<div dir="ltr"><div><div><div>Hi Rukmani,<br><br></div>I was able to get past the initial hangup by reducing MAXBLOCKS. I normally use 500, but reduced to 400 then 300 to get through initialization. The problem occurs much less frequently, but was not eliminated. It still happens if I refine too much during one step. I have been running with 48 cpus per node so that they each have 2gb of memory like Stampede1. I had been using Stampede2 fine for a couple of months before this started happening suddenly, so I suspect it coincides with some software upgrade they did recently. I opened a ticket with TACC a few weeks ago but they haven't responded.<br><br></div>Cheers,<br></div>-Jeremy<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 1, 2017 at 2:23 PM, Rukmani Vijayaraghavan <span dir="ltr"><<a href="mailto:rukmani@virginia.edu" target="_blank">rukmani@virginia.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">Hi,<br>

<br>

I've been experiencing similar problems with FLASH4 on Stampede2 at TACC. My jobs hang on the initial block refinement step.<br>

<br>

I wrote to the people at TACC and they recommended running a test problem on 1 node 16 cores on Stampede 1 and 1 node 64 cores on Stampede 2. When I tried this for the basic 2D Sedov test problem, Stampede 2 was 3 times slower than Stampede 2. Based on their response, it looks like FLASH is currently not well optimized for the new KNL nodes of Stampede 2 (<a href="https://portal.tacc.utexas.edu/user-guides/stampede2#bestpractices" rel="noreferrer" target="_blank">https://portal.tacc.utexas.ed<wbr>u/user-guides/stampede2#bestpr<wbr>actices</a>). I'm not sure if anything else is missing -- I tested my simulation setup on Stampede 1 and it worked just fine.<br>

<br>

Are there any other recommendations or fixes? Has anybody else successfully run large FLASH runs on Stampede 2?<br>

<br>

Thanks,<br>

Rukmani<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hello,<br>

<br>

I have been experiencing a problem with my simulations hanging at random<br>

occasions while processing a refinement step (via gr_updateRefinement). The<br>

issue seems to be related to a mismatch between the processors sending and<br>

receiving block data in mpi_amr_redist_blk.F90. A stack trace shows that<br>

there is a single processor still making a call to send_block_data() while<br>

the rest have moved on to the subsequent MPI_ALLREDUCE() call (see excerpt<br>

below). The deadlock condition is repeatable: e.g. if it happened at step 3<br>

it will keep happening at the same point unless I change the grid structure<br>

by refining more or less. I have some of my own routines that are modifying<br>

the logical structures for marking blocks, as in refine(blockID) = .true.,<br>

but am not attempting to modify the grid through any other means. Is it<br>

possible there is a problem with my MPI setup? I am using FLASH4.4 on the<br>

new Stampede2 at TACC with 8 nodes by 48 processors each for 384 total<br>

processors.<br>

<br>

Thanks!<br>

-Jeremy<br>

<br>

flash4             000000000079E2D6  send_block_data_          269<br>

send_block_data.F90<br>

flash4             000000000070C3E4  amr_redist_blk_           674<br>

mpi_amr_redist_blk.F90<br>

flash4             00000000004FE560  amr_morton_order_         164<br>

amr_morton_order.F90<br>

flash4             000000000071A7DA  amr_refine_derefi         319<br>

mpi_amr_refine_derefine.F90<br>

flash4             00000000005D8B22  gr_updaterefineme         112<br>

gr_updateRefinement.F90<br>

flash4             0000000000457A10  grid_updaterefine          98<br>

Grid_updateRefinement.F90<br>

flash4             0000000000413F74  driver_evolveflas         390<br>

Driver_evolveFlash.F90<br>

flash4             000000000041DBB3  MAIN__                     51<br>

Flash.F90<br>

<br>

flash4             000000000070C50F  amr_redist_blk_           686<br>

mpi_amr_redist_blk.F90<br>

flash4             00000000004FE560  amr_morton_order_         164<br>

amr_morton_order.F90<br>

flash4             000000000071A7DA  amr_refine_derefi         319<br>

mpi_amr_refine_derefine.F90<br>

flash4             00000000005D8B22  gr_updaterefineme         112<br>

gr_updateRefinement.F90<br>

flash4             0000000000457A10  grid_updaterefine          98<br>

Grid_updateRefinement.F90<br>

flash4             0000000000413F74  driver_evolveflas         390<br>

Driver_evolveFlash.F90<br>

flash4             000000000041DBB3  MAIN__                     51<br>

Flash.F90<br>

</blockquote>

<br>

-- <br>

Rukmani Vijayaraghavan<br>

NSF Astronomy & Astrophysics Postdoctoral Fellow<br>

University of Virginia<br>

<a href="mailto:rukmani@virginia.edu" target="_blank">rukmani@virginia.edu</a><br>

<br>

</div></div></blockquote></div><br></div>