[FLASH-USERS] gr_findAllNeghID

Jeremy S Ritter jritter at mail.utexas.edu
Mon Nov 27 20:49:55 EST 2017


Hi John,

Are you using Stampede2 at TACC? I had the exact same problem come up out
of nowhere over the summer. We have a custom set of particle mapping
routines that uses 99% the same logic as the normal FLASH routines with
some minor changes on unrelated parts. We could call the same functions
several times successfully while building the grid, then suddenly one of
the calls would cause this crash. Whenever it happened it would always
happen at that same step upon restarts unless I altered the grid
configuration somehow. I solved the problem by decreasing MAXBLOCKS and/or
changing the number of processors used, so that each processor has more
memory available and the blocks were distributed differently amongst the
processors. I had been using MAXBLOCKS=500 for years, with the
nodes/processes arranged to get 2gb per processor, but have recently had to
lower that to 300 to stop this and other seemingly related problems.

After weeks of debugging over the summer, my completely unsubstantiated
guess is that there is occasionally something fishy going on during the
MPI_SEND/RECV calls that exchange information between neighbors, and that
the fishiness comes down to one of the processors running out of memory and
then hanging, crashing, or otherwise not responding during the MPI call. I
have a ticket open with TACC but they haven't contributed any insight.

Cheers,
-Jeremy

On Mon, Nov 27, 2017 at 12:22 PM, John ZuHone <jzuhone at gmail.com> wrote:

> Hi all,
>
> I suspect that this is a question mostly for Klaus, but if anyone here has
> any thoughts I’d appreciate it.
>
> I’m developing a new module in which I need to know the grid information
> about not only the neighbor blocks but also the blocks on the corners. For
> that, I am attempting to use the gr_findAllNeghID function, which is used
> currently by the routines which map particle properties to the mesh. I am
> aware that this function needs to have a sane grid structure, which is
> typically set up during a call to Grid_fillGuardCells. I believe that
> calling the function gr_ensureValidNeighborInfo should do the job, if I’m
> not mistaken.
>
> However, after calling gr_findAllNeghID, I am getting errors like this,
> which indicate that the mesh is not in the proper state:
>
>  Block handle error for target block:          -1 , proc:          -1
>  . My block is:          31 and proc is:          30
>   and we were trying to find neighbors to guard cell region:           1
>            1           1 and my global space is: IAXIS=  -1.00000000000000
>   -1.00000000000000      JAXIS=  -1.00000000000000       -1.00000000000000
>  KAXIS=  -1.00000000000000       -1.00000000000000
>  (1) No block handle.... increase maxblocks_alloc
>  DRIVER_ABORT: Damn 1
>  Block handle error for target block:          -1 , proc:          -1
>  . My block is:         542 and proc is:         540
>   and we were trying to find neighbors to guard cell region:           1
>            1           1 and my global space is: IAXIS=  -1.00000000000000
>   -1.00000000000000      JAXIS=  -1.00000000000000       -1.00000000000000
>  KAXIS=  -1.00000000000000       -1.00000000000000
>  (1) No block handle.... increase maxblocks_alloc
>  Block handle error for target block:          -1 , proc:          -1
>  DRIVER_ABORT: Damn 1
>  . My block is:        1082 and proc is:        1080
>   and we were trying to find neighbors to guard cell region:           1
>            1           1 and my global space is: IAXIS=  -1.00000000000000
>   -1.00000000000000      JAXIS=  -1.00000000000000       -1.00000000000000
>  KAXIS=  -1.00000000000000       -1.00000000000000
>  (1) No block handle.... increase maxblocks_alloc
>  DRIVER_ABORT: Damn 1
>
> I’ve tried adding a call to Grid_fillGuardCells right before the call to
> gr_findAllNeghID (which is overkill for my application, but worth a shot),
> as well as gr_ensureValidNeighborInfo(10), but neither of these things
> work.
>
> Does anyone have any idea what I need to do to get this function to work
> properly?
>
> Thanks,
>
> John Z
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20171127/bfdcce05/attachment.htm>


More information about the flash-users mailing list