[FLASH-USERS] Problems running at higher levels of refinement.

Klaus Weide klaus at flash.uchicago.edu
Mon Apr 24 13:17:40 EDT 2017


On Mon, 24 Apr 2017, Alexander Sheardown wrote:

> We appear to have a workaround for the MPI forking problem by reducing 
> the number of CPUs per node, this has solved the problem so far. The 
> only issue now is the simulation will stop at some point and give this 
> error in the output file:
.....
> #3  0x4595CD in amr_1blk_cc_cp_remote_ at amr_1blk_cc_cp_remote.F90:356
> 
> #4  0x47F686 in amr_1blk_guardcell_srl_ at amr_1blk_guardcell_srl.F90:753
> 
> #5  0x59DA10 in amr_1blk_guardcell_ at mpi_amr_1blk_guardcell.F90:743
> 
> #6  0x5F7273 in amr_guardcell_ at mpi_amr_guardcell.F90:301
> 
> #7  0x41CC9A in grid_fillguardcells_ at Grid_fillGuardCells.F90:460
> 
> #8  0x56FC61 in hy_uhd_unsplit_ at hy_uhd_unsplit.F90:296
> 
> #9  0x437645 in hydro_ at Hydro.F90:67
> 
> #10  0x409FC1 in driver_evolveflash_ at Driver_evolveFlash.F90:290
> 
> #11  0x404CB6 in flash at Flash.F90:51
> 
> #12  0x7F57EB37EB34
> 
> 
> There seems to be some issue transferring guard cell information between 
> blocks. I had come across this 
> http://flash.uchicago.edu/pipermail/flash-users/2015-February/001637.html 

You could try to extend that approach by using a lower optimization level 
(-O0) for other files (for example, all that appear in the backtrace 
above). Or just compile everything like that: setup with -debug,
with the exact meaning of that depending on your Makefile.h

> which details an error with gcc 4.9.1 (although I am now using 
> openmpi/gcc/1.10.5)

This does not show the compiler version - 1.10.5 is (probably) ther 
version of the OpenMPI library you are using. You may well be using
gcc 4.9.  Run 'mpif90 -V' for version info.

> when filling guard cells and details a work around by 
> adding a few lines to the end of the Makefile which I have added. This 
> is what I have been using. Interestingly, if I don't add these lines to 
> the end of the Makefile the simulation will stop straight away after 
> producing the initial plot file. 

It would be interesting to know what error messages are produced.

> However if I include the lines then the 
> simulation will run alot further than the initial plot file but will 
> eventually stop with the error shown above.

> Have you seen any related issue like this in FLASH before with problems transferring guard cell info?

The cases we know of appear to be consistent with the hypothesis that 
some compiler (version)s introduce errors in optimization.

Klaus



More information about the flash-users mailing list