[FLASH-USERS] 4000+ cpus on Franklin

Seyit Hocuk seyit at astro.rug.nl
Wed Sep 2 05:00:28 EDT 2009


Hi, Just a quick note on this.

For me, using -O3 in the compiler options got rid of the unwanted zero 
values (and the weird refinement boundary fluctuations). Not using -O3 
with Flash3 creates lots of problems.
I hope this is of some use for you.

Seyit



James Guillochon wrote:
> Hi Klaus,
>
> I am using FLASH 3.0. The problem occurs immediately after restart, 
> before the first time step. Here's a copy of the log before aborting:
>
> [ 08-31-2009  21:56:16.578 ] [GRID amr_refine_derefine]: initiating 
> refinement
> [GRID amr_refine_derefine] min blks 17    max blks 21    tot blks 76393
> [GRID amr_refine_derefine] min leaf blks 13    max leaf blks 17    tot 
> leaf blks 66844
> [ 08-31-2009  21:56:16.655 ] [GRID amr_refine_derefine]: refinement 
> complete
> [DRIVER_ABORT] Driver_abort() called by PE         4093
> abort_message [flash_convert_cc_hook] Trying to convert non-zero 
> mass-specific variable to per-volume form, but dens is zero!
>
> Here's the standard output:
>
>  file: wdacc_hdf5_chk_00170 opened for restart
>  read_data:  read         76393  blocks.
>  io_readData:  finished reading input file.
>  [Eos_init] Cannot open helm_table.bdat!
>  [Eos_init] Trying old helm_table.dat!
>  Source terms initialized
>  don_dist, don_mass    3359870241.058920        6.5677519229596569E+032
>  [EOS Helmholtz] WARNING!  Mask setting does not speed up Eos 
> Helmholtz calls
>   iteration, no. not moved =             0        76246
>   iteration, no. not moved =             1        42904
>   iteration, no. not moved =             2            0
>  refined: total leaf blocks =         66844
>  refined: total blocks =         76393
> [flash_convert_cc_hook] PE=   4093, ivar=  4, why=2
>  Trying to convert non-zero mass-specific variable to per-volume form, 
> but dens is zero!
> Application 1231431 exit codes: 1
> Application 1231431 exit signals: Killed
> Application 1231431 resources: utime 0, stime 0
>
> The error seems to be happening on one of the very last indexed 
> processors (There are 4096 processors total, error is happening on 
> 4093), and only on one of them.
>
> I've tried enabling "amr_error_checking" to dump some additional 
> information, if I enable that option I end up with segmentation faults 
> on all processors. Here's the standard output just before crashing:
>
>  mpi_amr_1blk_restrict: after commsetup: pe             3
>  mpi_amr_1blk_restrict: after commsetup: pe             2
>  mpi_amr_1blk_restrict: pe             3  blk            10  ich =
>             1
>  mpi_amr_1blk_restrict: pe             3  blk            10  child =
>             3           11
>  mpi_amr_1blk_restrict: pe             3  blk            10  cnodetype =
>             1
>  mpi_amr_1blk_restrict: pe             3  blk            10  cempty =
>             0
>  mpi_amr_1blk_restrict: pe             2  blk             1  ich =
>             1
>  mpi_amr_1blk_restrict: pe             3  blk            10  calling perm
>  mpi_amr_1blk_restrict: pe             2  blk             1  child =
>             2            2
>  mpi_amr_1blk_restrict: pe             2  blk             1  cnodetype =
>             1
>  mpi_amr_1blk_restrict: pe             2  blk             1  cempty =
>             0
>  mpi_amr_1blk_restrict: pe             2  blk             1  calling perm
>  mpi_amr_1blk_restrict: pe             1  blk             2
>   after reset blockgeom
>  mpi_amr_1blk_restrict: pe             1  blk             2
>   bef reset amr_restrict_unk_fun
>  mpi_amr_1blk_restrict: pe             3  blk            10  exited perm
>  mpi_amr_1blk_restrict: pe             2  blk             1  exited perm
>  mpi_amr_1blk_restrict: pe             3  blk            10  calling 
> blockgeom
>  mpi_amr_1blk_restrict: pe             1  blk             2
>   aft reset amr_restrict_unk_fun
>  mpi_amr_1blk_restrict: pe             2  blk             1  calling 
> blockgeom
>  mpi_amr_1blk_restrict: pe             1  blk             2  aft lcc
>  mpi_amr_1blk_restrict: pe             1  blk             2  ich =
>             3
>  mpi_amr_1blk_restrict: pe             1  blk             2  child =
>             1            5
>  mpi_amr_1blk_restrict: pe             1  blk             2  cnodetype =
>             1
>  mpi_amr_1blk_restrict: pe             1  blk             2  cempty =
>             0
>  mpi_amr_1blk_restrict: pe             1  blk             2  calling perm
> Application 1231925 exit codes: 139
> Application 1231925 exit signals: Killed
> Application 1231925 resources: utime 4885, stime 185
>
> Unfortunately, I wasn't able to pin down what I actually did to fix 
> the problem I had with zero density values a few months ago. I had 
> been trying many different things, including changing bits of the 
> actual Simulation code.
>
> Your help is very much appreciated!
>




More information about the flash-users mailing list