[FLASH-USERS] 4000+ cpus on Franklin
Seyit Hocuk
seyit at astro.rug.nl
Wed Sep 2 05:00:28 EDT 2009
Hi, Just a quick note on this.
For me, using -O3 in the compiler options got rid of the unwanted zero
values (and the weird refinement boundary fluctuations). Not using -O3
with Flash3 creates lots of problems.
I hope this is of some use for you.
Seyit
James Guillochon wrote:
> Hi Klaus,
>
> I am using FLASH 3.0. The problem occurs immediately after restart,
> before the first time step. Here's a copy of the log before aborting:
>
> [ 08-31-2009 21:56:16.578 ] [GRID amr_refine_derefine]: initiating
> refinement
> [GRID amr_refine_derefine] min blks 17 max blks 21 tot blks 76393
> [GRID amr_refine_derefine] min leaf blks 13 max leaf blks 17 tot
> leaf blks 66844
> [ 08-31-2009 21:56:16.655 ] [GRID amr_refine_derefine]: refinement
> complete
> [DRIVER_ABORT] Driver_abort() called by PE 4093
> abort_message [flash_convert_cc_hook] Trying to convert non-zero
> mass-specific variable to per-volume form, but dens is zero!
>
> Here's the standard output:
>
> file: wdacc_hdf5_chk_00170 opened for restart
> read_data: read 76393 blocks.
> io_readData: finished reading input file.
> [Eos_init] Cannot open helm_table.bdat!
> [Eos_init] Trying old helm_table.dat!
> Source terms initialized
> don_dist, don_mass 3359870241.058920 6.5677519229596569E+032
> [EOS Helmholtz] WARNING! Mask setting does not speed up Eos
> Helmholtz calls
> iteration, no. not moved = 0 76246
> iteration, no. not moved = 1 42904
> iteration, no. not moved = 2 0
> refined: total leaf blocks = 66844
> refined: total blocks = 76393
> [flash_convert_cc_hook] PE= 4093, ivar= 4, why=2
> Trying to convert non-zero mass-specific variable to per-volume form,
> but dens is zero!
> Application 1231431 exit codes: 1
> Application 1231431 exit signals: Killed
> Application 1231431 resources: utime 0, stime 0
>
> The error seems to be happening on one of the very last indexed
> processors (There are 4096 processors total, error is happening on
> 4093), and only on one of them.
>
> I've tried enabling "amr_error_checking" to dump some additional
> information, if I enable that option I end up with segmentation faults
> on all processors. Here's the standard output just before crashing:
>
> mpi_amr_1blk_restrict: after commsetup: pe 3
> mpi_amr_1blk_restrict: after commsetup: pe 2
> mpi_amr_1blk_restrict: pe 3 blk 10 ich =
> 1
> mpi_amr_1blk_restrict: pe 3 blk 10 child =
> 3 11
> mpi_amr_1blk_restrict: pe 3 blk 10 cnodetype =
> 1
> mpi_amr_1blk_restrict: pe 3 blk 10 cempty =
> 0
> mpi_amr_1blk_restrict: pe 2 blk 1 ich =
> 1
> mpi_amr_1blk_restrict: pe 3 blk 10 calling perm
> mpi_amr_1blk_restrict: pe 2 blk 1 child =
> 2 2
> mpi_amr_1blk_restrict: pe 2 blk 1 cnodetype =
> 1
> mpi_amr_1blk_restrict: pe 2 blk 1 cempty =
> 0
> mpi_amr_1blk_restrict: pe 2 blk 1 calling perm
> mpi_amr_1blk_restrict: pe 1 blk 2
> after reset blockgeom
> mpi_amr_1blk_restrict: pe 1 blk 2
> bef reset amr_restrict_unk_fun
> mpi_amr_1blk_restrict: pe 3 blk 10 exited perm
> mpi_amr_1blk_restrict: pe 2 blk 1 exited perm
> mpi_amr_1blk_restrict: pe 3 blk 10 calling
> blockgeom
> mpi_amr_1blk_restrict: pe 1 blk 2
> aft reset amr_restrict_unk_fun
> mpi_amr_1blk_restrict: pe 2 blk 1 calling
> blockgeom
> mpi_amr_1blk_restrict: pe 1 blk 2 aft lcc
> mpi_amr_1blk_restrict: pe 1 blk 2 ich =
> 3
> mpi_amr_1blk_restrict: pe 1 blk 2 child =
> 1 5
> mpi_amr_1blk_restrict: pe 1 blk 2 cnodetype =
> 1
> mpi_amr_1blk_restrict: pe 1 blk 2 cempty =
> 0
> mpi_amr_1blk_restrict: pe 1 blk 2 calling perm
> Application 1231925 exit codes: 139
> Application 1231925 exit signals: Killed
> Application 1231925 resources: utime 4885, stime 185
>
> Unfortunately, I wasn't able to pin down what I actually did to fix
> the problem I had with zero density values a few months ago. I had
> been trying many different things, including changing bits of the
> actual Simulation code.
>
> Your help is very much appreciated!
>
More information about the flash-users
mailing list