[FLASH-USERS] truncation errors and restarting
Mateusz Ruszkowski
mateuszr at umich.edu
Mon Nov 17 13:49:31 EST 2008
Hi,
I am restarting from a checkpoint file a run that crashed due to a
segmentation fault. For many steps after the restart the code behaves
*exactly* the same way as before prior to the crash. However, at some
point the timestep changes by 10% in one step. The restarted run is now
past the point where the code crashed before and is now running normally.
I understand that it may be very difficult to diagnose a problem like this
one. But I am wondering if it is possible that some tiny truncation errors
in the checkpoint file eventually made code behave sightly differently
(e.g., different switches got activated, etc) which in turn prevented the
crash due to the segmentation fault. Is this possible ?
Mateusz
P.S. this is what happens around the time the timestep changes
(previous timesteps were identical)
Old:
[ 11-17-2008 03:59:53.253 ] [gr_hgSolve]: gr_hgSolve: ite 4:
norm(residual)/norm(src) = 1.445309E-06
[ 11-17-2008 03:59:53.289 ] [mpi_amr_comm_setup]:
buffer_dim_send=291565, buffer_dim_recv=218221
[ 11-17-2008 03:59:53.816 ] [mpi_amr_comm_setup]:
buffer_dim_send=232373, buffer_dim_recv=188625
[ 11-17-2008 03:59:54.186 ] [mpi_amr_comm_setup]:
buffer_dim_send=227937, buffer_dim_recv=188625
[ 11-17-2008 03:59:54.692 ] [mpi_amr_comm_setup]:
buffer_dim_send=227353, buffer_dim_recv=188625
[ 11-17-2008 03:59:56.718 ] [gr_hgSolve]: gr_hgSolve: ite 5:
norm(residual)/norm(src) = 1.673032E-07
[ 11-17-2008 03:59:58.194 ] step: n=1029 t=7.484192E+05 dt=2.140401E+02
[ 11-17-2008 03:59:58.536 ] [mpi_amr_comm_setup]:
buffer_dim_send=5646649, buffer_dim_recv=4801841
[ 11-17-2008 04:00:07.804 ] [mpi_amr_comm_setup]:
buffer_dim_send=1310137, buffer_dim_recv=1107121
[ 11-17-2008 04:00:13.468 ] [mpi_amr_comm_setup]:
buffer_dim_send=4375321, buffer_dim_recv=3722705
Restarted:
[ 11-17-2008 11:29:55.951 ] [gr_hgSolve]: gr_hgSolve: ite 4:
norm(residual)/norm(src) = 1.445309E-06
[ 11-17-2008 11:29:55.986 ] [mpi_amr_comm_setup]:
buffer_dim_send=291565, buffer_dim_recv=218221
[ 11-17-2008 11:29:56.550 ] [mpi_amr_comm_setup]:
buffer_dim_send=232373, buffer_dim_recv=188625
[ 11-17-2008 11:29:56.921 ] [mpi_amr_comm_setup]:
buffer_dim_send=227937, buffer_dim_recv=188625
[ 11-17-2008 11:29:57.454 ] [mpi_amr_comm_setup]:
buffer_dim_send=227353, buffer_dim_recv=188625
[ 11-17-2008 11:29:59.433 ] [gr_hgSolve]: gr_hgSolve: ite 5:
norm(residual)/norm(src) = 1.673031E-07
[ 11-17-2008 11:30:05.081 ] step: n=1029 t=7.484192E+05 dt=2.506123E+02
[ 11-17-2008 11:30:05.436 ] [mpi_amr_comm_setup]:
buffer_dim_send=5646649, buffer_dim_recv=4801841
[ 11-17-2008 11:30:14.690 ] [mpi_amr_comm_setup]:
buffer_dim_send=1310137, buffer_dim_recv=1107121
[ 11-17-2008 11:30:20.366 ] [mpi_amr_comm_setup]:
buffer_dim_send=4375321, buffer_dim_recv=3722705
More information about the flash-users
mailing list