[FLASH-USERS] Fatal error in MPI_Sendrecv: Unknown error class, error stack

g.granda at irya.unam.mx g.granda at irya.unam.mx
Sun Dec 23 20:24:15 EST 2018


Hello Flash users,
I've been running hydro-dynamical simulations using a uniform grid for a 
while without any trouble. However, I recently got an error after 
increasing the resolution. Before that I increased the resolution of 
these simulations without any problem., which make me guess that this 
issue could be related to lack of enough memory, but I'm not sure.


My log file shows the following:

  FLASH log file:  12-23-2018  16:37:39.277    Run number:  1
  
==============================================================================
  Number of MPI tasks:               1024
  MPI version:                          3
  MPI subversion:                       1
  Dimensionality:                       3
  Max Number of Blocks/Proc:            1
  Number x zones:                      64
  Number y zones:                     128
  Number z zones:                     128
  Setup stamp:     Wed Dec 12 23:25:03 2018
  Build stamp:     Wed Dec 12 23:25:32 2018
  System info:     Linux mouruka.crya.privado 2.6.32-504.16.2.el6.x86_64 
#1 SMP Wed Apr 22 06:48:29
  Version:         FLASH 4.5_release
  Build directory: /home/guido/FLASH4.5/obj_lr
  Setup syntax:    /home/guido/FLASH4.5/bin/setup.py LinearRegime_mz 
-auto -3d -objdir=obj_lr +ug -site=irya.guido -nxb=64 -nyb=128 -nzb=128
  f compiler flags: 
/home/guido/libraries/compiled_with_gcc-7.3.0/mpich-3.2.1/bin/mpif90 -g 
-c -O2 -fdefault-real-8 -fdefault-double-8 -ffree-line-length-none 
-Wuninitialized -ggdb -c -O2 -fdefault-real-8 -fdefault-double-8 
-ffree-line-length-none -Wuninitialized -DMAXBLOCKS=1 -DNXB=64 -DNYB=128 
-DNZB=128 -DN_DIM=3
  c compiler flags:    
/home/guido/libraries/compiled_with_gcc-7.3.0/mpich-3.2.1/bin/mpicc 
-I/home/guido/libraries/compiled_with_gcc-7.3.0/hdf5-1.8.20/include 
-DH5_USE_16_API -O2 -c -DMAXBLOCKS=1 -DNXB=64 -DNYB=128 -DNZB=128 
-DN_DIM=3
  
==============================================================================
  Comment:  Linear Regime
  
==============================================================================
  FLASH Units used:
    Driver/DriverMain/Unsplit
    Driver/localAPI
    Grid/GridBoundaryConditions
    Grid/GridMain/UG
    Grid/localAPI
    IO/IOMain/hdf5/serial/UG
    IO/localAPI
    PhysicalConstants/PhysicalConstantsMain
    RuntimeParameters/RuntimeParametersMain
    Simulation/SimulationMain/LinearRegime_mz
    flashUtilities/contiguousConversion
    flashUtilities/general
    flashUtilities/interpolation/oneDim
    flashUtilities/nameValueLL
    flashUtilities/system/memoryUsage/legacy
    monitors/Logfile/LogfileMain
    monitors/Timers/TimersMain/MPINative
    physics/Eos/EosMain/Gamma
    physics/Eos/localAPI
    physics/Hydro/HydroMain/unsplit/Hydro_Unsplit
    physics/Hydro/localAPI
    physics/sourceTerms/Cool/CoolMain/equilibrium_cooling
  
==============================================================================
  RuntimeParameters:

  
==============================================================================
  bndpriorityone              =          1
bndprioritythree            =          3
bndprioritytwo              =          2
checkpointfileintervalstep  =  100000000 [CHANGED]
checkpointfilenumber        =          0
dr_abortpause               =          2
dr_dtminbelowaction         =          1
dr_numposdefvars            =          4
drift_break_inst            =          0
drift_trunc_mantissa        =          2
drift_verbose_inst          =          0
eos_entrelescalechoice      =          6
eos_loglevel                =        700
fileformatversion           =          9
forcedplotfilenumber        =          0
hy_3torder                  =         -1
hydrocomputedtoption        =         -1
igridsize                   =          1
iprocs                      =         16 [CHANGED]
iguard                      =          4
irenorm                     =          0
jgridsize                   =          1
jprocs                      =          8 [CHANGED]
jguard                      =          4
kgridsize                   =          1
kprocs                      =          8 [CHANGED]
kguard                      =          4
memory_stat_freq            =     100000
meshcopycount               =          1
nbegin                      =          1
nblockx                     =          1
nblocky                     =          1
nblockz                     =          1
nend                        =    1000000 [CHANGED]
nsteptotalsts               =          5
order                       =          2
outputsplitnum              =          1
plotfileintervalstep        =    1000000 [CHANGED]
plotfilenumber              =          0
rolling_checkpoint          =      10000
sim_nk                      =          2 [CHANGED]
sweeporder                  =        123
transorder                  =          1
wr_integrals_freq           =          1
limitedslopebeta            =                 0.100E+01
t_cool_min                  =                 0.100E+02 [CHANGED]
cfl                         =                 0.500E+00 [CHANGED]
checkpointfileintervaltime  =                 0.343E+14 [CHANGED]
checkpointfileintervalz     =                 0.180+309
cvisc                       =                 0.100E+00
dr_dtmincontinue            =                 0.000E+00
dr_posdefdtfactor           =                 0.100E+01
dr_tstepslowstartfactor     =                 0.100E+00
dtinit                      =                 0.137E+13 [CHANGED]
dtmax                       =                 0.137E+13 [CHANGED]
dtmin                       =                 0.137E+11 [CHANGED]
eintswitch                  =                 0.000E+00
eos_singlespeciesa          =                 0.100E+01
eos_singlespeciesz          =                 0.100E+01
gamma                       =                 0.167E+01 [CHANGED]
hy_cflfallbackfactor        =                 0.900E+00
hy_fpresinmomflux           =                 0.100E+01
hybridorderkappa            =                 0.000E+00
mu_mol                      =                 0.127E+01
nd_cool_max                 =                 0.100E+21
nd_cool_min                 =                 0.100E-01
nusts                       =                 0.100E+00
plotfileintervaltime        =                 0.137E+13 [CHANGED]
plotfileintervalz           =                 0.180+309
radiusgp                    =                 0.200E+01
rss_limit                   =                -0.100E+01
sigmagp                     =                 0.300E+01
sim_num_dens                =                 0.300E+01
sim_rho_amp                 =                 0.100E-02
sim_temp                    =                 0.730E+03
small                       =                 0.100E-39 [CHANGED]
smalle                      =                 0.100E-09
smallp                      =                 0.100E-21 [CHANGED]
smallt                      =                 0.100E+01 [CHANGED]
smallu                      =                 0.100E-39 [CHANGED]
smallx                      =                 0.100E-09
smlrho                      =                 0.100E-29 [CHANGED]
tinitial                    =                 0.000E+00
tiny                        =                 0.100E-15
tmax                        =                 0.137E+15 [CHANGED]
tstep_change_factor         =                 0.200E+01
wall_clock_checkpoint       =                 0.108E+05 [CHANGED]
wall_clock_time_limit       =                 0.605E+06
xmax                        =                 0.154E+20 [CHANGED]
xmin                        =                -0.154E+20 [CHANGED]
ymax                        =                 0.154E+20 [CHANGED]
ymin                        =                -0.154E+20 [CHANGED]
zfinal                      =                 0.000E+00
zinitial                    =                -0.100E+01
zmax                        =                 0.154E+20 [CHANGED]
zmin                        =                -0.154E+20 [CHANGED]
riemannsolver               = HLLC
unitsystem                  = CGS                            [CHANGED]
basenm                      = lr_                            [CHANGED]
dr_posdefvar_1              = none
dr_posdefvar_2              = none
dr_posdefvar_3              = none
dr_posdefvar_4              = none
entropyfixmethod            = HARTENHYMAN
   eosmode                     = dens_pres                      [CHANGED]
eosmodeinit                 = dens_ie
geometry                    = cartesian
grav_boundary_type          = isolated
hy_eosmodegc                = see eosMode
log_file                    = lr.log                         [CHANGED]
output_directory            =
pc_unitsbase                = CGS
plot_grid_var_1             = none
plot_grid_var_10            = none
plot_grid_var_11            = none
plot_grid_var_12            = none
plot_grid_var_2             = none
plot_grid_var_3             = none
plot_grid_var_4             = none
plot_grid_var_5             = none
plot_grid_var_6             = none
plot_grid_var_7             = none
plot_grid_var_8             = none
plot_grid_var_9             = none
plot_var_1                  = dens                           [CHANGED]
plot_var_10                 = none
plot_var_11                 = none
plot_var_12                 = none
plot_var_2                  = pres                           [CHANGED]
plot_var_3                  = temp                           [CHANGED]
plot_var_4                  = eint                           [CHANGED]
plot_var_5                  = velx                           [CHANGED]
plot_var_6                  = vely                           [CHANGED]
plot_var_7                  = velz                           [CHANGED]
plot_var_8                  = none
plot_var_9                  = none
prof_file                   = profile.dat
refine_var_thresh           = none
run_comment                 = Linear Regime                  [CHANGED]
run_number                  = 1
slopelimiter                = vanLeer
stats_file                  = flash.dat
wenomethod                  = WENO5
xl_boundary_type            = periodic
xr_boundary_type            = periodic
yl_boundary_type            = periodic
yr_boundary_type            = periodic
zl_boundary_type            = periodic
zr_boundary_type            = periodic
eosforriemann               =  F
addthermalflux              =  T
allowdtstsdominate          =  F
alwayscomputeuservars       =  T
alwaysrestrictcheckpoint    =  T
charlimiting                =  T
chkguardcellsinput          =  F
chkguardcellsoutput         =  F
compute_grid_size           =  T
conserveangmom              =  F
converttoconsvdformeshcalls =  F
corners                     =  F
dr_printtsteploc            =  T
dr_shortenlaststepbeforetmax =  F
dr_useposdefcomputedt       =  F
drift_tuples                =  F
eachprocwritesownabortlog   =  F
eachprocwritessummary       =  F
entropy                     =  F
flux_correct                =  F
geometryoverride            =  F
gr_bcenableapplymixedgds    =  T
hy_fallbacklowercfl         =  F
hy_fullspecmsfluxhandling   =  T
ignoreforcedplot            =  F
io_writemscalarintegrals    =  F
plotfilegridquantitydp      =  F
plotfilemetadatadp          =  F
reducegcellfills            =  F
restart                     =  F
shockdetect                 =  F
shocklowercfl               =  F
summaryoutputonly           =  F
threadblocklistbuild        =  F
threaddriverblocklist       =  F
threaddriverwithinblock     =  F
threadeoswithinblock        =  F
threadhydroblocklist        =  F
threadhydrowithinblock      =  F
threadraytracebuild         =  F
threadwithinblockbuild      =  F
typematchedxfer             =  T
unbiased_geometry           =  F
updatehydrofluxes           =  T
useburn                     =  F
usecollectivehdf5           =  T
useconductivity             =  F
usecool                     =  F
usecosmology                =  F
usedeleptonize              =  F
usediffuse                  =  F
usediffusecomputedtspecies  =  F
usediffusecomputedttherm    =  F
usediffusecomputedtvisc     =  F
usediffusecomputedtmagnetic =  F
useenergydeposition         =  F
useflame                    =  F
usegravity                  =  F
useheat                     =  F
        useheatexchange             =  F
usehydro                    =  T
useincompns                 =  F
useionize                   =  F
uselegacylabels             =  T
usemagneticresistivity      =  F
usemassdiffusivity          =  F
useopacity                  =  F
useparticles                =  F
useplasmastate              =  F
usepolytrope                =  F
useprimordialchemistry      =  F
useprotonemission           =  F
useprotonimaging            =  F
useradtrans                 =  F
useraytrace                 =  F
usests                      =  F
usestsfordiffusion          =  F
usestir                     =  F
usethomsonscattering        =  F
usetreeray                  =  F
useturb                     =  T
useviscosity                =  F
usexrayimaging              =  F
use_3dfullctu               =  T
use_auxeinteqn              =  T
use_avisc                   =  F
use_cma_advection           =  F
use_cma_flattening          =  F
use_flattening              =  F
use_gravhalfupdate          =  T
use_hybridorder             =  F
use_steepening              =  F
use_upwindtvd               =  F
writestatsummary            =  T

  
==============================================================================

  Known units of measurement:

               Unit                          CGS Value                
Base Unit
   1                  cm                     1.0000                       
     cm
   2                   s                     1.0000                       
      s
   3                   g                     1.0000                       
      g
   4                   K                     1.0000                       
      K
   5                 esu                     1.0000                       
    esu
   6                 mol                     1.0000                       
    mol
   7                   m                     100.00                       
     cm
   8                  km                    1.00000E+05                   
     cm
   9                  pc                    3.08568E+18                   
     cm
  10                 kpc                    3.08568E+21                   
     cm
  11                 Mpc                    3.08568E+24                   
     cm
     12                 Gpc                    3.08568E+27                
        cm
  13                Rsun                    6.96000E+10                   
     cm
  14                  AU                    1.49598E+13                   
     cm
  15                  yr                    3.15569E+07                   
      s
  16                 Myr                    3.15569E+13                   
      s
  17                 Gyr                    3.15569E+16                   
      s
  18                  kg                     1000.0                       
      g
  19                Msun                    1.98892E+33                   
      g
  20                 amu                    1.66054E-24                   
      g
  21                  eV                     11605.                       
      K
  22                   C                    2.99792E+09                   
    esu
  23                LFLY                    3.08568E+24                   
     cm
  24                TFLY                    2.05759E+17                   
      s
  25                MFLY                    9.88470E+45                   
      g
  26            clLength                    3.08568E+24                   
     cm
  27              clTime                    3.15569E+16                   
      s
  28              clMass                    1.98892E+48                   
      g
  29              clTemp                    1.16045E+07                   
      K
-----------End of Units--------------------

  Known physical constants:

     Constant Name       Constant Value   cm       s        g        K    
     esu      mol
   1              Newton    6.67408E-08   3.0     -2.0     -1.0      0.0  
     0.0      0.0
   2      speed of light    2.99792E+10   1.0     -1.0      0.0      0.0  
     0.0      0.0
   3              Planck    6.62607E-27   2.0     -1.0      1.0      0.0  
     0.0      0.0
   4     electron charge    4.80320E-10   0.0      0.0      0.0      0.0  
     1.0      0.0
   5       electron mass    9.10938E-28   0.0      0.0      1.0      0.0  
     0.0      0.0
   6         proton mass    1.67262E-24   0.0      0.0      1.0      0.0  
     0.0      0.0
   7      fine-structure    7.29735E-03   0.0      0.0      0.0      0.0  
     0.0      0.0
   8            Avogadro    6.02214E+23   0.0      0.0      0.0      0.0  
     0.0     -1.0
   9           Boltzmann    1.38065E-16   2.0     -2.0      1.0     -1.0  
     0.0      0.0
  10  ideal gas constant    8.31446E+07   2.0     -2.0      1.0     -1.0  
     0.0     -1.0
  11                Wien    0.28978       1.0      0.0      0.0      1.0  
     0.0      0.0
  12    Stefan-Boltzmann    5.67037E-05   0.0     -3.0      1.0     -4.0  
     0.0      0.0
  13  Radiation Constant    7.56572E-15  -1.0     -2.0      1.0     -4.0  
     0.0      0.0
  14                  pi     3.1416       0.0      0.0      0.0      0.0  
     0.0      0.0
  15                   e     2.7183       0.0      0.0      0.0      0.0  
     0.0      0.0
  16               Euler    0.57722       0.0      0.0      0.0      0.0  
     0.0      0.0
  
==============================================================================

  Multifluid database: not configured in

  
==============================================================================
  [ 12-23-2018  16:37:39.287 ] [gr_initGeometry] checking BCs for idir: 1
  [ 12-23-2018  16:37:39.288 ] [gr_initGeometry] checking BCs for idir: 2
  [ 12-23-2018  16:37:39.289 ] [gr_initGeometry] checking BCs for idir: 3

While, the error file shows:

Fatal error in MPI_Sendrecv: Unknown error class, error stack:
MPI_Sendrecv(237)..........: MPI_Sendrecv(sbuf=0x184d22a0, scount=1, 
dtype=USER<hvector>, dest=1, stag=1, rbuf=0x185982a0, rcount=1, 
dtype=USER<hvector>, src=3, rtag=1, comm=0x84000007, 
status=0x7fffd42b79b0) failed
MPID_nem_tcp_connpoll(1845): Communication error with rank 794: 
Connection reset by peer
/var/spool/torque/mom_priv/jobs/2392.mouruka.crya.privado.SC: line 12: 
/home/guido: is a directory
Fatal error in PMPI_Bcast: Unknown error class, error stack:
PMPI_Bcast(1600)........: MPI_Bcast(buf=0x7fff76136018, count=1, 
MPI_INTEGER, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1452)...:
MPIR_Bcast(1476)........:
MPIR_Bcast_intra(1249)..:
MPIR_SMP_Bcast(1081)....:
MPIR_Bcast_binomial(285):
MPIC_Send(300)..........:
MPID_Send(75)...........: Communication error with rank 8
MPIR_SMP_Bcast(1088)....:
MPIR_Bcast_binomial(310): Failure during collective
Fatal error in PMPI_Bcast: Unknown error class, error stack:
PMPI_Bcast(1600)........: MPI_Bcast(buf=0x7fff908136d8, count=1, 
MPI_INTEGER, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1452)...:
MPIR_Bcast(1476)........:
MPIR_Bcast_intra(1249)..:
MPIR_SMP_Bcast(1081)....:
MPIR_Bcast_binomial(310): Failure during collective
MPIR_SMP_Bcast(1088)....:
MPIR_Bcast_binomial(310): Failure during collective
Fatal error in PMPI_Bcast: Unknown error class, error stack:
PMPI_Bcast(1600)........: MPI_Bcast(buf=0x7fff1bc337d8, count=1, 
MPI_INTEGER, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1452)...:
MPIR_Bcast(1476)........:
MPIR_Bcast_intra(1249)..:
MPIR_SMP_Bcast(1088)....:
MPIR_Bcast_binomial(310): Failure during collective
Fatal error in PMPI_Bcast: Unknown error class, error stack:
PMPI_Bcast(1600)........: MPI_Bcast(buf=0x7fffa3446a98, count=1, 
MPI_INTEGER, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1452)...:
MPIR_Bcast(1476)........:
MPIR_Bcast_intra(1249)..:
MPIR_SMP_Bcast(1088)....:
MPIR_Bcast_binomial(310): Failure during collective
Fatal error in PMPI_Bcast: Unknown error class, error stack:
PMPI_Bcast(1600)........: MPI_Bcast(buf=0x7fffe4a93648, count=1, 
MPI_INTEGER, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1452)...:
MPIR_Bcast(1476)........:
MPIR_Bcast_intra(1249)..:
MPIR_SMP_Bcast(1088)....:

I didn't find a similar error in the flash forum. Do you know what is 
going on?
Cheers,

                                                                          
            47,79         15%
                                                                          
           1,2           Top



More information about the flash-users mailing list