[FLASH-USERS] issues with IBM MPI and hypre

Roman Yurchak roman.yurchak at crans.org
Wed Apr 17 18:27:15 EDT 2013


Hello,

  I'm trying to run a simulation derived from LaserSlab at
ada.idris.fr (Intel cpus with IBM's Parallel Operating Environment: poe
), and is is frequently  staying forever on a random time-step due to
some MPI communication errors within hypre (with no error messages
printed). The same simulation runs perfectly well with ifort/icc and
Open MPI on another machine.

  Now, on ada.idris.fr there is ifort/icc  12.1.0, IBM MPI,  hypre
2.9.0b and I'm using the svn version of FLASH from Oct 12 (I should
probably update to  FLASH 4.0.1 ). See log file
http://perso.crans.org/yurchak/i/flash.log.txt for compilation flags and
setup arguments. Few remarks:
   * the failures are reproducible: for a given setup, the simulation
would always hung on the same time step.
   * changing hydo solver parameters, cfl, etc. seems to only change the
time step when it would happen.
   * tried to recompile FLASH and hypre with -00 without much success
   * some debugging tells that when it happens, the processes are
approximatively in the following state:
        LapiImpl::Context:Advance
        MPIC_Wait()
        MPIR_Allreduce_intra()
        hypre_GMRESSetup()
        diff_advancetherm()
  (see http://perso.crans.org/yurchak/i/debug_tv_hypre.png for a more
complete snapshot of one of the processes with totalview)

 Any advice on how to deal with this kind of issues would be much
appreciated.

Thanks,
-- 
Roman Yurchak
Ph.D student
Laboratoire LULI
Ecole Polytechnique



More information about the flash-users mailing list