[FLASH-USERS] issues with IBM MPI and hypre
Roman Yurchak
roman.yurchak at crans.org
Wed Apr 17 18:27:15 EDT 2013
Hello,
I'm trying to run a simulation derived from LaserSlab at
ada.idris.fr (Intel cpus with IBM's Parallel Operating Environment: poe
), and is is frequently staying forever on a random time-step due to
some MPI communication errors within hypre (with no error messages
printed). The same simulation runs perfectly well with ifort/icc and
Open MPI on another machine.
Now, on ada.idris.fr there is ifort/icc 12.1.0, IBM MPI, hypre
2.9.0b and I'm using the svn version of FLASH from Oct 12 (I should
probably update to FLASH 4.0.1 ). See log file
http://perso.crans.org/yurchak/i/flash.log.txt for compilation flags and
setup arguments. Few remarks:
* the failures are reproducible: for a given setup, the simulation
would always hung on the same time step.
* changing hydo solver parameters, cfl, etc. seems to only change the
time step when it would happen.
* tried to recompile FLASH and hypre with -00 without much success
* some debugging tells that when it happens, the processes are
approximatively in the following state:
LapiImpl::Context:Advance
MPIC_Wait()
MPIR_Allreduce_intra()
hypre_GMRESSetup()
diff_advancetherm()
(see http://perso.crans.org/yurchak/i/debug_tv_hypre.png for a more
complete snapshot of one of the processes with totalview)
Any advice on how to deal with this kind of issues would be much
appreciated.
Thanks,
--
Roman Yurchak
Ph.D student
Laboratoire LULI
Ecole Polytechnique
More information about the flash-users
mailing list