[FLASH-USERS] Flash4b hangs on BlueGene/Q system

Mirko Cestari m.cestari at cineca.it
Fri Nov 23 04:11:48 EST 2012


Dear Chris,
thanks a lot for your answer. You probably pointed out the
problem here. Indeed I have the __mcount_internal   
in my executable:

 nm   flash4 | egrep '( _mcount$|__mcount_internal$)'
0000000001904e78 D __mcount_internal
00000000019039f0 D _mcount


This is an example of how an object is compiled and the application linked

============
mpixlf77_r -O0 -g -qstrict -q64  -qzerosize  -c -cpp  -qsuffix=cpp=F:cpp=F90 -qfree=f90 -WF,-DMAXBLOCKS=100 -WF,-DNXB=8 -WF,-DNYB=8 -WF,-DNZB=8 -WF,-DN_DIM=3 rp_getOpt.F90

mpixlf77_r   -o  flash4 Burn.o Burn_computeDt.o Burn_finalize.o ...
==============================

I use no "-pg" argument, so I don't really understand where the __mcount_internal
function comes from.     

Mirko 

--
Mirko Cestari, PhD
m.cestari at cineca.it
CINECA - SuperComputing Applications and Innovation Department - SCAI
via Magnanelli, 6/3 40033 Casalecchio di Reno (Bologna) - ITALY
www.cineca.it

----- Original Message -----
From: "Christopher Daley" <cdaley at flash.uchicago.edu>
To: "Mirko Cestari" <m.cestari at cineca.it>
Cc: flash-users at flash.uchicago.edu
Sent: Wednesday, November 21, 2012 6:42:36 PM
Subject: Re: [FLASH-USERS] Flash4b hangs on BlueGene/Q system

Hi Mirko,

We are able to run FLASH simulations on the Argonne BG/Q.

I would double check that you have no '-pg' in your compile
lines.  Maybe you also have custom static pattern rules in your
Makefile.h that contain '-pg'?

Based on your stacktrace you should see _mcount symbol in your
Flash.o and flash4 binary, i.e.

$ nm -A Flash.o flash4 | egrep '( _mcount$|__mcount_internal$)'
Flash.o:                 U _mcount
flash4:00000000023a47e0 D __mcount_internal
flash4:00000000023a3628 D _mcount
$

I can only get these symbols by compiling Flash.F90 with -pg.
If I remove -pg then I get
$ nm -A Flash.o flash4 | egrep '( _mcount$|__mcount_internal$)'
$

You should create a flash4 binary that has no reference to
_mcount and then repeat your flash4 run.


(The only "embedded" profiling in FLASH4 needs to be explicitly
included in your FLASH application at setup time using
-unit=monitors/Profiler/ProfilerMain/mpihpm.)

Chris


On 11/21/2012 09:10 AM, Mirko Cestari wrote:
> Dear users,
> we are experiencing some issues in trying to run Flash4b on
> our BGQ system if compiled with native xl compilers.
> We didn't experience any problem on our previous
> BG/P system (we compiled and run successfully the same code/input).
>
> The programs seems to hang indefinitely at the very beginning of
> the simulation run. Checking with a debugging tool (totalview) the stack
> trace turns out to be
>
> Stack Trace
> C    __mcount_internal,     FP=19ffffb980
>       ._mcount,              FP=19ffffba00
> f90  flash,                 FP=19ffffbaa0
>       .generic_start_main,   FP=19ffffbd80
> C    __libc_start_main,     FP=19ffffbe40
>
> the execution stops at
>
>    =>  program Flash
>
> in Flash.F90, more precisely, the execution runs in an infinite (while) loop
> in the function  __mcount_internal in mcount.c
>
>   =>            while (atomic_compare_and_exchange_bool_acq (&p->mcount_hwthd, hwthd, -1));
>
> which to my understanding is a profiling function (please note
> no profiling flags have been used to compile the program, is there
> any profiling embedded in the code?).
>
> Compiling with gcc gives worse performance but no "freezing" problems.
>
> Have you ran into similar problems on BG/Q systems? Can you point
> me to someone that might have encountered the same problem?
>
> Thanks in advance,
> Mirko
>
> --
> Mirko Cestari, PhD
> m.cestari at cineca.it
> CINECA - SuperComputing Applications and Innovation Department - SCAI
> via Magnanelli, 6/3 40033 Casalecchio di Reno (Bologna) - ITALY
> www.cineca.it
>





More information about the flash-users mailing list