[FLASH-USERS] Flash4b hangs on BlueGene/Q system

Chris Daley cdaley at flash.uchicago.edu
Fri Nov 23 17:08:26 EST 2012


Hi Mirko,

Your compiler option -cpp is being treated as -p by the compiler, where
option -p is

-p[g]  Sets up the object files produced by the compiler for profiling.
        -pg is like -p, but it produces more extensive statistics.


I show results from tests I did on Vesta (BG/Q at Argone National
Lab):

$ cat test.F90
subroutine test
   print *, "hello"
end subroutine test

$ xlf90_r -c test.F90
** test   === End of Compilation 1 ===
1501-510  Compilation successful for file test.F90.
$ nm test.o
0000000000000000 r .const_dr
0000000000000000 d _$STATIC
                  U _xlfBeginIO
                  U _xlfEndIO
                  U _xlfWriteLDChar
0000000000000000 D test

$ xlf90_r -c -cpp test.F90
** test   === End of Compilation 1 ===
1501-510  Compilation successful for file test.F90.
$ nm test.o
0000000000000000 r .const_dr
0000000000000000 d _$STATIC
                  U _mcount
                  U _xlfBeginIO
                  U _xlfEndIO
                  U _xlfWriteLDChar
0000000000000000 D test


Notice that when I include '-cpp' I get the '_mcount' symbol in the
object file.  You should remove the '-cpp' option from your compile
line.  Even better would be to download FLASH4.0 and base your
Makefile.h on ./sites/miralac1/Makefile.h.  This should allow you to
run FLASH on your BG/Q.

Normally these profiling symbols should not cause any problems,
however, keep in mind that the BG/Q is still very new and weird issues
can crop up.  In the last year we have had early access to the BG/Q at
Argonne National Lab and I have also had an issue with gprof
profiling.  I did not have the hang that you are encountering but I
had an issue where 0 samples were recorded during gprof profiling.
Someone at Argonne National Lab had the same issue and advised me to
link against an old version of libc.a at an earlier efix level so that
I could continue with gprof profiling (the mcount symbols exist in
libc.a.).  Perhaps the issues are related?

Chris



On 11/23/2012 03:11 AM, Mirko Cestari wrote:
> Dear Chris,
> thanks a lot for your answer. You probably pointed out the
> problem here. Indeed I have the __mcount_internal
> in my executable:
>
>   nm   flash4 | egrep '( _mcount$|__mcount_internal$)'
> 0000000001904e78 D __mcount_internal
> 00000000019039f0 D _mcount
>
>
> This is an example of how an object is compiled and the application linked
>
> ============
> mpixlf77_r -O0 -g -qstrict -q64  -qzerosize  -c -cpp  -qsuffix=cpp=F:cpp=F90 -qfree=f90 -WF,-DMAXBLOCKS=100 -WF,-DNXB=8 -WF,-DNYB=8 -WF,-DNZB=8 -WF,-DN_DIM=3 rp_getOpt.F90
>
> mpixlf77_r   -o  flash4 Burn.o Burn_computeDt.o Burn_finalize.o ...
> ==============================
>
> I use no "-pg" argument, so I don't really understand where the __mcount_internal
> function comes from.
>
> Mirko
>
> --
> Mirko Cestari, PhD
> m.cestari at cineca.it
> CINECA - SuperComputing Applications and Innovation Department - SCAI
> via Magnanelli, 6/3 40033 Casalecchio di Reno (Bologna) - ITALY
> www.cineca.it
>
> ----- Original Message -----
> From: "Christopher Daley" <cdaley at flash.uchicago.edu>
> To: "Mirko Cestari" <m.cestari at cineca.it>
> Cc: flash-users at flash.uchicago.edu
> Sent: Wednesday, November 21, 2012 6:42:36 PM
> Subject: Re: [FLASH-USERS] Flash4b hangs on BlueGene/Q system
>
> Hi Mirko,
>
> We are able to run FLASH simulations on the Argonne BG/Q.
>
> I would double check that you have no '-pg' in your compile
> lines.  Maybe you also have custom static pattern rules in your
> Makefile.h that contain '-pg'?
>
> Based on your stacktrace you should see _mcount symbol in your
> Flash.o and flash4 binary, i.e.
>
> $ nm -A Flash.o flash4 | egrep '( _mcount$|__mcount_internal$)'
> Flash.o:                 U _mcount
> flash4:00000000023a47e0 D __mcount_internal
> flash4:00000000023a3628 D _mcount
> $
>
> I can only get these symbols by compiling Flash.F90 with -pg.
> If I remove -pg then I get
> $ nm -A Flash.o flash4 | egrep '( _mcount$|__mcount_internal$)'
> $
>
> You should create a flash4 binary that has no reference to
> _mcount and then repeat your flash4 run.
>
>
> (The only "embedded" profiling in FLASH4 needs to be explicitly
> included in your FLASH application at setup time using
> -unit=monitors/Profiler/ProfilerMain/mpihpm.)
>
> Chris
>
>
> On 11/21/2012 09:10 AM, Mirko Cestari wrote:
>> Dear users,
>> we are experiencing some issues in trying to run Flash4b on
>> our BGQ system if compiled with native xl compilers.
>> We didn't experience any problem on our previous
>> BG/P system (we compiled and run successfully the same code/input).
>>
>> The programs seems to hang indefinitely at the very beginning of
>> the simulation run. Checking with a debugging tool (totalview) the stack
>> trace turns out to be
>>
>> Stack Trace
>> C    __mcount_internal,     FP=19ffffb980
>>        ._mcount,              FP=19ffffba00
>> f90  flash,                 FP=19ffffbaa0
>>        .generic_start_main,   FP=19ffffbd80
>> C    __libc_start_main,     FP=19ffffbe40
>>
>> the execution stops at
>>
>>     =>  program Flash
>>
>> in Flash.F90, more precisely, the execution runs in an infinite (while) loop
>> in the function  __mcount_internal in mcount.c
>>
>>    =>            while (atomic_compare_and_exchange_bool_acq (&p->mcount_hwthd, hwthd, -1));
>>
>> which to my understanding is a profiling function (please note
>> no profiling flags have been used to compile the program, is there
>> any profiling embedded in the code?).
>>
>> Compiling with gcc gives worse performance but no "freezing" problems.
>>
>> Have you ran into similar problems on BG/Q systems? Can you point
>> me to someone that might have encountered the same problem?
>>
>> Thanks in advance,
>> Mirko
>>
>> --
>> Mirko Cestari, PhD
>> m.cestari at cineca.it
>> CINECA - SuperComputing Applications and Innovation Department - SCAI
>> via Magnanelli, 6/3 40033 Casalecchio di Reno (Bologna) - ITALY
>> www.cineca.it
>>




More information about the flash-users mailing list