[FLASH-USERS] Scaling problems
Ryan Farber
rjfarber at umich.edu
Wed Apr 5 14:32:41 EDT 2023
Thanks to Lee for pointing this out and sorry to Pedro for glossing over
your initial message where you mention that you tune iprocs, jprocs, nxb,
and nyb to have about the same number of grid points (so indeed looking at
strong scaling rather than weak scaling).
One more point regarding the sigbus issue is that perhaps adding
mcmodel=large in your FFLAGS_* (in your Makefile.h) will help -- if you're
on an Intel system anyway this enforces absolute addressing whereas the
default mcmodel=small uses relative addressing. From what I recall I had a
~2% performance decrease switching from mcmodel=large (but from
mcmodel=medium) but it helped with memory issues in the past at large core
counts.
Best,
--------
Ryan
On Wed, Apr 5, 2023 at 5:06 PM Ryan Farber <rjfarber at umich.edu> wrote:
> Hi Pedro,
>
> One point of follow-up regarding your sigbus error - it looks like this is
> a memory access error. I'm wondering if you're requesting more logical
> cores than physical cores exist on the machine. I've found that to be
> problematic in the past.
>
> It sounds like Lee might have the answer to your issue regarding too few
> zones per proc. One point I'm confused about though is whether you're
> studying strong or weak scaling. Based on your response to Paco and I
> checked the FLASH users guide, for uniform grid mode there's one block per
> processor -- and the number of zones per block are fixed at compile time
> (usually) so doesn't that mean you're increasing the amount of work
> proportional to the number of processors you use? In that case, seeing a
> constant "speedup" suggests good weak scaling.
>
> Best,
> --------
> Ryan
>
>
> On Wed, Apr 5, 2023 at 4:17 PM Leland Ellison <c.leland.ellison at gmail.com>
> wrote:
>
>> Hi Pedro,
>>
>> I suspect your per-block zone counts are too low to see benefits of
>> adding more procs at this point. The details will depend on your specific
>> problem and hardware of course, but when I've done strong scaling studies
>> I've found (rapidly) diminishing returns to adding more procs once I fall
>> below about ~1000 zones per proc. I think this is what you're seeing in
>> your nxb=nxy=16 and nxb=nxy=28 cases. If your scaling study continues up to
>> several thousand zones per proc, I bet you'll see more of the expected
>> behavior.
>>
>> Hope this helps!
>> Lee
>>
>> ________________
>> Leland Ellison PhD
>> Computational Physicist
>> https://www.linkedin.com/in/clelandellison/
>> https://scholar.google.com/citations?user=1rfcVWgAAAAJ
>>
>> On Wed, Apr 5, 2023 at 6:46 AM pedro romero <
>> pedro.romero at greentownsbyfusion.com> wrote:
>>
>>> Hi Paco,
>>>
>>>
>>>
>>> As far as I know, using uniform grid fixes the number of blocks as one
>>> per processor. Am I wrong? Do you mean to fix nxb and nyb while varying the
>>> cores?
>>>
>>>
>>>
>>> *De:* Francisco Holguin <opaco at umich.edu>
>>> *Enviado el:* miércoles, 5 de abril de 2023 15:17
>>> *Para:* pedro romero <pedro.romero at greentownsbyfusion.com>
>>> *CC:* flash-users at flash.rochester.edu
>>> *Asunto:* Re: [FLASH-USERS] Scaling problems
>>>
>>>
>>>
>>> Hi Pedro,
>>>
>>>
>>>
>>> What if you just fix the number of blocks, and vary the cores?
>>>
>>>
>>>
>>> -Paco
>>>
>>>
>>>
>>> On Wed, Apr 5, 2023 at 5:13 AM pedro romero <
>>> pedro.romero at greentownsbyfusion.com> wrote:
>>>
>>> Hi all,
>>>
>>>
>>>
>>> I am trying to scale up on computational resources and I came across a
>>> few issues. First of all, I am running the same +ug example (a modification
>>> of 2D Zpinch template) varying the number of cores, nxb and nyb but it
>>> shows no speed up as the number of cores increase (I am tuning Iprocs,
>>> Jprocs, nxb and nyb to always get an approximately equal grid).
>>>
>>>
>>>
>>> Furthermore, at a certain number of cores the program execution
>>> interrupts, and I get a SIGBUS error (which I attach to this message). Am I
>>> missing something? Is there any additional thing to consider?
>>>
>>>
>>>
>>> I will also attach the log file of one successful run using 36 cores and
>>> nxb=nyb=16 (which shows little or no speed up in comparison to a run on 12
>>> cores and nxb=nyb=28). Thank you in advance for any help.
>>>
>>>
>>>
>>> Pedro
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> flash-users mailing list
>>> flash-users at flash.rochester.edu
>>>
>>> For list info, including unsubscribe:
>>> https://flash.rochester.edu/mailman/listinfo/flash-users
>>>
>>> _______________________________________________
>>> flash-users mailing list
>>> flash-users at flash.rochester.edu
>>>
>>> For list info, including unsubscribe:
>>> https://flash.rochester.edu/mailman/listinfo/flash-users
>>>
>> _______________________________________________
>> flash-users mailing list
>> flash-users at flash.rochester.edu
>>
>> For list info, including unsubscribe:
>> https://flash.rochester.edu/mailman/listinfo/flash-users
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20230405/4d3fbfc3/attachment-0001.htm>
More information about the flash-users
mailing list