[FLASH-USERS] MORE nodes MORE time on HPC

Fri Jun 17 16:38:44 EDT 2022

It looks like from the  log files that the simulation is only using ~400
total blocks. The HYPRE solvers scale better when there are a larger number
of blocks/(mpi rank) and with 5 blocks/rank in the 2 node case it's not
surprising that there is a performance degradation in the radiation solver.

You can get better scaling of the radTrans unit by using mesh replication
<https://flash.rochester.edu/site/flashcode/user_support/flash4_ug_4p62/node147.html#SECTION061212000000000000000>.
This only scales the radiation unit and you can get speedup until you're
spending as much time in  radiation as one of the other units.

*************************************************************************
Adam Reyes

Code Group Leader, Flash Center for Computational Science
Research Scientist, Dept. of Physics and Astronomy
University of Rochester
River Campus: Bausch and Lomb Hall, 369
500 Wilson Blvd. PO Box 270171, Rochester, NY 14627
Email adam.reyes at rochester.edu
Web https://flash.rochester.edu
 (he / him / his)

*************************************************************************

On Fri, Jun 17, 2022 at 3:26 PM Hansen, Eddie <ehansen at pas.rochester.edu>
wrote:

> Hello,
>
> Just to add to Ryan's very helpful insights, the Grid_advanceDiffusion
> routine is the main routine for all implicit diffusion solves (thermal
> conduction and radiation transport). FLASH effectively just sets up or
> defines the diffusion problems and the HYPRE library does all the hard
> work.
>
> It's not surprising that with 20 radiation groups, the code spends a lot
> of time here. But with more cores taking longer, it does seem like the
> version of HYPRE you have might not be parallelized.
>
> --
> Eddie Hansen | Research Scientist
> Flash Center for Computational Science
> Dept. of Physics and Astronomy
> University of Rochester, Rochester, NY
> Cell 607-341-6126 | flash.rochester.edu
> (he / him / his)
> ------------------------------
> *From:* flash-users <flash-users-bounces at flash.rochester.edu> on behalf
> of Ryan Farber <rjfarber at umich.edu>
> *Sent:* Friday, June 17, 2022 3:12:30 PM
> *To:* 赵旭 <xuzhao1994 at sjtu.edu.cn>
> *Cc:* flash-users <flash-users at flash.rochester.edu>
> *Subject:* Re: [FLASH-USERS] MORE nodes MORE time on HPC
>
> Hi Zhao,
>
> I'll admit I haven't used the modules you're using so I can't say much on
> the performance you've observing vs expected. My guess is that there are
> versions of HYPRE with and without parallel support and possibly yours is
> operating without parallel support?
>
> If you check towards the end of the log files it specifies
> the percentage of time different routines took up. For the one node case,
> 47% of the evolution time is spent in "RadTrans" most of which occurs in
> "Grid_advanceDiffusion" of which most of that time is spent in
> gr_hypre_solve (~27% of total walltime) next most expensive is sourceTerms
> (37% total walltime) most of which is spent in EnergyDeposition ->
> Transport Rays.
>
> In the two node case, the logfile shows that 74% of the simulation time is
> spent in RadTrans, mostly (again) in "Grid_advanceDiffusion" and (again)
> most of which is spent in gr_hypreSolve (~66%). Given that about twice the
> amount of time is spent there compared to the one node case, there's likely
> some MPI_BCAST or MPI_ALLREDUCE or other parallel inefficiencies occurring
> there. Possibly related to the code not handling well cylindrical geometry
> (just speculating).
>
> Looking into those routines may provide future elucidation but I'd start
> by exploring your hypre package. Best of luck!
>
> Best,
> --------
> Ryan
>
>
> On Fri, Jun 17, 2022 at 3:27 PM 赵旭 <xuzhao1994 at sjtu.edu.cn> wrote:
>
> Hi Ryan，
>
> Thank you for you reply.
>
> Please find attached log file for case 1) - 1node.log and case 2) -
> 2nodes.log.
>
> The setup flag is
>
> ./setup -auto $SM_dir -2d +cylindrical -nxb=16 -nyb=16 -maxblocks=1000
> +hdf5typeio species=Cham,Fuel,Cone +mtmmmt +laser +pm4dev +uhd3t +mgd
> mgd_meshgroups=20 -objdir=~/zx/FLASH4.6.2/data/$Data_dir -parfile=$par_dir
>
> and the .par is also attached
>
> is is the AMR mesh caused the problem?
>
> > "(e.g., if you've written a lot of MPI_BCAST or MPI_ALL_REDUCE calls) "
>
> I dont write anything to the source code, I only changed run parameters
> for targets and lasers, etc.
>
> Best,
>
> Zhao Xu
>
> ----- 原始邮件 -----
> 发件人: "Ryan Farber" <rjfarber at umich.edu>
> 收件人: "赵旭" <xuzhao1994 at sjtu.edu.cn>
> 抄送: "flash-users" <flash-users at flash.rochester.edu>
> 发送时间: 星期五, 2022年 6 月 17日 下午 5:31:46
> 主题: Re: [FLASH-USERS] MORE nodes MORE time on HPC
>
> Hi Zhao,
>
> I'm having some trouble understanding exactly the cases you're comparing.
> If you attach logfiles for each case, that should clear things up.
>
> More generally, using more than one node / more processors increases the
> communication time so if your problem doesn't scale well (e.g., if you've
> written a lot of MPI_BCAST or MPI_ALL_REDUCE calls) then using more
> processors can result in a slower solution time.
>
> Best,
> --------
> Ryan
>
>
> On Fri, Jun 17, 2022 at 7:19 AM 赵旭 <xuzhao1994 at sjtu.edu.cn> wrote:
>
> > Dear all,
> >
> > sorry about the typo,
> >
> > '' the results show that in case 2) it takes double or even more time
> than
> > 1) ''
> >
> > that is when i use more than 1 node, it spends more time.
> >
> >
> > ----- 原始邮件 -----
> > 发件人: "赵旭" <xuzhao1994 at sjtu.edu.cn>
> > 收件人: "flash-users" <flash-users at flash.rochester.edu>
> > 发送时间: 星期五, 2022年 6 月 17日 上午 11:39:48
> > 主题: [FLASH-USERS] MORE nodes MORE time on HPC
> >
> > Dear FLASH user & developers,
> >
> > I have a question about running FLASH code on HPC. I am running a
> modified
> > laserslab case (changed .par and  initBlock.F90 for laser and target)
> from
> > the default one. I tried that
> > 1) running on 1 nodes with 40 cores  (1 nodes contains 40 cores) and
> > 2) running on 2 nodes with 80 cores with the same setup and parameters in
> > 1)
> >
> > the results show that in case 1) it takes double or even more time than
> > 1), and this seems counter intuitive. Because if I run a case with larger
> > simulation box or with fine resolution I have to using more cores.
> >
> > I tried two HPC, as a) 1 nodes = 40 cores with total 192G memory, and b)
> 1
> > nodes = 64 cores total 512G memory. It takes 2 times in 2 nodes in a) and
> > nearly 5 times in 2 nodes in b)
> >
> > I dont know if this problem comes from the setting related to HPC (like
> > mpi and hypre version , or job systerm) or setting related to FLASH
> > code(like in some source code files)
> >
> > I use gcc 7.5, python3.8, mpich 3.3.2, hypre 2.11.2, hdf5 1.10.5.
> >
> > both HPCs use slurm job systerm and like below
> >
> >   #!/bin/bash
> >
> >   #SBATCH --job-name=         # Name
> >   #SBATCH --partition=64c512g               # cpu
> >   #SBATCH -n 128                       # total cpu
> >   #SBATCH --ntasks-per-node=64          # cpu/node
> >   #SBATCH --output=%j.out
> >   #SBATCH --error=%j.err
> >
> >   mpirun ./flash4 >laser_slab.log
> >
> > I would appreciate any help.
> >
> > Thanks !
> >
> > --
> > Zhao Xu
> > Laboratory for Laser Plasmas (MoE)
> > Shanghai Jiao Tong University
> > 800 Dongchuan Rd, Shanghai 200240
> > _______________________________________________
> > flash-users mailing list
> > flash-users at flash.rochester.edu
> >
> > For list info, including unsubscribe:
> > https://flash.rochester.edu/mailman/listinfo/flash-users
> > --
> > Zhao Xu
> > Laboratory for Laser Plasmas (MoE)
> > Shanghai Jiao Tong University
> > 800 Dongchuan Rd, Shanghai 200240
> > _______________________________________________
> > flash-users mailing list
> > flash-users at flash.rochester.edu
> >
> > For list info, including unsubscribe:
> > https://flash.rochester.edu/mailman/listinfo/flash-users
> >
> --
> Zhao Xu
> Laboratory for Laser Plasmas (MoE)
> Shanghai Jiao Tong University
> 800 Dongchuan Rd, Shanghai 200240
>
> _______________________________________________
> flash-users mailing list
> flash-users at flash.rochester.edu
>
> For list info, including unsubscribe:
> https://flash.rochester.edu/mailman/listinfo/flash-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20220617/2d80e577/attachment-0001.htm>