<div dir="ltr">Hi Zhao,<div><br></div><div>Regarding "make" errors, note that you'll need to update paths in the FLASH/sites/MACHINE_NAME/Makefile.h file. Unfortunately, I'm not sure about detailed version numbers and I think you'll just have to play around with the combinations available to you unless someone else chimes in. Does HYPRE have some sort of "hello world" minimal working example for testing parallel scaling?</div><div><br></div><div>Best,<br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div dir="ltr">--------<div>Ryan</div></div></div></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jun 20, 2022 at 4:40 AM 赵旭 <<a href="mailto:xuzhao1994@sjtu.edu.cn">xuzhao1994@sjtu.edu.cn</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Dear Adam, Eddie and Ryan,<br>
<br>
Thank you very much for your invaluable help and insightful suggestions. <br>
<br>
I would like to ask what version of compiler (gcc, ... ?) + mpi (mpich or openmpi, ...?) + hypre + hdf5 do you suggest/prefer for FLASH 4.6.2 ?<br>
<br>
I have tried some versions of these but meet a lot of troubles like make error.<br>
<br>
For now I use gcc 11.2 + mpich 3.4.2 but I am not sure if this is suitable and which version of hypre and hdf5 should be used.<br>
<br>
Thank you again for your help.<br>
<br>
Zhao Xu<br>
<br>
<br>
<br>
----- 原始邮件 -----<br>
发件人: "Adam Reyes" <<a href="mailto:adam.reyes@rochester.edu" target="_blank">adam.reyes@rochester.edu</a>><br>
收件人: "Eddie Hansen" <<a href="mailto:ehansen@pas.rochester.edu" target="_blank">ehansen@pas.rochester.edu</a>><br>
抄送: "Ryan Farber" <<a href="mailto:rjfarber@umich.edu" target="_blank">rjfarber@umich.edu</a>>, "赵旭" <<a href="mailto:xuzhao1994@sjtu.edu.cn" target="_blank">xuzhao1994@sjtu.edu.cn</a>>, "flash-users" <<a href="mailto:flash-users@flash.rochester.edu" target="_blank">flash-users@flash.rochester.edu</a>><br>
发送时间: 星期六, 2022年 6 月 18日 上午 4:38:44<br>
主题: Re: [FLASH-USERS] MORE nodes MORE time on HPC<br>
<br>
It looks like from the log files that the simulation is only using ~400<br>
total blocks. The HYPRE solvers scale better when there are a larger number<br>
of blocks/(mpi rank) and with 5 blocks/rank in the 2 node case it's not<br>
surprising that there is a performance degradation in the radiation solver.<br>
<br>
You can get better scaling of the radTrans unit by using mesh replication<br>
<<a href="https://flash.rochester.edu/site/flashcode/user_support/flash4_ug_4p62/node147.html#SECTION061212000000000000000" rel="noreferrer" target="_blank">https://flash.rochester.edu/site/flashcode/user_support/flash4_ug_4p62/node147.html#SECTION061212000000000000000</a>>.<br>
This only scales the radiation unit and you can get speedup until you're<br>
spending as much time in radiation as one of the other units.<br>
<br>
*************************************************************************<br>
Adam Reyes<br>
<br>
Code Group Leader, Flash Center for Computational Science<br>
Research Scientist, Dept. of Physics and Astronomy<br>
University of Rochester<br>
River Campus: Bausch and Lomb Hall, 369<br>
500 Wilson Blvd. PO Box 270171, Rochester, NY 14627<br>
Email <a href="mailto:adam.reyes@rochester.edu" target="_blank">adam.reyes@rochester.edu</a><br>
Web <a href="https://flash.rochester.edu" rel="noreferrer" target="_blank">https://flash.rochester.edu</a><br>
(he / him / his)<br>
<br>
*************************************************************************<br>
<br>
On Fri, Jun 17, 2022 at 3:26 PM Hansen, Eddie <<a href="mailto:ehansen@pas.rochester.edu" target="_blank">ehansen@pas.rochester.edu</a>><br>
wrote:<br>
<br>
> Hello,<br>
><br>
> Just to add to Ryan's very helpful insights, the Grid_advanceDiffusion<br>
> routine is the main routine for all implicit diffusion solves (thermal<br>
> conduction and radiation transport). FLASH effectively just sets up or<br>
> defines the diffusion problems and the HYPRE library does all the hard<br>
> work.<br>
><br>
> It's not surprising that with 20 radiation groups, the code spends a lot<br>
> of time here. But with more cores taking longer, it does seem like the<br>
> version of HYPRE you have might not be parallelized.<br>
><br>
> --<br>
> Eddie Hansen | Research Scientist<br>
> Flash Center for Computational Science<br>
> Dept. of Physics and Astronomy<br>
> University of Rochester, Rochester, NY<br>
> Cell 607-341-6126 | <a href="http://flash.rochester.edu" rel="noreferrer" target="_blank">flash.rochester.edu</a><br>
> (he / him / his)<br>
> ------------------------------<br>
> *From:* flash-users <<a href="mailto:flash-users-bounces@flash.rochester.edu" target="_blank">flash-users-bounces@flash.rochester.edu</a>> on behalf<br>
> of Ryan Farber <<a href="mailto:rjfarber@umich.edu" target="_blank">rjfarber@umich.edu</a>><br>
> *Sent:* Friday, June 17, 2022 3:12:30 PM<br>
> *To:* 赵旭 <<a href="mailto:xuzhao1994@sjtu.edu.cn" target="_blank">xuzhao1994@sjtu.edu.cn</a>><br>
> *Cc:* flash-users <<a href="mailto:flash-users@flash.rochester.edu" target="_blank">flash-users@flash.rochester.edu</a>><br>
> *Subject:* Re: [FLASH-USERS] MORE nodes MORE time on HPC<br>
><br>
> Hi Zhao,<br>
><br>
> I'll admit I haven't used the modules you're using so I can't say much on<br>
> the performance you've observing vs expected. My guess is that there are<br>
> versions of HYPRE with and without parallel support and possibly yours is<br>
> operating without parallel support?<br>
><br>
> If you check towards the end of the log files it specifies<br>
> the percentage of time different routines took up. For the one node case,<br>
> 47% of the evolution time is spent in "RadTrans" most of which occurs in<br>
> "Grid_advanceDiffusion" of which most of that time is spent in<br>
> gr_hypre_solve (~27% of total walltime) next most expensive is sourceTerms<br>
> (37% total walltime) most of which is spent in EnergyDeposition -><br>
> Transport Rays.<br>
><br>
> In the two node case, the logfile shows that 74% of the simulation time is<br>
> spent in RadTrans, mostly (again) in "Grid_advanceDiffusion" and (again)<br>
> most of which is spent in gr_hypreSolve (~66%). Given that about twice the<br>
> amount of time is spent there compared to the one node case, there's likely<br>
> some MPI_BCAST or MPI_ALLREDUCE or other parallel inefficiencies occurring<br>
> there. Possibly related to the code not handling well cylindrical geometry<br>
> (just speculating).<br>
><br>
> Looking into those routines may provide future elucidation but I'd start<br>
> by exploring your hypre package. Best of luck!<br>
><br>
> Best,<br>
> --------<br>
> Ryan<br>
><br>
><br>
> On Fri, Jun 17, 2022 at 3:27 PM 赵旭 <<a href="mailto:xuzhao1994@sjtu.edu.cn" target="_blank">xuzhao1994@sjtu.edu.cn</a>> wrote:<br>
><br>
> Hi Ryan,<br>
><br>
> Thank you for you reply.<br>
><br>
> Please find attached log file for case 1) - 1node.log and case 2) -<br>
> 2nodes.log.<br>
><br>
> The setup flag is<br>
><br>
> ./setup -auto $SM_dir -2d +cylindrical -nxb=16 -nyb=16 -maxblocks=1000<br>
> +hdf5typeio species=Cham,Fuel,Cone +mtmmmt +laser +pm4dev +uhd3t +mgd<br>
> mgd_meshgroups=20 -objdir=~/zx/FLASH4.6.2/data/$Data_dir -parfile=$par_dir<br>
><br>
> and the .par is also attached<br>
><br>
> is is the AMR mesh caused the problem?<br>
><br>
> > "(e.g., if you've written a lot of MPI_BCAST or MPI_ALL_REDUCE calls) "<br>
><br>
> I dont write anything to the source code, I only changed run parameters<br>
> for targets and lasers, etc.<br>
><br>
> Best,<br>
><br>
> Zhao Xu<br>
><br>
> ----- 原始邮件 -----<br>
> 发件人: "Ryan Farber" <<a href="mailto:rjfarber@umich.edu" target="_blank">rjfarber@umich.edu</a>><br>
> 收件人: "赵旭" <<a href="mailto:xuzhao1994@sjtu.edu.cn" target="_blank">xuzhao1994@sjtu.edu.cn</a>><br>
> 抄送: "flash-users" <<a href="mailto:flash-users@flash.rochester.edu" target="_blank">flash-users@flash.rochester.edu</a>><br>
> 发送时间: 星期五, 2022年 6 月 17日 下午 5:31:46<br>
> 主题: Re: [FLASH-USERS] MORE nodes MORE time on HPC<br>
><br>
> Hi Zhao,<br>
><br>
> I'm having some trouble understanding exactly the cases you're comparing.<br>
> If you attach logfiles for each case, that should clear things up.<br>
><br>
> More generally, using more than one node / more processors increases the<br>
> communication time so if your problem doesn't scale well (e.g., if you've<br>
> written a lot of MPI_BCAST or MPI_ALL_REDUCE calls) then using more<br>
> processors can result in a slower solution time.<br>
><br>
> Best,<br>
> --------<br>
> Ryan<br>
><br>
><br>
> On Fri, Jun 17, 2022 at 7:19 AM 赵旭 <<a href="mailto:xuzhao1994@sjtu.edu.cn" target="_blank">xuzhao1994@sjtu.edu.cn</a>> wrote:<br>
><br>
> > Dear all,<br>
> ><br>
> > sorry about the typo,<br>
> ><br>
> > '' the results show that in case 2) it takes double or even more time<br>
> than<br>
> > 1) ''<br>
> ><br>
> > that is when i use more than 1 node, it spends more time.<br>
> ><br>
> ><br>
> > ----- 原始邮件 -----<br>
> > 发件人: "赵旭" <<a href="mailto:xuzhao1994@sjtu.edu.cn" target="_blank">xuzhao1994@sjtu.edu.cn</a>><br>
> > 收件人: "flash-users" <<a href="mailto:flash-users@flash.rochester.edu" target="_blank">flash-users@flash.rochester.edu</a>><br>
> > 发送时间: 星期五, 2022年 6 月 17日 上午 11:39:48<br>
> > 主题: [FLASH-USERS] MORE nodes MORE time on HPC<br>
> ><br>
> > Dear FLASH user & developers,<br>
> ><br>
> > I have a question about running FLASH code on HPC. I am running a<br>
> modified<br>
> > laserslab case (changed .par and initBlock.F90 for laser and target)<br>
> from<br>
> > the default one. I tried that<br>
> > 1) running on 1 nodes with 40 cores (1 nodes contains 40 cores) and<br>
> > 2) running on 2 nodes with 80 cores with the same setup and parameters in<br>
> > 1)<br>
> ><br>
> > the results show that in case 1) it takes double or even more time than<br>
> > 1), and this seems counter intuitive. Because if I run a case with larger<br>
> > simulation box or with fine resolution I have to using more cores.<br>
> ><br>
> > I tried two HPC, as a) 1 nodes = 40 cores with total 192G memory, and b)<br>
> 1<br>
> > nodes = 64 cores total 512G memory. It takes 2 times in 2 nodes in a) and<br>
> > nearly 5 times in 2 nodes in b)<br>
> ><br>
> > I dont know if this problem comes from the setting related to HPC (like<br>
> > mpi and hypre version , or job systerm) or setting related to FLASH<br>
> > code(like in some source code files)<br>
> ><br>
> > I use gcc 7.5, python3.8, mpich 3.3.2, hypre 2.11.2, hdf5 1.10.5.<br>
> ><br>
> > both HPCs use slurm job systerm and like below<br>
> ><br>
> > #!/bin/bash<br>
> ><br>
> > #SBATCH --job-name= # Name<br>
> > #SBATCH --partition=64c512g # cpu<br>
> > #SBATCH -n 128 # total cpu<br>
> > #SBATCH --ntasks-per-node=64 # cpu/node<br>
> > #SBATCH --output=%j.out<br>
> > #SBATCH --error=%j.err<br>
> ><br>
> > mpirun ./flash4 >laser_slab.log<br>
> ><br>
> > I would appreciate any help.<br>
> ><br>
> > Thanks !<br>
> ><br>
> > --<br>
> > Zhao Xu<br>
> > Laboratory for Laser Plasmas (MoE)<br>
> > Shanghai Jiao Tong University<br>
> > 800 Dongchuan Rd, Shanghai 200240<br>
> > _______________________________________________<br>
> > flash-users mailing list<br>
> > <a href="mailto:flash-users@flash.rochester.edu" target="_blank">flash-users@flash.rochester.edu</a><br>
> ><br>
> > For list info, including unsubscribe:<br>
> > <a href="https://flash.rochester.edu/mailman/listinfo/flash-users" rel="noreferrer" target="_blank">https://flash.rochester.edu/mailman/listinfo/flash-users</a><br>
> > --<br>
> > Zhao Xu<br>
> > Laboratory for Laser Plasmas (MoE)<br>
> > Shanghai Jiao Tong University<br>
> > 800 Dongchuan Rd, Shanghai 200240<br>
> > _______________________________________________<br>
> > flash-users mailing list<br>
> > <a href="mailto:flash-users@flash.rochester.edu" target="_blank">flash-users@flash.rochester.edu</a><br>
> ><br>
> > For list info, including unsubscribe:<br>
> > <a href="https://flash.rochester.edu/mailman/listinfo/flash-users" rel="noreferrer" target="_blank">https://flash.rochester.edu/mailman/listinfo/flash-users</a><br>
> ><br>
> --<br>
> Zhao Xu<br>
> Laboratory for Laser Plasmas (MoE)<br>
> Shanghai Jiao Tong University<br>
> 800 Dongchuan Rd, Shanghai 200240<br>
><br>
> _______________________________________________<br>
> flash-users mailing list<br>
> <a href="mailto:flash-users@flash.rochester.edu" target="_blank">flash-users@flash.rochester.edu</a><br>
><br>
> For list info, including unsubscribe:<br>
> <a href="https://flash.rochester.edu/mailman/listinfo/flash-users" rel="noreferrer" target="_blank">https://flash.rochester.edu/mailman/listinfo/flash-users</a><br>
><br>
-- <br>
Zhao Xu<br>
Laboratory for Laser Plasmas (MoE)<br>
Shanghai Jiao Tong University<br>
800 Dongchuan Rd, Shanghai 200240<br>
</blockquote></div>