[FLASH-BUGS] Re: [FLASH-USERS] MPI bug on SGI
Paul Ricker
pmricker at uiuc.edu
Thu May 8 00:49:09 CDT 2003
Markus,
I believe I've fixed this problem. The MPI_ISENDs in the routine
move_block() in the file amr_redist_blk.F90 were not appropriately
completed via an MPI_WAITALL or similar call. I've modified the
file to keep track of send requests and to issue an MPI_WAITALL at
the end of amr_redist_blk() (as was done already with the non-blocking
receives posted in that routine). I've tested it under 2.2, and it
appears to work (I had a student who encountered the bug a few days
ago, so I was able to reproduce the problem and verify the fix).
The updated file is attached, and I have committed the changes to
2.3 (the file had not changed since 2.2).
Incidentally, I also checked the other AMR routines to see if there
are any similar problems elsewhere. I counted the calls to
non-blocking receives and sends, waitalls, waitanys, testalls, and
testanys. There were no test calls. Since each send/receive
creates a request that has to be waited on, one would expect the
code to be structured so that there are as many wait calls as
sends+receives. The results are below.
routine isend irecv waitall waitany #posts #waits
---------------------------------------------------------------------------
amr_bsort 0 1 1 0 1 1
amr_derefine_blocks 0 8 8 0 8 8
amr_flux_conserve_udt 0 3 1 0 3 1*
OK - all irecvs use same request array
amr_guardcell_cc_c_to_f 0 4 4 0 4 4
amr_guardcell_cc_srl 0 2 2 0 2 2
amr_morton 0 10 8 0 10 8*
OK - all irecvs use same request array
amr_prolong_cc 0 5 5 0 5 5
amr_redist_blk 1 1 1 0 2 1*
BAD - isend matched by iprobe & recv, no wait/test
amr_refine_blocks 0 6 6 0 6 6
amr_refine_derefine 0 2 2 0 2 2
amr_restrict_bnd_data 0 3 1 0 3 1*
OK - all irecvs use same request array
amr_restrict_cc 0 2 2 0 2 2
batchsend 0 1 0 1 1 1
batchsend_dbl 0 1 0 1 1 1
ref_marking 1 1 2 0 2 2
---------------------------------------------------------------------------
Only four routines (*) had more sends+receives than waits, and on closer
inspection, three of the routines used the same request array in the
MPI_IRECV calls, so the discrepancy wasn't a problem on its face.
(There could still be problems with the number of times each is called,
e.g. in a loop; I didn't check for that.) amr_redist_blk was the only
file in which there was a discrepancy that couldn't be explained by
re-use of the same request array.
Let me know if this doesn't fix your problem.
Best regards,
Paul Ricker
--
---------------------------------------------------------------------
Paul M. Ricker Department of Astronomy
Assistant Professor National Center for Supercomputing Applications
pmricker at uiuc.edu University of Illinois at Urbana-Champaign
http://www.astro.uiuc.edu/~pmricker Urbana IL 61801-3074
---------------------------------------------------------------------
On Thu, 2003-04-17 at 14:18, Markus Gross wrote:
> Hi!
>
> I belive I have just encountered a bug in at least one FLASH routine.
>
> Our System:
>
> SGI origin IRIX64 6.5 07091542 IP35
> MPI 3.2.0.7 (MPT 1.4) -mpi Version 1266208220
> MIPSpro Compilers: Version 7.4
>
> Problem:
>
> 3-D hydro starting with 5x5x3 blocks, maxref = 3
> we get a:
>
> *** MPI has run out of request entries.
> *** The current allocation level is:
> *** MPI_REQUEST_MAX = 16384
> IOT Trap
> MPI: MPI_COMM_WORLD rank 0 has terminated without calling MPI_Finalize()
> MPI: aborting job
>
> when:
>
> [04-17-2003 18:48.39] <<< refined: tot_blocks = 2723 >>>
> [04-17-2003 18:48.59] *** plot file: gas_pre_hdf_plt_cnt_0009
> [04-17-2003 18:49.08] *** wrote to gas_pre_hdf_plt_cnt_0009
> [04-17-2003 18:49.41] step 91 t= 2.664610E-06 dt= 1.431909E-08
> [04-17-2003 18:50.12] step 92 t= 2.693248E-06 dt= 1.426362E-08
>
> MPI_REQUEST_MAX = 16384 is a hard limit on our machine and can only be
> decreased using the enivronment variable.
>
>
> I tracked that down to at least one non-blocking send to a blocking receive
> where the send reqest was not freed. This happens at least is:
>
> line 241 AMR_redist_blk.F90 FLASH 2.2 & 2.1
>
>
> according to the MPI standard that should be fixed by having a MPI_WAIT
> following the non-blocking send immediately.
>
> I attached a test program to this email which illustrates this point (at
> least on our system). Version as supplied should fails, if you uncomment the
> MPI_wait it should run until forever.
>
> Please let me know what you think about this and if I am right, if there are
> more if these problems in the code (I presume you would know from the top of
> your head). I try to get through it tonight, but Flash is a little bit
> bigger, so help would be appreciated.
>
> Regards,
>
> Markus.
-------------- next part --------------
subroutine amr_redist_blk(new_loc,nprocs,mype,lnblocks_old)
! $RCSfile: amr_redist_blk.F90,v $
! $Revision: 1.1.1.1 $
! $Date: 2002/11/27 20:00:50 $
use physicaldata
use tree
implicit none
include 'mpif.h'
integer istart,jstart,kstart,iend,jend,kend
parameter(istart = nguard)
parameter(jstart = nguard*k2d+1-k2d)
parameter(kstart = nguard*k3d+1-k3d)
parameter(iend = nguard+nxb+1)
parameter(jend = nguard*k2d+nyb+k2d)
parameter(kend = nguard*k3d+nzb+k3d)
integer :: new_loc(2,maxblocks_tr), old_loc(2,maxblocks_tr)
integer :: nprocs,mype,lnblocks_old
integer :: nrecv, nsend, lb,errorcode,ierr
logical :: free(maxblocks), moved(maxblocks), sent(maxblocks)
logical :: repeat, repeatt
integer :: reqr(maxblocks_tr), reqs(maxblocks_tr)
integer :: statr(MPI_STATUS_SIZE,maxblocks_tr)
integer :: stats(MPI_STATUS_SIZE,maxblocks_tr)
integer :: nmoved, nit
integer :: test(maxblocks), point_to(maxblocks)
integer :: nm, nm2, nm2_old
integer :: myblockint, block_int2d, block_int3d
! DEFINE BLOCK INTERIOR
call MPI_TYPE_VECTOR (nyb*k2d+k2d+1, &
& nvar*(nxb+2), &
& nvar*iu_bnd, &
& MPI_DOUBLE_PRECISION, &
& block_int2d, &
& ierr)
myblockint = block_int2d
if (ndim.eq.3) then
call MPI_TYPE_HVECTOR (nzb+2, &
& 1, &
& nvar*iu_bnd*ju_bnd*8, &
& block_int2d, &
& block_int3d, &
& ierr)
myblockint = block_int3d
end if
if (ndim.eq.3) call MPI_TYPE_COMMIT(block_int2d,ierr)
call MPI_TYPE_COMMIT(myblockint,ierr)
! 1) compute old_loc
call fill_old_loc (new_loc,old_loc,nprocs,mype)
!--------------
! treat unk
if(nvar.gt.0) then
! Post all receives
nrecv = 0
do lb = 1,new_lnblocks
if (.not.newchild(lb)) then
if (old_loc(2,lb).ne.mype) then
nrecv = nrecv + 1
call MPI_IRECV (unk(1,istart,jstart,kstart,lb), &
& 1, &
& myblockint, &
& old_loc(2,lb), &
& lb, &
& MPI_COMM_WORLD, &
& reqr(nrecv), &
& ierr)
end if
end if
end do
moved(:) = .false.
moved(lnblocks_old+1:maxblocks) = .true.
free(:) = .false.
free(lnblocks_old+1:maxblocks) = .true.
sent(:) = .false.
repeat = .TRUE.
nmoved = 0
test(:) = 0
point_to(:) = 0
nsend = 0
nit = 0
nm2 = 0
nm2_old = 1
do while (repeat.and.nm2.ne.nm2_old)
do lb = 1, max(lnblocks_old,new_lnblocks)
call move_block(lb, new_loc, old_loc, free, moved, sent, &
& lnblocks_old, mype, nmoved, &
& test, point_to, myblockint, reqs, nsend)
enddo
repeat = any(.not.moved(:))
call MPI_ALLREDUCE (repeat,repeatt,1,MPI_LOGICAL, &
& MPI_LOR,MPI_COMM_WORLD,ierr)
repeat = repeatt
nm2_old = nm2
nm = count(.not.moved(:))
call MPI_ALLREDUCE (nm,nm2,1,MPI_INTEGER, &
& MPI_SUM,MPI_COMM_WORLD,ierr)
if (mype.eq.0) then
print *,' interation, no. not moved = ',nit,nm2
end if
nit = nit + 1
end do
if (nm2_old.eq.nm2.and.nm2.ne.0) then
if (mype.eq.0) then
print *,' ERROR: could not move all blocks in amr_redist_blk '
print *,' Try increasing maxblocks or use more processors '
print *,' nm2_old, nm2 = ',nm2_old,nm2
print *,' ABORTING !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
end if
call abort_flash("Error: could not move all blocks in amr_redist_blk")
end if
if (nrecv.gt.0) then
call MPI_WAITALL(nrecv,reqr,statr,ierr)
end if
if (nsend.gt.0) then
call MPI_WAITALL(nsend,reqs,stats,ierr)
end if
end if
call MPI_TYPE_FREE(myblockint,ierr)
if (ndim.eq.3) call MPI_TYPE_FREE(block_int2d,ierr)
return
end subroutine amr_redist_blk
recursive subroutine move_block (lb, new_loc, old_loc, free, &
& moved, sent, &
& lnblocks_old, mype, nmoved, &
& test, point_to, myblockint, &
reqs, nsend)
use physicaldata
use tree
implicit none
include 'mpif.h'
integer istart,jstart,kstart,iend,jend,kend
parameter(istart = nguard)
parameter(jstart = nguard*k2d+1-k2d)
parameter(kstart = nguard*k3d+1-k3d)
parameter(iend = nguard+nxb+1)
parameter(jend = nguard*k2d+nyb+k2d)
parameter(kend = nguard*k3d+nzb+k3d)
integer :: new_loc(2,maxblocks_tr), old_loc(2,maxblocks_tr)
logical :: free(maxblocks), moved(maxblocks), sent(maxblocks)
integer :: reqs(maxblocks_tr), nsend
integer :: lb, lnblocks_old, mype
logical :: success
integer :: status(MPI_STATUS_SIZE)
integer :: reqr, ierr, nmoved, lb2
integer :: point_to(maxblocks),test(maxblocks)
integer :: myblockint
if (new_loc(1,lb).eq.lb.and.new_loc(2,lb).eq.mype) then
if (.not.moved(lb)) moved(lb) = .true.
return
end if
if (lb.le.max(lnblocks_old,new_lnblocks)) then
if (lb.le.lnblocks_old) then
if (new_loc(2,lb).ne.mype) then
success = .false.
call MPI_IPROBE (new_loc(2,lb), &
& maxblocks+new_loc(1,lb), &
& MPI_COMM_WORLD, &
& success, &
& status, &
& ierr)
if (.not.moved(lb).and.success) then
call MPI_RECV (success, &
& 1, &
& MPI_LOGICAL, &
& new_loc(2,lb), &
& maxblocks+new_loc(1,lb), &
& MPI_COMM_WORLD, &
& status, &
& ierr)
if (free(lb)) then
call MPI_SSEND ( &
& unk(1,istart,jstart,kstart,point_to(lb)), &
& 1, &
& myblockint, &
& new_loc(2,lb), &
& new_loc(1,lb), &
& MPI_COMM_WORLD, &
& ierr)
test(point_to(lb)) = -1
else
call MPI_SSEND ( &
& unk(1,istart,jstart,kstart,lb), &
& 1, &
& myblockint, &
& new_loc(2,lb), &
& new_loc(1,lb), &
& MPI_COMM_WORLD, &
& ierr)
free(lb) = .true.
end if
moved(lb) = .true.
end if
else
if (.not.moved(lb).and.free(new_loc(1,lb))) then
if (free(lb)) then
unk(:,istart:iend,jstart:jend,kstart:kend,new_loc(1,lb)) = &
& unk(:,istart:iend,jstart:jend,kstart:kend,point_to(lb))
test(point_to(lb)) = -1
else
unk(:,istart:iend,jstart:jend,kstart:kend,new_loc(1,lb)) = &
& unk(:,istart:iend,jstart:jend,kstart:kend,lb)
free(lb) = .true.
end if
moved(lb) = .true.
end if
end if
end if
if (lb.le.new_lnblocks) then
if (free(lb).and..not.sent(lb)) then
sent(lb) = .true.
if (.not.newchild(lb)) then
if (old_loc(2,lb).ne.mype) then
nsend = nsend + 1
call MPI_ISEND (free(lb), &
& 1, &
& MPI_LOGICAL, &
& old_loc(2,lb), &
& maxblocks+lb, &
& MPI_COMM_WORLD, &
& reqs(nsend), &
& ierr)
end if
end if
end if
end if
if (lb.le.lnblocks_old.and..not.free(lb)) then
nmoved = nmoved + 1
point_to(lb) = max(lnblocks_old,new_lnblocks)+nmoved
if (point_to(lb).gt.maxblocks) then
do lb2 = max(lnblocks_old,new_lnblocks)+1,maxblocks
if (test(lb2).eq.-1) then
point_to(lb) = lb2
go to 22
end if
end do
end if
22 if (point_to(lb).le.maxblocks) then
test(point_to(lb)) = 1
unk(:,istart:iend,jstart:jend,kstart:kend,point_to(lb)) = &
& unk(:,istart:iend,jstart:jend,kstart:kend,lb)
free(lb) = .TRUE.
end if
end if
else
return
end if
return
end subroutine move_block
More information about the flash-bugs
mailing list