[FLASH-BUGS] Re: [FLASH-USERS] MPI bug on SGI

Thu May 8 00:49:09 CDT 2003

Markus,

I believe I've fixed this problem.  The MPI_ISENDs in the routine
move_block() in the file amr_redist_blk.F90 were not appropriately
completed via an MPI_WAITALL or similar call.  I've modified the
file to keep track of send requests and to issue an MPI_WAITALL at
the end of amr_redist_blk() (as was done already with the non-blocking
receives posted in that routine).  I've tested it under 2.2, and it
appears to work (I had a student who encountered the bug a few days
ago, so I was able to reproduce the problem and verify the fix).
The updated file is attached, and I have committed the changes to
2.3 (the file had not changed since 2.2).

Incidentally, I also checked the other AMR routines to see if there
are any similar problems elsewhere.  I counted the calls to
non-blocking receives and sends, waitalls, waitanys, testalls, and
testanys.  There were no test calls.  Since each send/receive
creates a request that has to be waited on, one would expect the
code to be structured so that there are as many wait calls as
sends+receives.  The results are below.

routine                     isend  irecv  waitall waitany  #posts #waits
---------------------------------------------------------------------------
amr_bsort                      0     1       1       0        1       1
amr_derefine_blocks            0     8       8       0        8       8
amr_flux_conserve_udt          0     3       1       0        3       1*
  OK - all irecvs use same request array
amr_guardcell_cc_c_to_f        0     4       4       0        4       4
amr_guardcell_cc_srl           0     2       2       0        2       2
amr_morton                     0    10       8       0       10       8*
  OK - all irecvs use same request array
amr_prolong_cc                 0     5       5       0        5       5
amr_redist_blk                 1     1       1       0        2       1*
  BAD - isend matched by iprobe & recv, no wait/test
amr_refine_blocks              0     6       6       0        6       6
amr_refine_derefine            0     2       2       0        2       2
amr_restrict_bnd_data          0     3       1       0        3       1*
  OK - all irecvs use same request array
amr_restrict_cc                0     2       2       0        2       2
batchsend                      0     1       0       1        1       1
batchsend_dbl                  0     1       0       1        1       1
ref_marking                    1     1       2       0        2       2
---------------------------------------------------------------------------

Only four routines (*) had more sends+receives than waits, and on closer
inspection, three of the routines used the same request array in the
MPI_IRECV calls, so the discrepancy wasn't a problem on its face.
(There could still be problems with the number of times each is called,
e.g. in a loop; I didn't check for that.)  amr_redist_blk was the only
file in which there was a discrepancy that couldn't be explained by
re-use of the same request array.

Let me know if this doesn't fix your problem.

Best regards,
Paul Ricker

-- 
---------------------------------------------------------------------
Paul M. Ricker                                Department of Astronomy
Assistant Professor   National Center for Supercomputing Applications
pmricker at uiuc.edu          University of Illinois at Urbana-Champaign
http://www.astro.uiuc.edu/~pmricker              Urbana IL 61801-3074
---------------------------------------------------------------------

On Thu, 2003-04-17 at 14:18, Markus Gross wrote:
> Hi!
> 
> I belive I have just encountered a bug in at least one FLASH routine.
> 
> Our System: 
> 
> SGI origin IRIX64 6.5 07091542 IP35
> MPI 3.2.0.7 (MPT 1.4) -mpi Version 1266208220
> MIPSpro Compilers: Version 7.4
> 
> Problem:
> 
> 3-D hydro starting with 5x5x3 blocks, maxref = 3
> we get a:
> 
> *** MPI has run out of request entries.
> *** The current allocation level is:
> ***     MPI_REQUEST_MAX = 16384
> IOT Trap
> MPI: MPI_COMM_WORLD rank 0 has terminated without calling MPI_Finalize()
> MPI: aborting job
> 
> when:
> 
>  [04-17-2003  18:48.39] <<< refined: tot_blocks =    2723 >>>
>  [04-17-2003  18:48.59] *** plot file:  gas_pre_hdf_plt_cnt_0009
>  [04-17-2003  18:49.08] *** wrote to gas_pre_hdf_plt_cnt_0009
>  [04-17-2003  18:49.41] step      91  t=  2.664610E-06  dt= 1.431909E-08
>  [04-17-2003  18:50.12] step      92  t=  2.693248E-06  dt= 1.426362E-08
> 
> MPI_REQUEST_MAX = 16384 is a hard limit on our machine and can only be 
> decreased using the enivronment variable.
> 
> 
> I tracked that down to at least one non-blocking send to a blocking receive 
> where the send reqest was not freed. This happens at least is:
> 
> line 241 AMR_redist_blk.F90 FLASH 2.2 & 2.1
> 
> 
> according to the MPI standard that should be fixed by having a MPI_WAIT 
> following the non-blocking send immediately. 
> 
> I attached a test program to this email which illustrates this point (at 
> least on our system). Version as supplied should fails, if you uncomment the 
> MPI_wait it should run until forever.
> 
> Please let me know what you think about this and if I am right, if there are 
> more if these problems in the code (I presume you would know from the top of 
> your head). I try to get through it tonight, but Flash is a little bit 
> bigger, so help would be appreciated.
> 
> Regards,
> 
> Markus.

-------------- next part --------------
      subroutine amr_redist_blk(new_loc,nprocs,mype,lnblocks_old)

! $RCSfile: amr_redist_blk.F90,v $
! $Revision: 1.1.1.1 $
! $Date: 2002/11/27 20:00:50 $

use physicaldata
      use tree
      implicit none
      include 'mpif.h'

      integer istart,jstart,kstart,iend,jend,kend
      parameter(istart = nguard)
      parameter(jstart = nguard*k2d+1-k2d)
      parameter(kstart = nguard*k3d+1-k3d)
      parameter(iend   = nguard+nxb+1)
      parameter(jend   = nguard*k2d+nyb+k2d)
      parameter(kend   = nguard*k3d+nzb+k3d)

      integer :: new_loc(2,maxblocks_tr), old_loc(2,maxblocks_tr)
      integer :: nprocs,mype,lnblocks_old

      integer :: nrecv, nsend, lb,errorcode,ierr
      logical :: free(maxblocks), moved(maxblocks), sent(maxblocks)
      logical :: repeat, repeatt
      integer :: reqr(maxblocks_tr), reqs(maxblocks_tr)
      integer :: statr(MPI_STATUS_SIZE,maxblocks_tr)
      integer :: stats(MPI_STATUS_SIZE,maxblocks_tr)
      integer :: nmoved, nit
      integer :: test(maxblocks), point_to(maxblocks)
      integer :: nm, nm2, nm2_old
      integer :: myblockint, block_int2d, block_int3d

! DEFINE BLOCK INTERIOR
      call MPI_TYPE_VECTOR (nyb*k2d+k2d+1, & 
     &                      nvar*(nxb+2), & 
     &                      nvar*iu_bnd, & 
     &                      MPI_DOUBLE_PRECISION, & 
     &                      block_int2d, & 
     &                      ierr)
      myblockint = block_int2d
      if (ndim.eq.3) then
         call MPI_TYPE_HVECTOR (nzb+2, & 
     &                          1, & 
     &                          nvar*iu_bnd*ju_bnd*8, & 
     &                          block_int2d, & 
     &                          block_int3d, & 
     &                          ierr)
         myblockint = block_int3d
      end if
      if (ndim.eq.3) call MPI_TYPE_COMMIT(block_int2d,ierr)
      call MPI_TYPE_COMMIT(myblockint,ierr)

! 1) compute old_loc
      call fill_old_loc (new_loc,old_loc,nprocs,mype)

!--------------
! treat unk

      if(nvar.gt.0) then

! Post all receives
         nrecv = 0
         do lb = 1,new_lnblocks
            if (.not.newchild(lb)) then
               if (old_loc(2,lb).ne.mype) then
                  nrecv = nrecv + 1
                  call MPI_IRECV (unk(1,istart,jstart,kstart,lb), & 
     &                 1, & 
     &                 myblockint, & 
     &                 old_loc(2,lb), & 
     &                 lb, & 
     &                 MPI_COMM_WORLD, & 
     &                 reqr(nrecv), & 
     &                 ierr)
               end if
            end if
         end do

         moved(:) = .false.
         moved(lnblocks_old+1:maxblocks) = .true.
         free(:) = .false.
         free(lnblocks_old+1:maxblocks) = .true.
         sent(:) = .false.
         repeat = .TRUE.
         nmoved = 0
         test(:) = 0
         point_to(:) = 0
         nsend = 0

         nit = 0
         nm2 = 0
         nm2_old = 1
         do while (repeat.and.nm2.ne.nm2_old) 

            do lb = 1, max(lnblocks_old,new_lnblocks)
               call move_block(lb, new_loc, old_loc, free, moved, sent, & 
     &                         lnblocks_old, mype, nmoved,  & 
     &                         test, point_to, myblockint, reqs, nsend)
            enddo

            repeat = any(.not.moved(:))
            call MPI_ALLREDUCE (repeat,repeatt,1,MPI_LOGICAL, & 
     &           MPI_LOR,MPI_COMM_WORLD,ierr)
            repeat = repeatt

            nm2_old = nm2
            nm = count(.not.moved(:))
            call MPI_ALLREDUCE (nm,nm2,1,MPI_INTEGER, & 
     &           MPI_SUM,MPI_COMM_WORLD,ierr)
            if (mype.eq.0) then
               print *,' interation, no. not moved = ',nit,nm2
            end if

            nit = nit + 1

         end do

         if (nm2_old.eq.nm2.and.nm2.ne.0) then
            if (mype.eq.0) then
        print *,' ERROR: could not move all blocks in amr_redist_blk '
            print *,' Try increasing maxblocks or use more processors '
            print *,' nm2_old, nm2 = ',nm2_old,nm2
            print *,' ABORTING !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
            end if
            call abort_flash("Error: could not move all blocks in amr_redist_blk")
         end if

         if (nrecv.gt.0) then
            call MPI_WAITALL(nrecv,reqr,statr,ierr)
         end if

         if (nsend.gt.0) then
            call MPI_WAITALL(nsend,reqs,stats,ierr)
         end if

      end if

      call MPI_TYPE_FREE(myblockint,ierr)
      if (ndim.eq.3) call MPI_TYPE_FREE(block_int2d,ierr)

      return
      end subroutine amr_redist_blk

      recursive subroutine move_block (lb, new_loc, old_loc, free, & 
     &                                 moved, sent,  & 
     &                                 lnblocks_old, mype, nmoved, & 
     &                                 test, point_to, myblockint, &
                                       reqs, nsend)

use physicaldata
      use tree
      implicit none
      include 'mpif.h'

      integer istart,jstart,kstart,iend,jend,kend
      parameter(istart = nguard)
      parameter(jstart = nguard*k2d+1-k2d)
      parameter(kstart = nguard*k3d+1-k3d)
      parameter(iend   = nguard+nxb+1)
      parameter(jend   = nguard*k2d+nyb+k2d)
      parameter(kend   = nguard*k3d+nzb+k3d)

      integer :: new_loc(2,maxblocks_tr), old_loc(2,maxblocks_tr)
      logical :: free(maxblocks), moved(maxblocks), sent(maxblocks)
      integer :: reqs(maxblocks_tr), nsend
      integer :: lb, lnblocks_old, mype
      logical :: success 
      integer :: status(MPI_STATUS_SIZE)
      integer :: reqr, ierr, nmoved, lb2
      integer :: point_to(maxblocks),test(maxblocks)
      integer :: myblockint

      if (new_loc(1,lb).eq.lb.and.new_loc(2,lb).eq.mype) then
         if (.not.moved(lb)) moved(lb) = .true.
         return
      end if

      if (lb.le.max(lnblocks_old,new_lnblocks)) then

         if (lb.le.lnblocks_old) then
           if (new_loc(2,lb).ne.mype) then
            success = .false.
            call MPI_IPROBE (new_loc(2,lb), & 
     &                       maxblocks+new_loc(1,lb), & 
     &                       MPI_COMM_WORLD, & 
     &                       success, & 
     &                       status, & 
     &                       ierr)
            if (.not.moved(lb).and.success) then
               call MPI_RECV (success, & 
     &              1, & 
     &              MPI_LOGICAL, & 
     &              new_loc(2,lb), & 
     &              maxblocks+new_loc(1,lb), & 
     &              MPI_COMM_WORLD, & 
     &              status, & 
     &              ierr)
               if (free(lb)) then
                call MPI_SSEND ( & 
     &              unk(1,istart,jstart,kstart,point_to(lb)), & 
     &              1, & 
     &              myblockint, & 
     &              new_loc(2,lb), & 
     &              new_loc(1,lb), & 
     &              MPI_COMM_WORLD, & 
     &              ierr)
                test(point_to(lb)) = -1
               else
                call MPI_SSEND ( & 
     &              unk(1,istart,jstart,kstart,lb), & 
     &              1, & 
     &              myblockint, & 
     &              new_loc(2,lb), & 
     &              new_loc(1,lb), & 
     &              MPI_COMM_WORLD, & 
     &              ierr)
                free(lb) = .true.
               end if
               moved(lb) = .true.
            end if
         else
            if (.not.moved(lb).and.free(new_loc(1,lb))) then
            if (free(lb)) then
             unk(:,istart:iend,jstart:jend,kstart:kend,new_loc(1,lb)) =  & 
     &          unk(:,istart:iend,jstart:jend,kstart:kend,point_to(lb))
             test(point_to(lb)) = -1
            else
             unk(:,istart:iend,jstart:jend,kstart:kend,new_loc(1,lb)) =  & 
     &          unk(:,istart:iend,jstart:jend,kstart:kend,lb)
             free(lb) = .true.
            end if
            moved(lb) = .true.
            end if
           end if
         end if

         if (lb.le.new_lnblocks) then
            if (free(lb).and..not.sent(lb)) then
               sent(lb) = .true.
               if (.not.newchild(lb)) then
                  if (old_loc(2,lb).ne.mype) then
                     nsend = nsend + 1
                     call MPI_ISEND (free(lb), & 
     &                    1, & 
     &                    MPI_LOGICAL, & 
     &                    old_loc(2,lb), & 
     &                    maxblocks+lb, & 
     &                    MPI_COMM_WORLD, & 
     &                    reqs(nsend), & 
     &                    ierr)
                  end if
               end if
            end if
         end if

         if (lb.le.lnblocks_old.and..not.free(lb)) then
            nmoved = nmoved + 1
            point_to(lb) = max(lnblocks_old,new_lnblocks)+nmoved
            if (point_to(lb).gt.maxblocks) then
               do lb2 = max(lnblocks_old,new_lnblocks)+1,maxblocks
                  if (test(lb2).eq.-1) then
                     point_to(lb) = lb2
                     go to 22
                  end if
               end do
            end if
 22         if (point_to(lb).le.maxblocks) then
               test(point_to(lb)) = 1
            unk(:,istart:iend,jstart:jend,kstart:kend,point_to(lb)) =  & 
     &         unk(:,istart:iend,jstart:jend,kstart:kend,lb)
               free(lb) = .TRUE.
            end if
         end if

      else
         return
      end if

      return
      end subroutine move_block