[FLASH-USERS] Oddness in results when using uneven number of parallel processors

Tue Sep 15 06:54:04 EDT 2009

Hi Anshu,

I realise now that my second block loop is also reduced by the number of 
processors which may be the problem. That is, blockCount and blockList 
are divided(reduced). The first block loop can be parallelized, however, 
I want that each block/zone to do the complete block loop again, not a 
part of it. The fact that I noticed no problem when I used 2^x procs, 
must come from the way that I do things in the code and that with 2^x 
procs the box is divided symmetricly. Like you said, it was just a 
coincidence.

Im trying to understand what MPI_allreduce function does. Does it 
exactly do what I want, i.e., make blockList and blockCount 
complete/undivided again?

Thanks,
Seyit

Anshu Dubey wrote:
> Seyit,
>
> There is no implicit shared memory parallelism in FLASH. If you need A
> to be summed up over all processors, you have to make a call to the
> MPI_Allreduce function. As an example you can see the file
> source/Grid/GridSolvers/Multipole/gr_mpoleMoments.F90 where the
> Moments are first computed locally, and then summed over all
> processors with allreduce calls. The fact that you are seeing some
> reasonable values in some parallel situations has to be just a
> coincidence.
>
> Anshu
>
> On Mon, Sep 14, 2009 at 7:18 AM, Seyit Hocuk <seyit at astro.rug.nl> wrote:
>   
>> Hi Community,
>>
>> I am experiencing somewhat odd results with a newly written unit when I am
>> running with odd number of processors. Everything seems fine when using 1 or
>> even number of processors.
>> By the way, I realise now that it works for 2^x number of processors. So,
>> with even, I mean 2^x and odd =! 2^x.
>>
>>
>> The code goes something like this (A is important here):
>>
>>     
>>>> Unit<<
>>>>         
>> do blockID = 1, blockCount
>> ...
>>   do k = blkLimits(LOW,KAXIS),blkLimits(HIGH,KAXIS)
>>       do j = blkLimits(LOW,JAXIS),blkLimits(HIGH,JAXIS)
>>          do i = blkLimits(LOW,IAXIS),blkLimits(HIGH,IAXIS)
>>            ...
>>            do LeafID = 1, blockCount
>>                call subroutine1
>>                ...
>> ...
>> end
>>
>>     
>>>> subroutine1<<
>>>>         
>> ...
>> call subroutine2
>> ...
>> end
>>
>>     
>>>> subroutine2<<
>>>>         
>> do n = blkLimits(LOW,KAXIS),blkLimits(HIGH,KAXIS)
>>  do m = blkLimits(LOW,JAXIS),blkLimits(HIGH,JAXIS)
>>      do l = blkLimits(LOW,IAXIS),blkLimits(HIGH,IAXIS)
>>      ...
>>      A = A + x  (a summation)
>>      ...
>> end
>>
>>
>> I was thinking that every processor calculates its own piece of summation of
>> A, but fails to communicate between eacother so that it is not the actual
>> sum I am seeing. However, I find it weird that it does works for certain
>> number of processors more than 1. To solve this, I used the solnData
>> pointer. That is, instead of A, I have tried solnData(A,i,j,k) so that the
>> different processors will be connected through this mediator. No change.
>>
>>
>> Kind Regards,
>> Seyit
>>
>>
>>
>>