This algorithm works is independent of the number of processes involved. It is shown here for a reduction of a scalar integer using the Albelian function MYOP.
SUBROUTINE pf_Reduce(myProc,nProc,in,n) IMPLICIT NONE INTEGER in INTEGER myProc,nProc,n c local variables integer dest,src,cdim,bit,index c reduce and combine down to process 0 switch = 0 cdim = log2ceil( nProc ) bit = 2**cdim bit = bit/2 DO src = nProc-1, 1, -1 IF (src.LT.bit) bit = bit/2 dest = XOR(src,bit) tmp@dest = out@src out = MYOP(out, tmp) ENDDO c propagate result up from 0 bit = 1 DO dest = 1, nProc-1 IF (dest.GE.(bit*2)) bit = bit*2 src = XOR(dest,bit) out@dest = out@src ENDDO RETURN END