Using openMP for speedup

[2013-11-21 Thu]

I did an openMP course recently to finally learn a little about parallel programming. Here I pimp the Mandelbrot example with openMP. The code can be accessed with

git clone https://maurow@bitbucket.org/maurow/mauro_learning_fortran.git
cd mauro_learning_fortran
git checkout v0.3

and is in the folder mandel/.

The idea of openMP is pretty simple: annotate your code with some special openMP directives, tell gcc to compile it with openMP support and viola! Of course there is a bit more to it, the course notes are a good starting point: http://chryswoods.com/beginning_openmp.

Code

The code is basically the same as for Calling Fortran 90 from C (with some improvements and some more comments). The openMP related bits are the comments starting with $OMP. Note that the function had to loose its pure-ness as that does not go with openMP.

mandel.f90:

module mandel
  implicit none

  integer, parameter :: dp=kind(0.d0) ! double precision

contains

  pure function mandel_frac(z, c) result(out)
    ! The Mandelbrot map: z -> z^2 + c
    complex(dp), intent(in):: z, c
    complex(dp):: out
    out = z**2 + c
  end function mandel_frac

  function calc_num_iter(re, im, itermax, escape) result(out)
    ! Iterates on mandel_frac
    ! Input:
    !  - re/im: vector of real/imaginary values of grid
    !  - itermax: maximum of iterations done
    !  - escape: stops if abs(z)>escape, for Mandelbrot escape=2
    ! Output:
    !  - number of iterations until abs(z)>escape
    real(dp), intent(in):: re(:), im(:), escape
    integer, intent(in):: itermax
    integer:: out(size(re), size(im))
    integer:: ii, jj, kk, itt
    complex(dp):: zz, cc

    !$OMP PARALLEL  PRIVATE(jj, zz, cc, itt, kk)

    !$OMP DO
    do jj=1,size(im)
       do ii=1,size(re)
          zz = 0
          cc = cmplx(re(ii), im(jj), dp)
          do kk=1,itermax
             zz = mandel_frac(zz, cc)
             if (abs(zz)>escape) then
                exit
             end if
          end do
          if (kk>=itermax) then
             out(ii,jj) = -1
          else
             out(ii,jj) = kk
          end if
       end do
    end do
    !$OMP END PARALLEL

  end function calc_num_iter

end module mandel

$OMP PARALLEL PRIVATE(jj, zz, cc, itt, kk) starts a parallel section, i.e. it can be run by several threads, and $OMP END PARALLEL ends it. Each thread has its own copy of jj, zz, cc, itt, kk which is stated by the PRIVATE statement. The other variables they all share, thus all threads will write their bit to the out array.

$OMP DO states that the next for-loop can be executed in parallel.

Compiling

The same code can now be compiled with or without openMP depending on the presence of the gcc flag -fopenmp.

gfortran -c run_mandel_from_fortran.f90
# serial:
gfortran -c mandel.f90 -o  mandel.o
gfortran mandel.o run_mandel_from_fortran.o -o fout
# with openMP:
gfortran -fopenmp -c mandel.f90 -o  mandel_openMP.o
gfortran -fopenmp mandel_openMP.o run_mandel_from_fortran.o -o fout_openMP

Running

Set the number of threads the program can use and run it with

export OMP_NUM_THREADS=20
./fout_openMP

I can get a speedup of about 2.3 for a large set of points. Oddly, I get the largest speedup when I set OMP_NUM_THREADS > 10 even though my laptop has only 2 cores + 2 virtual ones.