Using openMP for speedup
I did an openMP course recently to finally learn a little about parallel programming. Here I pimp the Mandelbrot example with openMP. The code can be accessed with
git clone https://maurow@bitbucket.org/maurow/mauro_learning_fortran.git cd mauro_learning_fortran git checkout v0.3
and is in the folder mandel/
.
The idea of openMP is pretty simple: annotate your code with some
special openMP directives, tell gcc
to compile it with openMP
support and viola! Of course there is a bit more to it, the course
notes are a good starting point: http://chryswoods.com/beginning_openmp.
Code
The code is basically the same as for Calling Fortran 90 from C (with
some improvements and some more comments). The openMP related bits
are the comments starting with $OMP
. Note that the function had to
loose its pure
-ness as that does not go with openMP.
mandel.f90
:
module mandel implicit none integer, parameter :: dp=kind(0.d0) ! double precision contains pure function mandel_frac(z, c) result(out) ! The Mandelbrot map: z -> z^2 + c complex(dp), intent(in):: z, c complex(dp):: out out = z**2 + c end function mandel_frac function calc_num_iter(re, im, itermax, escape) result(out) ! Iterates on mandel_frac ! Input: ! - re/im: vector of real/imaginary values of grid ! - itermax: maximum of iterations done ! - escape: stops if abs(z)>escape, for Mandelbrot escape=2 ! Output: ! - number of iterations until abs(z)>escape real(dp), intent(in):: re(:), im(:), escape integer, intent(in):: itermax integer:: out(size(re), size(im)) integer:: ii, jj, kk, itt complex(dp):: zz, cc !$OMP PARALLEL PRIVATE(jj, zz, cc, itt, kk) !$OMP DO do jj=1,size(im) do ii=1,size(re) zz = 0 cc = cmplx(re(ii), im(jj), dp) do kk=1,itermax zz = mandel_frac(zz, cc) if (abs(zz)>escape) then exit end if end do if (kk>=itermax) then out(ii,jj) = -1 else out(ii,jj) = kk end if end do end do !$OMP END PARALLEL end function calc_num_iter end module mandel
$OMP PARALLEL PRIVATE(jj, zz, cc, itt, kk)
starts a parallel
section, i.e. it can be run by several threads, and $OMP END
PARALLEL
ends it. Each thread has its own copy of jj, zz, cc, itt,
kk
which is stated by the PRIVATE
statement. The other variables
they all share, thus all threads will write their bit to the out
array.
$OMP DO
states that the next for-loop can be executed in parallel.
Compiling
The same code can now be compiled with or without openMP depending on
the presence of the gcc
flag -fopenmp
.
gfortran -c run_mandel_from_fortran.f90 # serial: gfortran -c mandel.f90 -o mandel.o gfortran mandel.o run_mandel_from_fortran.o -o fout # with openMP: gfortran -fopenmp -c mandel.f90 -o mandel_openMP.o gfortran -fopenmp mandel_openMP.o run_mandel_from_fortran.o -o fout_openMP
Running
Set the number of threads the program can use and run it with
export OMP_NUM_THREADS=20 ./fout_openMP
I can get a speedup of about 2.3 for a large set of points. Oddly, I
get the largest speedup when I set OMP_NUM_THREADS
> 10 even though
my laptop has only 2 cores + 2 virtual ones.