Using openMP for speedup
I did an openMP course recently to finally learn a little about parallel programming. Here I pimp the Mandelbrot example with openMP. The code can be accessed with
git clone https://maurow@bitbucket.org/maurow/mauro_learning_fortran.git cd mauro_learning_fortran git checkout v0.3
and is in the folder mandel/.
The idea of openMP is pretty simple: annotate your code with some
special openMP directives, tell gcc to compile it with openMP
support and viola! Of course there is a bit more to it, the course
notes are a good starting point: http://chryswoods.com/beginning_openmp.
Code
The code is basically the same as for Calling Fortran 90 from C (with
some improvements and some more comments). The openMP related bits
are the comments starting with $OMP. Note that the function had to
loose its pure-ness as that does not go with openMP.
mandel.f90:
module mandel
implicit none
integer, parameter :: dp=kind(0.d0) ! double precision
contains
pure function mandel_frac(z, c) result(out)
! The Mandelbrot map: z -> z^2 + c
complex(dp), intent(in):: z, c
complex(dp):: out
out = z**2 + c
end function mandel_frac
function calc_num_iter(re, im, itermax, escape) result(out)
! Iterates on mandel_frac
! Input:
! - re/im: vector of real/imaginary values of grid
! - itermax: maximum of iterations done
! - escape: stops if abs(z)>escape, for Mandelbrot escape=2
! Output:
! - number of iterations until abs(z)>escape
real(dp), intent(in):: re(:), im(:), escape
integer, intent(in):: itermax
integer:: out(size(re), size(im))
integer:: ii, jj, kk, itt
complex(dp):: zz, cc
!$OMP PARALLEL PRIVATE(jj, zz, cc, itt, kk)
!$OMP DO
do jj=1,size(im)
do ii=1,size(re)
zz = 0
cc = cmplx(re(ii), im(jj), dp)
do kk=1,itermax
zz = mandel_frac(zz, cc)
if (abs(zz)>escape) then
exit
end if
end do
if (kk>=itermax) then
out(ii,jj) = -1
else
out(ii,jj) = kk
end if
end do
end do
!$OMP END PARALLEL
end function calc_num_iter
end module mandel
$OMP PARALLEL PRIVATE(jj, zz, cc, itt, kk) starts a parallel
section, i.e. it can be run by several threads, and $OMP END
PARALLEL ends it. Each thread has its own copy of jj, zz, cc, itt,
kk which is stated by the PRIVATE statement. The other variables
they all share, thus all threads will write their bit to the out
array.
$OMP DO states that the next for-loop can be executed in parallel.
Compiling
The same code can now be compiled with or without openMP depending on
the presence of the gcc flag -fopenmp.
gfortran -c run_mandel_from_fortran.f90 # serial: gfortran -c mandel.f90 -o mandel.o gfortran mandel.o run_mandel_from_fortran.o -o fout # with openMP: gfortran -fopenmp -c mandel.f90 -o mandel_openMP.o gfortran -fopenmp mandel_openMP.o run_mandel_from_fortran.o -o fout_openMP
Running
Set the number of threads the program can use and run it with
export OMP_NUM_THREADS=20 ./fout_openMP
I can get a speedup of about 2.3 for a large set of points. Oddly, I
get the largest speedup when I set OMP_NUM_THREADS > 10 even though
my laptop has only 2 cores + 2 virtual ones.