# Using openMP for speedup

[2013-11-21 Thu]

I did an openMP course recently to finally learn a little about parallel programming. Here I pimp the Mandelbrot example with openMP. The code can be accessed with

```git clone https://maurow@bitbucket.org/maurow/mauro_learning_fortran.git
cd mauro_learning_fortran
git checkout v0.3
```

and is in the folder `mandel/`.

The idea of openMP is pretty simple: annotate your code with some special openMP directives, tell `gcc` to compile it with openMP support and viola! Of course there is a bit more to it, the course notes are a good starting point: http://chryswoods.com/beginning_openmp.

## Code

The code is basically the same as for Calling Fortran 90 from C (with some improvements and some more comments). The openMP related bits are the comments starting with `\$OMP`. Note that the function had to loose its `pure`-ness as that does not go with openMP.

`mandel.f90`:

```module mandel
implicit none

integer, parameter :: dp=kind(0.d0) ! double precision

contains

pure function mandel_frac(z, c) result(out)
! The Mandelbrot map: z -> z^2 + c
complex(dp), intent(in):: z, c
complex(dp):: out
out = z**2 + c
end function mandel_frac

function calc_num_iter(re, im, itermax, escape) result(out)
! Iterates on mandel_frac
! Input:
!  - re/im: vector of real/imaginary values of grid
!  - itermax: maximum of iterations done
!  - escape: stops if abs(z)>escape, for Mandelbrot escape=2
! Output:
!  - number of iterations until abs(z)>escape
real(dp), intent(in):: re(:), im(:), escape
integer, intent(in):: itermax
integer:: out(size(re), size(im))
integer:: ii, jj, kk, itt
complex(dp):: zz, cc

!\$OMP PARALLEL  PRIVATE(jj, zz, cc, itt, kk)

!\$OMP DO
do jj=1,size(im)
do ii=1,size(re)
zz = 0
cc = cmplx(re(ii), im(jj), dp)
do kk=1,itermax
zz = mandel_frac(zz, cc)
if (abs(zz)>escape) then
exit
end if
end do
if (kk>=itermax) then
out(ii,jj) = -1
else
out(ii,jj) = kk
end if
end do
end do
!\$OMP END PARALLEL

end function calc_num_iter

end module mandel
```

`\$OMP PARALLEL PRIVATE(jj, zz, cc, itt, kk)` starts a parallel section, i.e. it can be run by several threads, and ```\$OMP END PARALLEL``` ends it. Each thread has its own copy of ```jj, zz, cc, itt, kk``` which is stated by the `PRIVATE` statement. The other variables they all share, thus all threads will write their bit to the `out` array.

`\$OMP DO` states that the next for-loop can be executed in parallel.

## Compiling

The same code can now be compiled with or without openMP depending on the presence of the `gcc` flag `-fopenmp`.

```gfortran -c run_mandel_from_fortran.f90
# serial:
gfortran -c mandel.f90 -o  mandel.o
gfortran mandel.o run_mandel_from_fortran.o -o fout
# with openMP:
gfortran -fopenmp -c mandel.f90 -o  mandel_openMP.o
gfortran -fopenmp mandel_openMP.o run_mandel_from_fortran.o -o fout_openMP
```

## Running

Set the number of threads the program can use and run it with

```export OMP_NUM_THREADS=20
./fout_openMP
```

I can get a speedup of about 2.3 for a large set of points. Oddly, I get the largest speedup when I set `OMP_NUM_THREADS` > 10 even though my laptop has only 2 cores + 2 virtual ones.