Compressing sparse GeoTiffs using PackBits
GeoTiffs can be compressed using several algorithms (lossless and lossy) via the excellent command-line tools of GDAL. A nice round-up is in this blog post, which shows that if you want to optimize read-time (lossless), then the primitive PackBits algorithm is best. PackBits does best when there are a lot of identical bytes in a row. Let's try to take advantage of this when compressing a sparse geotiff (with sparse I mean a lot of no-data values).
I prepared an example geotiff raster with 1000x1000 Float32 values, the fill is about 10% and the nodata value is -9999.
>> gdalinfo sparse.tif
Driver: GTiff/GeoTIFF
Files: sparse.tif
Size is 1000, 1000
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (0.000000000000000,0.500000000000000)
Pixel Size = (0.000500000000000,-0.000500000000000)
Metadata:
  TIFFTAG_SOFTWARE=n/a
Image Structure Metadata:
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (   0.0000000,   0.5000000) (  0d 0' 0.01"E,  0d30' 0.00"N)
Lower Left  (   0.0000000,   0.0000000) (  0d 0' 0.01"E,  0d 0' 0.01"N)
Upper Right (   0.5000000,   0.5000000) (  0d30' 0.00"E,  0d30' 0.00"N)
Lower Right (   0.5000000,   0.0000000) (  0d30' 0.00"E,  0d 0' 0.01"N)
Center      (   0.2500000,   0.2500000) (  0d15' 0.00"E,  0d15' 0.00"N)
Band 1 Block=1000x2 Type=Float32, ColorInterp=Gray
  NoData Value=-9999
Note that it is in `Float32` with NoData value -9999.  If your
file is in Float64, I suggest you change that (as it is unlikely that
your data has 15 decimal digits of precision) with: gdal_translate
-ot float32 in.tif out.tif.
Let's try to just compress it using PackBits:
>> gdalwarp -overwrite -co COMPRESS=PACKBITS sparse.tif sparse-v1.tif >> ls -sh sparse* 3.9M sparse.tif 3.9M sparse-v1.tif
No improvement. As an aside, other compression schemes will work with this data:
>> gdalwarp -overwrite -co COMPRESS=DEFLATE sparse.tif sparse-deflate.tif >> ls -sh sparse* 520K sparse-deflate.tif 3.9M sparse.tif 3.9M sparse-v1.tif
but I want to stick to Packbits for improved reading speed.
The trick to make Packbits work is to re-encode the no-data value into
a value which has representation with a simple bit pattern.  Let's
first see what the byte pattern of my current NoData=-9999 value is:
julia> reinterpret(UInt32, -9999f0) 0xc61c3c00
Which shows bit-representation of -9999 in Float32 as hexadecimal "c61c3c00".
Remember that one byte is encoded with two hexadecimal numbers,
i.e. the bytes are "c6", "1c", "3c, "00".
Let's make a Float32 with all equal bytes (and also negative):
julia> reinterpret(Float32, parse(UInt32, "eeeeeeee", base = 16)) -3.697314f28
and use that to re-encode the NoData values (and compress):
>> gdalwarp -overwrite -srcnodata -9999 -dstnodata -3.697314e28 \
     -co COMPRESS=PACKBITS sparse.tif sparse-v2.tif
>> ls -sh sparse*
520K sparse-deflate.tif  3.9M sparse.tif  3.9M sparse-v1.tif  668K sparse-v2.tif
Cool, now my file compressed from 3.9M to 0.668M.
Let's just check the read-speeds (the reason I've used Packbits) with a ten times larger file than the above-provided example file:
>> time gdalinfo -checksum sparse-large.tif > /dev/null 0.17s user 0.03s system 99% cpu 0.202 total >> time gdalinfo -checksum sparse-large-v1.tif > /dev/null 0.60s user 0.15s system 99% cpu 0.750 total >> time gdalinfo -checksum sparse-large-v2.tif > /dev/null 0.65s user 0.12s system 99% cpu 0.781 total >> time gdalinfo -checksum sparse-large-deflate.tif > /dev/null 0.83s user 0.10s system 99% cpu 0.940 total
which shows 0.17s for uncompressed, 0.6s for "v1", 0.65s for "v2" and
0.83s for "deflate".  Which is a bit underwhelming, but hey, at least
Packbits is faster than Deflate.
Note that PackBits only really works for sparse datasets, or
datasets which have a lot of identical values (with simple byte
patterns).  Otherwise use Deflate or LZW.