Compressing sparse GeoTiffs using PackBits
GeoTiffs can be compressed using several algorithms (lossless and lossy) via the excellent command-line tools of GDAL. A nice round-up is in this blog post, which shows that if you want to optimize read-time (lossless), then the primitive PackBits algorithm is best. PackBits does best when there are a lot of identical bytes in a row. Let's try to take advantage of this when compressing a sparse geotiff (with sparse I mean a lot of no-data values).
I prepared an example geotiff raster with 1000x1000 Float32 values, the fill is about 10% and the nodata value is -9999.
>> gdalinfo sparse.tif Driver: GTiff/GeoTIFF Files: sparse.tif Size is 1000, 1000 Coordinate System is: GEOGCS["WGS 84", DATUM["WGS_1984", SPHEROID["WGS 84",6378137,298.257223563, AUTHORITY["EPSG","7030"]], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich",0], UNIT["degree",0.0174532925199433], AUTHORITY["EPSG","4326"]] Origin = (0.000000000000000,0.500000000000000) Pixel Size = (0.000500000000000,-0.000500000000000) Metadata: TIFFTAG_SOFTWARE=n/a Image Structure Metadata: INTERLEAVE=BAND Corner Coordinates: Upper Left ( 0.0000000, 0.5000000) ( 0d 0' 0.01"E, 0d30' 0.00"N) Lower Left ( 0.0000000, 0.0000000) ( 0d 0' 0.01"E, 0d 0' 0.01"N) Upper Right ( 0.5000000, 0.5000000) ( 0d30' 0.00"E, 0d30' 0.00"N) Lower Right ( 0.5000000, 0.0000000) ( 0d30' 0.00"E, 0d 0' 0.01"N) Center ( 0.2500000, 0.2500000) ( 0d15' 0.00"E, 0d15' 0.00"N) Band 1 Block=1000x2 Type=Float32, ColorInterp=Gray NoData Value=-9999
Note that it is in `Float32` with NoData
value -9999
. If your
file is in Float64, I suggest you change that (as it is unlikely that
your data has 15 decimal digits of precision) with: gdal_translate
-ot float32 in.tif out.tif
.
Let's try to just compress it using PackBits:
>> gdalwarp -overwrite -co COMPRESS=PACKBITS sparse.tif sparse-v1.tif >> ls -sh sparse* 3.9M sparse.tif 3.9M sparse-v1.tif
No improvement. As an aside, other compression schemes will work with this data:
>> gdalwarp -overwrite -co COMPRESS=DEFLATE sparse.tif sparse-deflate.tif >> ls -sh sparse* 520K sparse-deflate.tif 3.9M sparse.tif 3.9M sparse-v1.tif
but I want to stick to Packbits for improved reading speed.
The trick to make Packbits work is to re-encode the no-data value into
a value which has representation with a simple bit pattern. Let's
first see what the byte pattern of my current NoData=-9999
value is:
julia> reinterpret(UInt32, -9999f0) 0xc61c3c00
Which shows bit-representation of -9999
in Float32 as hexadecimal "c61c3c00".
Remember that one byte is encoded with two hexadecimal numbers,
i.e. the bytes are "c6", "1c", "3c, "00".
Let's make a Float32 with all equal bytes (and also negative):
julia> reinterpret(Float32, parse(UInt32, "eeeeeeee", base = 16)) -3.697314f28
and use that to re-encode the NoData values (and compress):
>> gdalwarp -overwrite -srcnodata -9999 -dstnodata -3.697314e28 \ -co COMPRESS=PACKBITS sparse.tif sparse-v2.tif >> ls -sh sparse* 520K sparse-deflate.tif 3.9M sparse.tif 3.9M sparse-v1.tif 668K sparse-v2.tif
Cool, now my file compressed from 3.9M to 0.668M.
Let's just check the read-speeds (the reason I've used Packbits) with a ten times larger file than the above-provided example file:
>> time gdalinfo -checksum sparse-large.tif > /dev/null 0.17s user 0.03s system 99% cpu 0.202 total >> time gdalinfo -checksum sparse-large-v1.tif > /dev/null 0.60s user 0.15s system 99% cpu 0.750 total >> time gdalinfo -checksum sparse-large-v2.tif > /dev/null 0.65s user 0.12s system 99% cpu 0.781 total >> time gdalinfo -checksum sparse-large-deflate.tif > /dev/null 0.83s user 0.10s system 99% cpu 0.940 total
which shows 0.17s for uncompressed, 0.6s for "v1", 0.65s for "v2" and
0.83s for "deflate". Which is a bit underwhelming, but hey, at least
Packbits is faster than Deflate
.
Note that PackBits only really works for sparse datasets, or
datasets which have a lot of identical values (with simple byte
patterns). Otherwise use Deflate
or LZW
.