I have been working through an exercise to find alternatives to storing raster data in an expensive proprietary database format. We have a lot of raster data in various formats (ECW, jpg, LZW and Packbits compressed TIFF, uncompressed TIFF, GeoTiff) and of different types (historic black and white mapping, full colour aerial photography, panchromatic (greyscale) aerial photography, full colour base mapping tiles at different resolutions). There is probably close to 1TB (~1000GB) of it. It is currently consumed by about 20 different applications in different ways – through a MapServer (ms4w) WMS, Oracle GeoRaster, ArcSDE raster and directory based image catalogues. There are multiple copies of the data in the different formats for the various applications and while it would be nice to have one datastore that every application accessed, in reality the data needs to be fit for purpose and so there probably will always be a number of options to consume it.
But to get back to the task at hand – alternative data storage and dissemination. Enter MapServer (mapserver.org) and GeoServer (geoserver.org) and PostgreSQL + PostGIS. The latest version of PostGIS can store raster in the database and both GeoServer and MapServer can serve mapping tiles from both directories and databases.
The existing MapServer uses directory based TIFF files in three resolutions (100%, 50% and 25% or full, quarter or sixteenth resolution) with a tile index and quad tree index for each dataset. It is very fast. It serves data through WMS to the CRM, ArcGIS desktop and the external tourism website.
The development GeoServer also uses directory based GeoTiff tiles but these are prepared differently to the MapServer imagery. The commands below (in Windows batch file format* run in the FWTools shell) are used to optimise the images in terms of final file size, image quality and performance using the GDAL toolset.
* variables on the Windows command line are defined %G but in a batch file are defined %%G – see ss64.com for more info.
In the processing workspace I created a directory structure to keep track of the files as they are created in each step:
\processing \gtiff # original source files \gtiff_rgb # uncompressed tiles (you need a LOT of disk space!) \gtiff_opt # optimised compressed final output
I copied a small python script that writes a projection file (.prj) for each TIF in the directory. Save the script in the image directory and call it in the FWTools shell like ” python create_prj.py”. See here and here and here for more info.
import glob content = 'PROJCS["OSGB 1936 / British National Grid", GEOGCS["OSGB 1936", DATUM["OSGB_1936", SPHEROID["Airy 1830",6377563.396,299.3249646, AUTHORITY["EPSG","7001"]], AUTHORITY["EPSG","6277"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.01745329251994328, AUTHORITY["EPSG","9122"]], AUTHORITY["EPSG","4277"]], UNIT["metre",1, AUTHORITY["EPSG","9001"]], PROJECTION["Transverse_Mercator"], PARAMETER["latitude_of_origin",49], PARAMETER["central_meridian",-2], PARAMETER["scale_factor",0.9996012717], PARAMETER["false_easting",400000], PARAMETER["false_northing",-100000], AUTHORITY["EPSG","27700"], AXIS["Easting",EAST], AXIS["Northing",NORTH]]' tifs = glob.glob('*.tif') for tif in tifs: prj = tif.split('.') + '.prj' file = open(prj,'w') file.writelines(content) file.close()
BLACK AND WHITE TIFF IMAGES with LZW compression
After some further experimentation I found that steps 1 and 2 below we unnecessary and steps 3 and 4 produced what I needed. Bear in mind that the source images were two colour 1-bit LZW compressed tiffs and I created two colour 1-bit LZW compressed tiled geotiffs with world (.tfw) and projection (.prj) files. See below for coloured images.
1. Create uncompressed tiled RGB Geotiff with TIFF world files and encoded projection information. In this case for all the files in gtiff directory that have a .tif extension use gdal_translate to output geotiffs with EPSG:27700 (British National Grid) projection, internal tiles, external world files and expand the tiff to three channels RGB in the gtiff_rgb directory:
FOR /F %%G IN ('dir /b gtiff\*.tif') DO gdal_translate -of Gtiff \ -a_srs EPSG:27700 -co "TILED=YES" -co "TFW=YES" -expand rgb \ gtiff\%%G gtiff_rgb\%%G 2. Convert to 2-colour 8-bit raster overwriting the expanded RGB GeoTIFF from step 1.
FOR /F %%G IN ('dir /b gtiff\*.tif') DO rgb2pct.bat -n 2 \ -of GTiff gtiff_rgb\%%G gtiff_rgb\%%G
3. Optimise the 2-colour Geotiff by LZW compressing the output making sure the files are tiled and have embedded projection information.
FOR /F %%G IN ('dir /b gtiff\*.tif') DO gdal_translate -of Gtiff \ -a_srs EPSG:27700 -co "NBITS=1" -co "TILED=YES" -co "PROFILE=Geotiff" \ -co "TFW=YES" -co "COMPRESS=LZW" gtiff\%%G gtiff_opt\%%G
4. Add optimised overviews (full size + 8 overviews) using gdaladdo. I used the gauss resampling mode for better processing of sharp edges and noisy data. Overviews are JPEG compressed for smallest size.
FOR /F %%G IN ('dir /b gtiff\*.tif') DO gdaladdo -r gauss \ --config COMPRESS_OVERVIEW JPEG --config PHOTOMETRIC_OVERVIEW YCBCR \ --config INTERLEAVE_OVERVIEW PIXEL \ gtiff_opt\%%G 2 4 8 16 32 64 128 256
COLOUR TIFF IMAGES with LZW/Packbits compression
Create expanded tiled Geotiff. This step can result in output files of anything from 50MB to 200MB each depending on input. My input tiles ranged from 4000×4000 to 8000×8000 and the processing regularly filled up the 100GB scratch workspace I had.
FOR /F %%G IN ('dir /b gtiff\*.tif') DO gdal_translate -of Gtiff \ -a_srs EPSG:27700 -co "TILED=YES" -expand rgb gtiff\%%G gtiff_rgb\%%G
Optimise Geotiff. This step should get you some way back towards the original file size. By adjusting the JPEG_QUALITY you can find a balance between image quality and file size. The default is 75 but I upped that to 95 as the GeoServer output was not very good.
FOR /F %%G IN ('dir /b gtiff\*.tif') DO gdal_translate -of Gtiff \ -a_srs EPSG:27700 -co "TILED=YES" -co "TFW=YES" -co "PROFILE=Geotiff" \ -co "INTERLEAVE=PIXEL" -co "PHOTOMETRIC=YCBCR" -co "COMPRESS=JPEG" \ -co "JPEG_QUALITY=95" gtiff_rgb\%%G gtiff_opt\%%G
Add optimised overviews.
FOR /F %%G IN ('dir /b gtiff\*.tif') DO gdaladdo -r gauss \ --config COMPRESS_OVERVIEW JPEG --config PHOTOMETRIC_OVERVIEW YCBCR \ --config INTERLEAVE_OVERVIEW PIXEL gtiff_opt\%%G 2 4 8 16 32 64 128 256
Putting all the commands in one loop in a batch file may be the way to go as you can save a heap of disk space by deleting the uncompressed version once it has been processed. Unless you want to keep the uncompressed files, of course:
rem All in a wunner FOR /F %%G IN ('dir /b gtiff\*.tif') DO ( echo EXPANDING %%G... gdal_translate -of Gtiff -a_srs EPSG:27700 -co "TILED=YES" / -expand rgb gtiff\%%G gtiff_rgb\%%G echo OPTIMISING %%G... gdal_translate -of Gtiff -a_srs EPSG:27700 -co "TILED=YES" / -co "TFW=YES" -co "PROFILE=Geotiff" -co "INTERLEAVE=PIXEL" / -co "PHOTOMETRIC=YCBCR" -co "COMPRESS=JPEG" -co "JPEG_QUALITY=95" / gtiff_rgb\%%G gtiff_opt\%%G echo DELETING UNCOMPRESSED TIFF %%G... del gtiff_rgb\*.tif echo ADDING OVERVIEWS TO %%G... gdaladdo -r gauss --config COMPRESS_OVERVIEW JPEG / --config PHOTOMETRIC_OVERVIEW YCBCR --config INTERLEAVE_OVERVIEW PIXEL / gtiff_opt\%%G 2 4 8 16 32 64 128 256
I’ll post a comparison table of image file sizes here shortly.