I’m using ArchGDAL to save climate data as large 3D GeoTIFFs, but I’ve noticed that files created in ArchGDAL are roughly 3x larger than those created with gdal_* binaries. As a diagnostic I eliminated all my calculations and wrote an ArchGDAL function that just resaves its input file using a given compression method. Then I wrapped this in a test function that also recompresses the ArchGDAL output using gdal_translate and compares resulting file sizes.
Spoiler: small input GeoTIFFs have the same output size, but very large input files (4 GB) become much larger after passing through ArchGDAL.
First my test code:
using ArchGDAL, GDAL_jll
function copy_gtiff(infile, outfile; compressmethod="ZSTD")
ArchGDAL.readraster(infile) do origdataset
nbands = ArchGDAL.nraster(origdataset)
band1 = ArchGDAL.getband(origdataset, 1)
nodata = ArchGDAL.getnodatavalue(band1)
isfile(outfile) && rm(outfile)
origdata = origdataset[:,:,:]
ArchGDAL.create(outfile,
driver = ArchGDAL.getdriver("GTiff"),
width = ArchGDAL.width(origdataset),
height = ArchGDAL.height(origdataset),
nbands = nbands,
dtype = ArchGDAL.pixeltype(band1),
options = ["BIGTIFF=YES", "COMPRESS=$compressmethod"]
) do dataset
geotransform = ArchGDAL.getgeotransform(origdataset)
proj = ArchGDAL.getproj(origdataset)
ArchGDAL.setgeotransform!(dataset, geotransform)
ArchGDAL.setproj!(dataset, proj)
for b = 1:nbands
band = ArchGDAL.getband(dataset, b)
ArchGDAL.setnodatavalue!(band, nodata)
ArchGDAL.write!(band, origdata[:,:,b])
end
end
return origdata
end
end
function recompress_gtiff(infile, outfile; compressmethod="ZSTD")
gdal_translate_path() do gdal_translate
run(`$gdal_translate -q -co BIGTIFF=YES -co COMPRESS=$compressmethod $infile $outfile`)
end
end
filesizeMB(filename) = round(filesize(filename)/1024^2, digits=1)
function readalldata(filename)
ArchGDAL.readraster(filename) do dataset
return dataset[:,:,:]
end
end
function testcopy(compressmethod="ZSTD")
for nbands in [3, 30, 300, 1000, 2920]
infile = "test_ZSTD_$(nbands)bands.tif"
outfile1, outfile2 = tempname(), tempname()
println("\nTesting with $nbands bands, compression method = $compressmethod:")
print(" copying $infile with ArchGDAL...")
time = round(@elapsed origdata = copy_gtiff(infile, outfile1; compressmethod); digits=1)
println(" ($time s). Original file $(filesizeMB(infile)) MB, copy $(filesizeMB(outfile1)) MB.")
print(" recompressing with gdal_translate...")
time = round(@elapsed recompress_gtiff(outfile1, outfile2; compressmethod); digits=1)
println(" ($time s). Recompressed file $(filesizeMB(outfile2)) MB.")
print(" reading copied file...")
time = round(@elapsed newdata = readalldata(outfile1); digits=1)
println(" ($time s). Identical to original file: $(all(origdata .== newdata)).")
print(" reading recompressed file...")
time = round(@elapsed newdata = readalldata(outfile2); digits=1)
println(" ($time s). Identical to original file: $(all(origdata .== newdata)).")
rm(outfile1)
rm(outfile2)
end
end
Running the tests using ZSTD compression:
julia> testcopy("ZSTD")
Testing with 3 bands, compression method = ZSTD:
copying test_ZSTD_3bands.tif with ArchGDAL... (1.3 s). Original file 3.9 MB, copy 3.9 MB.
recompressing with gdal_translate... (1.2 s). Recompressed file 3.9 MB.
reading copied file... (0.0 s). Identical to original file: true.
reading recompressed file... (0.0 s). Identical to original file: true.
Testing with 30 bands, compression method = ZSTD:
copying test_ZSTD_30bands.tif with ArchGDAL... (2.1 s). Original file 40.2 MB, copy 40.2 MB.
recompressing with gdal_translate... (2.1 s). Recompressed file 40.2 MB.
reading copied file... (0.2 s). Identical to original file: true.
reading recompressed file... (0.3 s). Identical to original file: true.
Testing with 300 bands, compression method = ZSTD:
copying test_ZSTD_300bands.tif with ArchGDAL... (8.6 s). Original file 405.8 MB, copy 405.8 MB.
recompressing with gdal_translate... (8.4 s). Recompressed file 405.8 MB.
reading copied file... (1.6 s). Identical to original file: true.
reading recompressed file... (1.6 s). Identical to original file: true.
Testing with 1000 bands, compression method = ZSTD:
copying test_ZSTD_1000bands.tif with ArchGDAL... (23.1 s). Original file 1355.7 MB, copy 1355.7 MB.
recompressing with gdal_translate... (22.3 s). Recompressed file 1355.7 MB.
reading copied file... (5.6 s). Identical to original file: true.
reading recompressed file... (5.8 s). Identical to original file: true.
Testing with 2920 bands, compression method = ZSTD:
copying test_ZSTD_2920bands.tif with ArchGDAL... (171.5 s). Original file 3968.5 MB, copy 10906.3 MB.
recompressing with gdal_translate... (61.3 s). Recompressed file 3968.5 MB.
reading copied file... (15.1 s). Identical to original file: true.
reading recompressed file... (15.6 s). Identical to original file: true.
My 5 test files are available in a Box folder here. I get similar results for other compression methods. Some valid values are DEFLATE, LZW and NONE. My test results for these are in a log file in that Box folder. My installed versions: ArchGDAL v0.7.4 and GDAL_jll v300.202.100+0.
I suspect the problem is some kind of interaction with the BIGTIFF setting, which is required for creating GeoTIFFs larger than 4 GB. I’ll open an issue in the ArchGDAL repo for this, but first I wanted to make sure it’s not just me. For example, I could be missing some simple setting. Also I wonder if there is some kind of workaround I can use until the devs have time to look into this. Otherwise I’ll probably store my files in other format for now, e.g. HDF5.