I’m using ArchGDAL to save climate data as large 3D GeoTIFFs, but I’ve noticed that files created in ArchGDAL are roughly 3x larger than those created with gdal_* binaries. As a diagnostic I eliminated all my calculations and wrote an ArchGDAL function that just resaves its input file using a given compression method. Then I wrapped this in a test function that also recompresses the ArchGDAL output using gdal_translate and compares resulting file sizes.
Spoiler: small input GeoTIFFs have the same output size, but very large input files (4 GB) become much larger after passing through ArchGDAL.
First my test code:
using ArchGDAL, GDAL_jll
function copy_gtiff(infile, outfile; compressmethod="ZSTD")
    ArchGDAL.readraster(infile) do origdataset
        nbands = ArchGDAL.nraster(origdataset)
        band1 = ArchGDAL.getband(origdataset, 1)
        nodata = ArchGDAL.getnodatavalue(band1)
        isfile(outfile) && rm(outfile)
        origdata = origdataset[:,:,:]
        ArchGDAL.create(outfile,
            driver = ArchGDAL.getdriver("GTiff"),
            width = ArchGDAL.width(origdataset),
            height = ArchGDAL.height(origdataset),
            nbands = nbands,
            dtype = ArchGDAL.pixeltype(band1),
            options = ["BIGTIFF=YES", "COMPRESS=$compressmethod"]
        ) do dataset
            geotransform = ArchGDAL.getgeotransform(origdataset)
            proj = ArchGDAL.getproj(origdataset)
            ArchGDAL.setgeotransform!(dataset, geotransform)
            ArchGDAL.setproj!(dataset, proj)
            for b = 1:nbands
                band = ArchGDAL.getband(dataset, b)
                ArchGDAL.setnodatavalue!(band, nodata)
                ArchGDAL.write!(band, origdata[:,:,b])
            end
        end
        return origdata
    end
end
function recompress_gtiff(infile, outfile; compressmethod="ZSTD")
    gdal_translate_path() do gdal_translate
        run(`$gdal_translate -q -co BIGTIFF=YES -co COMPRESS=$compressmethod $infile $outfile`)
    end
end
filesizeMB(filename) = round(filesize(filename)/1024^2, digits=1)
function readalldata(filename)
    ArchGDAL.readraster(filename) do dataset
        return dataset[:,:,:]
    end
end
function testcopy(compressmethod="ZSTD")
    for nbands in [3, 30, 300, 1000, 2920]
        infile = "test_ZSTD_$(nbands)bands.tif"
        outfile1, outfile2 = tempname(), tempname()
        println("\nTesting with $nbands bands, compression method = $compressmethod:")
        print("  copying $infile with ArchGDAL...")
        time = round(@elapsed origdata = copy_gtiff(infile, outfile1; compressmethod); digits=1)
        println(" ($time s). Original file $(filesizeMB(infile)) MB, copy $(filesizeMB(outfile1)) MB.")
        print("  recompressing with gdal_translate...")
        time = round(@elapsed recompress_gtiff(outfile1, outfile2; compressmethod); digits=1)
        println(" ($time s). Recompressed file $(filesizeMB(outfile2)) MB.")
        print("  reading copied file...")
        time = round(@elapsed newdata = readalldata(outfile1); digits=1)
        println(" ($time s). Identical to original file: $(all(origdata .== newdata)).")
        print("  reading recompressed file...")
        time = round(@elapsed newdata = readalldata(outfile2); digits=1)
        println(" ($time s). Identical to original file: $(all(origdata .== newdata)).")
        rm(outfile1)
        rm(outfile2)
    end
end
Running the tests using ZSTD compression:
julia> testcopy("ZSTD")
Testing with 3 bands, compression method = ZSTD:
  copying test_ZSTD_3bands.tif with ArchGDAL... (1.3 s). Original file 3.9 MB, copy 3.9 MB.
  recompressing with gdal_translate... (1.2 s). Recompressed file 3.9 MB.
  reading copied file... (0.0 s). Identical to original file: true.
  reading recompressed file... (0.0 s). Identical to original file: true.
Testing with 30 bands, compression method = ZSTD:
  copying test_ZSTD_30bands.tif with ArchGDAL... (2.1 s). Original file 40.2 MB, copy 40.2 MB.
  recompressing with gdal_translate... (2.1 s). Recompressed file 40.2 MB.
  reading copied file... (0.2 s). Identical to original file: true.
  reading recompressed file... (0.3 s). Identical to original file: true.
Testing with 300 bands, compression method = ZSTD:
  copying test_ZSTD_300bands.tif with ArchGDAL... (8.6 s). Original file 405.8 MB, copy 405.8 MB.
  recompressing with gdal_translate... (8.4 s). Recompressed file 405.8 MB.
  reading copied file... (1.6 s). Identical to original file: true.
  reading recompressed file... (1.6 s). Identical to original file: true.
Testing with 1000 bands, compression method = ZSTD:
  copying test_ZSTD_1000bands.tif with ArchGDAL... (23.1 s). Original file 1355.7 MB, copy 1355.7 MB.
  recompressing with gdal_translate... (22.3 s). Recompressed file 1355.7 MB.
  reading copied file... (5.6 s). Identical to original file: true.
  reading recompressed file... (5.8 s). Identical to original file: true.
Testing with 2920 bands, compression method = ZSTD:
  copying test_ZSTD_2920bands.tif with ArchGDAL... (171.5 s). Original file 3968.5 MB, copy 10906.3 MB.
  recompressing with gdal_translate... (61.3 s). Recompressed file 3968.5 MB.
  reading copied file... (15.1 s). Identical to original file: true.
  reading recompressed file... (15.6 s). Identical to original file: true.
My 5 test files are available in a Box folder here. I get similar results for other compression methods. Some valid values are DEFLATE, LZW and NONE. My test results for these are in a log file in that Box folder. My installed versions: ArchGDAL v0.7.4 and GDAL_jll v300.202.100+0.
I suspect the problem is some kind of interaction with the BIGTIFF setting, which is required for creating GeoTIFFs larger than 4 GB. I’ll open an issue in the ArchGDAL repo for this, but first I wanted to make sure it’s not just me. For example, I could be missing some simple setting. Also I wonder if there is some kind of workaround I can use until the devs have time to look into this. Otherwise I’ll probably store my files in other format for now, e.g. HDF5.