[ANN] CodecInflate64.jl - Testing help needed from Windows ZIP file users

CodecInflate64.jl is a work in progress Julia implementation of deflate64 decompression.

Deflate64 is an incompatible variant of deflate that Windows File Explorer sometimes uses when making large ZIP files.

If you have any large ZIP files created on Windows, please help me test this library by running the following script and reporting any errors here or on GitHub.

julia testzip.jl myzipfile.zip
# This script tests if CodecInflate64 can be used to correctly read a zip file.
# Run it with for example: `julia testzip.jl myzipfile.zip`
import Pkg
Pkg.activate(;temp=true)
Pkg.add([
    "CodecInflate64",
    "ZipArchives",
    "CRC32",
    "InputBuffers"
])

using ZipArchives, CRC32, InputBuffers, CodecInflate64, Mmap

function checkcrc32_zipfile(zipfile::String; bufsize=2^14)
    data = mmap(open(zipfile; read=true))
    r = ZipReader(data)
    for i in 1:zip_nentries(r)
        method = ZipArchives.zip_compression_method(r, i)
        a = ZipArchives.zip_entry_data_offset(r,i)
        s = zip_compressed_size(r,i)
        c = data[begin+a:begin+a+s-1]
        u = if method == 9
            Deflate64DecompressorStream(InputBuffer(c); bufsize)
        elseif method == 8
            DeflateDecompressorStream(InputBuffer(c); bufsize)
        elseif method == 0
            InputBuffer(c)
        else
            error("unknown method in $(repr(zipfile)) entry: $(i) name: $(repr(zip_name(r,i)))")
        end
        if crc32(u) != zip_stored_crc32(r, i)
            error("crc32 wrong for $(repr(zipfile)) entry: $(i) name: $(repr(zip_name(r,i)))")
        end
    end
    @info "$(zip_nentries(r)) entries in $(repr(zipfile)) successfully checked"
end

checkcrc32_zipfile(ARGS[1])

Any ideas for more automated compatibility testing would also be appreciated, but Microsoft seems to try and prevent this Why is Windows Compressed Folders (Zip folders) support stuck at the turn of the century? - The Old New Thing

testzip.jl (1.3 KB)

2 Likes

@TimG Does the testzip.jl script work on the ZIP files you were having issues with?

@nhz2 Thank you for taking this initiative.

I’ve tried running testzip.jl on one of my zip files and got this result:

ERROR: LoadError: SystemError: opening file "GCS-00000140": No such file or directory
Stacktrace:
 [1] systemerror(p::String, errno::Int32; extrainfo::Nothing)
   @ Base .\error.jl:176
 [2] systemerror
   @ .\error.jl:175 [inlined]
 [3] open(fname::String; lock::Bool, read::Bool, write::Nothing, create::Nothing, truncate::Nothing, append::Nothing)
   @ Base .\iostream.jl:293
 [4] open
   @ .\iostream.jl:275 [inlined]
 [5] checkcrc32_zipfile(zipfile::String; bufsize::Int64)
   @ Main C:\Users\TGebbels\...\Documents\Julia Experimenting\testzip.jl:16
 [6] checkcrc32_zipfile(zipfile::String)
   @ Main C:\Users\TGebbels\...\Documents\Julia Experimenting\testzip.jl:15
 [7] top-level scope
   @ C:\Users\TGebbels\...\Documents\Julia Experimenting\testzip.jl:39
in expression starting at C:\Users\TGebbels\...\Documents\Julia Experimenting\testzip.jl:39

The files in my zip file are typically like this:

GCS-00000139 - YorkshireSculpturePark.xlsx
GCS-00000140 - YoungPeoplesSportPanel.xlsx
GCS-00000141 - Image1.jpg
GCS-00000141 - YoungpeoplegettheirskatesonStreetSkateTo.xlsx

and I wonder on the face of it if you do not allow for spaces in filenames?

Spaces in filenames are fine, just put them in quotes when running the script:
julia testzip.jl "my zip file with spaces.zip"

Alternatively, you can run the checkcrc32_zipfile function directly in the REPL to avoid CLI issues:

using ZipArchives, CRC32, InputBuffers, CodecInflate64, Mmap

function checkcrc32_zipfile(zipfile::String; bufsize=2^14)
    data = mmap(open(zipfile; read=true))
    r = ZipReader(data)
    for i in 1:zip_nentries(r)
        method = ZipArchives.zip_compression_method(r, i)
        a = ZipArchives.zip_entry_data_offset(r,i)
        s = zip_compressed_size(r,i)
        c = data[begin+a:begin+a+s-1]
        u = if method == 9
            Deflate64DecompressorStream(InputBuffer(c); bufsize)
        elseif method == 8
            DeflateDecompressorStream(InputBuffer(c); bufsize)
        elseif method == 0
            InputBuffer(c)
        else
            error("unknown method: $(method) in $(repr(zipfile)) entry: $(i) name: $(repr(zip_name(r,i)))")
        end
        if crc32(u) != zip_stored_crc32(r, i)
            error("crc32 wrong for $(repr(zipfile)) entry: $(i) name: $(repr(zip_name(r,i)))")
        end
    end
    @info "$(zip_nentries(r)) entries in $(repr(zipfile)) successfully checked"
end

checkcrc32_zipfile("my zip file with spaces")
1 Like

Sorry for not understanding.

Tried again:

[ Info: 243 entries in "GCS-00000140 - YoungPeoplesSportPanel.zip" successfully checked
1 Like