Efficient file hashing

If you just want a checksum, rather than a cryptographically secure hash, Julia 0.6 has a hardware-accelerated CRC-32c checksum function (https://github.com/JuliaLang/julia/pull/18297). Currently it is unexported/undocumented, but that will likely change in the future.

In the meantime, you can do:

Base.crc32c(read(filename))

to read in the whole file and compute the checksum. Alternatively, you can checksum it in chunks by something like:

function checksum(filename, blocksize=16384)
    crc = zero(UInt32)
    open(filename, "r") do f
        while !eof(f)
            crc = Base.crc32c(read(f, blocksize), crc)
        end
    end
    return crc
end

(The answer is independent of blocksize.)

Update: CRC32c checksums were exported in Julia 0.7 and are now available in the CRC32c stdlib. You can checksum a file with using CRC32c; checksum = open(crc32c, filename).

3 Likes