Why CRC32c but missing CRC32

I just found out that CRC32 and CRC32c are two different things, and the latter we have in stdlib: CRC32c · The Julia Language

julia> using CRC32c

julia> CRC32c.crc32c(UInt8[1,0,1,0,0,0,0,0,0,0])
0x61b7f6e9

julia> CRC32c.crc32c(UInt8[1,0,1,0,0,0,0,0,0,0]) |> Int
1639446249
In [1]: import numpy as np

In [2]: import zlib

In [5]: zlib.crc32(np.array([1,0,1,0,0,0,0,0,0,0], dtype=np.uint8))
Out[5]: 3236037590

I’m guessing probably no particular reason other than nobody has implemented it yet.

Would this be a welcomed feature?

turns out it’s a pretty old piece of stdlib that used to be internal only, so wasn’t a deliberate omission per se, just CRC32c was happened to be used internally

found GitHub - fhs/CRC32.jl: 32-bit cyclic redundancy check (CRC-32) checksum implementation for Julia

This package is deprecated. Please use the CRC32 implementation in any of the following libraries:

  • Zlib.jl – depends on zlib but about 26x faster

and in: GitHub - dcjones/Zlib.jl: zlib bindings for Julia

Note: This library is currently maintained, but should be considered deprecated in favor of Libz.jl, which is in every way better.

then in: GitHub - BioJulia/Libz.jl: Fast, flexible zlib bindings.

NOTE: If you are starting a new project on Julia 0.6 or later, it is recommended to use the CodecZlib.jl package instead. CodecZlib.jl and other packages offer more unified interfaces for a wide range of file formats.

then you found out that GitHub - JuliaIO/CodecZlib.jl: zlib codecs for TranscodingStreams.jl.

doesn’t expose crc32 call…

1 Like

I’ll expose it for you.

julia> import Zlib_jll: libz

julia> function crc(crc = 0, buf = C_NULL, len = sizeof(buf))
           @ccall libz.crc32(crc::Culong, buf::Ptr{UInt8}, len::UInt)::Culong
       end
crc (generic function with 4 methods)

julia> _crc = crc()
0x0000000000000000

julia> _crc = crc(_crc, [1,2,3,4,5])
0x00000000baa24928

julia> _crc = crc(_crc, [1,2,3,4,5])
0x00000000b121350d

julia> crc(0, [1,2,3,4,5,1,2,3,4,5])
0x00000000b121350d
3 Likes

There is also GitHub - andrewcooke/CRC.jl: A Julia module (and command line tool) for calculating Cyclic Redundancy Checksums (CRCs).

Yes, part of the reason for the standard library is to expose functionality that we needed for Julia internals, but whose API we don’t want to tie to Julia Base because the internals may change in the future.

Note that the reason why we implemented CRC-32c rather than other variants of CRC-32 is that CRC-32c is hardware-accelerated, and we have a highly optimized implementation based on code originally posted by Mark Adler.

Because of that, the CRC32c standard library is a good default choice for performance reasons.

Although it looks like zlib also uses optimized code by Adler, this still appears to be quite a bit slower than CRC32c. I guess it is calculating a CRC-32 variant that is not fully hardware accelerated?

On Intel x86_64 (Intel Core i7):

julia> using CRC32c, BenchmarkTools

julia> for k in (1,2,3,4,5)
          buf = rand(UInt8, 10^k)
          println("\nn = 10^$k:")
          @btime crc32c($buf)
          @btime crc(0, $buf) # @mkitti’s zlib interface
       end

n = 10^1:
  7.106 ns (0 allocations: 0 bytes)
  12.351 ns (0 allocations: 0 bytes)

n = 10^2:
  10.708 ns (0 allocations: 0 bytes)
  121.290 ns (0 allocations: 0 bytes)

n = 10^3:
  48.388 ns (0 allocations: 0 bytes)
  337.500 ns (0 allocations: 0 bytes)

n = 10^4:
  452.394 ns (0 allocations: 0 bytes)
  2.878 μs (0 allocations: 0 bytes)

n = 10^5:
  3.889 μs (0 allocations: 0 bytes)
  27.342 μs (0 allocations: 0 bytes)

On ARM (Apple M1 Pro):

n = 10^1:
  4.250 ns (0 allocations: 0 bytes)
  7.215 ns (0 allocations: 0 bytes)

n = 10^2:
  8.291 ns (0 allocations: 0 bytes)
  81.608 ns (0 allocations: 0 bytes)

n = 10^3:
  50.109 ns (0 allocations: 0 bytes)
  239.023 ns (0 allocations: 0 bytes)

n = 10^4:
  454.736 ns (0 allocations: 0 bytes)
  1.712 μs (0 allocations: 0 bytes)

n = 10^5:
  3.896 μs (0 allocations: 0 bytes)
  16.625 μs (0 allocations: 0 bytes)

(Which is confusing because CRC-32 does have hardware acceleration on ARM, which their implementation does appear to use very similarly to our code. Is this feature not enabled in the Zlib_jll build?)

3 Likes

I mean…

  1. that’s a jll package, not even a package
  2. which means that that’s not technically an API
  3. sure Zlib is pretty stable
  4. but my hacked up version (i.e. your code) won’t to cover all the crc cases one may want (see the supported, buffer, and N bytes methods by crc32c, or even, corner cases that I can’t think of rn

so yeah, that’s not the way to go.

For now, I’m gonna use:

For no reason other than I already depend on this package.

I’m not sure why that follows. JLL packages are packages with semantic versioning, so you can reliably make your package depend on the documented zlib API.

It would be nice to register a simple CRC32 package which exposes same API as CRC32c but uses zlib’s crc32.

Out of curiosity, why do you need CRC-32 specifically and not CRC-32c? Are you handling some foreign data that comes with a CRC-32 checksum?

1 Like

are people looking into every symbol exported by each version of the binary upstream and their corresponding call signature when versioning the jll? (again, probably doesn’t matter for Zlib and crc32 as they are probably fixed forever now)

yeah unfortunately: https://github.com/root-project/root/blob/master/tree/ntuple/v7/doc/specifications.md#header-envelope

I created a draft package (still unregistered) here: GitHub - JuliaIO/CRC32.jl: CRC32 package for Julia (to be registered: JuliaRegistries/General#75000)

Update: CRC32 is now a registered package.

7 Likes

fhs merged an update to fhs/CRC32.jl yesterday that (1) updates the README to link to functioning CRC32 packages and (2) updates fhs/CRC32.jl to work with Julia 1.x (e.g. for pedagogy, since it is far simpler than they other implementations).

1 Like