Yes, part of the reason for the standard library is to expose functionality that we needed for Julia internals, but whose API we don’t want to tie to Julia Base because the internals may change in the future.
Note that the reason why we implemented CRC-32c rather than other variants of CRC-32 is that CRC-32c is hardware-accelerated, and we have a highly optimized implementation based on code originally posted by Mark Adler.
Because of that, the CRC32c standard library is a good default choice for performance reasons.
Although it looks like zlib also uses optimized code by Adler, this still appears to be quite a bit slower than CRC32c. I guess it is calculating a CRC-32 variant that is not fully hardware accelerated?
On Intel x86_64 (Intel Core i7):
julia> using CRC32c, BenchmarkTools
julia> for k in (1,2,3,4,5)
buf = rand(UInt8, 10^k)
println("\nn = 10^$k:")
@btime crc32c($buf)
@btime crc(0, $buf) # @mkitti’s zlib interface
end
n = 10^1:
7.106 ns (0 allocations: 0 bytes)
12.351 ns (0 allocations: 0 bytes)
n = 10^2:
10.708 ns (0 allocations: 0 bytes)
121.290 ns (0 allocations: 0 bytes)
n = 10^3:
48.388 ns (0 allocations: 0 bytes)
337.500 ns (0 allocations: 0 bytes)
n = 10^4:
452.394 ns (0 allocations: 0 bytes)
2.878 μs (0 allocations: 0 bytes)
n = 10^5:
3.889 μs (0 allocations: 0 bytes)
27.342 μs (0 allocations: 0 bytes)
On ARM (Apple M1 Pro):
n = 10^1:
4.250 ns (0 allocations: 0 bytes)
7.215 ns (0 allocations: 0 bytes)
n = 10^2:
8.291 ns (0 allocations: 0 bytes)
81.608 ns (0 allocations: 0 bytes)
n = 10^3:
50.109 ns (0 allocations: 0 bytes)
239.023 ns (0 allocations: 0 bytes)
n = 10^4:
454.736 ns (0 allocations: 0 bytes)
1.712 μs (0 allocations: 0 bytes)
n = 10^5:
3.896 μs (0 allocations: 0 bytes)
16.625 μs (0 allocations: 0 bytes)
(Which is confusing because CRC-32 does have hardware acceleration on ARM, which their implementation does appear to use very similarly to our code. Is this feature not enabled in the Zlib_jll build?)