Julia CodecZLib fails while python succeeds

I figured out what the extra three bytes are. They are the first three bytes of an Adler-32 big-endian checksum. The byte 0x4a got truncated. Additionally, if we add the bytes 0x78 0x9c to the beginning of the stream, this is now a proper zlib stream.

  using CodecZlib
  data = read("deflate_bug_demo.bin")
  zlibstream = vcat(UInt8[0x78, 0x9c], data, UInt8[0x4a])   # header + file + missing Adler byte
  out = transcode(ZlibDecompressor, zlibstream)             # 16572 bytes

We can compute the Adler-32 checksum:

  using Zlib_jll

  # Pure-Julia Adler-32 trailer: 4 bytes, big-endian, over the *uncompressed* data.
  function adler32_trailer(data)
      a, b = 1, 0
      for byte in data
          a = (a + byte) % 65521
          b = (b + a)    % 65521
      end
      checksum = (b << 16) | a
      return [UInt8(checksum >> s & 0xff) for s in (24, 16, 8, 0)]  # big-endian
  end

  # zlib-backed Adler-32 trailer: 4 bytes, big-endian, over the *uncompressed* data.
  function adler32_trailer_zlib(data)
      checksum = ccall((:adler32, libz), Culong,
                       (Culong, Ptr{UInt8}, Cuint), 1, data, length(data))
      return [UInt8(checksum >> s & 0xff) for s in (24, 16, 8, 0)]  # big-endian
  end

  # Read the decompressed output and compute the trailer with both implementations.
  out = read("deflate_bug_demo.out")
  t_julia = adler32_trailer(out)
  t_zlib  = adler32_trailer_zlib(out)

  println("input: deflate_bug_demo.out  (", length(out), " bytes)")
  println("adler32_trailer       (pure Julia) = ", bytes2hex(t_julia))
  println("adler32_trailer_zlib  (Zlib_jll)   = ", bytes2hex(t_zlib))
  println("match: ", t_julia == t_zlib)

Here is the output:

  Output:

  input: deflate_bug_demo.out  (16572 bytes)
  adler32_trailer       (pure Julia) = f31b334a
  adler32_trailer_zlib  (Zlib_jll)   = f31b334a
  match: true

Is this a bug? Or is it a bug that other implementations did not report the extra three bytes?

Issue created: `DeflateDecompressor`/`ZlibDecompressor` error on trailing bytes after a complete stream (concatenated-stream policy) · Issue #107 · JuliaIO/CodecZlib.jl · GitHub