Can Blosc.jl only decompress content that it has compressed?

I have been using Blocs.jl and it’s quite awesome. I am trying to make passing compressed data between Python, Julia, and R easier so I was looking for compression format that can used in all of them.

I tried to compress something using LZ4 in R

xc = sample(10:20, 1e6, replace = TRUE)
x <- writeBin(xc, raw())

xc1= qs::lz4_compress_raw(x, 1)
writeBin(xc1, "c:/data/ok.io")

qs::lz4_decompress_raw(xc1)

As you can see, I can read them in R fine. However, I can’t read them using CodecLz4.jl as the particular lz format isn’t supported yet, see https://github.com/invenia/CodecLz4.jl/issues/25

So I tried Blosc.jl as the comment seems to suggest

using Blosc
comp = open("c:/data/ok.io", "r") do io
	read(io)
end

Blosc.set_compressor("lz4")
Blosc.decompress(UInt8, comp) # returns 0 length array

Blosc.set_compressor("lz4hc")
Blosc.decompress(UInt8, comp) # return 0 length arrray

So I want to know if Blosc.jl can only read content that it has compressed? How do I change the inptu content so that Blosc.jl can decompress the lz4 content produced by other lz4 compressors in other languages?

You could also look at https://github.com/bicycle1885/Snappy.jl. Not sure whether it supports the particular format you need, but in general it has worked well for parquet.

1 Like

Even you set a different compressor to use under the hood, IIRC blosc expects the data to use its own header format.

Would appreciate pointers to the documentation, my google-fu isn’t turning up much on that.

I guess solution is for a Julia LZ4 implementation that works for all LZ4 formats.