Compressing arbitrary Vector{UInt8} in memory

janfrancu · October 9, 2018, 7:37am

I am creating a lot of FlatBuffers, which I would like to store in S3 storage. Right now I use something akin to the following code to serialize the structure and put it to S3 bucket.

fbStruct = ... 
fbBytes = FlatBuffers.bytes(FlatBuffers.build!(fbStruct))
AWSS3.put_s3(awstoken, bucket, path, fbBytes)

Is there a way to compress the fbBytes vector using CodecZlib? I have found a way to do that with some temporary file, however I don’t know how to make it all in memory.

Tamas_Papp · October 9, 2018, 7:54am

Have you tried an IOBuffer?

janfrancu · October 9, 2018, 8:06am

I have tried the IOBuffer like this

stream = GzipCompressorStream(IOBuffer(fbBytes))
newBytes = read(stream)
close(stream)

however the results are different from what I get, when using some temporary file and then reading it back.

open(GzipCompressorStream, filename, "w") do stream
    write(stream, fbBytes)
end

fileBytes = = open("a.fb.gz", "r") do f
    read(f)
end

I am still getting my head around how the TranscodingStreams, CodecZlib and the base IO work together, so maybe I am using it all wrong.

Tamas_Papp · October 9, 2018, 8:10am

It seems there is a direct Array API:

janfrancu · October 9, 2018, 8:22am

Well that was hiding in plain sight for me. Thanks for pointing that out, I somehow thought that the GzipCompressor was deprecated, however that was the case with the old GzipCompression types. Will try that straightaway.

bicycle1885 · October 9, 2018, 8:51am

transcode(GzipCompressor, data) is the simplest way to compress in-memory data. However, it allocates a working space every time you try to compress a chunk of data. If you need to compress a lot of data chunks, you can avoid lots of allocations by reusing pre-allocating a compressor object as described in this example: https://bicycle1885.github.io/TranscodingStreams.jl/stable/examples.html#Transcode-lots-of-strings-1.

Topic		Replies	Views
Gzipped (.csv.gz) writing? Data	6	2349	October 9, 2018
[ANN] TranscodingStreams.jl - new APIs to zlib, bzip2, xz, zstd and more! Community package , announcement	2	1456	August 18, 2017
How to save an array to disk in compressed form? General Usage question , data-compression	9	2856	January 24, 2023
How to save a large Float32 array on disk using data compression (failed attempt with JLD2)? Data jld2 , data-compression	2	1276	January 24, 2023
How to decompress .xz files/ How to use streams? General Usage question , package , codecxz	6	673	September 3, 2022

Compressing arbitrary Vector{UInt8} in memory

Related topics