Compressing arbitrary Vector{UInt8} in memory

I am creating a lot of FlatBuffers, which I would like to store in S3 storage. Right now I use something akin to the following code to serialize the structure and put it to S3 bucket.

fbStruct = ... 
fbBytes = FlatBuffers.bytes(FlatBuffers.build!(fbStruct))
AWSS3.put_s3(awstoken, bucket, path, fbBytes)

Is there a way to compress the fbBytes vector using CodecZlib? I have found a way to do that with some temporary file, however I don’t know how to make it all in memory.

Have you tried an IOBuffer?

I have tried the IOBuffer like this

stream = GzipCompressorStream(IOBuffer(fbBytes))
newBytes = read(stream)
close(stream)

however the results are different from what I get, when using some temporary file and then reading it back.

open(GzipCompressorStream, filename, "w") do stream
    write(stream, fbBytes)
end

fileBytes = = open("a.fb.gz", "r") do f
    read(f)
end 

I am still getting my head around how the TranscodingStreams, CodecZlib and the base IO work together, so maybe I am using it all wrong. :slight_smile:

It seems there is a direct Array API:

Well that was hiding in plain sight for me. Thanks for pointing that out, I somehow thought that the GzipCompressor was deprecated, however that was the case with the old GzipCompression types. Will try that straightaway.

transcode(GzipCompressor, data) is the simplest way to compress in-memory data. However, it allocates a working space every time you try to compress a chunk of data. If you need to compress a lot of data chunks, you can avoid lots of allocations by reusing pre-allocating a compressor object as described in this example: https://bicycle1885.github.io/TranscodingStreams.jl/stable/examples.html#Transcode-lots-of-strings-1.