What are the benefits of using IOBuffer()?

Are there potential benefits of writing to an IOBuffer() rather than printing to a file IO directly? For example, I see the following implementation,

function append(io::IO, data::SomeCompositeType)
# append a mutable struct to an IO
end

data_str = String(take!(append(IOBuffer(),element)))
open("data_file.txt", "w") do io
    write(io, data_str)
end

Is the IOBuffer step necessary in the above example? What does it buy us? Thanks.

1 Like

Mainly, IOBuffer is used in circumstances where you don’t just want to output to a file, e.g. if you want to output to a string, or you want to preprocess the data before it is written.

I don’t see much point in using an intermediate IOBuffer if you’re just going to dump it straight into a file. (Presumably file I/O is already buffered internally.)

8 Likes

Thanks.

Maybe something like

using BufferedStreams
BufferedOutputStream(open("data_file.txt", "w")) do io 
 # do lots of output
end

will be beneficial if you write lots of thigns to the file.

1 Like

I thought that the OS would take care of read-ahead and buffering pretty well when writing to a file directly.

Presumably you did some benchmarks to give this advice, do you mind sharing them?

3 Likes

Another usecase might be that you don’t want to modify files on disk until you have all the content (printing might error and you don’t wanna leave a corrupted/half file on disk). E.g. here https://github.com/JuliaLang/Pkg.jl/blob/191b7174cf955311d27820d2d1cb2cd870fb690b/src/project.jl#L174-L176 we first print to an IOBuffer, and only if that is successfull do we open the file on disk and print to that.

5 Likes
using BufferedStreams

x = "id".*string.(rand(UInt16,100_000_000))

fn(x) = begin
	io = BufferedOutputStream(open("c:/data/bin.bin", "w"))
	write.(Ref(io), x)
	close(io)
end

gn(x) = begin
	io = open("c:/data/bin2.bin", "w")
	write.(Ref(io), x)
	close(io)
end

using BenchmarkTools

@btime fn($x)
@btime gn($x)

there you go

image

3 Likes

See above. Hmmm, I heard from my Rust programmer friend that it’s still better to manager your own buffer with IO. He wrote this program for me that was 10x faster than anything on the market… so I think he knows his stuff

4 Likes

Thanks, I did not know this. I can reproduce your timings on Linux.

1 Like

Writing to files in Rust is AFAIU completely unbuffered so using a BufWriter is crucial, while Julia should use libuv for buffering. If Julia didn’t use any buffering there would be a much larger difference than a factor of 2 here. It is interesting to see that there is a difference at all though, perhaps the BufferedOutputStream buffering is more efficient than the libuv buffering in this case.

6 Likes

I can’t reproduce this in WSL on W10.

julia> include("test.jl")
  3.322 s (5226 allocations: 763.30 MiB)
  3.160 s (11 allocations: 762.94 MiB)

shell> cat test.jl
using BufferedStreams, BenchmarkTools

x = "id".*string.(rand(UInt16, 100_000_000))

f(x) = begin
    io = BufferedOutputStream(open("tmp.bin", "w"))
    write.(Ref(io), x)
    close(io)
end

g(x) = begin
    io = open("tmp2.bin", "w")
    write.(Ref(io), x)
    close(io)
end


@btime f($x)
@btime g($x)

julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4

I am on 1.3-rc1. I notice there are more allocations in 1.2 3.322 s (5226 allocations: 763.30 MiB)

I just thought BufferedOutput is always good but not so sure anymore…

True, but does that matter when both timing and used memory are about the same?

Just curious. Because in 1.3 there are much fewer allocations. I assume each allocation would incur some distinct cost

Yeah, true! I’m more wondering why the additional buffered version is 3s faster on your machine than the regular version…

Ah, this is on 1.3… Small writes on 1.3 are slower because they are now thread safe (and thus needs to lock for every write). BufferedStreams is not thread safe so it avoids the overhead of locking.

3 Likes

In some circumstances, using the IOBuffer is faster and more efficient to build a string from an object. For example, consider constructing a string by concatenating a String objects instead, then you would be creating a lot of new instances of String in the process. With IOBuffer those constructors are skipped with the hypothetical append because you are streaming characters into a single constructor, instead of calling multiple constructors to build up whatever String you are trying to get from the element object.

1 Like

This is still valid 2 years later. Inserting it here just for reference :slight_smile: My system is a Macbook Pro 2018; 2.5GHz Intel i7 and 16GB 1600 DDR3 RAM running on Big Sur 11.6.3

CleanShot 2022-01-28 at 17.20.34

I am very new to Julia, so please correct me if I am wrong or use this not as intended.
I use IOBuffer when I need to concatenate larger binary strings for parsing. I come from python and I use it in Julia similarly to BytesIO from python. I am working with readers of proprietary data files where data can be chunked into pieces and pieces spread over whole file. Parsing of such file with direct IOStream (obtained with open) is painful, in particularly that it can be chunked with few more layers (chunked compression and encryption). Using IOBuffer allows to reduce complexity concatenating spread chunked data into continues data representation layer before going forward with next layer of parsing.

1 Like