What are the benefits of using IOBuffer()?

chbian · September 12, 2019, 3:14pm

Are there potential benefits of writing to an IOBuffer() rather than printing to a file IO directly? For example, I see the following implementation,

function append(io::IO, data::SomeCompositeType)
# append a mutable struct to an IO
end

data_str = String(take!(append(IOBuffer(),element)))
open("data_file.txt", "w") do io
    write(io, data_str)
end

Is the IOBuffer step necessary in the above example? What does it buy us? Thanks.

stevengj · September 12, 2019, 3:27pm

Mainly, IOBuffer is used in circumstances where you don’t just want to output to a file, e.g. if you want to output to a string, or you want to preprocess the data before it is written.

I don’t see much point in using an intermediate IOBuffer if you’re just going to dump it straight into a file. (Presumably file I/O is already buffered internally.)

chbian · September 12, 2019, 3:33pm

Thanks.

xiaodai · September 13, 2019, 12:59am

Maybe something like

using BufferedStreams
BufferedOutputStream(open("data_file.txt", "w")) do io 
 # do lots of output
end

will be beneficial if you write lots of thigns to the file.

Tamas_Papp · September 13, 2019, 6:25am

I thought that the OS would take care of read-ahead and buffering pretty well when writing to a file directly.

kristoffer.carlsson · September 13, 2019, 6:43am

Presumably you did some benchmarks to give this advice, do you mind sharing them?

fredrikekre · September 13, 2019, 6:56am

Another usecase might be that you don’t want to modify files on disk until you have all the content (printing might error and you don’t wanna leave a corrupted/half file on disk). E.g. here https://github.com/JuliaLang/Pkg.jl/blob/191b7174cf955311d27820d2d1cb2cd870fb690b/src/project.jl#L174-L176 we first print to an IOBuffer, and only if that is successfull do we open the file on disk and print to that.

xiaodai · September 13, 2019, 9:17am

using BufferedStreams

x = "id".*string.(rand(UInt16,100_000_000))

fn(x) = begin
	io = BufferedOutputStream(open("c:/data/bin.bin", "w"))
	write.(Ref(io), x)
	close(io)
end

gn(x) = begin
	io = open("c:/data/bin2.bin", "w")
	write.(Ref(io), x)
	close(io)
end

using BenchmarkTools

@btime fn($x)
@btime gn($x)

there you go

xiaodai · September 13, 2019, 9:21am

See above. Hmmm, I heard from my Rust programmer friend that it’s still better to manager your own buffer with IO. He wrote this program for me that was 10x faster than anything on the market… so I think he knows his stuff

Tamas_Papp · September 13, 2019, 9:34am

Thanks, I did not know this. I can reproduce your timings on Linux.

kristoffer.carlsson · September 13, 2019, 9:42am

Writing to files in Rust is AFAIU completely unbuffered so using a BufWriter is crucial, while Julia should use libuv for buffering. If Julia didn’t use any buffering there would be a much larger difference than a factor of 2 here. It is interesting to see that there is a difference at all though, perhaps the BufferedOutputStream buffering is more efficient than the libuv buffering in this case.

Sukera · September 13, 2019, 10:51am

I can’t reproduce this in WSL on W10.

julia> include("test.jl")
  3.322 s (5226 allocations: 763.30 MiB)
  3.160 s (11 allocations: 762.94 MiB)

shell> cat test.jl
using BufferedStreams, BenchmarkTools

x = "id".*string.(rand(UInt16, 100_000_000))

f(x) = begin
    io = BufferedOutputStream(open("tmp.bin", "w"))
    write.(Ref(io), x)
    close(io)
end

g(x) = begin
    io = open("tmp2.bin", "w")
    write.(Ref(io), x)
    close(io)
end


@btime f($x)
@btime g($x)

julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4

xiaodai · September 13, 2019, 11:06am

I am on 1.3-rc1. I notice there are more allocations in 1.2 3.322 s (5226 allocations: 763.30 MiB)

I just thought BufferedOutput is always good but not so sure anymore…

Sukera · September 13, 2019, 11:14am

True, but does that matter when both timing and used memory are about the same?

xiaodai · September 13, 2019, 11:22am

Just curious. Because in 1.3 there are much fewer allocations. I assume each allocation would incur some distinct cost

Sukera · September 13, 2019, 11:29am

Yeah, true! I’m more wondering why the additional buffered version is 3s faster on your machine than the regular version…

kristoffer.carlsson · September 13, 2019, 12:23pm

Ah, this is on 1.3… Small writes on 1.3 are slower because they are now thread safe (and thus needs to lock for every write). BufferedStreams is not thread safe so it avoids the overhead of locking.

chakravala · September 13, 2019, 1:48pm

In some circumstances, using the IOBuffer is faster and more efficient to build a string from an object. For example, consider constructing a string by concatenating a String objects instead, then you would be creating a lot of new instances of String in the process. With IOBuffer those constructors are skipped with the hypothetical append because you are streaming characters into a single constructor, instead of calling multiple constructors to build up whatever String you are trying to get from the element object.

wizofe · January 28, 2022, 3:20pm

This is still valid 2 years later. Inserting it here just for reference My system is a Macbook Pro 2018; 2.5GHz Intel i7 and 16GB 1600 DDR3 RAM running on Big Sur 11.6.3

CleanShot 2022-01-28 at 17.20.34

sem-geologist · May 25, 2022, 1:17pm

I am very new to Julia, so please correct me if I am wrong or use this not as intended.
I use IOBuffer when I need to concatenate larger binary strings for parsing. I come from python and I use it in Julia similarly to BytesIO from python. I am working with readers of proprietary data files where data can be chunked into pieces and pieces spread over whole file. Parsing of such file with direct IOStream (obtained with open) is painful, in particularly that it can be chunked with few more layers (chunked compression and encryption). Using IOBuffer allows to reduce complexity concatenating spread chunked data into continues data representation layer before going forward with next layer of parsing.

Topic		Replies	Views
Avoiding small strings and using IOBuffer Performance strings , io	7	575	December 1, 2023
String optimisation in Julia General Usage performance , strings , io	21	607	September 21, 2024
Is it suggested to write functions for ::IO? Performance question	6	405	April 20, 2022
Understanding Logging in Julia General Usage	1	306	February 6, 2020
File IO Buffers too small? Performance binaryio , io	14	1719	November 25, 2022

What are the benefits of using IOBuffer()?

Related topics