Readbytes! or reading into an existing data block

lwabeke · August 22, 2017, 3:03pm

This must have been asked before, but I couldn’t find it.

How do I use readbytes! to read into a preallocated matrix? Or is there a better function?

Currently I’m using this:

    data = Array{Complex{Int16}}(10, 1000, 3, 1 )   # Allocate correct size, done once

    dataBytes = reshape(data, prod(size(data)) )
    dataBytes = reinterpret(UInt8, dataBytes)   # dataBytes is now an alias of data reinterpreted as byte array

# do this in a processing loop many times
    bytesRead = readbytes!(fid, dataBytes, sizeof(dataBytes) )

but isn’t there a more direct way to use data, instead of creating dataBytes as an alias to the data?

stevengj · August 22, 2017, 4:53pm

Not that I know of (short of re-implementing readbytes! via low-level ccalls). What’s wrong with using reinterpret?

(Note that reading bytes directly into Int16 values in this way is endian-dependent, i.e. the binary files may not be portable to other architectures, although in practice right now almost everyone uses little-endian machines.)

lwabeke · August 23, 2017, 7:57am

In principle nothing is wrong with using reinterpret, it just is annoying that I need to reshape and reinterpret as separate steps.

Thus far on the problems I am mostly working with, trying to get high performance out of Julia in that particular domain, I have developed the hypothesis: that the key to performance is I need to be very careful of creating temporary variables. By now I’m a bit Obsessive Compulsive about any temporary variables no matter how big. At times my Julia coding is starting to feel like a fight to keep the GC away: Create temporary variables and reuse the the whole time (unless the infrequent occurrence of them changing size). Here I’m torn between wrapping functions in a let block to make the temporaries static or passing them in and cluttering the calling interface.

Originally the function performing the readbytes! was taking in the Complex{Int16} array (user interface) and then creating the Vector{UInt8} internally, this meant such a temporary variable every time this function is called. I have now settled with letting the user do the conversion outside once and have my function take it in as a Vector{UInt8}.

I keep feeling I want to be able to manually allocate a block of memory and then be able to at various times instantiate different types of arrays to utilise that as their data storage block. I’m not sure how one who do it, but I guess something like that would be possible in order to be able to interface to C function calls.

Tamas_Papp · August 23, 2017, 8:33am

Among other things. No need to form hypotheses, see the performance tips. Preallocating outputs is there, but other things are equally important. Profiling and benchmarking will give you specific information for particular cases.

stevengj · August 23, 2017, 2:55pm

In general, this is a mistake. For operations on a large (length n >> 1) array, then any O(1) costs (e.g. creating a few small heap-allocated temporaries like reinterpret wrappers) will often be negligible. This is especially true if you are doing O(n) I/O as with readbytes!.

Of course, ultimately you have to do profiling and benchmarking to be certain of where your performance is going, but as a general rule I wouldn’t worry about small allocations outside of innermost loops.

Hoare’s famous quote about premature optimization and “small efficiencies” comes to mind.

Topic		Replies	Views
Readbytes! is bugging me New to Julia binaryio , data	8	2624	April 22, 2019
`reinterpret` to a single value from an array of a smaller data type General Usage	24	3261	March 26, 2018
What we need to do IO in Julia with guaranteed memory safety Internals & Design	11	2165	March 28, 2018
Why does `reinterpret` cause an extra allocation? General Usage	30	4259	February 24, 2018
Does Julia have an analog of Python's readinto method? General Usage question	1	541	September 27, 2017

Readbytes! or reading into an existing data block

Related topics