Read binary data of arbitrary dims and type

Hi, I’m new to Julia (julia 1.1.1  1.0.4). I’m trying to read binary data and struggling to understand how dimensions are handled. There are a few other posts about this but they only address 1-D cases. Typically our data is 2 or 3-D of any one type. From other answers it almost seems that reading the data as a 1-D vector and then reshaping it is the only way to go. But that seems so… inefficient that it’s more likely I’m just missing something.

Btw, I do not have access to other packages. This is on a stand-alone system. New packages will take months to get approval for.

Here’s stevengj’s nicely concise solution with the addition of type as a parameter (works well):

read_bin(filename, dims, T) = read!(filename, Vector{T}(undef, dims))
data = read_bin(‘myfile.img’, (260 * 251), Float32)
>65260-element Array(Float32,1)

I’ve tried passing a list to dims in various ways and changing Vector to Array… but to no avail.
Ideas?

This works:

x = rand(Float32, 10, 10, 10)
write("test.bin", x)
y = Array{Float32}(undef, (10, 10, 10))
open("test.bin") do io
    read!(io, y)
end
y == x
3 Likes

I used to use a simple scheme for this. In the filename contain the type and parse it after reading. After this you can get the file size and determine the number of elements of that type there are by the bytes size.

As far as maintaining shape - well - not sure there. You could make a custom reading function to handle that?

Did you benchmark this? reshape should be very efficient.

No benchmark, just conceptually reading data in as 1 shape then rearranging it sounds inefficient. :slight_smile:
Most of our images are a few 100 MB to a couple of GB is size. This isn’t for production so speed isn’t crucial but it’s still a concern.

Note that output from reshape shares data with the input (see ?reshape), so it is rather fast.

2 Likes

To expand upon this, realize that a multidimensional array is stored as a consecutive sequence of numbers (a “1d array”) in memory — there’s no such thing as “multidimensional memory” in standard CPUs. All reshape does is to reinterpret the same data as a different dimensionality. There is no physical rearrangement.

2 Likes

Sweet! Thank you. I was struggling for an embarrassing amount of time with this.

Since it’s a First Steps post, here’s the whole thing.

function read_dat(filename, dims, T)
    img = Array{T}(undef, (dims))
    open(filename) do io
        read!(io, img)
    end
end

> data = read_dat("myfilename.img", (251, 260), Float32)
251x260 Array{Float32,2}:...

Though I’m not sure how to pass the dims. This doesn’t work:

bands=250; lines=260; samples=440;
fn = “myfile.img”;
dtype = Float32;
data = read_dat(fn, (bands,lines,samples), dtype)

Tried a few variations; ([b,l,s]), ((b,s,l))… ???

1 Like

Ohh, ok. That’s good to know, not what I pictured. Thanks.