Efficient way to read large array from binary file in slices?

tlnagy · November 2, 2017, 7:20pm

I’ve written a TIFF parser (GitHub - tlnagy/OMETIFF.jl: I/O operations for OME-TIFF files in Julia) and there are still some inefficiencies that I would like to fix.

One problem is that I allocate a large array that will hold my multidimensional image data, but then have to also allocate when I read in the slices of data and then copy the data from the latter into the former. I thought it would be to easy to fix by just passing a view of the larger array to read!, but that doesn’t work:

julia> s = open("julia_memory_blowup.tif")
IOStream(<file julia_memory_blowup.tif>)

julia> a = Array{Float64}(10, 10);

julia> read!(s, view(a, 1, :))
ERROR: MethodError: no method matching read!(::IOStream, ::SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true})
Closest candidates are:
  read!(::IO, ::BitArray) at bitarray.jl:2010
  read!(::AbstractString, ::Any) at io.jl:161
  read!(::IO, ::Array{UInt8,N} where N) at io.jl:387
  ...

Any thoughts of how to avoid allocating for each slice?

Tamas_Papp · November 2, 2017, 7:37pm

I would just define a method for read! which works with subarrays. Possibly submit it as a PR. I came across a similar problem with write & bits types, and did that:
https://github.com/JuliaLang/julia/pull/24234

ssfrr · November 2, 2017, 8:28pm

you could also use readbytes!, which allows you to specify how many bytes you’d like to read. I’m not entirely sure why this is a separate function from read!, rather than just letting read! have an optional 3rd argument, but perhaps someone else knows.

ssfrr · November 2, 2017, 9:02pm

btw, after doing some digging I found here that jeff has in the past supported merging the two functions, and also it looks like @samoconnor once wrote a branch to do so.

tlnagy · November 3, 2017, 6:50pm

julia> readbytes!(s, view(a, 1, :))
ERROR: MethodError: no method matching readbytes!(::IOStream, ::SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true})
Closest candidates are:
  readbytes!(::IOStream, ::Array{UInt8,N} where N) at iostream.jl:278
  readbytes!(::IOStream, ::Array{UInt8,N} where N, ::Any; all) at iostream.jl:278
  readbytes!(::IO, ::AbstractArray{UInt8,N} where N) at io.jl:503
  ...

readbytes! similarly doesn’t work on this problem. read! already supports reading multiple bytes at a time. I was hoping there was an easier solution, but @Tamas_Papp’s might be best way forward. Not sure where to start though.

Tamas_Papp · November 5, 2017, 9:04am

IMO writing performant code for SubArray will have to take indexing into account.

y4lu · February 16, 2018, 7:10am

How would mutating functions compare against non-mutating a[1,:] = read(s, 10);
or possibly faster columnwise a[:,1] = read(s, 10); ?

henry2004y · July 26, 2019, 9:06pm

For my problem in reading binary files using Julia 1.1, I still found that slices of array does not work for read!. For example,

w  = Array{Float32,2}(undef,n1,nw)
read!(fileID, w[:,iw])

does not correctly get the values. As a workaround, I need to allocate another intermediate array to read in the correct ones.

Tamas_Papp · July 27, 2019, 5:50am

You may want to try something like

read!(fileID, @view w[:,iw])

as your version just make a copy, reads into that, and then the copy is not accessible any more.

See Arrays · The Julia Language

mgkuhn · July 31, 2019, 12:49pm

read!() still lacks a method for subarrays in Julia 1.1: https://github.com/JuliaLang/julia/issues/32524

tlnagy · August 11, 2019, 5:41pm

Thanks for creating that issue @mgkuhn. I can’t preallocate a single temporary array because the TIFF file type does not guarantee the same layout in memory for each slice^[1]. I would really need a solution to read into a subarray where I could modify the shape of the array for each slice.

This is my reading of the spec. I doubt there are many TIFFs out there that would mix striped and non-striped images, but…you never know. ↩︎

henry2004y · February 4, 2020, 9:30pm

Will this be supported in the upcoming v1.4? I’m kind of confused by the discussion threads linked above.

tlnagy · February 4, 2020, 9:43pm

Yup looks like it, the commit widening the signature for read is in 1.4.0-RC1:

Topic		Replies	Views
Read binary data of arbitrary dims and type New to Julia binaryio	8	3012	September 9, 2019
Reading binary file in julia 1.0 New to Julia binaryio	13	7795	August 29, 2019
Fast reading of multiple big-endian binary files Performance binaryio	1	784	December 18, 2020
Binary_reading New to Julia binaryio	8	1684	June 27, 2019
How do I read a binary file back into an Array? New to Julia binaryio	1	2085	September 6, 2019

Efficient way to read large array from binary file in slices?

Related topics