Efficient way to read large array from binary file in slices?

I’ve written a TIFF parser (GitHub - tlnagy/OMETIFF.jl: I/O operations for OME-TIFF files in Julia) and there are still some inefficiencies that I would like to fix.

One problem is that I allocate a large array that will hold my multidimensional image data, but then have to also allocate when I read in the slices of data and then copy the data from the latter into the former. I thought it would be to easy to fix by just passing a view of the larger array to read!, but that doesn’t work:

julia> s = open("julia_memory_blowup.tif")
IOStream(<file julia_memory_blowup.tif>)

julia> a = Array{Float64}(10, 10);

julia> read!(s, view(a, 1, :))
ERROR: MethodError: no method matching read!(::IOStream, ::SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true})
Closest candidates are:
  read!(::IO, ::BitArray) at bitarray.jl:2010
  read!(::AbstractString, ::Any) at io.jl:161
  read!(::IO, ::Array{UInt8,N} where N) at io.jl:387
  ...

Any thoughts of how to avoid allocating for each slice?

I would just define a method for read! which works with subarrays. Possibly submit it as a PR. I came across a similar problem with write & bits types, and did that:
https://github.com/JuliaLang/julia/pull/24234

you could also use readbytes!, which allows you to specify how many bytes you’d like to read. I’m not entirely sure why this is a separate function from read!, rather than just letting read! have an optional 3rd argument, but perhaps someone else knows.

btw, after doing some digging I found here that jeff has in the past supported merging the two functions, and also it looks like @samoconnor once wrote a branch to do so.

2 Likes
julia> readbytes!(s, view(a, 1, :))
ERROR: MethodError: no method matching readbytes!(::IOStream, ::SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true})
Closest candidates are:
  readbytes!(::IOStream, ::Array{UInt8,N} where N) at iostream.jl:278
  readbytes!(::IOStream, ::Array{UInt8,N} where N, ::Any; all) at iostream.jl:278
  readbytes!(::IO, ::AbstractArray{UInt8,N} where N) at io.jl:503
  ...

readbytes! similarly doesn’t work on this problem. read! already supports reading multiple bytes at a time. I was hoping there was an easier solution, but @Tamas_Papp’s might be best way forward. Not sure where to start though.

IMO writing performant code for SubArray will have to take indexing into account.

How would mutating functions compare against non-mutating a[1,:] = read(s, 10);
or possibly faster columnwise a[:,1] = read(s, 10); ?

For my problem in reading binary files using Julia 1.1, I still found that slices of array does not work for read!. For example,

w  = Array{Float32,2}(undef,n1,nw)
read!(fileID, w[:,iw])

does not correctly get the values. As a workaround, I need to allocate another intermediate array to read in the correct ones.

You may want to try something like

read!(fileID, @view w[:,iw])

as your version just make a copy, reads into that, and then the copy is not accessible any more.

See https://docs.julialang.org/en/v1/base/arrays/#Views-(SubArrays-and-other-view-types)-1

2 Likes

read!() still lacks a method for subarrays in Julia 1.1: https://github.com/JuliaLang/julia/issues/32524

2 Likes

Thanks for creating that issue @mgkuhn. I can’t preallocate a single temporary array because the TIFF file type does not guarantee the same layout in memory for each slice[^1]. I would really need a solution to read into a subarray where I could modify the shape of the array for each slice.

[^1]: This is my reading of the spec. I doubt there are many TIFFs out there that would mix striped and non-striped images, but…you never know.

Will this be supported in the upcoming v1.4? I’m kind of confused by the discussion threads linked above.

Yup looks like it, the commit widening the signature for read is in 1.4.0-RC1:

2 Likes