Efficient way to read large array from binary file in slices?

question
performance

#1

I’ve written a TIFF parser (https://github.com/tlnagy/OMETIFF.jl) and there are still some inefficiencies that I would like to fix.

One problem is that I allocate a large array that will hold my multidimensional image data, but then have to also allocate when I read in the slices of data and then copy the data from the latter into the former. I thought it would be to easy to fix by just passing a view of the larger array to read!, but that doesn’t work:

julia> s = open("julia_memory_blowup.tif")
IOStream(<file julia_memory_blowup.tif>)

julia> a = Array{Float64}(10, 10);

julia> read!(s, view(a, 1, :))
ERROR: MethodError: no method matching read!(::IOStream, ::SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true})
Closest candidates are:
  read!(::IO, ::BitArray) at bitarray.jl:2010
  read!(::AbstractString, ::Any) at io.jl:161
  read!(::IO, ::Array{UInt8,N} where N) at io.jl:387
  ...

Any thoughts of how to avoid allocating for each slice?


#2

I would just define a method for read! which works with subarrays. Possibly submit it as a PR. I came across a similar problem with write & bits types, and did that:
https://github.com/JuliaLang/julia/pull/24234


#3

you could also use readbytes!, which allows you to specify how many bytes you’d like to read. I’m not entirely sure why this is a separate function from read!, rather than just letting read! have an optional 3rd argument, but perhaps someone else knows.


#4

btw, after doing some digging I found here that jeff has in the past supported merging the two functions, and also it looks like @samoconnor once wrote a branch to do so.


#5
julia> readbytes!(s, view(a, 1, :))
ERROR: MethodError: no method matching readbytes!(::IOStream, ::SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true})
Closest candidates are:
  readbytes!(::IOStream, ::Array{UInt8,N} where N) at iostream.jl:278
  readbytes!(::IOStream, ::Array{UInt8,N} where N, ::Any; all) at iostream.jl:278
  readbytes!(::IO, ::AbstractArray{UInt8,N} where N) at io.jl:503
  ...

readbytes! similarly doesn’t work on this problem. read! already supports reading multiple bytes at a time. I was hoping there was an easier solution, but @Tamas_Papp’s might be best way forward. Not sure where to start though.


#6

IMO writing performant code for SubArray will have to take indexing into account.


#7

How would mutating functions compare against non-mutating a[1,:] = read(s, 10);
or possibly faster columnwise a[:,1] = read(s, 10); ?