Efficient way to read large array from binary file in slices?



I’ve written a TIFF parser (https://github.com/tlnagy/OMETIFF.jl) and there are still some inefficiencies that I would like to fix.

One problem is that I allocate a large array that will hold my multidimensional image data, but then have to also allocate when I read in the slices of data and then copy the data from the latter into the former. I thought it would be to easy to fix by just passing a view of the larger array to read!, but that doesn’t work:

julia> s = open("julia_memory_blowup.tif")
IOStream(<file julia_memory_blowup.tif>)

julia> a = Array{Float64}(10, 10);

julia> read!(s, view(a, 1, :))
ERROR: MethodError: no method matching read!(::IOStream, ::SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true})
Closest candidates are:
  read!(::IO, ::BitArray) at bitarray.jl:2010
  read!(::AbstractString, ::Any) at io.jl:161
  read!(::IO, ::Array{UInt8,N} where N) at io.jl:387

Any thoughts of how to avoid allocating for each slice?


I would just define a method for read! which works with subarrays. Possibly submit it as a PR. I came across a similar problem with write & bits types, and did that:


you could also use readbytes!, which allows you to specify how many bytes you’d like to read. I’m not entirely sure why this is a separate function from read!, rather than just letting read! have an optional 3rd argument, but perhaps someone else knows.


btw, after doing some digging I found here that jeff has in the past supported merging the two functions, and also it looks like @samoconnor once wrote a branch to do so.

julia> readbytes!(s, view(a, 1, :))
ERROR: MethodError: no method matching readbytes!(::IOStream, ::SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true})
Closest candidates are:
  readbytes!(::IOStream, ::Array{UInt8,N} where N) at iostream.jl:278
  readbytes!(::IOStream, ::Array{UInt8,N} where N, ::Any; all) at iostream.jl:278
  readbytes!(::IO, ::AbstractArray{UInt8,N} where N) at io.jl:503

readbytes! similarly doesn’t work on this problem. read! already supports reading multiple bytes at a time. I was hoping there was an easier solution, but @Tamas_Papp’s might be best way forward. Not sure where to start though.


IMO writing performant code for SubArray will have to take indexing into account.


How would mutating functions compare against non-mutating a[1,:] = read(s, 10);
or possibly faster columnwise a[:,1] = read(s, 10); ?