Access array elements in 'strided groups'

Sorry for confusing terminology, MWE probably makes it clearer

function viewrows(data::Vector, start, stop, ncol)
    datamat = reshape(data, :, ncol)
    dataview = @view datamat[start:stop, :]
    return reshape(dataview, :)
end

julia> viewrows(collect(1:20), 2,5, 2)
8-element reshape(view(::Matrix{Int64}, 2:5, :), 8) with eltype Int64:
  2
  3
  4
  5
 12
 13
 14
 15

Problem is that the reshape locks the array from being appended (which ofc makes sense since what should happen with datamat should someone cling on to it is not clear). I know this is a well known issue, I just cba to dig up the links.

Copying the data is an option ofc, but I donā€™t want to do it because 1) the data is typically sizey and 2) viewrows is typically only called at a point when no more data will be appended. Calls to viewrows happen ā€˜deep insideā€™ to the point where it would be inconvenient for users to copy the data themselves before doing the stuff which eventually calls viewrows.

In case it matters data is typically built by adding one ā€œrowā€ at the time, but I could not find a way to append rows or columns to a matrix which is why I store it in ā€œflatā€ format. If there is such a way Iā€™ll happily do that instead.

I experimented with creating a GroupedRange and it worked without too much hassle when GroupedRange <: AbstractVector{Int} but it was about 6 times slower than the above viewrows. I couldnā€™t figure out how to make it work with GroupedRange <: OrdinalRange. Not sure GroupedRange would even be compliant with OrdinalRange and whether it can be expected to be than the AbstractArray version.

I also looked at Base.Cartesian.@nrefs but it wonā€™t be of any help here since the best I can do is still to materialize an array with the ā€˜strided groupsā€™, right?

Unfortunately I still donā€™t understand what you want to do here.

From your post it seems you want to append to this data, but it is not clear how. Generally speaking, you canā€™t append to views.

Maybe you can reformulate the problem so that you can use the excellent

1 Like

Unfortunately I still donā€™t understand what you want to do here.

Thanks for pointing this out. I constantly struggle with either writing a whole novel which seems to lead people to not want to read it or failing to produce enough context for the problem to be understandable.

What I am doing is decoding/parsing unstructured binary files with an ad-hoc protocol, trying to make things both very fast (goal is to take < 1 second to decode a 1 GB file) while allowing for users to hook into the machinery to e.g deal with data too large to fit in memory.

For the latter, I allow users to inspect the current decoded state which is just a Vector{UInt8} with some surrounding metadata (types basically) on how to interpret it. A chunk of data has a header which tells the size of the next chunk and points to a line in a text-spec saying what types of data are in the chunk (e.g the next following 22 bytes is an UInt16 followed by a Float64 followed by a Int32 etcā€¦). First step of decoding is basically appending the chunks of each unique line in the spec into a single Vector{UInt8} for that line in the spec.

viewrows is basically used in conjunction with the type metadata to select a slice of the raw data blob and reinterpret it to the right type so that that strided steprange of UInt8 gets reinterpreted into e.g. a vector of the N:th type in that chunktype. And no, I did not invent this protocol and I donā€™t have the means to change it.

In the current state, it is a bit of a thorn in the side that should a user make use of the actual values (and not the metadata) to determine if they want a piece of data, the program will fail because after you have viewed the values its no longer possible to append more chunks due to the reshape issue.

Maybe you can reformulate the problem so that you can use the excellent ElasticArrays.jl

This looks like it could be very useful as I could then store the data in matrix shape and append to it. Then there would be no need to reshape the data and therefore the seal is broken. Thank you very much! I will mark your post as the solution once I have verified that it works.