Is it possible to reinterpret and reshape without allocating?

So I have data I’m getting over HTTP,

r = rand(UInt8, 847296)

It is arraigned in 3 8 byte chunks, where the first chunk is unwanted metadata, so every 3n -2 chunks is unwanted.
I have written a function that does this:

function parseRawData(data)
    data = @view reinterpret(Float64, data)[(3:length(data)÷8+2) .% 3 .!= 0]
    return reshape(data, 2, length(data)÷2)
end

julia> @time parseRawData(r)
  0.000224 seconds (8 allocations: 564.811 KiB)

But this allocates 0.55MB. Is it possible to do this while reusing all the memory of the original vector with minimal allocations?

I’ve heard that it’s bad to try to access discontinuous memory like this so is that why it can’t be done much better?

This allocates an array.

Yes, but only a small one compared to the 0.5MB one

julia> @time (3:length(r)÷8+2) .% 3 .!= 0
  0.000204 seconds (10 allocations: 13.194 KiB)

Is it possible to do this without allocating?

This code returns the same result but does no allocations:

function parseRawData2(data)
    d = reshape(reinterpret(Float64, data), 3, :)
    return @view d[2:3,:]
end

Views involving logical indexing (arrays of booleans) are much less efficient to work with than range slices like 2:3. (I also think parseRawData2 is clearer.)

6 Likes

Wow! That’s surprising. Very cool.
So I should always try to avoid logical indexing in favor of finding a pure @view slice that does the same thing?
What about stacking views if your criteria get more complex?

Is there a point where this stops being worth it? (other than the code becoming unreadable)

Basically, logical indexing on the view is better than a view that’s created with logical indexing. The hard thing is that a view has to behave as if it was continuous, so if it’s constructed over disjoint indexes it has to do some work

Can you recommend something to read on the subject? Seems really interesting.

You can look at the subarray.jl file which is the source for how views are implemented.

If you are not working with live data then i think you may get better speed performance if you download data and then analyze it offline.

What do you mean download? Isn’t that what I’m doing when I do a HTTP.get(url)?

Yes, You download using http but you connect and disconnect with url many times so this may increase time. Simply try my suggestion. It may increase speed. Earlier I was also working on online data and in my case I saw too much speed :high_speed_train: performance.

So you mean like opening a web-socket?