ReinterpretedArray Performance (even worse on 1.8)

Basically, julia “knows” that 2 different arrays contain unrelated content, so it might optimize by removing any code that doesn’t meet that expectation. But it will only do it sometimes in rare cases, and usually only when you start to rely upon it in a larger application, or if you end up someday trying to use the same code on a new computer. The penalty you see with using ReinterpretArray (which is the same, but disables this optimization assumption) is exactly why we don’t make this the default.

thank you. My application is pretty large - this is a tiny but very important component.

So whatever approach I use, if I keep the original reference but NOT NEVER ACCESS IT, then I’ll definitely be fine?

You might be sometimes, other times you might not. The original UInt8 array might not have the alignment required for the processor to load the values correctly, or the application might just sometimes suffer significant performance penalties on many processors.

Ok, I finally understand the problem. And is there no way to guarantee that the UInt8 array is contiguous in memory?

That is what declaring it Float64 is for?

Concretely the way I want to use this is to have temporary arrays for cases where I can predict their eltype at evaluation time, but not at the time I’m constructing the object where the temporary array is held. So very crudely, I would do something like this:

mutable struct A
    tmp::Vector{UInt8} 
    # other stuff
end 

function evaluate(a::A, x::TX)
    T = predict_eltype(TX)
    # resize a.tmp if necessary, or replace with larger array
    tmp = reinterpret(T, a.tmp)
    # ... do some computations in tmp 
    # return output
end

But then I noticed that the reinterpret sometimes causes a significant slow-down.

I should also acknowledge that the speedup I’m getting by not allocating is not at all clear, sometimes I seem to gain quite a bit, sometimes nothing. Maybe it depends on whether the GC can figure out that I want to reuse my arrays and keeps them around? (Basically that’s the functionality I’m trying to re-implement here…)

Note that this will improve considerably with https://github.com/JuliaLang/julia/pull/44186

nightly:

julia> print("           Array: "); @btime cheb!($A, $x)
           Array:   225.605 ns (0 allocations: 0 bytes)

julia> print("ReinterpretArray: "); @btime cheb!($B, $x)
ReinterpretArray:   2.839 μs (0 allocations: 0 bytes)

julia> VERSION
v"1.9.0-DEV.411"

This PR:

julia> print("           Array: "); @btime cheb!($A, $x)
           Array:   226.522 ns (0 allocations: 0 bytes)

julia> print("ReinterpretArray: "); @btime cheb!($B, $x)
ReinterpretArray:   458.822 ns (0 allocations: 0 bytes)

julia> VERSION
v"1.8.0-DEV.1561"

Wish this PR would get reviewed

3 Likes