ReinterpretedArray Performance (even worse on 1.8)

jameson · April 27, 2022, 5:06pm

Basically, julia “knows” that 2 different arrays contain unrelated content, so it might optimize by removing any code that doesn’t meet that expectation. But it will only do it sometimes in rare cases, and usually only when you start to rely upon it in a larger application, or if you end up someday trying to use the same code on a new computer. The penalty you see with using ReinterpretArray (which is the same, but disables this optimization assumption) is exactly why we don’t make this the default.

cortner · April 27, 2022, 5:08pm

thank you. My application is pretty large - this is a tiny but very important component.

So whatever approach I use, if I keep the original reference but NOT NEVER ACCESS IT, then I’ll definitely be fine?

jameson · April 27, 2022, 5:12pm

You might be sometimes, other times you might not. The original UInt8 array might not have the alignment required for the processor to load the values correctly, or the application might just sometimes suffer significant performance penalties on many processors.

cortner · April 27, 2022, 5:47pm

Ok, I finally understand the problem. And is there no way to guarantee that the UInt8 array is contiguous in memory?

jameson · April 27, 2022, 5:52pm

That is what declaring it Float64 is for?

cortner · April 27, 2022, 5:56pm

Concretely the way I want to use this is to have temporary arrays for cases where I can predict their eltype at evaluation time, but not at the time I’m constructing the object where the temporary array is held. So very crudely, I would do something like this:

mutable struct A
    tmp::Vector{UInt8} 
    # other stuff
end 

function evaluate(a::A, x::TX)
    T = predict_eltype(TX)
    # resize a.tmp if necessary, or replace with larger array
    tmp = reinterpret(T, a.tmp)
    # ... do some computations in tmp 
    # return output
end

But then I noticed that the reinterpret sometimes causes a significant slow-down.

I should also acknowledge that the speedup I’m getting by not allocating is not at all clear, sometimes I seem to gain quite a bit, sometimes nothing. Maybe it depends on whether the GC can figure out that I want to reuse my arrays and keeps them around? (Basically that’s the functionality I’m trying to re-implement here…)

jishnub · April 28, 2022, 6:16am

Note that this will improve considerably with https://github.com/JuliaLang/julia/pull/44186

nightly:

julia> print("           Array: "); @btime cheb!($A, $x)
           Array:   225.605 ns (0 allocations: 0 bytes)

julia> print("ReinterpretArray: "); @btime cheb!($B, $x)
ReinterpretArray:   2.839 μs (0 allocations: 0 bytes)

julia> VERSION
v"1.9.0-DEV.411"

This PR:

julia> print("           Array: "); @btime cheb!($A, $x)
           Array:   226.522 ns (0 allocations: 0 bytes)

julia> print("ReinterpretArray: "); @btime cheb!($B, $x)
ReinterpretArray:   458.822 ns (0 allocations: 0 bytes)

julia> VERSION
v"1.8.0-DEV.1561"

Wish this PR would get reviewed

Topic		Replies	Views
Big overhead with the new lazy reshape/reinterpret Internals & Design	35	4958	August 18, 2018
Why does `reinterpret` cause an extra allocation? General Usage	30	4155	February 24, 2018
FAQ: ReinterpretArray vs unsafe_wrap General Usage	2	1731	September 4, 2018
Reinterpret to existing vector Performance question , performance	16	609	January 29, 2023
`reinterpret` to a single value from an array of a smaller data type General Usage	24	3182	March 26, 2018

ReinterpretedArray Performance (even worse on 1.8)

Related topics