Basically, julia “knows” that 2 different arrays contain unrelated content, so it might optimize by removing any code that doesn’t meet that expectation. But it will only do it sometimes in rare cases, and usually only when you start to rely upon it in a larger application, or if you end up someday trying to use the same code on a new computer. The penalty you see with using ReinterpretArray (which is the same, but disables this optimization assumption) is exactly why we don’t make this the default.
thank you. My application is pretty large - this is a tiny but very important component.
So whatever approach I use, if I keep the original reference but NOT NEVER ACCESS IT, then I’ll definitely be fine?
You might be sometimes, other times you might not. The original UInt8 array might not have the alignment required for the processor to load the values correctly, or the application might just sometimes suffer significant performance penalties on many processors.
Ok, I finally understand the problem. And is there no way to guarantee that the UInt8
array is contiguous in memory?
That is what declaring it Float64
is for?
Concretely the way I want to use this is to have temporary arrays for cases where I can predict their eltype
at evaluation time, but not at the time I’m constructing the object where the temporary array is held. So very crudely, I would do something like this:
mutable struct A
tmp::Vector{UInt8}
# other stuff
end
function evaluate(a::A, x::TX)
T = predict_eltype(TX)
# resize a.tmp if necessary, or replace with larger array
tmp = reinterpret(T, a.tmp)
# ... do some computations in tmp
# return output
end
But then I noticed that the reinterpret
sometimes causes a significant slow-down.
I should also acknowledge that the speedup I’m getting by not allocating is not at all clear, sometimes I seem to gain quite a bit, sometimes nothing. Maybe it depends on whether the GC can figure out that I want to reuse my arrays and keeps them around? (Basically that’s the functionality I’m trying to re-implement here…)
Note that this will improve considerably with https://github.com/JuliaLang/julia/pull/44186
nightly:
julia> print(" Array: "); @btime cheb!($A, $x)
Array: 225.605 ns (0 allocations: 0 bytes)
julia> print("ReinterpretArray: "); @btime cheb!($B, $x)
ReinterpretArray: 2.839 μs (0 allocations: 0 bytes)
julia> VERSION
v"1.9.0-DEV.411"
This PR:
julia> print(" Array: "); @btime cheb!($A, $x)
Array: 226.522 ns (0 allocations: 0 bytes)
julia> print("ReinterpretArray: "); @btime cheb!($B, $x)
ReinterpretArray: 458.822 ns (0 allocations: 0 bytes)
julia> VERSION
v"1.8.0-DEV.1561"
Wish this PR would get reviewed