ReinterpretedArray Performance (even worse on 1.8)

And then still in the end: isn’t the incredibly poor performance of reinterpreted arrays on J1.8 strange when bounds-checking is enabled?

I think as long you keep the reference to the original Vector you are safe to use both, the GC will not free the memory exactly because you keep the original reference.

this is truly unsafe and best performance because it gives you a native array

julia> unsafe_arraycast(Float64, rand(UInt8, 64))
8-element Vector{Float64}:

@jling - thank you. And same principle - I need to keep the reference to the original array?



handles that

I think no, as the reference to the memory is the same. You can check it with

julia> function unsafe_arraycast(::Type{D}, ary::Vector{S}) where {S, D}
           l = sizeof(S)*length(ary)÷sizeof(D)
           res = ccall(:jl_reshape_array, Vector{D}, (Any, Any, Any), Vector{D}, ary, (l,))
           return res
unsafe_arraycast (generic function with 1 method)

julia> A = zeros(UInt8, 10 * sizeof(Float64));

julia> B = unsafe_arraycast(Float64, A);

julia> pointer(A)
Ptr{UInt8} @0x00007f744d5eba28

julia> pointer(B)
Ptr{Float64} @0x00007f744d5eba28

That’s really nice - thanks for the suggestion

Why do you label it unsafe then?

because it is super unsafe and Julia devs strongly against even having this as unsafe_* function in the base.

Notice this doesn’t work before 1.7 and is likely to break again in the future when jl_reshape_array changes

In what sense is it “super unsafe”? Is if because something internals of Julia? Memory aliasing with different types? :thinking: :confused:

Weird, I’m testing on v1.7 and it looks to work just fine. :sweat_smile:

before 1.7 means 1.6 doesn’t work

we should ask @jameson I guess

Sorry, misread. :sweat_smile:

Oky, doky. :eyes:

Basically, julia “knows” that 2 different arrays contain unrelated content, so it might optimize by removing any code that doesn’t meet that expectation. But it will only do it sometimes in rare cases, and usually only when you start to rely upon it in a larger application, or if you end up someday trying to use the same code on a new computer. The penalty you see with using ReinterpretArray (which is the same, but disables this optimization assumption) is exactly why we don’t make this the default.

thank you. My application is pretty large - this is a tiny but very important component.

So whatever approach I use, if I keep the original reference but NOT NEVER ACCESS IT, then I’ll definitely be fine?

You might be sometimes, other times you might not. The original UInt8 array might not have the alignment required for the processor to load the values correctly, or the application might just sometimes suffer significant performance penalties on many processors.

Ok, I finally understand the problem. And is there no way to guarantee that the UInt8 array is contiguous in memory?

That is what declaring it Float64 is for?

Concretely the way I want to use this is to have temporary arrays for cases where I can predict their eltype at evaluation time, but not at the time I’m constructing the object where the temporary array is held. So very crudely, I would do something like this:

mutable struct A
    # other stuff

function evaluate(a::A, x::TX)
    T = predict_eltype(TX)
    # resize a.tmp if necessary, or replace with larger array
    tmp = reinterpret(T, a.tmp)
    # ... do some computations in tmp 
    # return output

But then I noticed that the reinterpret sometimes causes a significant slow-down.

I should also acknowledge that the speedup I’m getting by not allocating is not at all clear, sometimes I seem to gain quite a bit, sometimes nothing. Maybe it depends on whether the GC can figure out that I want to reuse my arrays and keeps them around? (Basically that’s the functionality I’m trying to re-implement here…)

Note that this will improve considerably with Make `StridedReinterpretArray`'s `get/setindex` pointer based. by N5N3 · Pull Request #44186 · JuliaLang/julia · GitHub


julia> print("           Array: "); @btime cheb!($A, $x)
           Array:   225.605 ns (0 allocations: 0 bytes)

julia> print("ReinterpretArray: "); @btime cheb!($B, $x)
ReinterpretArray:   2.839 μs (0 allocations: 0 bytes)

julia> VERSION

This PR:

julia> print("           Array: "); @btime cheb!($A, $x)
           Array:   226.522 ns (0 allocations: 0 bytes)

julia> print("ReinterpretArray: "); @btime cheb!($B, $x)
ReinterpretArray:   458.822 ns (0 allocations: 0 bytes)

julia> VERSION

Wish this PR would get reviewed