Big overhead with the new lazy reshape/reinterpret

No, I think array.c should do all the thinking for reinterpreting arrays in some Base.unsafe_reinterpret. That is, this kind of reinterpret is the job for someone who knows all the array flags, all the multi-threaded and multiprocessor special cases, who knows which array to lock when, etc. Think of the following:

julia> A=[1,2,3];
julia> B=reinterpret(UInt64, A);
julia> push!(A, 1);
ERROR: cannot resize array with shared data
julia> push!(B, 1);

I am sceptical that allowing push! to B is a good idea. But some author of array.c thought about this, and every end-user resorting to pointer games is definitely worse than core people giving us the best and safest thing that can be done with zero-overhead, and all the segfaults we asked for when zero-overhead is not possible.

The amount of code is not always a good sign of how much will happen in the runtime. It has been confirmed that this all used to optimize away but now fails to do so.

1 Like

Fair enough, but I remember coming to the conclusion that in many cases this would not be a no-op (it’s been quite a while since I’ve looked at it carefully, so maybe I don’t know what I’m talking about).

I don’t ever remember a time when there weren’t problems with this though, at what point did it get broken?

Even if true, this does not make a Base.unsafe_reinterpret obsolete. Alone the fact that
y? reinterpret(some_T, x): something::Array{some_T} is now type-unstable makes it necessary to sometimes really reinterpret memory regions, not julia objects. And the old reinterpret code is perfectly fine for this job, and so much better than unsafe_wrap hacks that will break on first contact with arrays that have nonzero offset.

Having an unsafe_reinterpret that is basically just a safe copy of an unsafe_wraped array does seem like it would be a useful thing. Basically my current workaround of the reinterpret slowness is to do this by default, but also to allow views of unsafe_wraped arrays when it is guaranteed safe.

After you unsafe_wrap an array it is indistinguishable from a “proper” array. That’s what the old reinterpret does. But a one-liner that grabs the pointer and unsafe wraps it is not optimal: In principle you should handle offsets, lock arrays against resizing and possibly handle other array flags. Views of unsafe_wrap arrays are OK by default. Consider:

julia> zz=[];
       for i=1:50_000
       n=rand(1:30)
       A=zeros(UInt32,n*n+1);
       deleteat!(A,1);
       #B=unsafe_wrap(Array,pointer(A),(n,n))
       B=reinterpret(UInt32,A,(n,n))
       push!(zz,B);
       (i%1000 == 0)&& gc()
       end;
julia> all(iszero.(zz))
true

julia> zz=[];
       for i=1:50_000
       n=rand(1:30)
       A=zeros(UInt32,n*n+1);
       deleteat!(A,1);
       B=unsafe_wrap(Array,pointer(A),(n,n))
       #B=reinterpret(UInt32,A,(n,n))
       push!(zz,B);
       (i%1000 == 0)&& gc()
       end;

julia> all(iszero.(zz))
false

edit: So we see that we corrupted memory in the second variant (all on 0.62). Expecting users to handle these subtleties is slightly harsh, imo.

https://github.com/JuliaLang/julia/pull/27213

19 Likes

You are the man! Thank you so much Keno!

Some benchmaks:

using StaticArrays
m = rand(3, 1_000_000);
v = reinterpret(SVector{3,Float64}, m, (1_000_000,));
sum(v);

v0.6.2

julia> @time sum(v);
  0.002449 seconds (5 allocations: 192 bytes)

current master:

julia> @time sum(v);
  0.016592 seconds (9 allocations: 320 bytes)

Keno’s PR #27213:

julia> @time sum(v);
  0.002688 seconds (9 allocations: 320 bytes)

Nice!

6 Likes

Awesome!

And the new pointerref / pointerset intrinsics for controlling tbaa are also really cool, thanks a lot!

2 Likes

Wow, awesome @keno! All that work on the optimizer and then a reinterpret PR almost immediately, you are one prolific coder! I’m sure all of us will hugely appreciate whatever improvement this can offer.

3 Likes

Keno’s updated PR #28707 has just been merged! As of 8 hours ago, v1.1.0-DEV shows this

julia> using StaticArrays, BenchmarkTools
julia> a = rand(3, 10^4); b = reshape(reinterpret(SVector{3, Float64}, a), (size(a, 2),))
julia> @btime $b[5][2];
  1.347 ns (0 allocations: 0 bytes)
3 Likes

How does it compare to a normal array?

I’m not sure I know what you mean. Are you thinking about reinterpreting an Array of Arrays? In that case, I’d think you cannot use reinterpret, which is actually for Arrays of bitstypes

julia> isbitstype(eltype([[1,2],[3,4]]))
false
julia> isbitstype(eltype([SVector(1,2),SVector(3,4)]))
true

You might be looking for hcat and friends. Maybe I misunderstood?

No, I mean compared to getindex for a normal array b[5] where b::Vector{SVector{3, Float64}}. In v0.6 this would have been the output of the reshape(reinterpret(..)). So I am interested in knowing the cost of these 2 operations in v1.1.0-DEV.

I am asking because I don’t have access to a Linux machine currently and building v1.1.0-DEV on Windows didn’t work for me.

Ah, sorry, yes, they’re the same now. You cannot go much lower than this, I guess.

julia> b=[SVector{3}(rand(3)) for i in 1:10^4];
julia> @btime $b[5][2];
  1.347 ns (0 allocations: 0 bytes)

It’s interesting that, despite the above, the @code_native of the lazy and eager cases is quite different.

1 Like