I’m finding important performance regressions when indexing into the new reshape/reinterpret wrappers. (This is causing problems in trying to update NearestNeighbors.jl)
Current v0.7 master
julia> using StaticArrays
julia> a = rand(3, 10^4); b = reshape(reinterpret(SVector{3, Float64}, a), (size(a, 2),))
julia> @btime $b[5][2]
18.164 ns (0 allocations: 0 bytes)
v0.6.1
julia> using StaticArrays
julia> a = rand(3, 10^4); b = reinterpret(SVector{3, Float64}, a, (size(a, 2),));
julia> f(c) = c[5][2]
julia> @btime $b[5][2]
1.346 ns (0 allocations: 0 bytes)
What am I doing wrong? Is this temporary, perhaps, waiting for some other PR?
Thanks for the tip @foobar_lv2, this indeed works as fast as before! Is this equivalent to using the old version of reinterpret? If I do a @gc_preserve a, will anever be garbage-collected even if we exit the scope of b?
I will mark this as a solution to the question, although being somewhat of a hack I’m not sure it is really acceptable as a long term solution for the NearestNeighbor.jl package. I’ll ping @kristoffer.carlsson to see what he thinks…
Well, it circumvents the reason ReinterpretedArray was introduced at all (TBAA) so we will effectively lie to the compiler. I don’t know the internals well enough to predict what effect this will (might) have.
If I do a @gc_preserve a, will a never be garbage-collected even if we exit the scope of b?
b does not prevent a from being garbage collected. If you use b after a gets garbage collected, too bad.
@gc_preserve a begin ... end prevents a from being collected as long as the begin… end block runs.
Realistically, you probably can store a long-time reference to a somewhere (in the root-node of your tree?) and never expose the unsafe_wrapped b to users. Then you are totally fine.
Example (as far as I understood, anyone please correct me if I’m wrong here):
function foo_wrong(n)
a = rand(3, n)
b=unsafe_wrap(Array, reinterpret(Ptr{SVector{3, Float64}},pointer(a)), (size(a,2),), false)
return sum(b)
end
function foo_notwrong(n)
a = rand(3, n)
b=unsafe_wrap(Array, reinterpret(Ptr{SVector{3, Float64}},pointer(a)), (size(a,2),), false)
@Base.gc_preserve a begin s=sum(b) end
return s
end
Yes, that was my impression. I think Keno had good technical reasons (requirements for future progress with the compiler) to do the ReinterpretArray changes.
Garbage collection does not respect scopes. “Going out of scope” is the notion of visibility described in the docs; in reality, an object can be collected once the compiler believes that it will not be accessed anymore (and the compiler does reorder instructions!).
The compiler gets more and more clever, so all these pointer tricks are somewhat dangerous. I personally tend to stash away references in some very gc visible mutable place that is reachable from the user, in order to not have to think about what exactly the compiler will infer (or for very short-lived objects, @gc_preserve).
Also, the “false” for the unsafe_wrap is essential (you don’t want no double free).
I was wondering if there has been any further progress or investigation on this?
I’m getting some segfaults I can’t figure out so more than ever I’m itching to be able to use reinterpret. I’ve had a very hard time figuring out what about reinterpret is even slow. Sometimes accessing individual elements seems perfectly fine, but then I’ll run it through a function or something and it’s inexplicably slow.
I’m also interested to know. Just two days ago I checked out master to try whether there was any progress using the original example I posted, and I didn’t see any improvement yet.
One interesting thing, the allocations have gone way down, 84 → 9, 6.123Kib → 320 bytes, even though its still about 12x slower.
Have you been profiling this?
The allocations are spurious (due to partial compilation stuff?)
julia> @btime sum($v);#on 0.62
2.106 ms (0 allocations: 0 bytes)
I think the proper solution is a Base.unsafe_reinterpret that implements the old reinterpret (i.e. the return type does not know that the array is reinterpreted).
I think the reason for the new reinterpret is that some aliasing assumptions changed, which now makes it unsafe to use both A and unsafe_reinterpret(T,A,...) = unsafe_wrap(Array, convert(T, pointer(X)), ...) in the same loop. In 99% of the cases, you don’t do that and can impose a @noinline function boundary between the loops (hence the fact that we lie to llvm about aliasing does not matter).
At some point I looked carefully at the ReinterpretArray code, there is a lot going on during getindex calls. I suspect some of this will need to be rethought. getindex really needs to be a no-op.
I suspect unless someone wants to take this on in earnest now we will be stuck with unsafe_wrap for 1.0.