Big overhead with the new lazy reshape/reinterpret

pablosanjose · December 8, 2017, 8:02pm

I’m finding important performance regressions when indexing into the new reshape/reinterpret wrappers. (This is causing problems in trying to update NearestNeighbors.jl)

Current v0.7 master

julia> using StaticArrays
julia> a = rand(3, 10^4); b = reshape(reinterpret(SVector{3, Float64}, a), (size(a, 2),))
julia> @btime $b[5][2]
  18.164 ns (0 allocations: 0 bytes)

v0.6.1

julia> using StaticArrays
julia> a = rand(3, 10^4); b = reinterpret(SVector{3, Float64}, a, (size(a, 2),));
julia> f(c) = c[5][2]
julia> @btime $b[5][2]
  1.346 ns (0 allocations: 0 bytes)

What am I doing wrong? Is this temporary, perhaps, waiting for some other PR?

cortner · December 10, 2017, 9:31am

This will hurt me too very soon.

pablosanjose · December 10, 2017, 4:06pm

I confess I gave up, and resorted to copying instead of reinterpreting…. Attempting to improve this issue is still way above my head.

CarloLucibello · December 10, 2017, 11:09pm

maybe it is worth to open an issue on github

kevin.squire · December 11, 2017, 6:18am

Tim Holy already did so: https://github.com/JuliaLang/julia/issues/25014

Cheers,
Kevin

foobar_lv2 · December 11, 2017, 11:52am

Have you tried the following:

assert(typeof(a)==Array{Float64, 2})
assert(3==size(a,1))
b=unsafe_wrap(Array, reinterpret(Ptr{SVector{3, Float64}},pointer(a)), (size(a,2),), false);

Of course this is a dirty workaround and requires you to possibly @gc_preserve a.

pablosanjose · December 11, 2017, 12:11pm

Thanks for the tip @foobar_lv2, this indeed works as fast as before! Is this equivalent to using the old version of reinterpret? If I do a @gc_preserve a, will a never be garbage-collected even if we exit the scope of b?

I will mark this as a solution to the question, although being somewhat of a hack I’m not sure it is really acceptable as a long term solution for the NearestNeighbor.jl package. I’ll ping @kristoffer.carlsson to see what he thinks…

kristoffer.carlsson · December 11, 2017, 12:15pm

Well, it circumvents the reason ReinterpretedArray was introduced at all (TBAA) so we will effectively lie to the compiler. I don’t know the internals well enough to predict what effect this will (might) have.

foobar_lv2 · December 11, 2017, 12:18pm

If I do a @gc_preserve a, will a never be garbage-collected even if we exit the scope of b?

b does not prevent a from being garbage collected. If you use b after a gets garbage collected, too bad.

@gc_preserve a begin ... end prevents a from being collected as long as the begin… end block runs.

Realistically, you probably can store a long-time reference to a somewhere (in the root-node of your tree?) and never expose the unsafe_wrapped b to users. Then you are totally fine.

Example (as far as I understood, anyone please correct me if I’m wrong here):

function foo_wrong(n)
a = rand(3, n)
b=unsafe_wrap(Array, reinterpret(Ptr{SVector{3, Float64}},pointer(a)), (size(a,2),), false)
return sum(b)
end

function foo_notwrong(n)
a = rand(3, n)
b=unsafe_wrap(Array, reinterpret(Ptr{SVector{3, Float64}},pointer(a)), (size(a,2),), false) 
@Base.gc_preserve a begin s=sum(b) end
return s
end

pablosanjose · December 11, 2017, 12:21pm

Yes, that was my impression. I think Keno had good technical reasons (requirements for future progress with the compiler) to do the ReinterpretArray changes.

pablosanjose · December 11, 2017, 12:21pm

Thanks, understood!

foobar_lv2 · December 11, 2017, 12:34pm

To add even more details:

Garbage collection does not respect scopes. “Going out of scope” is the notion of visibility described in the docs; in reality, an object can be collected once the compiler believes that it will not be accessed anymore (and the compiler does reorder instructions!).

The compiler gets more and more clever, so all these pointer tricks are somewhat dangerous. I personally tend to stash away references in some very gc visible mutable place that is reachable from the user, in order to not have to think about what exactly the compiler will infer (or for very short-lived objects, @gc_preserve).

Also, the “false” for the unsafe_wrap is essential (you don’t want no double free).

ExpandingMan · February 24, 2018, 2:15pm

Still seeing this problem, although not quite as badly as in @pablosanjose’s example.

I’m getting a median time of 2.6 ns for accessing an array created with unsafe_wrap, and a median time of 10.0 ns for a reinterpreted array.

ExpandingMan · March 21, 2018, 2:38pm

I was wondering if there has been any further progress or investigation on this?

I’m getting some segfaults I can’t figure out so more than ever I’m itching to be able to use reinterpret. I’ve had a very hard time figuring out what about reinterpret is even slow. Sometimes accessing individual elements seems perfectly fine, but then I’ll run it through a function or something and it’s inexplicably slow.

pablosanjose · March 21, 2018, 2:51pm

I’m also interested to know. Just two days ago I checked out master to try whether there was any progress using the original example I posted, and I didn’t see any improvement yet.

cortner · March 21, 2018, 6:51pm

+1

andyferris · May 22, 2018, 3:41am

Still no improvements on master:

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.5152 (2018-05-21 21:19 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit dc30e38 (0 days old master)
|__/                   |  x86_64-linux-gnu

julia> using StaticArrays

julia> m = rand(3, 1_000_000);

julia> v = reinterpret(SVector{3,Float64}, m, (1_000_000,));
┌ Warning: `reinterpret(::Type{T}, a::Array{S}, dims::NTuple{N, Int}) where {T, S, N}` is deprecated, use `reshape(reinterpret(T, vec(a)), dims)` instead.
│   caller = top-level scope
└ @ Core :0

julia> sum(v);

julia> @time sum(v);
  0.050470 seconds (9 allocations: 320 bytes)

julia> typeof(v)
Base.ReinterpretArray{SArray{Tuple{3},Float64,1,3},1,Float64,Array{Float64,1}}

versus v0.6.2

  _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.2 (2017-12-13 18:08 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-linux-gnu

julia> using StaticArrays

julia> m = rand(3, 1_000_000);

julia> v = reinterpret(SVector{3,Float64}, m, (1_000_000,));

julia> sum(v);

julia> @time sum(v);
  0.004232 seconds (84 allocations: 6.123 KiB)

julia> typeof(v)
Array{StaticArrays.SArray{Tuple{3},Float64,1,3},1}

ScottPJones · May 22, 2018, 4:45am

One interesting thing, the allocations have gone way down, 84 → 9, 6.123Kib → 320 bytes, even though its still about 12x slower.
Have you been profiling this?

foobar_lv2 · May 22, 2018, 4:48pm

The allocations are spurious (due to partial compilation stuff?)

julia> @btime sum($v);#on 0.62
  2.106 ms (0 allocations: 0 bytes)

I think the proper solution is a Base.unsafe_reinterpret that implements the old reinterpret (i.e. the return type does not know that the array is reinterpreted).

I think the reason for the new reinterpret is that some aliasing assumptions changed, which now makes it unsafe to use both A and unsafe_reinterpret(T,A,...) = unsafe_wrap(Array, convert(T, pointer(X)), ...) in the same loop. In 99% of the cases, you don’t do that and can impose a @noinline function boundary between the loops (hence the fact that we lie to llvm about aliasing does not matter).

ExpandingMan · May 22, 2018, 4:54pm

At some point I looked carefully at the ReinterpretArray code, there is a lot going on during getindex calls. I suspect some of this will need to be rethought. getindex really needs to be a no-op.

I suspect unless someone wants to take this on in earnest now we will be stuck with unsafe_wrap for 1.0.

Topic		Replies	Views
Why does `reinterpret` cause an extra allocation? General Usage	30	4154	February 24, 2018
ReinterpretedArray Performance (even worse on 1.8) Performance performance	26	1148	April 28, 2022
Performance regression with indexing a `reinterpret`ed array in v0.7 Performance	2	535	July 17, 2018
FAQ: ReinterpretArray vs unsafe_wrap General Usage	2	1729	September 4, 2018
What we need to do IO in Julia with guaranteed memory safety Internals & Design	11	2135	March 28, 2018

Big overhead with the new lazy reshape/reinterpret

Related topics