ReinterpretedArray Performance (even worse on 1.8)

cortner · April 27, 2022, 3:31am

I’d like to use reinterpreted arrays to write into and read from an array. Here is a little script that is a bit oversimplified but not too far from my real use-case.

using BenchmarkTools

function cheb!(A, x)
   A[1] = 1 
   A[2] = x 
   for n = 3:length(A) 
      A[n] = 2 * x * A[n-1] - A[n-2]
   end
end

# Standard Array 
A = zeros(100)
# Reinterpreted Array 
B = reinterpret(Float64, zeros(UInt8, 100 * sizeof(Float64)))

# simple benchmark
x = rand()
print("           Array: "); @btime cheb!($A, $x)
print("ReinterpretArray: "); @btime cheb!($B, $x)

I always assumed that the abstraction here would be free, but apparently not. It is ok, not great on Julia 1.7 but terrible on Julia 1.8:

Output:

> j17 chebtest.jl
           Array:    454.965 ns (0 allocations: 0 bytes)
ReinterpretArray:   690.476 ns (0 allocations: 0 bytes)
> j18 chebtest.jl
           Array:   198.605 ns (0 allocations: 0 bytes)
ReinterpretArray:   965.647 ns (0 allocations: 0 bytes)

Julia Versions:

julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635* (2022-02-06 15:21 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.2.0)
  CPU: Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, cyclone)

julia> versioninfo()
Julia Version 1.8.0-beta3
Commit 3e092a2521 (2022-03-29 15:42 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.2.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 1 on 8 virtual cores

cortner · April 27, 2022, 3:41am

Updated script, adding @inbounds, and two more tests. unsafe_wrap and UnsafeArrays into the mix. This largely seems to resolve the problem?

using BenchmarkTools, UnsafeArrays

function cheb!(A, x)
   A[1] = 1 
   A[2] = x 
   for n = 3:length(A) 
      @inbounds A[n] = 2 * x * A[n-1] - A[n-2]
   end
end

# Standard Array 
A = zeros(100)
# Reinterpreted Array 
B = reinterpret(Float64, zeros(UInt8, 100 * sizeof(Float64)))
# unsafe_wrap 
_C = zeros(UInt8, 100 * sizeof(Float64))
ptr = Base.unsafe_convert(Ptr{Float64}, _C)
C = Base.unsafe_wrap(Array, ptr, 100)
# UnsafeArrays
D = UnsafeArray(ptr, (100,))

# simple benchmark
x = rand()
print("           Array: "); @btime cheb!($A, $x)
print("ReinterpretArray: "); @btime cheb!($B, $x)
print("          unsafe: "); @btime cheb!($C, $x)
print("     UnsafeArray: "); @btime cheb!($D, $x)

Results:

> j17 chebtest.jl                                    7s
           Array:   169.928 ns (0 allocations: 0 bytes)
ReinterpretArray:   426.822 ns (0 allocations: 0 bytes)
          unsafe:   170.455 ns (0 allocations: 0 bytes)
     UnsafeArray:   187.381 ns (0 allocations: 0 bytes)

> j18 chebtest.jl                                    8s
           Array:   185.108 ns (0 allocations: 0 bytes)
ReinterpretArray:   230.561 ns (0 allocations: 0 bytes)
          unsafe:   183.610 ns (0 allocations: 0 bytes)
     UnsafeArray:   189.774 ns (0 allocations: 0 bytes)

cortner · April 27, 2022, 3:50am

I thought maybe something weird about the M1, but same on an AMD EPYC-ROME:

j17 test_cheb.jl 
           Array:   373.473 ns (0 allocations: 0 bytes)
ReinterpretArray:   962.550 ns (0 allocations: 0 bytes)
          unsafe:   373.473 ns (0 allocations: 0 bytes)
     UnsafeArray:   161.749 ns (0 allocations: 0 bytes)

EDIT: adding this is without @inbounds, so it seems that UnsafeArray doesn’t do bound checks which explains this behaviour…

jling · April 27, 2022, 3:51am

I once again ask for Base to have this:

function unsafe_arraycast(::Type{D}, ary::Vector{S}) where {S, D}
    l = sizeof(S)*length(ary)÷sizeof(D)
    res = ccall(:jl_reshape_array, Vector{D}, (Any, Any, Any), Vector{D}, ary, (l,))
    return res
end

cortner · April 27, 2022, 3:55am

how is this related to Base.unsafe_wrap and to UnsafeArrays?

suavesito · April 27, 2022, 3:58am

I think is basically what you did with unsafe_wrap + unsafe_convert.

cortner · April 27, 2022, 3:58am

And maybe my most important Question - is there a reason for me to not use unsafe_wrap or UnsafeArrays as long as I always keep around the original reference? E.g. like this:

struct MyVector{T} <: Vector{T}
   _A::Vector{UInt8} 
   A::Vector{T}
end

cortner · April 27, 2022, 4:00am

And then still in the end: isn’t the incredibly poor performance of reinterpreted arrays on J1.8 strange when bounds-checking is enabled?

suavesito · April 27, 2022, 4:02am

I think as long you keep the reference to the original Vector you are safe to use both, the GC will not free the memory exactly because you keep the original reference.

jling · April 27, 2022, 4:02am

this is truly unsafe and best performance because it gives you a native array

julia> unsafe_arraycast(Float64, rand(UInt8, 64))
8-element Vector{Float64}:
 -6.079564859434036e242
  2.8652427427119243e252
 -7.940145865008032e-108
  8.320185091615792e-8
  5.427223515701773e188
  9.184586067914578e-204
  2.342950753478369e-31
  2.653381005394009e266

cortner · April 27, 2022, 5:12am

@jling - thank you. And same principle - I need to keep the reference to the original array?

jling · April 27, 2022, 5:20am

nope,

jl_reshape_array

handles that

suavesito · April 27, 2022, 5:21am

I think no, as the reference to the memory is the same. You can check it with

julia> function unsafe_arraycast(::Type{D}, ary::Vector{S}) where {S, D}
           l = sizeof(S)*length(ary)÷sizeof(D)
           res = ccall(:jl_reshape_array, Vector{D}, (Any, Any, Any), Vector{D}, ary, (l,))
           return res
       end
unsafe_arraycast (generic function with 1 method)

julia> A = zeros(UInt8, 10 * sizeof(Float64));

julia> B = unsafe_arraycast(Float64, A);

julia> pointer(A)
Ptr{UInt8} @0x00007f744d5eba28

julia> pointer(B)
Ptr{Float64} @0x00007f744d5eba28

cortner · April 27, 2022, 5:53am

That’s really nice - thanks for the suggestion

cortner · April 27, 2022, 5:54am

Why do you label it unsafe then?

jling · April 27, 2022, 3:15pm

because it is super unsafe and Julia devs strongly against even having this as unsafe_* function in the base.

Notice this doesn’t work before 1.7 and is likely to break again in the future when jl_reshape_array changes

suavesito · April 27, 2022, 3:34pm

In what sense is it “super unsafe”? Is if because something internals of Julia? Memory aliasing with different types?

Weird, I’m testing on v1.7 and it looks to work just fine.

jling · April 27, 2022, 4:23pm

before 1.7 means 1.6 doesn’t work

jling · April 27, 2022, 4:25pm

we should ask @jameson I guess

suavesito · April 27, 2022, 4:34pm

Sorry, misread.

Oky, doky.

Topic		Replies	Views
Big overhead with the new lazy reshape/reinterpret Internals & Design	35	5066	August 18, 2018
Why does `reinterpret` cause an extra allocation? General Usage	30	4248	February 24, 2018
FAQ: ReinterpretArray vs unsafe_wrap General Usage	2	1751	September 4, 2018
`reinterpret` to a single value from an array of a smaller data type General Usage	24	3260	March 26, 2018
Performance regression with indexing a `reinterpret`ed array in v0.7 Performance	2	552	July 17, 2018

ReinterpretedArray Performance (even worse on 1.8)

Related topics