Slowdown with reinterpret

dgleich · December 16, 2022, 4:28pm

I hope I can be forgiven for being surprised at the starkly difference between these implementations.

using BenchmarkTools
function myfunc1(n::Int)
  x = Vector{Int}(undef, 3*n)
  for i in eachindex(x)
    x[i] = i
  end 
  return x
end 

function myfunc2(n::Int)
  xdata = Vector{Tuple{Int,Int,Int}}(undef, n)
  x = reinterpret(Int, xdata)
  for i in eachindex(x)
    x[i] = i
  end 
  return xdata
end 

n = 1_000_000
@btime myfunc1($n);
@btime myfunc2($n);

Results

  3.068 ms (2 allocations: 22.89 MiB) # myfunc1
  16.018 ms (2 allocations: 22.89 MiB) # myfunc2

Is there any way to reinterpret a vector of tuples as a linear array without such a slowdown? Is this a bug somewhere?

Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.4.0)
  CPU: 16 × Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake-avx512)
  Threads: 1 on 8 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS =

jishnub · December 16, 2022, 4:41pm

I think this slowdown might be resolved by

github.com/JuliaLang/julia

Make `StridedReinterpretArray`'s `get/setindex` pointer based.

JuliaLang:master ← N5N3:Contigious

opened 06:23AM - 15 Feb 22 UTC

N5N3

+189 -109

This PR make `StridedReinterpretArray`'s `get/setindex` pure pointer based if it…s root parent is a `Array`. Thus a "Dense" `ReinterpretArray` should behave more like a `Array`. Some examples with better performance: ```julia julia> a = randn(ComplexF64, 100, 100); b = randn(100); c = a * b; julia> aa = reinterpret(Float64, a); cc = reinterpret(Float64, c); julia> @btime LinearAlgebra.generic_matvecmul!($cc, 'N', $aa, $b, LinearAlgebra.MulAddMul(true,false)); 2.544 μs (0 allocations: 0 bytes) # on master 116.900 μs (0 allocations: 0 bytes) ``` ```julia julia> f(x, y) = @inbounds @simd ivdep for i in eachindex(x,y) # ivdep here is useless on master x[i] = ntoh(y[i]) end julia> a = reinterpret(Float64, rand(UInt8, 16000)); julia> @btime f($a, $a) 124.388 ns (0 allocations: 0 bytes) # on master 697.260 ns (0 allocations: 0 bytes) ``` Test has been extended thus all branches should be tested.

but that PR hasn’t seen a lot of attention in a while

This package may be able to address your case:

Ronis_BR · December 16, 2022, 4:56pm

You can cast the memory region. However, notice that this is somewhat unsafe and you will need to take care for Julia GC do not deallocate xdata:

function myfunc3(n::Int)
    xdata = Vector{Tuple{Int,Int,Int}}(undef, n)
    GC.@preserve xdata begin
        p = Base.unsafe_convert(Ptr{Int}, xdata)
        x = unsafe_wrap(Array, p, 3 * n)

        for i in eachindex(x)
          x[i] = i
        end
    end
    return xdata
end

julia> @btime myfunc1($n);
  2.484 ms (2 allocations: 22.89 MiB)

julia> @btime myfunc2($n);
  16.915 ms (2 allocations: 22.89 MiB)

julia> @btime myfunc3($n);
  2.487 ms (3 allocations: 22.89 MiB)

dgleich · December 16, 2022, 7:06pm

Thanks! I was hoping to avoid the unsafe convert, but that seems like a pragmatic way to proceed. Especially with the @preserve.