Why does `reinterpret` cause an extra allocation?


#1

Hello all. I’ve recently taken to removing pointers from some code and investigating whether there is anything which would not be possible to do efficiently in Julia without them.

First exhibit, the behavior of reinterpret. Consider this function

function wrap(::Type{T}, A::Vector) where T
    ptr = convert(Ptr{T}, pointer(A))
    unsafe_wrap(Array, ptr, length(A))
end

const A = rand(Int64, 10^6)

(yes, I know this isn’t safe because of the length, it’s just for demonstration purposes) then

julia> @benchmark reinterpret(Float64, A)
BenchmarkTools.Trial: 
  memory estimate:  80 bytes
  allocs estimate:  2
  --------------
  minimum time:     30.747 ns (0.00% GC)
  median time:      32.407 ns (0.00% GC)
  mean time:        36.406 ns (6.97% GC)
  maximum time:     959.294 ns (88.39% GC)
  --------------
  samples:          10000
  evals/sample:     994
julia> @benchmark wrap(Float64, A)
BenchmarkTools.Trial: 
  memory estimate:  80 bytes
  allocs estimate:  1
  --------------
  minimum time:     18.565 ns (0.00% GC)
  median time:      23.097 ns (0.00% GC)
  mean time:        25.844 ns (7.53% GC)
  maximum time:     723.059 ns (90.19% GC)
  --------------
  samples:          10000
  evals/sample:     997

So, it appears that reinterpret causes an extra allocation that results in it being a bit slow. Any ideas as to why this is the case and what, if anything, can be done to make it as efficient as using a pointer?


#2

I just realized that I may have answered my own question, perhaps the extra allocation has something to do with ensuring that the data sizes are compatible?


#3

You need to create a new Julia Array which is allocated on the heap. In what application do you use this that makes 30 ns be considered slow?


#4

I don’t consider it slow, I consider it slower than it possibly could be (only because the pointers are faster). The case in which this type of performance might matter is if performing a huge number of such operations on a large in-memory data buffer.

So I guess I’m just missing something fundamental about what goes on here. My understanding was that both of these operations create Julia arrays that point to the same underlying data. In either case the only thing being allocated is the Julia array (just a reference) that is passed as a return value. I would have thought that there’d be a new Julia array allocated on the heap in both cases, what am I missing?


#5

The correct way of reinterpreting an array, is by using reinterpret. The reinterpreted array keeps the original array safe from being garbage collected for example.

The size of the buffer doesn’t matter so not sure what your point is here. Also why would you do a huge numbers if reinterprets? Presumably you do it because you want to operate on the data…

As a s general remark, there is a recent trend on discourse to quite quickly grab after unsafe methods, pointers, C-style casting of types etc. This is quite unfortunate. It is not how you program in Julia and by adopting that kind of workflow one is throwing away many of the things that make julia nice to program in.


#6

Note that things have changed dramatically on 0.7 with ReinterpretArray:

# 0.6
julia> @btime reinterpret(Float64, A)
  51.480 ns (2 allocations: 80 bytes)
1000000-element Array{Float64,1}:

# 0.7
julia> @btime reinterpret(Float64, A)
  10.995 ns (1 allocation: 16 bytes)
1000000-element reinterpret(Float64, ::Array{Int64,1}):

#7

I didn’t think there were rules telling people just how they should program in Julia.
There used to be a (poorly named) philosophy that Julia programmers should be considered responsible adults, and not be prevented from doing what they needed to do.


#8

When dealing with operations that are just a couple nanoseconds, that’s still way too slow to use.


#9

Oh give it a rest, please. We’re at 3x a single function call.

julia> @btime identity(A)
  3.543 ns (0 allocations: 0 bytes)
1000000-element Array{Int64,1}:

I imagine that in many cases the allocation for the wrapper will be elided if it doesn’t escape — and someday there will be a general solution there, too.


#10

What are you trying to reinterpret? Going between Vectors of static vectors to a matrix is the common case I am thinking about, that any operation on the matrix is huge compared to 3.5ns.


#11

If reinterpreting is so fast, why does the Base Julia code for strings use the same techniques that I do?

@inline function codeunit(s::String, i::Integer)
    @boundscheck checkbounds(s, i)
    GC.@preserve s unsafe_load(pointer(s, i))
end

write(io::IO, s::String) =
    GC.@preserve s unsafe_write(io, pointer(s), reinterpret(UInt, sizeof(s)))

#12

Strings - operations on strings are often just a few ns.


#13

Construction of this object just isn’t all that informative by itself. More informative is:

julia> A = rand(Int8, 100);

julia> @btime reinterpret(UInt8, $A);
  11.143 ns (1 allocation: 16 bytes)

julia> Ar = reinterpret(UInt8, A);
       @btime getindex($Ar, $1)
  3.041 ns (0 allocations: 0 bytes)

julia> fA(A, i) = reinterpret(UInt8, A)[i];
       @btime fA($A, $1);
  3.540 ns (0 allocations: 0 bytes)

# Strings are similar:
julia> S = randstring(100);

julia> @btime codeunits($S);
  11.139 ns (1 allocation: 16 bytes)

julia> Sr = codeunits(S);
       @btime getindex($Sr, $1);
  4.551 ns (0 allocations: 0 bytes)

julia> fS(S, i) = codeunits(S)[i]
       @btime fS($S, $1);
  5.031 ns (0 allocations: 0 bytes)

You’re always welcome to dig into internals yourself, but it’s not something I’d encourage on the general Usage board.


#14

Legacy, special-case, temporary, implementation details. It’s not uncommon for languages implementations to need to include small amounts of code that violates recommended practice in order to provide a stable, public API. So, yeah, it happens, but it usually doesn’t belong on a mailing list. You have to know your audience. The group of people who need to work on these internal features is small. Most users shouldn’t need to know about these details: there’s an enormous difference between being permitted to do unsafe operations, and providing general guidance on how to use the language well. </rant>

Note too that we don’t do anything like this for Arrays, just for the very limited case of Strings. The standard library makes a strong and clear distinction between the two (whenever possible). Benchmarks like ExpandingMan and mbauman are useful for confirming that the expected optimizations are being done on the preferred-style code which avoids unsafe functions.


#15

In this case, ExpandingMan started off stating that he was already using pointers.
I’ve also needed to use them, as nothing else has been fast enough for what I’ve been doing.
(My other option would be to simply stop using Julia, which I’d really hate to do!)

If at some wonderful future, the compiler is made smart enough that there is zero difference in performance,
not 10%, 44%, 2x, etc., then I’d start using “safer” alternatives.

So, to make things perfectly clear, for people who haven’t been dealing with low-level programming with pointers in C/C++/assembly language, they probably should not be messing around with them in Julia either
(if you have, then it’s great that the capability is there, to really solve the “two-language” problem).


#16

ExpandingMan started by stating that he was trying to rid his code of uses of pointer. Not the same.

This is the promise of C++ (“guaranteed zero cost abstractions”). It’s here, available now, if that is your goal. It is not a goal of Julia. As mbauman’s benchmarks are intended to show, this can lead to optimizing the wrong part of the program for a problem that does not exist.

And if so, they hopefully should have quickly learned what a terrible idea it is to mess with them and why assembly languages have completely lost to C and why there are many modern attempts now to eliminate C in turn, such as, for different target audiences, Rust, Julia, Swift, Javascript, etc., and each with different allocation schemes. But all start from a common premise that pointers should always be managed references.


#17

To be clear, I’m not claiming that the use of pointers in my code was appropriate. In fact, I think it was probably inappropriate, but my starting point was some existing code that made extensive use of them.

I agree. The other reason I was using pointers in the first place (apart from basing what I was doing on some existing code) was ignorance on my part about when exactly data gets copied in Julia. I was extremely paranoid that if I was not extremely careful, at some point some huge data buffer would get copied as an unfortunate side effect of some other code. I’ve done some testing and reading and I think I’m a bit less ignorant about this now. I don’t think it was ever a real problem in the first place.

Yes, I shouldn’t have mentioned the size of the buffer, that was clearly silly. The huge number of reinterprets is because potentially lots of computations would be done with this data, which shouldn’t necessarily involve it getting copied. For instance, if I have two vectors x::Vector{Float64} and y::Vector{Float64} both of which come from some underlying data buffer B::Vector{UInt8}, I don’t want to have to allocate two new arrays for x and y to do x .+ y. This is for a Julia implementation of Apache Arrow for which it is very hard to predict use cases, so I don’t want to bake some fundamental flaw or performance limitation into my code. I’m not going crazy optimizing it yet, but I want to make sure that optimizing it is at least possible. Looking into this stuff is part of the process, and it part of that has been a learning process for me (paranoia is often my downfall).

The good news is, it’s looking more and more to me like there is no need for pointers in any of this code. That is impressive for Julia’s sake!

As for the more general discussion of deprecating pointer, I at least partly agree with @ScottPJones sentiment in that I’d be extremely nervous that there would be some things which could not, even in principle, be done efficiently in Julia without them. In my opinion there ought to be some extremely solid reason for believing that is not the case before deprecating them. However, as should be apparent from the above discussion, I am certainly not qualified to make this determination, and at present I’m not able to come up with a specific example that demonstrates the necessity of pointer (which seems like a good thing).


#18

I agree that you shouldn’t complain about the creation speed; a jl_array is heavier than a pointer (a C++ vector is also heavier). The problem is:

versioninfo()
#Julia Version 0.7.0-DEV.3696
#Commit 9f8496c734* (2018-02-02 16:38 UTC)

A=rand(Int64, 10^6);
w1= wrap(Float64, A);

@btime sum(w1)
# 764.269 ��s (1 allocation: 16 bytes)
#NaN
@btime copy(w1);
#1.506 ms (2 allocations: 7.63 MiB)

w2 = reinterpret(Float64, A);
@btime sum(w2)
#  1.367 ms (7 allocations: 128 bytes)
#NaN
@btime copy(w2);
#  2.149 ms (2 allocations: 7.63 MiB)

w3=reinterpret(Int8, A);
@btime copy(w3);
#  26.967 ms (2 allocations: 7.63 MiB)

w4=reinterpret(Int128, A);
@btime copy(w4);
#11.420 ms (2 allocations: 7.63 MiB)

AA=rand(Int8, 8*length(A));
@btime copy(AA);
#  1.445 ms (2 allocations: 7.63 MiB)

(issue already exists https://github.com/JuliaLang/julia/issues/25014 )


#19

Wow, ok I was starting to think this was all solved an I could just use reinterpret, but now I am confused. I guess I will have to do more testing with more conversions.


#20

Hey, you can now do cool stuff with the new reinterpret like

A2=zeros(UInt8, 8,8);
Av=view(A2, 1, :);
reinterpret(Int64,Av)[1]=(1<<35) -8
A2
8��8 Array{UInt8,2}:
 0x00  0x00  0x00  0x00  0x00  0x00  0x00  0x00
 0x00  0x00  0x00  0x00  0x00  0x00  0x00  0x00
 0xf8  0xff  0xff  0xff  0x07  0x00  0x00  0x00
 0x00  0x00  0x00  0x00  0x00  0x00  0x00  0x00
 0x00  0x00  0x00  0x00  0x00  0x00  0x00  0x00
 0x00  0x00  0x00  0x00  0x00  0x00  0x00  0x00
 0x00  0x00  0x00  0x00  0x00  0x00  0x00  0x00
 0x00  0x00  0x00  0x00  0x00  0x00  0x00  0x00

I mean, reinterpret will now work regardless of whether your AbstractArray had contiguous memory and will emulate the system endianness for you!

(sorry for the snark; I know that this was not the reason for the reinterpret overhaul, but it amused me to no end)