FAQ: ReinterpretArray vs unsafe_wrap

foobar_lv2 · September 1, 2018, 4:21pm

There were a lot of questions and discussions about reinterpret vs unsafe_wrap since the new reinterpret arrived in 0.7, so I thought to make a thread to collect some of the answers and hopefully get some authoritative answers from @Keno archived here. I’ll start by describing how the current state looks like to me, and hope to be corrected if I get something wrong, and also hope for examples by others.

What is this about? We want to reinterpret an array’s buffer. The most common examples are (a) between T and SVector{T}, where T = Int64, Int32, Float64, Float32, etc, (b) real and complex, (c) various structs.
Why would you use the unsafe_wrap instead of reinterpret?
Why and when is unsafe_wrap bad?
How do you safely use unsafe_wrap when you must?

Let’s give more answer to 1 via an example (Version 1.0.0 (2018-08-08)):

using StaticArrays
using Random
using LinearAlgebra
using BenchmarkTools

n=100;m=3000; 
M_sv=Vector{SVector{n,Float64}}(undef,m);
V_re = reinterpret(Float64,M_sv);
M_re=reshape(V_re, (n,m));
M_uw = unsafe_wrap(Array,reinterpret(Ptr{Float64},pointer(M_sv)), (n,m));
rand!(M_uw);

and now we can give a partial answer to 2: Certain operations are slow on ReinterpretArray between certain types on certain julia versions. Somehow I believed that the below had gotten patched in the meantime, but it is slow on the archlinux 1.0.0 version, so let’s see:

@btime sum(sum($M_sv)); #102.901 μs (3 allocations: 1.77 KiB)
@btime sum($V_re);#7.836 ms (0 allocations: 0 bytes)
@btime sum($M_re);#8.544 ms (0 allocations: 0 bytes)
@btime sum($M_uw);#103.828 μs (0 allocations: 0 bytes)
@btime *(M_re',M_re); 
res=Matrix{Float64}(undef,m,m);
@btime mul!(res,M_uw',M_uw);#  51.057 ms (1 allocation: 16 bytes)#51.057 ms (1 allocation: 16 bytes)
@btime mul!(res,M_re',M_re);#50.932 ms (1 allocation: 16 bytes)

So the answer to (2) is that reinterpret on arrays produces a ReinterpretArray (that you may need to reshape). This makes access sometimes slower (hopefully already/soon fixed in many cases), could sometimes give longer compile times, and may give you trouble if you or your dependecies dispatch on Array instead of more abstract arrays. You or your dependencies might dispatch on Array because you want to hand your data over to C/Fortran/etc codes that expect a specific layout, or because you did not want to think about custom array types. We see that LinearAlgebra appears to be capable of passing through the pointers of reinterpreted arrays to julia’s BLAS, but you’ll need to see whether your dependencies get this right.

Now, why is unsafe_wrap problematic? I see three issues:

It is unstable. Internals and then all the answers to all these questions may change, and then your code produces wrong results.
Type based aliasing analysis. This was the main reason for the new reinterpret (it used to be mostly equivalent to unsafe_wrap). An array is essentially a pointer to a buffer with some metadata; and the compiler assumes that the buffers to differently typed buffers cannot overlap (alias). This makes code faster, but may produce wrong results if you shuffle data inside the same buffer, once accessed via M_sv and once via M_uw. So, don’t do that!
Relocation and object lifetime. In the above example, M_sv believes that it owns the buffer, and nobody else holds pointers to it. If M_sv dies, the buffer will be freed and access to M_uw can corrupt memory. If you push! to M_sv, the buffer can be realloced, and access to M_sv can corrupt memory (so don’t push to stuff that has living wraps!).

Now, how do you use unsafe_wrap safely? Well, don’t ever do anything that aliases in the same “context”, and make sure that the base lives at least one “context” longer than the unsafe_wrap. unsafe_wrap is very cheap: You can simply discard it after use and make a new wrap if you need it again (outside of inner loops). What does “context” mean? Well, as far as the compiler can see during optimization. I think that @noinline function boundaries are enough separation? That is, code like the following should be OK for aliasing if foo! and bar! are @noinline. Likewise, objects can be freed before execution reaches the line-number where they go out of scope, if the compiler infers that they are not used afterwards; but @noinline function boundaries should prevent the compiler from noticing this?

I’d like @Keno’s confirmation on these points before anyone trusts me on this. I am not sure how far IPO looks; is it necessary to jump through further hoops to prevent the compiler from discovering the juicy but poisonous no-alias information?

@noinline f(M_uw, M_sv)
for i=1:10
foo!(M_uw)
bar!(M_sv)
end 
end

Keno · September 1, 2018, 5:13pm

This is pretty much correct. There’s also alignment and data layout issues (e.g. you can’t unsafe_wrap an unaligned pointer such as a memory buffer as Float64). For the TBAA issue, you basically need to guarantee that you don’t access the pointer through any other array for as far as the compiler can see.

This is probably true enough for now, but I’m not comfortable guaranteeing that that will stay the case in the future. Inlining is mostly orthogonal to IPO and in general @noinline would ideally be considered more of a hint.

The biggest problem here with all of this code is that the lifetime of objects is unintuitive , so say that in your example M_sv owned the memory and M_uw was the reinterpreted one. Since the compiler can’t see the memory relationship between them, it would be perfectly legal for it to precompute the result of bar! (e.g. because it doesn’t depend on the exact data of M_sv and then free it (and thus the storage for M_uw) before it even got to the foo!. If you’re wrapping another julia-tracked value, the only safe thing to do is to preserve the memory owner using @GC.preserve.

foobar_lv2 · September 4, 2018, 2:55pm

Thanks!

A more brutal way for aliasing could be to write

foo!(M_sv)
ccall(...) #does nothing, but don't tell the compiler
bar!(M_uw)

That way, we should prevent any aliasing shennigans, also in future versions, since julia cannot know whether the ccall mutates or depends on M_sv or M_uw. Would I need to pass pointer(M_uw) and pointer(M_sv) to the ccall to achieve this effect?

A second question regarding garbage collection: It used to be possible to make it so that the new (wrapped) array keeps the buffer alive (the old reinterpret). Can this be recovered, e.g. by fiddling with the shared flags?

That way, the only remaining problem would be that the resulting unsafe_reinterpret carries wrong aliasing info; but this is irrelevant in most applications.

Topic		Replies	Views
Big overhead with the new lazy reshape/reinterpret Internals & Design	35	4904	August 18, 2018
`reinterpret` to a single value from an array of a smaller data type General Usage	24	3149	March 26, 2018
Reinterpret to existing vector Performance question , performance	16	551	January 29, 2023
Unsafe_wrap with array of arrays General Usage	8	1828	January 3, 2018
Reusing preallocated memory without unsafe wraps Performance multithreading , memory , memory-allocation	8	708	February 27, 2022

FAQ: ReinterpretArray vs unsafe_wrap

Related topics