Why does this pointer code seem to always work?


#1

I wrote some code expecting it to break. Here is a MWE

struct PointerWrapper{T}
    ptr::Ptr{T}
end


struct BufferWrapper{T}
    pw::PointerWrapper
    data::Vector{T}
end


function BufferWrapper(data::Vector{T}) where T
    pw = PointerWrapper{T}(pointer(data))
    BufferWrapper{T}(pw, data)
end

load(bw::BufferWrapper, i::Integer) = unsafe_load(bw.pw.ptr, i)


bw = BufferWrapper([2,3,5,7])
load(bw, 2)

I’ve tried several different variations of this, and it never seems to break, PointerWrapper always seems to get the appropriate pointer. The reason I find this surprising is that I thought that since BufferWrapper is a struct, in general data might be being copied in the last line of the BufferWrapper function.

My question is whether this safety is guaranteed. It would be really cool if it is, because it would simplify design considerably. The only alternative I could see would be to make BufferWrapper a mutable struct and then creating the contained PointerWrapper object only after I create the BufferWrapper with the appropriate data (which seems both ugly and inefficient).


#2

Being a struct has nothing to do with it. The data field of BufferWrapper is just a reference to a Vector{T} that is located elsewhere in memory, so the constructor does not make a copy of the array.

(A struct (immutable) in Julia can still have fields that are references to mutable data.)


#3

Ah that make sense. So is this code predictably safe as long as the BufferWrapper exists?

In fact, let me ask a more general question. I have a use case where I have some (potentially huge and memory mapped) buffer of data, let’s call it B::Vector{UInt8} and then some objects which basically reinterpret portions of the data B and provide an AbstractVector interface. Right now, I have these objects as essentially wrappers of pointers, a bit like my PointerWrapper example above. Alternatively, I could have made them objects which simply hold references to B. The reason I didn’t do this was that I didn’t know what implications that would have for B itself. I didn’t ever want to risk chunks of it being “copied” some how, or to destroy the memory-mapped nature of B. I’m realizing now that I probably should have done more research first. So, my question is, can I be confident that things won’t get screwed up if I do away with the pointers and just carry around B directly?


#4

Carrying around a reference to an array B will never make copies automatically. It is generally the right thing to do rather than messing with pointers.

(Assignments A = B don’t make copies. Passing parameters f(B) doesn’t make a copy. Creating a data structure Foo(B) with a B member doesn’t make a copy.)


#5

Hm, most of that I knew but I was being extra paranoid. I think part of why I hadn’t thought more carefully about this is that my starting point was some existing code that made extensive use of pointers. My other concern is under what circumstances reinterpret is slower than unsafe_wrap. Another fear is that if I do, for instance reinterpret(T, B[a:b]) that I’ll basically be allocating B[a:b] twice, once to create the array and once to create a reinterpreted version of that array, whereas if I do unsafe_wrap I think I’m only doing the latter. It looks like some benchmarking is in order.

Thanks!


#6

For this to work, and be future-proof, you would need to force the compiler to allocate this struct and to always use the pointer indirectly:

function load(bw::BufferWrapper, i::Integer)
    @gc_preserve bw
        return unsafe_load(bw.pw.ptr, i)
    end
end

But in which case, bw.pw[i] may actually be faster / simper / safer. Specifically, not the fastest possible implementation, but the fastest possible correct implementation.