When to use or not use `reinterpret`?

I reinterpret an array of bytes (UInt8) as Int64 like this

dict = reinterpret(Int64, uncompressed_data)

and when I use dict via a bunch of essentially random indices like so

for i in indices
  do_something(dict[i])
end

So I am accessing dict[i] like a random access array. Is this a bad way to use reinterpret? I think using unsafe_wrap to create the dict results in faster random access overall. Although the code is complex, so I need to write a simple MWE to confirm.

… naming an array dict isn’t a particular good choice IMHO…


Anyway, if you get the data directly as pointer or read it from somewhere else, you should create the array with the right type from start.

If you already have the array in a normal Array the creating another array using unsafe_wrap is illegal. You must not pass any pointer from any julia objects to unsafe_wrap. Doing so can crash your program and you are just lucky that it didn’t.

2 Likes

That’s a good point. Parquet calls the array a “dictionary” though, so it’s in relation to the parquet reader. Maybe I call it vec_dict or something.

Awesome to know.

Sorry for digging out this old topic, but I just stumbled across this in another context. It seems that this is exactly what’s done in RealFFTs.jl:

@yuyichao, does that mean that this shouldn’t be implemented in this way in RealFFTs.jl, or was that more of a simplified/exaggerated statement for newcomers?

1 Like

the same question, then what’s the correct usage of unsafe_wrap?

This. As documented.

For wrapping foreign, non-Julia memory. As discussed above.

2 Likes

And what’s the correct way to implement this, if unsafe_wrap cannot be used?

Where?

help?> unsafe_wrap
search: unsafe_wrap unsafe_write unsafe_swap!

  unsafe_wrap(Array, pointer::Ptr{T}, dims; own = false)

  Wrap a Julia Array object around the data at the address given by pointer, without making a copy. The pointer element type T determines
  the array element type. dims is either an integer (for a 1d array) or a tuple of the array dimensions. own optionally specifies whether
  Julia should take ownership of the memory, calling free on the pointer when the array is no longer referenced.

  This function is labeled "unsafe" because it will crash if pointer is not a valid memory address to data of the requested length. Unlike
  unsafe_load and unsafe_store!, the programmer is responsible also for ensuring that the underlying data is not accessed through two arrays
  of different element type, similar to the strict aliasing rule in C.