I reinterpret an array of bytes (UInt8) as Int64 like this
dict = reinterpret(Int64, uncompressed_data)
and when I use dict via a bunch of essentially random indices like so
for i in indices
do_something(dict[i])
end
So I am accessing dict[i] like a random access array. Is this a bad way to use reinterpret? I think using unsafe_wrap to create the dict results in faster random access overall. Although the code is complex, so I need to write a simple MWE to confirm.
… naming an array dict isn’t a particular good choice IMHO…
Anyway, if you get the data directly as pointer or read it from somewhere else, you should create the array with the right type from start.
If you already have the array in a normal Array the creating another array using unsafe_wrap is illegal. You must not pass any pointer from any julia objects to unsafe_wrap. Doing so can crash your program and you are just lucky that it didn’t.
That’s a good point. Parquet calls the array a “dictionary” though, so it’s in relation to the parquet reader. Maybe I call it vec_dict or something.
Sorry for digging out this old topic, but I just stumbled across this in another context. It seems that this is exactly what’s done in RealFFTs.jl:
@yuyichao, does that mean that this shouldn’t be implemented in this way in RealFFTs.jl, or was that more of a simplified/exaggerated statement for newcomers?
help?> unsafe_wrap
search: unsafe_wrap unsafe_write unsafe_swap!
unsafe_wrap(Array, pointer::Ptr{T}, dims; own = false)
Wrap a Julia Array object around the data at the address given by pointer, without making a copy. The pointer element type T determines
the array element type. dims is either an integer (for a 1d array) or a tuple of the array dimensions. own optionally specifies whether
Julia should take ownership of the memory, calling free on the pointer when the array is no longer referenced.
This function is labeled "unsafe" because it will crash if pointer is not a valid memory address to data of the requested length. Unlike
unsafe_load and unsafe_store!, the programmer is responsible also for ensuring that the underlying data is not accessed through two arrays
of different element type, similar to the strict aliasing rule in C.