Hi! There is a dataset loaded by a 3rd-party library in the form of a raw float array. I want to view this data as a vector of structs A
, each having layout like this: <F32><F32><F32>(B*8*60*3*4)<F32>
, i.e. three floats, then a 8*60*3*4
array of structs B
and then another float. Struct B
is simply three floats.
However I cannot find a reasonable way to do this using reinterpret
or otherwise. reinterpret
requires bitstype
, so something like StaticArrays
for the inner arrays. However, StaticArrays
has poor performance, especially compilation part, for arrays that large.
Any other way to view a raw buffer array as an array of large nested structs?
I am not sure I understand the specs, but what about
data = Float64.(1:(3 + 3*8*60*3*4 + 1))
struct B{T}
a::T
b::T
c::T
end
header = data[1:3]
Bs = reinterpret(B{Float64}, @view data[4:(end - 1)])
footer = data[end]
That said, a static array type instead of B{Float64}
should be fine too.
I mean something more along these lines:
struct B
a::Float32
b::Float32
c::Float32
end
struct A
a::Float32
b::Float32
c::Float32
d::StaticArray{Tuple{3,8,60,3,4}, B}
e::Float32
end
reinterpret(A, data)
This basically works, but many operations are slow due to the large staticarray.
I don’t understand the motivation to use static arrays like this, this is known to be suboptimal.
Sure, this was just to explain what I need. Replacing StaticArray with Array does not work here, because reinterpret
requires bits type.
Is there a reason you can’t do what I suggested above?
I would really like to use the resulting array x
as x[i].a
, x[i].d[a, b, c, d, e]
and so on. As I understand, your code handles a single element of this (single A
struct), and cannot be applied to the whole array of them.
I don’t think what you’re asking for is possible without gigantic static arrays (or an equivalent giant bitstype) because the memory layout of the natural non-bitstype julia structs would simply be different than the layout of your vector.
But there may still be a way to get what you want. If your goal is just to avoid expensive copying, then perhaps you can construct each d
field from a reinterpret(B, view(data, i, j))
where i and j are the relevant indices for each A
’s data. You’d still need to copy the extra floats in each A, but that should be very cheap.
Thanks! This is probably the closest possible solution. However, copying the floats has an effect that when they are modified by assigning to a field in the resulting array (of mutable struct
s), these changes do not propagate to the original data buffer. Currently this is good enough for me, but I’m still interested if someone can think of a way to make a true view of a raw buffer array as an array of complex structs like in this example.
The most realistic variant is something like
julia> struct arrPtr<:AbstractArray{Float32, 4}
ptr::Ptr{Float32}
end
julia> Base.size(::arrPtr) = (8, 60, 3, 4)
julia> Base.getindex(a::arrPtr, i) = unsafe_load(a.ptr, i)
julia> Base.setindex!(a::arrPtr, x, i) = unsafe_store!(a.ptr, x, i)
julia> Base.IndexStyle(::Type{arrPtr}) = Base.IndexLinear()
julia> struct Bptr
ptr::Ptr{Float32}
end
julia> function Base.getproperty(b::Bptr, s::Symbol)
ptr = getfield(b, 1)
if s == :a
return unsafe_load(ptr, 1)
elseif s==:b
return unsafe_load(ptr, 2)
elseif s==:c
return unsafe_load(ptr, 3)
elseif s==:d
return arrPtr(ptr + 12)
elseif s==:e
return unsafe_load(ptr, 3 + 5760)
else
error()
end
end
julia> function Base.setproperty!(b::Bptr, s::Symbol, v::Float32)
ptr = getfield(b, 1)
if s == :a
return unsafe_store!(ptr, v, 1)
elseif s==:b
return unsafe_store!(ptr, v, 2)
elseif s==:c
return unsafe_store!(ptr, v, 3)
elseif s==:e
return unsafe_store!(ptr, v, 3 + 5760)
else
error()
end
end
julia> struct asBptr<:AbstractVector{Bptr}
ptr::Ptr{Float32}
len::Int
keepalive::Any
end
julia> asBptr(arr::Array{Float32}) = asBptr(pointer(arr), div(length(arr), 16+4*5760), arr)
julia> Base.size(buf::asBptr) = (buf.len,)
julia> Base.getindex(buf::asBptr, i) = Bptr(buf.ptr + (16 + 4*5760)*(i-1))
julia> Base.IndexStyle(::Type{asBptr}) = Base.IndexLinear()
This gives use like
julia> ab = asBptr(arr)
43-element asBptr:
Bptr(Ptr{Float32} @0x00007f3acb016040)
Bptr(Ptr{Float32} @0x00007f3acb01ba50)
...
julia> ab[14].a
0.4085027f0
julia> ab[14].d[:, 2, 2, 2]
8-element Array{Float32,1}:
0.15881789
0.41884196
0.07280159
0.8976873
0.97513235
0.2145493
0.5796931
0.08459842
julia> ab[14].d[:, 2, 2, 2].=0;
This kind of approach is effectively without alternative if you deal with a memory mapped file and mutations need to be visible to other processes.
Note that there are no boundschecks. asBptr
keeps the underlying storage alive, but Bptr
or arrPtr
don’t (so make sure that the garbage collector does not steal your storage away!).
However, this is not type stable, right? Due to the if
block in getproperty
returning different types that depend on value of s
passed.
If a function like that getproperty
method above is called with a constant (as is the case when you do foo.bar
), the compiler can propagate that constant through the function and figure out the return type, even in cases that look type-unstable. For example:
julia> function looks_type_unstable(s::Symbol)
if s == :a
1.0
else
"hello"
end
end
looks_type_unstable (generic function with 1 method)
julia> function passes_a_constant()
looks_type_unstable(:a)
end
passes_a_constant (generic function with 1 method)
julia> @code_warntype(passes_a_constant())
Body::Float64
1 ─ return 1.0