Originally we had
real_data = ntoh.(reinterpret(T, rawdata))
where T
is some element type such as Float64
, rawdata
is always a UIn8
vector.
After some help from Slack, we’ve moved to:
real_data = GC.@preserve rawdata ntoh.(unsafe_wrap(Array, Ptr{_eltype}(pointer(rawdata)), dp÷_size))
where dp = length(rawdata)
and _size = sizeof(_eltype)
.
The latest attemp is to perform ntoh
manually before unsafe_wrap
and hopefully getting SIMD since we know every-N bytes to swap ahead of time:
@inline function fast_ntoh!(rawdata, ::Type{T}) where T
_size = sizeof(T)
@inbounds @views @simd for i in 1:_size:length(rawdata) - _size
r = i:i+_size-1
rawdata[r] .= rawdata[reverse(r)]
end
end
which doesn’t seem to be fast. I’m wondering if there’s some low-level things we can do to speedup ntoh
since we know the grouping.