Originally we had
real_data = ntoh.(reinterpret(T, rawdata))
where T is some element type such as Float64, rawdata is always a UIn8 vector.
After some help from Slack, we’ve moved to:
real_data = GC.@preserve rawdata ntoh.(unsafe_wrap(Array, Ptr{_eltype}(pointer(rawdata)), dp÷_size))
where dp = length(rawdata) and _size = sizeof(_eltype).
The latest attemp is to perform ntoh manually before unsafe_wrap and hopefully getting SIMD since we know every-N bytes to swap ahead of time:
@inline function fast_ntoh!(rawdata, ::Type{T}) where T
_size = sizeof(T)
@inbounds @views @simd for i in 1:_size:length(rawdata) - _size
r = i:i+_size-1
rawdata[r] .= rawdata[reverse(r)]
end
end
which doesn’t seem to be fast. I’m wondering if there’s some low-level things we can do to speedup ntoh since we know the grouping.