Fastest way to swap bytes order in-place (ntoh)

Originally we had

real_data = ntoh.(reinterpret(T, rawdata))

where T is some element type such as Float64, rawdata is always a UIn8 vector.

After some help from Slack, we’ve moved to:

real_data = GC.@preserve rawdata ntoh.(unsafe_wrap(Array, Ptr{_eltype}(pointer(rawdata)), dp÷_size))

where dp = length(rawdata) and _size = sizeof(_eltype).

The latest attemp is to perform ntoh manually before unsafe_wrap and hopefully getting SIMD since we know every-N bytes to swap ahead of time:

@inline function fast_ntoh!(rawdata, ::Type{T}) where T
    _size = sizeof(T)
    @inbounds @views @simd for i in 1:_size:length(rawdata) - _size
        r = i:i+_size-1
        rawdata[r] .= rawdata[reverse(r)]
    end
end

which doesn’t seem to be fast. I’m wondering if there’s some low-level things we can do to speedup ntoh since we know the grouping.

1 Like

https://github.com/JuliaLang/julia/issues/42227

1 Like