Custom IO performance - loading variables from byte array

sairus7 · June 21, 2020, 4:32pm

I want to load several variables of different bitstypes from a packed byte vector.
For example, Int16, Int32, Int16 from 8-byte chunk of memory.

I’ve tried several approaches to find a most efficient way:

Using IOBuffer.
Using raw pointers with manual unrolled loop.
Using map with raw piointers.
Using generated function that produce code with raw pointers.

So, I have a couple of questions:

Why a local IOBuffer cannot be optimized-out and still allocates?
Why map is boxing ind variable from outer scope and how to deal with?
Is that possible to use type inference to write fast function without @generated code?

Source code:

function load_io(vect::Vector{UInt8})
    io = IOBuffer(vect)
    x1 = read(io, Int16)
    x2 = read(io, Int32)
    x3 = read(io, Int16)
    x1, x2, x3
end

function load_unroll(vect::Vector{UInt8})
    GC.@preserve vect begin
        ind::Int = 1

        ptr1::Ptr{Int16} = pointer(vect, ind)
        x1 = unsafe_load(ptr1)
        ind += sizeof(Int16)

        ptr2::Ptr{Int32} = pointer(vect, ind)
        x2 = unsafe_load(ptr2)
        ind += sizeof(Int32)

        ptr3::Ptr{Int16} = pointer(vect, ind)
        x3 = unsafe_load(ptr3)
        ind += sizeof(Int16)

    end
    x1, x2, x3
end

function load_map(vect::Vector{UInt8})
    GC.@preserve vect begin
        ind::Int = 1
        out = map((Int16, Int32, Int16)) do T
            p::Ptr{T} = pointer(vect, ind)
            x = unsafe_load(p)
            ind += sizeof(T)
            x
        end
    end
    out
end

@generated function load_gen(vect::Vector{UInt8})
    exprs = Expr[]
    for T in (Int16, Int32, Int16)
        ex = quote
            let
                p::Ptr{$T} = pointer(vect, ind)
                x = unsafe_load(p)
                ind += sizeof($T)
                x
            end
        end
        push!(exprs, ex)
    end
    loop_unroll = :(tuple($(exprs...)))

    out_expr = quote
        GC.@preserve vect begin
            ind::Int = 1
            $loop_unroll
        end
    end
    return out_expr
end

using BenchmarkTools

bytes = UInt8[0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8]

load_io(bytes) == load_unroll(bytes) == load_map(bytes) == load_gen(bytes)

@btime load_io($bytes)      # 14.227 ns (1 allocation: 64 bytes)
@btime load_unroll($bytes)  # 1.399 ns (0 allocations: 0 bytes)
@btime load_map($bytes)     # 866.129 ns (13 allocations: 256 bytes)
@btime load_gen($bytes)     # 1.399 ns (0 allocations: 0 bytes)

@code_warntype load_io(bytes)
@code_warntype load_unroll(bytes)
@code_warntype load_map(bytes) # why ind is Core.Box?
@code_warntype load_gen(bytes)

pixel27 · June 22, 2020, 7:11pm

I’m kind of surprised unsafe_load works. I thought loading an Int32 needed to be on a 4 byte boundary. But I haven’t touched assembly in years.

As for issue 1, my guess is that IOBuffer is mutable and therefor causes the allocation when it’s created.

sairus7 · June 23, 2020, 9:39am

Yes, i’m reading Int32 from a 4-byte chunk at the middle [0x3, 0x4, 0x5, 0x6], and 2 bytes for each Int16 on the edges.

Topic		Replies	Views
Efficiently interpreting byte-packed buffer Performance	31	1424	April 6, 2022
What we need to do IO in Julia with guaranteed memory safety Internals & Design	11	2135	March 28, 2018
Reinterpret vector into single struct Performance	9	612	December 18, 2023
Use of `pointer` General Usage	19	6947	February 6, 2018
Unsafe_store! sometimes slower than arrays' setindex!? Performance	15	1443	November 30, 2017

Custom IO performance - loading variables from byte array

Related topics