Would a search directly calling memchr have disadvantages?

Just looking at the ScanByte package by @jakobnissen (thanks for giving me ideas! :slight_smile: ). It’s written to find the first occurrence of a given byte in a byte array. Just as a test I thought I can also call memchr iteratively to find all matches in an array like so:

(P.s. never used ccall before this so let me know if I messed something up)

@inline function byte_scan(mem::Vector{UInt8}, byte::UInt8)
    c = 0
    @GC.preserve begin
        mem_start::Ptr{UInt8} = pointer(mem)
        mem_length = length(mem)
        actual_index::Int64 = 0
        while mem_length > 1
            pos = @ccall memchr(mem_start::Ptr{UInt8}, byte::Cint, mem_length::Csize_t)::Ptr{Cchar}
            if pos == C_NULL
                return c
                mem_start = pos + 1 # not sure how to fix this type instability)
                actual_index = ((pos - pointer(mem)) + 1) % Int64
                mem_length = length(mem) - actual_index 
                c += actual_index
    return c

(Summing them doesn’t make much sense but just to do something with the returned indexes)

Which would be similar to searching it with a loop in base julia:

@inline function base_scan(mem::Vector{UInt8}, byte::UInt8)
    c = 0
    @inbounds for i in eachindex(mem)
        if mem[i] == byte
            c += i
    return c

Using memchr is much faster on my PC:

function test()
    target = 0x41
    arr = rand(UInt8, 10_000_000)
    # Search 'A' in the random char array
    @btime byte_scan($arr, $target)
    @btime base_scan($arr, $target)
    @assert byte_scan(arr, target) == base_scan(arr, target)

  1.161 ms (0 allocations: 0 bytes)
  13.498 ms (0 allocations: 0 bytes)

Will this have any major drawbacks that I’m missing or could I just swap this memchr in?

You could swap memchr in. But you might want to check out Base.findnext, which I believe is also very fast because it calls memchr under the hood.

Is this what you mean:

@inline function next_scan(mem::Vector{UInt8}, byte::UInt8)
    c = 0
    index = 1
    @inbounds while index < length(mem)
        index = findnext(x -> x == byte , mem, index+1)
        if isnothing(index)
            return c 
            c += index 
    return c

That times at:

8.706 ms (0 allocations: 0 bytes)

Looks like base uses memchr in _search and that indeed gets called by findnext. Maybe I’m missing something here :thinking:

It needs to be findnext(isequal(my_byte), my_arr, my_ind). If you use an anoymous function, it will not dispatch to memchr.


Aaah that comes much closer indeed:

  1.115 ms (0 allocations: 0 bytes) # Direct
  1.622 ms (0 allocations: 0 bytes) # findnext
  4.140 ms (0 allocations: 0 bytes) # loop

Thanks :slight_smile: