Finding position of a sequence in a UInt8 array?

Ahmed_Salih · August 29, 2020, 5:23pm

Hello!

Suppose I have:

using Random
rand(MersenneTwister(0),UInt8,10)

10-element Array{UInt8,1}:
 0x40
 0x52
 0x99
 0x35
 0x90
 0xea
 0x00
 0x13
 0xdf
 0x18

Say I want to find the position of sequence “0x90 0xea 0x00”, how would I go about this?

Finding the position of “0x90” is not suitable, this is just a simplified example.

Kind regards

Ahmed_Salih · August 29, 2020, 5:30pm

I have tried to use findfirst, skipchars etc. but only been able to use one char at a time currently

Vasily_Pisarev · August 29, 2020, 5:57pm

You can try the KMP algorithm.

Ahmed_Salih · August 29, 2020, 6:05pm

Thanks for the suggestion, that might be what I have to do

Is what I am wanting “unreasonable” though? I thought that perhaps one would already be able to do this in native Julia

Kind regards

Vasily_Pisarev · August 29, 2020, 6:22pm

Certainly not unreasonable, just mostly used for strings.

A slightly perverse way is using findfirst with arrays converted to strings:

julia> a = String(UInt8[
       0x40
        0x52
        0x99
        0x35
        0x90
        0xea
        0x00
        0x13
        0xdf
        0x18
       ])
"@R\x995\x90\xea\0\x13\xdf\x18"

julia> findfirst(String(UInt8[0x90, 0xea, 0x00]), a)
5:7

stevengj · August 29, 2020, 6:23pm

I think @Vasily_Pisarev was suggesting implementing KMP in native Julia. But probably what you really want is something that is already implemented.

As it turns out, searching for subsequences of byte arrays is already implemented in Julia Base in order to implement substring searching. You can call it via the undocumented function:

Base._searchindex(a, [0x90,0xea,0x00], 1)

to search for the starting index of the subsequence 0x90,0xea,0x00 in a byte array a starting at the beginning of the array, for example. It returns 0 if the subsequence was not found.

It might be nice to add a high-level interface for this via findfirst and findnext — should be a pretty easy PR since the main code is already implemented.

Ahmed_Salih · August 29, 2020, 7:51pm

Thanks for the try, this was what I hoped to avoid, but of course I did not specify that in my question, so that was my fault

Kind regards

EDIT: Included a quote by mistake

Ahmed_Salih · August 29, 2020, 7:54pm

This is so awesome thanks!

I just tested it for my purposes and before I was using ‘readuntil’ which allocated more and took longer time, so using this gave me a 10x improvement in timing (in my initial fast test).

I think your suggestions of making this more accessible is good and thanks for putting a PR up or what it is called

Lastly could you explain to me what the last “1” does in your example? I tried changing it, since I thought it would find an occurence two times if say it was “2” but I couldn’t figure it out. Maybe it is axis of search direction?

Kind regards

stevengj · August 29, 2020, 8:15pm

It’s the starting index for the search, the same as the start argument of findnext — it says to search starting at byte 1. Passing 2 means to search starting at byte 2.

In this way, you can find multiple occurrences:

i = Base._searchindex(a, [0x90,0xea,0x00], 1)
Base._searchindex(a, [0x90,0xea,0x00], i+1) # finds the second occurrence, if any

Ahmed_Salih · August 29, 2020, 8:26pm

Makes total sense, thanks!

stevengj · August 30, 2020, 1:01am

I filed an issue suggesting the feature, but I didn’t create a pull request (PR) — a PR occurs when someone actually implements the feature and requests for it to be merged.

jling · August 30, 2020, 2:18am

behold, there exists an actual PR now

Topic		Replies	Views
Compare patterns in byte array New to Julia	4	509	February 19, 2021
Finding Position of Element in an Array General Usage question	46	105219	October 9, 2019
Find position of Array elements in another Array General Usage indexing , arrays	16	3593	February 1, 2023
Find position on list (fast) General Usage	13	775	November 22, 2021
Find index of all occurences of a string New to Julia	7	2844	April 15, 2019

Finding position of a sequence in a UInt8 array?

Related topics