# Finding position of a sequence in a UInt8 array?

Hello!

Suppose I have:

``````using Random
rand(MersenneTwister(0),UInt8,10)

10-element Array{UInt8,1}:
0x40
0x52
0x99
0x35
0x90
0xea
0x00
0x13
0xdf
0x18

``````

Say I want to find the position of sequence “0x90 0xea 0x00”, how would I go about this?

Finding the position of “0x90” is not suitable, this is just a simplified example.

Kind regards

I have tried to use findfirst, skipchars etc. but only been able to use one char at a time currently

You can try the KMP algorithm.

Thanks for the suggestion, that might be what I have to do

Is what I am wanting “unreasonable” though? I thought that perhaps one would already be able to do this in native Julia

Kind regards

Certainly not unreasonable, just mostly used for strings.

A slightly perverse way is using `findfirst` with arrays converted to strings:

``````julia> a = String(UInt8[
0x40
0x52
0x99
0x35
0x90
0xea
0x00
0x13
0xdf
0x18
])
"@R\x995\x90\xea\0\x13\xdf\x18"

julia> findfirst(String(UInt8[0x90, 0xea, 0x00]), a)
5:7
``````

I think @Vasily_Pisarev was suggesting implementing KMP in native Julia. But probably what you really want is something that is already implemented.

As it turns out, searching for subsequences of byte arrays is already implemented in Julia `Base` in order to implement substring searching. You can call it via the undocumented function:

``````Base._searchindex(a, [0x90,0xea,0x00], 1)
``````

to search for the starting index of the subsequence `0x90,0xea,0x00` in a byte array `a` starting at the beginning of the array, for example. It returns `0` if the subsequence was not found.

It might be nice to add a high-level interface for this via `findfirst` and `findnext` — should be a pretty easy PR since the main code is already implemented.

6 Likes

Thanks for the try, this was what I hoped to avoid, but of course I did not specify that in my question, so that was my fault

Kind regards

EDIT: Included a quote by mistake

This is so awesome thanks!

I just tested it for my purposes and before I was using ‘readuntil’ which allocated more and took longer time, so using this gave me a 10x improvement in timing (in my initial fast test).

I think your suggestions of making this more accessible is good and thanks for putting a PR up or what it is called

Lastly could you explain to me what the last “1” does in your example? I tried changing it, since I thought it would find an occurence two times if say it was “2” but I couldn’t figure it out. Maybe it is axis of search direction?

Kind regards

It’s the starting index for the search, the same as the `start` argument of `findnext` — it says to search starting at byte `1`. Passing `2` means to search starting at byte `2`.

In this way, you can find multiple occurrences:

``````i = Base._searchindex(a, [0x90,0xea,0x00], 1)
Base._searchindex(a, [0x90,0xea,0x00], i+1) # finds the second occurrence, if any
``````
1 Like

Makes total sense, thanks!

I filed an issue suggesting the feature, but I didn’t create a pull request (PR) — a PR occurs when someone actually implements the feature and requests for it to be merged.

2 Likes

behold, there exists an actual PR now

4 Likes