Suppose I have:
Say I want to find the position of sequence “0x90 0xea 0x00”, how would I go about this?
Finding the position of “0x90” is not suitable, this is just a simplified example.
I have tried to use findfirst, skipchars etc. but only been able to use one char at a time currently
You can try the KMP algorithm.
Thanks for the suggestion, that might be what I have to do
Is what I am wanting “unreasonable” though? I thought that perhaps one would already be able to do this in native Julia
Certainly not unreasonable, just mostly used for strings.
A slightly perverse way is using
findfirst with arrays converted to strings:
julia> a = String(UInt8[
julia> findfirst(String(UInt8[0x90, 0xea, 0x00]), a)
I think @Vasily_Pisarev was suggesting implementing KMP in native Julia. But probably what you really want is something that is already implemented.
As it turns out, searching for subsequences of byte arrays is already implemented in Julia
Base in order to implement substring searching. You can call it via the undocumented function:
Base._searchindex(a, [0x90,0xea,0x00], 1)
to search for the starting index of the subsequence
0x90,0xea,0x00 in a byte array
a starting at the beginning of the array, for example. It returns
0 if the subsequence was not found.
It might be nice to add a high-level interface for this via
findnext — should be a pretty easy PR since the main code is already implemented.
Thanks for the try, this was what I hoped to avoid, but of course I did not specify that in my question, so that was my fault
EDIT: Included a quote by mistake
This is so awesome thanks!
I just tested it for my purposes and before I was using ‘readuntil’ which allocated more and took longer time, so using this gave me a 10x improvement in timing (in my initial fast test).
I think your suggestions of making this more accessible is good and thanks for putting a PR up or what it is called
Lastly could you explain to me what the last “1” does in your example? I tried changing it, since I thought it would find an occurence two times if say it was “2” but I couldn’t figure it out. Maybe it is axis of search direction?
It’s the starting index for the search, the same as the
start argument of
findnext — it says to search starting at byte
2 means to search starting at byte
In this way, you can find multiple occurrences:
i = Base._searchindex(a, [0x90,0xea,0x00], 1)
Base._searchindex(a, [0x90,0xea,0x00], i+1) # finds the second occurrence, if any
Makes total sense, thanks!
I filed an issue suggesting the feature, but I didn’t create a pull request (PR) — a PR occurs when someone actually implements the feature and requests for it to be merged.
behold, there exists an actual PR now