Find index of all occurences of a string

#1

Hey guys!

Say I have a string like:

“aaabbbaaabbbaaabbb”

Then I want to find the index of what corresponds to “aaa” so I would expect to find something like:

index = [1:3,7:9,13:15]

I know that “findfirst” exists and I could just make a subfunction utilizing this, just wondered if there was an easier approach.

Kind regards

Suggestion: findall to work on strings
#2

One option would be to use a regular expression. You can provide a third argument to match() instructing the search to start from a particular position in the string, which you can use to find sequential matches like so:

julia> pattern = r"aaa" # the r"" prefix makes this a regular expression
r"aaa"

julia> target = "aaabbbaaabbbaaabbb"
"aaabbbaaabbbaaabbb"

julia> m = match(pattern, target)
RegexMatch("aaa")

julia> m.offset
1

julia> m = match(pattern, target, m.offset + 1)
RegexMatch("aaa")

julia> m.offset
7

julia> m = match(pattern, target, m.offset + 1)
RegexMatch("aaa")

julia> m.offset
13
1 Like
#3

Thanks, now I can try benchmarking both functions, I will have to make a for loop it seems, if my keyword appears multiple times.

Kind regards

#4

Yup, exactly :slightly_smiling_face:

#5

In theory findall("aaa", "aaabbbaaabbbaaabbb") should probably do what you request. That would be consistent with findfirst and friends. Feel free to file a feature request.

#6

Will do so tomorrow, I also wondered why it didn’t.

Kind regards

#7

I think it used to, or there was some similar function that did. There was a major refactor of search and find* functions before the release of 1.0, you’ll probably find something about this in that issue or related PRs.

#8

How about using eachmatch ? (I guess it’s a natural next step from @rdeits’s suggestion)

julia> s  = "aaabbbaaabbbaaabbb"
julia> range(m::RegexMatch) = m.offset .+ (0:length(m.match)-1)
julia> [range(e) for e ∈ eachmatch(r"aaa", s)]
 1:3  
 7:9  
 13:15

Edit: if done over characters that may not have length one, you’d have to adjust range to something like 0:prevind(m.match, lastindex(m.match))

3 Likes