Find all indexes of string in string array

Hello,

Can someone help me with this? It seems like it should be easy but I can’t quite figure it out.

I’m trying to find all of the array indicies in this String array that matches the search string

Something like the following

data = ["abc","bcd","def","GHF"]
findall( x -> x == "b", data)

I was hoping to get that this string is found in the array at indexes 1,2 but it returns nothing. Is there some syntax that I’m not getting right here?

If you want to find all array items which contains some substring, then you should either use occursin or as a more general solution regexps.

data = ["abc","bcd","def","GHF"]
findall( x -> occursin("b", x), data)

# 2-element Array{Int64,1}:
# 1
# 2

If you want to find where this char can be found, you can add broadcast

findall.( x -> x .== 'b', data)

# 4-element Array{Array{Int64,1},1}:
#  [2]
#  [1]
#  [] 
#  [] 

Thanks, this works!

I tried to look through the online documentation, but this solution wasn’t clear from there.

You are welcome.
Just in case, if you want to find all substring indices in each element of the String array, it can be done even easier

findall.("b", data)

# 4-element Array{Array{UnitRange{Int64},1},1}:
# [2:2]
# [1:1]
# []   
# []   

This will give you all possible ranges, where substring is located.

1 Like

I was trying different variants and I was expecting this to return only index 2, but it doesn’t. Is this the right usage?

julia> findall(x -> startswith("b",x), data)
0-element Array{Int64,1}

The prefix is the second argument:

findall(x -> startswith(x,"b"), data)

Thanks.

I’m sure there is some logical reason, but anyone know why with occursin() the search term is the first argument and with startswith(), the search term is the second argument? Doesn’t that make it hard to remember?

Yes, I agree that it is not ideal. There is a github issue that addresses this exact point: https://github.com/JuliaLang/julia/issues/35031

1 Like

Well, the documentation cannot address every possible use case. But you can reason about your code, and think about what it’s doing, because the behavior you saw does make sense.

For every element in data it checks, is “abc” equal to “b”, is “bcd” equal to “b”, etc. And of course the answer each time is ‘no’.

2 Likes

Yes, I agree that in retrospect, that makes sense and I should have been able to figure it out, but just using logic or intuition hasn’t been that useful to figure out the syntax in Julia because if I were to do that, I think the following would make way more sense

findall( "b", occursin(data))

or more simply

occursin("b", data)

rather than

findall( x -> occursion("b", x), data)

But, either way, it’s still better than Python so I’ll keep trying :slight_smile:

You can use vectorized version of occursin, and avoid creation of intermediate function

occursin.("b", data)

which gives you Boolean vector. By adding findall you convert boolean values to indices

findall(occursin.("b", data))

The same applies to startswith and all other functions. You can read more about dot syntax and broadcasting in Functions · The Julia Language

Well, I respectfully disagree :wink: I don’t think these make sense at all. The first one is unclear, but the second obviously checks if "b" is one of the elements of data, which it isn’t. I mean, what else could it mean?

occursin surely has to to work on each element of data, not on the data as a whole.

1 Like

OK, now this syntax does make sense to me. It’s clear and simple…

findall(occursin.("b", data))

Thanks for the insight. I’ll have to study those vectorizations in more detail.

1 Like