Searching for a regular expression inside an array


#1

Hello,
I’ve been banging my head against the wall trying figure out how to search for a regular expression inside an array. I know from the documentation that it is possible to search for a regular expression in a string, but I know of no way to search for one within an array. I know that there are many search functions in Julia. I’ve also read that the functionality changed a little bit in 0.7 (I currently use 0.6 but I plan on upgrading). How would I do this task for instance:

Search for the last floating point number in an array that looks like this:
82 Bbb5 tu5 A#5 li5 Bb5 te5 932.2364186620536;

Many thanks!
Nakul


#2

Hi Nakul,

Welcome to Julia! In your previous post, there was some confusion distinguishing arrays from strings.

To make sure things are clear here, is

a string, or is it stored as an array–i.e.,

julia> notes
8-element Array{SubString{String},1}:
 "82"
 "Bbb5"
 "tu5"
 "A#5"
 "li5"
 "Bb5"
 "te5"
 "932.2364186620536;"

Cheers,
Kevin


#3

Hi Kevin,
Thanks for the reply. It is stored as an array. In fact it is just one row of a multidimensional array.

Thanks,
Nakul


#4

You can use the function map to run code over all elements in a collection. If you map a function that finds a string over an array, you get an array with results back. You can use either match or occursin to find matches using regexes.

julia> arr = ["lkjasfd", "kjsgfoij"]
2-element Array{String,1}:
 "lkjasfd"
 "kjsgfoij"

julia> map(arr) do str
           match(r"sg", str)
       end

2-element Array{Union{Nothing, RegexMatch},1}:
 nothing
 RegexMatch("sg")

julia> map(arr) do str
           occursin(r"sg", str)
       end
2-element Array{Bool,1}:
 false
  true

#5

A MWE would be useful, so that we know exactly what you’re asking. Do you need to find a single number, or one number in each entry? Can you loop over all entries? If they look like in your example, do you even need a regex, or will the number always be last in the string?

I would also advice upgrading to Julia 0.7 (then 1.0) before you do any further development, since any more code you add now will add to the amount of code you have to port at a later point.


#6

Thanks for the help. I am still having a lot of trouble.

  • The method of mapping and matching outlined by @baggepinnen works for me on a 1x3 array, but it does not work on a 128x13 sized array.
    Here is my minimum working example for the latter. How do I expand this to accommodate arrays of higher dimensionality?:
a = ["fa" "so" "la"]
map(a) do str
    match(r"fa", str)
end

. When I try to access these features, I can’t. I tried setting m = ans in the above MWE, and then trying the below things but that returned an error. How do I access the value of the matched item so that I can use it later?

  • the entire substring matched: m.match
  • the captured substrings as an array of strings: m.captures
  • the offset at which the whole match begins: m.offset
  • the offsets of the captured substrings as a vector: m.offsets

@bennedich, it later occurred to me that I can access the last element of the array for that particular task, but I am finding myself returning to this process of needing to search for specific strings again and again, like in the above MWE. I will upgrade to Julia 1.0 soon, but the problem is that I use VSCode and the two haven’t been made compatible yet.

Many thanks!

My ideal multidimensional looks like this by the way:
000 C . Dbb B# do . ru ty S . sa .;
100 C# Db . . di ra . . R1 . ri .;
200 D . Ebb Cx re . mu dy R2 G1 ri ga;
300 D# Eb . . ri me . . R3 G2 ri ga;
400 E . Fb Dx mi . fu ry . G3 . ga;
500 F . Gbb E# fa . su my M1 . ma .;
600 F# Gb . . fi se . . M2 . ma .;
700 G . Abb Fx so . lu fy P . pa .;
800 G# Ab . . si le . . D1 . da .;
900 A . Bbb Gx la . tu sy D2 N1 da ni;
1000 A# Bb . . li te . . D3 N2 da ni;
1100 B . Cb Ax ti . du ly . N3 . ni;


#7

It sould work equally well on arrays of any dimension. I can’t tell what your array pitchcorpus[1,] is, and I’m not sure what the comma , is doing in the indexing of it.

You must first check that the match != nothing, since nothing is returned if no match is found. If you want to access the first capture, do result = m.captures[1] etc.


#8

Thank you for the reply. My bad, that was my failed attempt. I just edited out the pitch corpus[1,] and replaced it with a, which was the actual MWE.

The first 4 lines work. Line 5 gives me:

a = ["fa" "so" "la"]
map(a) do str
           m = match(r"fa", str)
       end
result = m.captures[1]
ERROR: type Array has no field captures

In Julia 1.0 I get
'ERROR: UndefVarError: m not defined`

When I try on my multidimensional array I get error:

map(pitchcorpus) do str
           m = match(r"fa", str)
       end
ERROR: MethodError: no method matching match(::Regex, ::Int64)

Thanks,
Nakul


#9

ERROR: type Array has no field captures

You are overwriting m in each iteration.

ERROR: UndefVarError: m not defined

Read this section of the manual: https://docs.julialang.org/en/stable/manual/variables-and-scoping/#Local-Scope-1

ERROR: MethodError: no method matching match(::Regex, ::Int64)

The error suggests that pitchcorpus contains integers.

Based on my understanding of your problem, it doesn’t sound like a good approach to represent it all as strings and parse it with regexes. A better approach is likely to use a struct with individual fields for each part.

With all due respect, unless this is a one-off project that you’re trying to get done with as quickly as possible, may I suggest investing some time reading the manual and reading and trying to understand as much sample code as you can. It will not only answer most of your questions, but also guide you on the right path so that you ask the right questions.


#10

The map operation returns an array of matches. You need to access them individually, like below

julia> a = ["fa" "so" "la"]
1×3 Array{String,2}:
 "fa"  "so"  "la"

julia> ms = map(a) do str
           match(r"fa", str)
       end
1×3 Array{Union{Nothing, RegexMatch},2}:
 RegexMatch("fa")  nothing  nothing

julia> for m in ms
           if m != nothing
               @show m.match
           end
       end

# Output
m.match = "fa"

#11

Before you try to analyze this too much you could use global keyword in this code:

a = ["fa" "so" "la"]
map(a) do str
           global m = match(r"fa", str)
       end
result = m.captures[1]

it help to work under Julia 1.0 as you expected.


#12

To use it on multidimensional array is not so difficult in Julia (if you are understanding it a little deeper)

julia> a = ["fa" "so" "la"; "do" "re" "mi"]
2×3 Array{String,2}:
 "fa"  "so"  "la"
 "do"  "re"  "mi"

julia> ms = map(a) do str  # `ms` array has same dimensionality and size here
           match(r"fa", str)
       end
2×3 Array{Union{Nothing, RegexMatch},2}:
 RegexMatch("fa")  nothing  nothing
 nothing           nothing  nothing

julia> for i in CartesianIndices(a)  # look over all indices for `a` array (and `ms` array as well
         if ms[i] != nothing  # where it is matched
            a[i] = ">>si<<"  # for example: change string to this
         end
       end

julia> a
2×3 Array{String,2}:
 ">>si<<"  "so"  "la"
 "do"    "re"  "mi"

julia> 

#13
julia> a = ["fa" "so" "la"; "do" "re" "mi"]

julia> match.(Ref(r"fa"), a)
2×3 Array{Union{Nothing, RegexMatch},2}:
 RegexMatch("fa")  nothing  nothing
 nothing           nothing  nothing

seems to work also.


#14

You could do this trick too:

julia> match.([r"do"], a)

#15

No it won’t:

julia> result = m.captures[1]
ERROR: type Nothing has no field captures

OP’s problem is not multidimensional arrays, but mixing different data types in the same array (ints and strings, it seems). Btw, a more concise way of matching / replacing would be this:

julia> a = ["fa" "so" "la"; "do" "re" "mi"];

julia> mask = occursin.([r"fa"], a);

julia> a[mask] .= ">>si<<";

julia> a
2×3 Array{String,2}:
 ">>si<<"  "so"  "la"
 "do"      "re"  "mi"

#16

Cool! :slight_smile: (could we do it with similar simplicity without making temporal (mask) array?)

You are right! I was just trying to help OP to have same functionality in 1.0 as it was in in 0.6 but I had to explain it better.

ad multidimensionality I wrote it because (see):

I was just thinking that some working example could help to analyze and understand. (but maybe starting with some tutorial would be for OP better)


#17

Thanks for the help everyone!
I am very pleased with the outcome. There are numerous reasons why I think that this task was difficult for me, but the main one is that I am using 0.6 and occursin/CartesianIndices didn’t exist, and the identical map code behaved incorrectly in 0.6. Miller Puckette, the creator of PureData and my academic advisor has created a preliminary PD external that receives/sends Julia code but it currently only works for Julia 0.6, hence my use.

Everything works great on 1.0 though! Another problem is that often times when I search for specific Julia errors in the documentation I find no mention of them at all (for example, I couldn’t find “field captures” once in the documentation and found myself agonizing over various tangents thereafter.) The documentation is huge to say the least, and especially daunting for a relatively new programmer like me, but I will continue to process it at the fastest pace that I can!