Broadcasting Regex in DataFrame to create new column

I’m quite new to Julia using Python and R before. I want to populate a new column in a DataFrame with data extracted from an existing column using Regex. I apply the function match, broadcasting and a regex to extract the information. I can populate the new column with the RegexMatch object, only. I want to get the string from the RegexMatch object. But, the function m.match (https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions-1) doesn’t work in combination with broadcasting.

using DataFrames
using Random
series = rand(["110-bla","599-tag"],6)
test = DataFrame(s1 = series)
test.s2 = match.(r"[0-9]*",test.s1)

result=
Row │ s1 │ s2 │
│ │ String │ RegexMatch │
├─────┼─────────┼───────────────────┤
│ 1 │ 110-bla │ RegexMatch(“110”) │
│ 2 │ 599-tag │ RegexMatch(“599”) │
│ 3 │ 110-bla │ RegexMatch(“110”) │
│ 4 │ 599-tag │ RegexMatch(“599”) │
│ 5 │ 110-bla │ RegexMatch(“110”) │
│ 6 │ 110-bla │ RegexMatch(“110”)

for a single value I can extract the string (here “110”) correctly

match(r"[0-9]*","110-bla").match

but when I try to broadcast, I get an error message “type Array has no field match”

test.s2 = match.(r"[0-9]*",test.s1).match

Thanks a lot in advance for your help!

You can broadcast getproperty like so

getproperty.(match.(r"[0-9]*",test.s1), :match)

Another alternative is to define a function

getmatch(x) = x.match
getmatch.(match.(r"[0-9]*",test.s1))
4 Likes

Thanks very much for the swift reply. Your solution works perfectly! Petra

1 Like

Just to offer another solution that’s not broadcasting:

test.s2 = map (test.s1) do s
    m = match(r"[0-9]*",s)
    m.match
end

I offer this because, as neat as broadcasting is, I find I’m easily tripped up when things get complicated, or I’m using a container that shouldn’t be broadcasted into (eg in.(xs, some_set) will try to broadcast through the set as well, when I usually want to check if each element of xs is in the set).

1 Like