Transforming string columns in DataFrame with (regex) match

using DataFrames
using DataFramesMeta

# This is the data
df = DataFrame(A = 1:5, B = ["Th32is","i35s","124my","test775","var23ia5ble"])

# This is what the output should be
@transform!(df, :B = match.(r"\d+", :B))
@transform(df, :B = [x.match for x in :B])

# But I would like to do it in one transform call

# I tried to just use getproperty
df = DataFrame(A = 1:5, B = ["Th32is","i35s","124my","test775","var23ia5ble"])
@transform(df, :B = match.(r"\d+", :B).match)

# trying to broadcast explicitly
df = DataFrame(A = 1:5, B = ["Th32is","i35s","124my","test775","var23ia5ble"])
@transform(df, :B = getproperty.(match.(r"\d+", :B), :match))

# but neither of these work

Is there a (not too convoluted way) to do this in one transform call?

If DataFramesMeta and transform! aren’t absolutely necessary, you could just do:

julia> df = DataFrame(A = 1:5, B = ["Th32is","i35s","124my","test775","var23ia5ble"])
5Γ—2 DataFrame
 Row β”‚ A      B
     β”‚ Int64  String      
─────┼────────────────────
   1 β”‚     1  Th32is
   2 β”‚     2  i35s
   3 β”‚     3  124my
   4 β”‚     4  test775
   5 β”‚     5  var23ia5ble

julia> df.B = [x.match for x in match.(r"\d+", df.B)]; df
5Γ—2 DataFrame
 Row β”‚ A      B
     β”‚ Int64  SubStrin… 
─────┼──────────────────
   1 β”‚     1  32        
   2 β”‚     2  35        
   3 β”‚     3  124
   4 β”‚     4  775
   5 β”‚     5  23
1 Like

Use @rtransform, the row-wise version of @transform.

julia> @rtransform(df, :B = match(r"\d+", :B).match)
5Γ—2 DataFrame
 Row β”‚ A      B
     β”‚ Int64  SubStrin…
─────┼──────────────────
   1 β”‚     1  32
   2 β”‚     2  35
   3 β”‚     3  124
   4 β”‚     4  775
   5 β”‚     5  23

As for your problem with getproperty, in @transform, which accepts full columns, check out the docs here about β€œWorking with Symbol s without referring to columns”. You need to escape the Symbol with ^.

julia> @transform(df, :B = getproperty.(match.(r"\d+", :B), ^(:match)))
5Γ—2 DataFrame
 Row β”‚ A      B
     β”‚ Int64  SubStrin…
─────┼──────────────────
   1 β”‚     1  32
   2 β”‚     2  35
   3 β”‚     3  124
   4 β”‚     4  775
   5 β”‚     5  23
3 Likes

Thanks for both of these excellent answers, much appreciated! :slight_smile: