String column dataframe: replace for another string built from a substring from each row

I have a dataframe of String columns, such as (in fact, it has many more rows and columns):

julia> using DataFrames
julia> df = DataFrame(DRE=["119018524", "120034962"], Q1=["0.50", "0.00"])
2×2 DataFrame
│ Row │ DRE       │ Q1     │
│     │ String    │ String │
├─────┼───────────┼────────┤
│ 1   │ 119018524 │ 0.50   │
│ 2   │ 120034962 │ 0.00   │

I would like to transform the DRE column by retaining its second and third characters, prepending to them the string “20”, and discarding the remaining characters, so as to get the final dataframe:

julia> df
2×2 DataFrame
│ Row │ DRE     │ Q1     │
│     │ String  │ String │
├─────┼─────────┼────────┤
│ 1   │ 2019    │ 0.50   │
│ 2   │ 2020    │ 0.00   │

How do I do that? I guess I should use transform and regular expressions, right? I could not manage, however, to devise a proper command… Any help with a preferably concise and efficient version is really appreciated, with pointers to explaining docs. Thanks

1 Like

First, please update DataFrames.jl to 0.22 release to get the latest and bug free functionality.

Now regarding your code the simple transformation is (it assumes that DRE contains ONLY ASCII digits):

transform!(df, :DRE => ByRow(x -> "20"*x[2:3]) => :DRE)

or a general solution that correctly handles all UNICODE characters but is a bit verbose:

transform!(df, :DRE => ByRow(x -> "20"*last(first(x, 3), 2)) => :DRE)
4 Likes