Renaming multiple columns in Julia DataFrames

I have the following Julia Dataframe;

data_test = DataFrame(Time = 1:2, X1 = [0,1], X2 = [10,5])

And I have a list of names as follows;

technology = ["oil", "gas"]

How do I rename columns X1 and X2 using the list (excluding Time column). I can do it manually, however, this is not an efficient option to rename hundreds of columns. So, essentially, I’m what I’m looking for is a way to map the list of names to the columns. Any efficient solution is highly appreciated.

Thanks

One possible method is to use the high-level transforms of TableTransforms.jl. For example, you can Select the columns of interest with a regular expression, and then Rename them:

pipeline = Select(r"X*") β†’ Rename("X1" => "oil", "X2" => "gas", ...)

newdf = df |> pipeline

The transforms are implemented in terms of the Tables.jl API, we didn’t benchmark them against a DataFrames.jl-specific solution yet. Others can help with alternative solutions with different packages.

1 Like

Generalizing for an arbitrary number of columns using the built-in rename function from DataFrames.jl:

julia> rename!(data_test, ["X$i" => tech for (i, tech) in enumerate(technology)])
2Γ—3 DataFrame
 Row β”‚ Time   oil    gas
     β”‚ Int64  Int64  Int64
─────┼─────────────────────
   1 β”‚     1      0     10
   2 β”‚     2      1      5
5 Likes

Thanks @stillyslalom …this is what I was looking for.

One more alternative

@pipe replace.(names(data_test), names(data_test) .=> technology) |>
      rename!(data_test,_)

This method could come in handy if you need to use a regular expression. For example, if you want to change the names from X1,X2,X3… to Y1,Y2,Y3… you can do:

@pipe replace.(names(data_test), r"X" => "Y") |>
      rename!(data_test,_)
1 Like

use

rename!(data_test, names(data_test, r"X") .=> technology)
2 Likes

Wouldn’t it be β€œbetter”/simpler to write

EDITED based on @alfaromartino correction of my omssion of the function argument…:

rename!(s -> replace(s, r"X" => "Y"), df)

?
(Learning myself)…

Yes, that’s an alternative too. I just find splitting the two operations more readable, since the first line indicates the names you change and the second line that you use those names to rename the dataset (just a matter of preference).

Notice that if you want to use your option, you need to actually write:
rename!(s -> replace(s, r"X" => "Y"), df)

(notice the s in replace)

1 Like