Way to give DataFrame columns nicknames?

I am importing data from Excel into a DataFrame. The column headers are long specific names with spaces and units. They are a pain to work with because every time I need to access that column, I have to type df."Long Specific Name With Spaces (Units)". Is there a good way to give the column a nickname like :x, so I can access it with df.x instead? The method would need to preserve the original headers, so I can transform them back for printing.

I think the correct tool is a dictionary, but I haven’t used them much. I also know that DataFrames.jl uses => for some of its β€œsplit-apply-combine” syntax, and I don’t want to accidently interfere with that.

How are you importing your data? If using CSV.jl, you can read using the normalizenames argument to get rid of the spaces and add underscores. Something like

 CSV.File(file; normalizenames=true)

Yes, I am using CSV.jl. That is good to know, but it is really the length of the name that is the problem.

There’s no way to do this afaik, but it seems it would be quite easy to do

long_names = names(df)

short_names = [:x, :y, :z] # whatever you need

and then rename!(df, short_names) to work with the data with nicknames and rename!(df, long_names) again when producing outputs?

1 Like

Adding to this, you can also provide change individual columns one at a time, for example rename!(df,"Long Name"=>"longname"), or through a Dictionary if you want to change a series of column labels with rename!(df,Dict("Long Name"=>"longname","Other Odd Name"=>"oddname")).

1 Like

Can you reverse the Dictionary to get the original names back?

Or if you are not sure whether the order or the number of columns might change, you can try the following approach:

short_names = string.(range('a', step = 1, length = size(d,2)))
dict1 = Dict(zip(Symbol.(names(d)), short_names))
dict2 = Dict(zip(Symbol.(short_names), names(d)))
rename!(d, dict1)
rename!(d, dict2)

where d is the DataFrame. Note that this might causes problems if you have more than 26 columns.

1 Like

You can reverse the Dictionary you used to rename everything, for example reverse_dict = Dict(val=>key for (key,val) in dict).

1 Like

You don’t need a Dict at all, just put the long and short names into two equal-length vectors and broadcast the pair operator in rename! to transparently go back between long and short names for the desired columns:

julia> df = DataFrame(["Long Name $i" => rand(2) for i ∈ 1:5]...)
2Γ—5 DataFrame
 Row β”‚ Long Name 1  Long Name 2  Long Name 3  Long Name 4  Long Name 5 
     β”‚ Float64      Float64      Float64      Float64      Float64     
─────┼─────────────────────────────────────────────────────────────────
   1 β”‚    0.573031    0.0743141     0.187922     0.182269     0.251157
   2 β”‚    0.348444    0.986702      0.620114     0.881127     0.434446

julia> long_names = ["Long Name 1", "Long Name 3"];

julia> short_names = ["SN1", "SN3"]
2-element Vector{String}:
 "SN1"
 "SN3"

julia> rename!(df, long_names .=> short_names)
2Γ—5 DataFrame
 Row β”‚ SN1       Long Name 2  SN3       Long Name 4  Long Name 5 
     β”‚ Float64   Float64      Float64   Float64      Float64     
─────┼───────────────────────────────────────────────────────────
   1 β”‚ 0.573031    0.0743141  0.187922     0.182269     0.251157
   2 β”‚ 0.348444    0.986702   0.620114     0.881127     0.434446

julia> rename!(df, short_names .=> long_names)
2Γ—5 DataFrame
 Row β”‚ Long Name 1  Long Name 2  Long Name 3  Long Name 4  Long Name 5 
     β”‚ Float64      Float64      Float64      Float64      Float64     
─────┼─────────────────────────────────────────────────────────────────
   1 β”‚    0.573031    0.0743141     0.187922     0.182269     0.251157
   2 β”‚    0.348444    0.986702      0.620114     0.881127     0.434446
3 Likes