I am importing data from Excel into a DataFrame. The column headers are long specific names with spaces and units. They are a pain to work with because every time I need to access that column, I have to type df."Long Specific Name With Spaces (Units)". Is there a good way to give the column a nickname like :x, so I can access it with df.x instead? The method would need to preserve the original headers, so I can transform them back for printing.
I think the correct tool is a dictionary, but I havenβt used them much. I also know that DataFrames.jl uses => for some of its βsplit-apply-combineβ syntax, and I donβt want to accidently interfere with that.
How are you importing your data? If using CSV.jl, you can read using the normalizenames argument to get rid of the spaces and add underscores. Something like
Adding to this, you can also provide change individual columns one at a time, for example rename!(df,"Long Name"=>"longname"), or through a Dictionary if you want to change a series of column labels with rename!(df,Dict("Long Name"=>"longname","Other Odd Name"=>"oddname")).
You donβt need a Dict at all, just put the long and short names into two equal-length vectors and broadcast the pair operator in rename! to transparently go back between long and short names for the desired columns:
julia> df = DataFrame(["Long Name $i" => rand(2) for i β 1:5]...)
2Γ5 DataFrame
Row β Long Name 1 Long Name 2 Long Name 3 Long Name 4 Long Name 5
β Float64 Float64 Float64 Float64 Float64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 0.573031 0.0743141 0.187922 0.182269 0.251157
2 β 0.348444 0.986702 0.620114 0.881127 0.434446
julia> long_names = ["Long Name 1", "Long Name 3"];
julia> short_names = ["SN1", "SN3"]
2-element Vector{String}:
"SN1"
"SN3"
julia> rename!(df, long_names .=> short_names)
2Γ5 DataFrame
Row β SN1 Long Name 2 SN3 Long Name 4 Long Name 5
β Float64 Float64 Float64 Float64 Float64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 0.573031 0.0743141 0.187922 0.182269 0.251157
2 β 0.348444 0.986702 0.620114 0.881127 0.434446
julia> rename!(df, short_names .=> long_names)
2Γ5 DataFrame
Row β Long Name 1 Long Name 2 Long Name 3 Long Name 4 Long Name 5
β Float64 Float64 Float64 Float64 Float64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 0.573031 0.0743141 0.187922 0.182269 0.251157
2 β 0.348444 0.986702 0.620114 0.881127 0.434446