Remove spaces and units in Dataframe header?

Hello!

I am using DataFrames on a CSV file with space and unit declarations next to the variable, for an example:

time [s]

So when I import it, it looks like:

image

How would I go about removing " [s]", so it would become “time”?

Kind regards

This is my hardcoded approach

df = DataFrame(CSV.File("GaugesSWL_Swl_WaveLength.csv",delim=";"))
rename!(df, Symbol.(replace.(string.(names(df)), Ref(r"\[m\]"=>""))))
rename!(df, Symbol.(replace.(string.(names(df)), Ref(r"\[s\]"=>""))))
rename!(df, Symbol.(replace.(string.(names(df)), Ref(r"\s"=>""))))

Yeah that seems about right. You could always tweak the regex to capture anything of the pattern [*]

You shouldn’t need to use Symbol at all. In the up-to-date version of DataFrames it should just work with strings.

DataConvenience.cleannames! from DataConvenience.jl

I am not very good at regex, tried to do what you suggested but it didn’t work for me. You are right about strings though, that was quite functional

Kind regards

@xiaodai I will keep this in mind if I need to do this a lot more

You could also do the following

julia> x = "a string [m]";

julia> t = findfirst('[', x);

julia> x2 = x[1:(t-1)] |> strip
"a string"
1 Like

just remember that this works when you know that you have a space before [ (which is ASCII so you can safely subtract 1 otherwise prevind has to be used)

1 Like