How to change field names and types of a DataFrame

I imported a xlsx file to an array a using XLSX.jl. But now I have no real names for the fields in df and the types are all Any.
I changed names using rename! but I don’t know how to change types respectively to String, Int64 and Float64. Is it possible to import a vector of element types?
Is there a better method to go from xlsx to a DataFrame importing the row heading and types?

using DataFrames
# using an array
a = ["a" 2 4.2; "b" 5 6.7; "c" 3 8.9]
df = DataFrame(a)
# How to change names and types of DataFrame?
vec_of_names = ["name", "2004", "2020"]
rename!(df, vec_of_your_new_names)
1 Like

Someting like

df[!, :x2] = convert.(Int, df[:, :x2])

will change types types. Note the ! and :. See

1 Like

The newest version of XLSX.jl should allow very easy conversion to a data frame. See the docs here.

2 Likes

It’s strange. The first one (name) works but not the other two.

df[!, :name] = convert.(String, df[:, :name])
df[!, :2004] = convert.(Int, df[:, :2004])
df[!, :2020] = convert.(Float, df[:, :2020])

I don’t think :2004 is a valid column name. I would rename it (or you can probably access it as a string name).

2 Likes

You’re right. I had to change column names. Now it works.

df[!, :name] = convert.(String, df[:, :name])
df[!, :y2004] = convert.(Int, df[:, :y2004])
df[!, :y2020] = convert.(Float64, df[:, :y2020])

But I can access it as a string name:
df[!, "2020"] = convert.(Float64, df[:, "2020"])
Maybe there’s a way to put this on a line :slight_smile:

You should file an issue about this in DataFrames. a narrow_types! function would be nice to have, since this comes up a decent amount.

1 Like

Do you mean something like

v1 = [:name :y2004 "2020"]
v2 = [String Int Float64]

df[!, v1] = convert.(v2, df[:, v1])

Note: this doesn’t work.

  1. You should learn the difference between vectors and matrices. You should be working with vectors here, [:name, :y2004, "2020"], note, the commas instead of spaces.
  2. It’s not clear what your line is trying to accomplish.

I mean something like the following:

julia> df = DataFrame(x = Any[rand() for i in 1:10], y = Union{Float64, Missing}[rand() for i in 1:10]);

julia> function narrow_types!(df)
           for c in names(df)
               df[!, c] = identity.(df[!, c])
           end
           return df
       end
narrow_types! (generic function with 2 methods)

julia> narrow_types!(df)
10Γ—2 DataFrame
β”‚ Row β”‚ x         β”‚ y         β”‚
β”‚     β”‚ Float64   β”‚ Float64   β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.234185  β”‚ 0.0427855 β”‚
β”‚ 2   β”‚ 0.519789  β”‚ 0.123238  β”‚
β”‚ 3   β”‚ 0.838223  β”‚ 0.414821  β”‚
β”‚ 4   β”‚ 0.436736  β”‚ 0.0125951 β”‚
β”‚ 5   β”‚ 0.981439  β”‚ 0.867722  β”‚
β”‚ 6   β”‚ 0.0135416 β”‚ 0.547298  β”‚
β”‚ 7   β”‚ 0.9429    β”‚ 0.966585  β”‚
β”‚ 8   β”‚ 0.333247  β”‚ 0.379245  β”‚
β”‚ 9   β”‚ 0.567072  β”‚ 0.208785  β”‚
β”‚ 10  β”‚ 0.114606  β”‚ 0.256162  β”‚
  1. Sorry. Thank you;
  2. I was trying to use a vector of names and a vector of types to change the names and types of a DataFrame.

No problem.

I think ultimately you want Julia to take care of the types of columns, using the function above.

If you want to specify the types themselves, try this:

julia> function convert_types(df, colstypes)
           for (c, t) in colstypes
               df[!, c] = convert.(t, df[!, c])
           end
           df
       end
convert_types (generic function with 1 method)

julia> convert_types(df, [:x => Float64, :y => Float64])
10Γ—2 DataFrame
β”‚ Row β”‚ x        β”‚ y        β”‚
β”‚     β”‚ Float64  β”‚ Float64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.834294 β”‚ 0.463568 β”‚
β”‚ 2   β”‚ 0.60386  β”‚ 0.517164 β”‚
β”‚ 3   β”‚ 0.820777 β”‚ 0.994759 β”‚
β”‚ 4   β”‚ 0.372805 β”‚ 0.501976 β”‚
β”‚ 5   β”‚ 0.70668  β”‚ 0.707915 β”‚
β”‚ 6   β”‚ 0.6946   β”‚ 0.377697 β”‚
β”‚ 7   β”‚ 0.182847 β”‚ 0.958645 β”‚
β”‚ 8   β”‚ 0.116744 β”‚ 0.53444  β”‚
β”‚ 9   β”‚ 0.393341 β”‚ 0.39792  β”‚
β”‚ 10  β”‚ 0.862753 β”‚ 0.17667  β”‚
1 Like

I did this same mistake the other day, it is a little misleading but:

julia> typeof(:1000)
Int64

A colon followed by a number is not a Symbol, it is just the number. To get a symbol you would need Symbol("1000").

Yes. It is similar to what I wanted. It works in my case:
convert_types(df, [:name => String, :y2004 => Int, "2020" => Float64])

See above. When I use a string β€œ2020” instead of :y2020 also works.

For future reference, XLSX.readtable also has a infer_eltypes kwarg that should give back properly typed columns for your DataFrame: https://felipenoris.github.io/XLSX.jl/stable/api/#XLSX.readtable

2 Likes