How to change field names and types of a DataFrame

I imported a xlsx file to an array a using XLSX.jl. But now I have no real names for the fields in df and the types are all Any.
I changed names using rename! but I don’t know how to change types respectively to String, Int64 and Float64. Is it possible to import a vector of element types?
Is there a better method to go from xlsx to a DataFrame importing the row heading and types?

using DataFrames
# using an array
a = ["a" 2 4.2; "b" 5 6.7; "c" 3 8.9]
df = DataFrame(a)
# How to change names and types of DataFrame?
vec_of_names = ["name", "2004", "2020"]
rename!(df, vec_of_your_new_names)
2 Likes

Someting like

df[!, :x2] = convert.(Int, df[:, :x2])

will change types types. Note the ! and :. See

3 Likes

The newest version of XLSX.jl should allow very easy conversion to a data frame. See the docs here.

2 Likes

It’s strange. The first one (name) works but not the other two.

df[!, :name] = convert.(String, df[:, :name])
df[!, :2004] = convert.(Int, df[:, :2004])
df[!, :2020] = convert.(Float, df[:, :2020])

I don’t think :2004 is a valid column name. I would rename it (or you can probably access it as a string name).

2 Likes

You’re right. I had to change column names. Now it works.

df[!, :name] = convert.(String, df[:, :name])
df[!, :y2004] = convert.(Int, df[:, :y2004])
df[!, :y2020] = convert.(Float64, df[:, :y2020])

But I can access it as a string name:
df[!, "2020"] = convert.(Float64, df[:, "2020"])
Maybe there’s a way to put this on a line :slight_smile:

You should file an issue about this in DataFrames. a narrow_types! function would be nice to have, since this comes up a decent amount.

1 Like

Do you mean something like

v1 = [:name :y2004 "2020"]
v2 = [String Int Float64]

df[!, v1] = convert.(v2, df[:, v1])

Note: this doesn’t work.

  1. You should learn the difference between vectors and matrices. You should be working with vectors here, [:name, :y2004, "2020"], note, the commas instead of spaces.
  2. It’s not clear what your line is trying to accomplish.

I mean something like the following:

julia> df = DataFrame(x = Any[rand() for i in 1:10], y = Union{Float64, Missing}[rand() for i in 1:10]);

julia> function narrow_types!(df)
           for c in names(df)
               df[!, c] = identity.(df[!, c])
           end
           return df
       end
narrow_types! (generic function with 2 methods)

julia> narrow_types!(df)
10×2 DataFrame
│ Row │ x         │ y         │
│     │ Float64   │ Float64   │
├─────┼───────────┼───────────┤
│ 1   │ 0.234185  │ 0.0427855 │
│ 2   │ 0.519789  │ 0.123238  │
│ 3   │ 0.838223  │ 0.414821  │
│ 4   │ 0.436736  │ 0.0125951 │
│ 5   │ 0.981439  │ 0.867722  │
│ 6   │ 0.0135416 │ 0.547298  │
│ 7   │ 0.9429    │ 0.966585  │
│ 8   │ 0.333247  │ 0.379245  │
│ 9   │ 0.567072  │ 0.208785  │
│ 10  │ 0.114606  │ 0.256162  │
  1. Sorry. Thank you;
  2. I was trying to use a vector of names and a vector of types to change the names and types of a DataFrame.

No problem.

I think ultimately you want Julia to take care of the types of columns, using the function above.

If you want to specify the types themselves, try this:

julia> function convert_types(df, colstypes)
           for (c, t) in colstypes
               df[!, c] = convert.(t, df[!, c])
           end
           df
       end
convert_types (generic function with 1 method)

julia> convert_types(df, [:x => Float64, :y => Float64])
10×2 DataFrame
│ Row │ x        │ y        │
│     │ Float64  │ Float64  │
├─────┼──────────┼──────────┤
│ 1   │ 0.834294 │ 0.463568 │
│ 2   │ 0.60386  │ 0.517164 │
│ 3   │ 0.820777 │ 0.994759 │
│ 4   │ 0.372805 │ 0.501976 │
│ 5   │ 0.70668  │ 0.707915 │
│ 6   │ 0.6946   │ 0.377697 │
│ 7   │ 0.182847 │ 0.958645 │
│ 8   │ 0.116744 │ 0.53444  │
│ 9   │ 0.393341 │ 0.39792  │
│ 10  │ 0.862753 │ 0.17667  │
1 Like

I did this same mistake the other day, it is a little misleading but:

julia> typeof(:1000)
Int64

A colon followed by a number is not a Symbol, it is just the number. To get a symbol you would need Symbol("1000").

1 Like

Yes. It is similar to what I wanted. It works in my case:
convert_types(df, [:name => String, :y2004 => Int, "2020" => Float64])

See above. When I use a string “2020” instead of :y2020 also works.

For future reference, XLSX.readtable also has a infer_eltypes kwarg that should give back properly typed columns for your DataFrame: https://felipenoris.github.io/XLSX.jl/stable/api/#XLSX.readtable

3 Likes