Change datatype for subset of DataFrame columns

Suppose I have this dataframe:
df = DataFrame(A = Any[rand() for i in 1:10], B = [“a”,“b”,“c”,“d”,“e”,“f”,“g”,“h”,“i”,“j”], C = Any[rand() for i in 1:10])

I want to change the datatype of columns A and B programmatically (the actual dataframe has >20 columns!).

So far I have identified the columns with datatype “Any” with:
cols = names(prices, eltype.(eachcol(prices)) .== Any)

Now, how do I iterate through the columns to change the datatype?

(I searched everywhere and I only found ways to change a single column)

1 Like
julia> using DataFrames
[ Info: Precompiling DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0]

julia> df = DataFrame(A = Any[rand() for i in 1:10], B = string.('a':'j'), C = Any[rand() for i in 1:10])
10×3 DataFrame
 Row │ A          B       C        
     │ Any        String  Any      
─────┼─────────────────────────────
   1 │ 0.906792   a       0.674568
   2 │ 0.410943   b       0.370189
   3 │ 0.184457   c       0.759076
   4 │ 0.889315   d       0.332498
   5 │ 0.70852    e       0.839533
   6 │ 0.0455212  f       0.27601
   7 │ 0.955367   g       0.295744
   8 │ 0.435959   h       0.16552
   9 │ 0.186456   i       0.156159
  10 │ 0.529786   j       0.717429

julia> for name in names(df); df[!, name] = identity.(df[!, name]); end

julia> df
10×3 DataFrame
 Row │ A          B       C        
     │ Float64    String  Float64  
─────┼─────────────────────────────
   1 │ 0.906792   a       0.674568
   2 │ 0.410943   b       0.370189
   3 │ 0.184457   c       0.759076
   4 │ 0.889315   d       0.332498
   5 │ 0.70852    e       0.839533
   6 │ 0.0455212  f       0.27601
   7 │ 0.955367   g       0.295744
   8 │ 0.435959   h       0.16552
   9 │ 0.186456   i       0.156159
  10 │ 0.529786   j       0.717429
1 Like
transform!(df, Cols(:) .=> ByRow(identity), renamecols = false)
2 Likes

I think identity.(df) works too.

Edit: though the solution which uses transform! seems faster for larger dfs.

@DataFrames I market your answer as the “solution”. On the other hand if the dataframe has many columns and not all of them require typing “narrowing down”, wouldn’t this solution be sub-optimal (requiring to process all columns vs. the one that really need to be processed)?

Coming from R I’m surprised it is not possible to just pass a vector with the column names that require processing…

What do you mean, it is not possible to simply replace Cols(:) by a Vector of column names?

julia> transform!(df, ["A", "C"] .=> ByRow(identity), renamecols = false)
10×3 DataFrame
 Row │ A          B       C        
     │ Float64    String  Float64  
─────┼─────────────────────────────
   1 │ 0.104016   a       0.551771
   2 │ 0.609946   b       0.211459
   3 │ 0.136688   c       0.595575
   4 │ 0.217122   d       0.543388
   5 │ 0.916172   e       0.514883
   6 │ 0.655476   f       0.502752
   7 │ 0.261034   g       0.543343
   8 │ 0.0641058  h       0.765185
   9 │ 0.817648   i       0.414756
  10 │ 0.413503   j       0.381042

That’s great!
I thought it wasn’t possible. Newby here :slight_smile: