Suppose I have this dataframe:
df = DataFrame(A = Any[rand() for i in 1:10], B = [“a”,“b”,“c”,“d”,“e”,“f”,“g”,“h”,“i”,“j”], C = Any[rand() for i in 1:10])
I want to change the datatype of columns A and B programmatically (the actual dataframe has >20 columns!).
So far I have identified the columns with datatype “Any” with:
cols = names(prices, eltype.(eachcol(prices)) .== Any)
Now, how do I iterate through the columns to change the datatype?
(I searched everywhere and I only found ways to change a single column)
julia> using DataFrames
[ Info: Precompiling DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0]
julia> df = DataFrame(A = Any[rand() for i in 1:10], B = string.('a':'j'), C = Any[rand() for i in 1:10])
10×3 DataFrame
Row │ A B C
│ Any String Any
─────┼─────────────────────────────
1 │ 0.906792 a 0.674568
2 │ 0.410943 b 0.370189
3 │ 0.184457 c 0.759076
4 │ 0.889315 d 0.332498
5 │ 0.70852 e 0.839533
6 │ 0.0455212 f 0.27601
7 │ 0.955367 g 0.295744
8 │ 0.435959 h 0.16552
9 │ 0.186456 i 0.156159
10 │ 0.529786 j 0.717429
julia> for name in names(df); df[!, name] = identity.(df[!, name]); end
julia> df
10×3 DataFrame
Row │ A B C
│ Float64 String Float64
─────┼─────────────────────────────
1 │ 0.906792 a 0.674568
2 │ 0.410943 b 0.370189
3 │ 0.184457 c 0.759076
4 │ 0.889315 d 0.332498
5 │ 0.70852 e 0.839533
6 │ 0.0455212 f 0.27601
7 │ 0.955367 g 0.295744
8 │ 0.435959 h 0.16552
9 │ 0.186456 i 0.156159
10 │ 0.529786 j 0.717429
@DataFrames I market your answer as the “solution”. On the other hand if the dataframe has many columns and not all of them require typing “narrowing down”, wouldn’t this solution be sub-optimal (requiring to process all columns vs. the one that really need to be processed)?
Coming from R I’m surprised it is not possible to just pass a vector with the column names that require processing…