DataFrames: convert column data type

I have some coloums that contain floats. Now for analysis I want to use the measurement datatype, because it also gives me uncertainties. I tried this, but it does’t work, because I want the values to be written back into the same spot. How can I do something like this?:

float_cols = select(df, eltype.(eachcol(df)) .<: Float64)
float_cols .= convert.(Measurement,float_cols)

this raises an error no method matching Float64(::Measurement{Float64}) because it tries to write back to the columns that are still of type float.

You can’t, the poing of .= is tp reuse existing memory to save allocations, but different data types have different memory layouts so the memory can be reused.

Yes, that makes sense. This gives me back the dataframe with the coloums that where floats now converted to measurements. But I would somehow need to write it back to the original dataframe, because it remains unchanged.

float_cols = select(df, eltype.(eachcol(df)) .<: Float64)
float_cols = convert.(Measurement,float_cols)

That’s because select creates a copy. You want something like

In [15]: df = DataFrame(x = 1:2, y = rand(2), z = 1:2)
2Γ—3 DataFrame
 Row β”‚ x      y         z
     β”‚ Int64  Float64   Int64
─────┼────────────────────────
   1 β”‚     1  0.614221      1
   2 β”‚     2  0.906434      2

In [16]: df[!, names(df, Int)] = float.(df[!, names(df, Int)])
2Γ—2 DataFrame
 Row β”‚ x        z
     β”‚ Float64  Float64
─────┼──────────────────
   1 β”‚     1.0      1.0
   2 β”‚     2.0      2.0

In [17]: df
2Γ—3 DataFrame
 Row β”‚ x        y         z
     β”‚ Float64  Float64   Float64
─────┼────────────────────────────
   1 β”‚     1.0  0.614221      1.0
   2 β”‚     2.0  0.906434      2.0
3 Likes

Absolutely bonkers that I never knew about this handy dandy shortcut

EDIT: Also, given this shortcut, then probably something like

select(df, :, names(df,Float64) .=> ByRow(measurement) .=> names(df,Float64))

Or even renamecols = false instead of .=> names(df, Float64). But again select will create a copy here. select! will work, but drop other columns. If you want to use the minilanguage you want transform!:

In [29]: transform!(df, names(df, Int) .=> float; renamecols = false)
2Γ—3 DataFrame
 Row β”‚ x        y          z
     β”‚ Float64  Float64    Float64
─────┼─────────────────────────────
   1 β”‚     1.0  0.0932927      1.0
   2 β”‚     2.0  0.112173       2.0
1 Like