Replacing some DataFrame values based on their type, for multiple columns - limits of the df.colname syntax

I have a DataFrame where missing values are indicated by somewhat random strings instead of real numbers, so the only way to know that a value is missing is to test whether it is not a Float64. I am trying to replace these random strings by the “proper” missing indicator.
I found a satisfactory (to me!) method to do this one column at a time, but I find the resulting code really not easy to read when I try to apply it to all variables in the DataFrame.
Here is an example

df = DataFrame(a = [1.0, 2.0, "N/A", 3.0, "bad"], b = [4.0, "error", 5.0, 6.0, 7.0])
# One column at a time (satisfactorily readable to me)
df.a[typeof.(df.a) .!= Float64] .= missing
# For all columns (not very readable to me, because of the df[:,colname] syntax)
for colname in names(df)
    df[typeof.(df[:, varname]) .!= Float64, colname] .= missing

I wish I could use the df.colname syntax in the second version, when colname is in a string variable, but I can’t figure out how to do that. And writing df[:, colname] leads to the above code, which I, for one, do not find very readable. In Matlab, I could have written df.(colname), adding parentheses to indicate that colname is not the actual name of the column but a string variable containing that column name.

I am opened to solutions with transform() or @transform but, to me (coming from Matlab), what I have found so far is even less readable.


Where is your data coming from? You should almost certainly move this tidyihg step further up your pipeline, e.g., when parsing a CSV, than to have a DataFrame with columns of eltype Any.

(Sorry for giving advice instead of answering the question!)

I would write it like this:

julia> replace_non_floats(x) = [xᵢ isa Number ? xᵢ : missing for xᵢ ∈ x]
replace_non_floats (generic function with 1 method)

julia> mapcols!(replace_non_floats, df)
5×2 DataFrame
 Row │ a          b
     │ Float64?   Float64?
   1 │       1.0        4.0
   2 │       2.0  missing
   3 │ missing          5.0
   4 │       3.0        6.0
   5 │ missing          7.0

Which has the additional benefit that it creates a narrower vector type if possible (your code keeps the columns Any which as Joseph says is usually a bad idea).

Try this sintax(not tested I am on phone)


But nilshg advice is a better way to follows

Alternative solution with TableTransforms.jl:

df |> Replace(x -> !(x isa Float64) => missing)

julia> df=DataFrame((c1=[1,"b",2],c2=[21, 22, "cc"]))
3×2 DataFrame
 Row │ c1   c2  
     │ Any  Any
   1 │ 1    21
   2 │ b    22
   3 │ 2    cc

julia> transform!(df, Cols(:).=>(c->[r isa Number ? r : missing for r ∈ c]).=>identity)
3×2 DataFrame
 Row │ c1       c2      
     │ Int64?   Int64?
   1 │       1       21
   2 │ missing       22
   3 │       2  missing
julia> transform!(df, Cols(:).=>(c->[r isa Number ? r : missing for r ∈ c]),renamecols=false)
3×2 DataFrame
 Row │ c1       c2      
     │ Int64?   Int64?
   1 │       1       21
   2 │ missing       22
   3 │       2  missing