How to convert the String15 datatype in DataFrames.jl to Float64

I have a DataFrame with 2 columns in type String15 and String31, respectively.
Now I want to convert element type from 2 columns to Float64 but I can’t found any document or example in DataFrames.jl’s document.

My first try:
removed_missing_df[:, "Area"] = map(Float64, removed_missing_df[:, "Area"])

return error:

MethodError: no method matching Float64(::InlineStrings.String15)

Closest candidates are:

(::Type{T})(!Matched::AbstractChar) where T<:Union{AbstractChar, Number} at char.jl:50

(::Type{T})(!Matched::Base.TwicePrecision) where T<:Number at twiceprecision.jl:266

(::Type{T})(!Matched::Complex) where T<:Real at complex.jl:44

...

    iterate@generator.jl:47[inlined]
    _collect@array.jl:807[inlined]
    collect_similar(::Vector{InlineStrings.String15}, ::Base.Generator{Vector{InlineStrings.String15}, Type{Float64}})@array.jl:716
    map(::Type, ::Vector{InlineStrings.String15})@abstractarray.jl:2933
    top-level scope@Local: 1[inlined]
1 Like

Look into the transform function and the transformation x -> parse(Float64, x).

Also, if you are using CSV.jl to read in the data, look into specifying the type when you call the read function.

can you give me a sample code for transform function?

Say you’ve downloaded the file StarWars.csv.

Then reading it in using CSV.jl:

df = CSV.read("StarWars.csv", DataFrame)

we get the :Weight column has type String7 since Jabba’s weight is given as “NA”.
Updating from here, we might do

df[15, :Weight] = "1358" 

but now we want to interpret this :Weight column as containing Float64 data.
So we do:

transform!(df, :Weight => ( x -> parse.(Float64,x) ) => :Weight)

do convert the column. (See First Steps with DataFrames.jl for more on this.)

If you give the types at the time you do CSV.read, the String7 type becomes a Union{Missing, Float64} instead:

df = CSV.read("StarWars.csv", DataFrame; types=Dict(:Weight => Float64))
2 Likes

DataFramesMeta.jl can simplify some of this syntax; check out the examples for Propagating missing values with @passmissing

1 Like

Seems you’re looking for an element-wise parse:

removed_missing_df[:, "Area"] = parse.(Float64, removed_missing_df[:, "Area"])


I guess (like @jd-foster) that the issue stems from missing values in the input. If you’re using CSV it’s also possible to tell CSV.read directly which token to interpret as a missing value:

julia> df = CSV.read("StarWars.csv", DataFrame; missingstring="NA")
julia> eltype(df.Weight)
Union{Missing, Float64}

The : syntax seems not working in this situation, I changed : => !
remove_missing_df[!, "Area"] = parse.(Float64, remove_missing_df[!, "Area"])

Now I have a different question. Why ! works while : throws error

The error messages in here also weird, it should be
Cannot `convert` an object of type InlineStrings.String15 to an object of type Float64

Is it a bug?

Ups, my bad. One should use ! here.

DataFrames.jl doesn’t always follow Julia rules and conventions, neither for assignment, indexing, type promotion, concatenation nor broadcasting. Instead it has its own set of rules: Indexing · DataFrames.jl

As to your question “Why?”. No particular reason afaik.

I don’t think so. Considere this:

julia> vs=["1.2","3.4","2.0"]
3-element Vector{String}:
 "1.2"
 "3.4"
 "2.0"

julia> df=DataFrame(;vs)
3×1 DataFrame
 Row │ vs     
     │ String
─────┼────────
   1 │ 1.2
   2 │ 3.4
   3 │ 2.0

julia> eltype(df.vs)
String

julia> df.vs[1]=99.9
ERROR: MethodError: Cannot `convert` an object of type Float64 to an object of type String


julia> df.vs[1]=parse(Float64,"99.9")
ERROR: MethodError: Cannot `convert` an object of type Float64 to an object of type String
Closest candidates are:
  

and this

julia> vs[1]=9.99
ERROR: MethodError: Cannot `convert` an object of type Float64 to an object of type String
Closest candidates are:

then the “responsible” is julia non DataFrames

2 Likes

you can use parse inside the transform function

julia> df
3×1 DataFrame
 Row │ vs     
     │ String
─────┼────────
   1 │ 1.2
   2 │ 3.4
   3 │ 2.0

julia> transform(df, :vs=>ByRow(x->parse(Float64, x))=>:fls)      
3×2 DataFrame
 Row │ vs      fls     
     │ String  Float64
─────┼─────────────────
   1 │ 1.2         1.2
   2 │ 3.4         3.4
   3 │ 2.0         2.0