I have a DataFrame with 2 columns in type String15 and String31, respectively.
Now I want to convert element type from 2 columns to Float64 but I can’t found any document or example in DataFrames.jl’s document.
My first try:
removed_missing_df[:, "Area"] = map(Float64, removed_missing_df[:, "Area"])
return error:
MethodError: no method matching Float64(::InlineStrings.String15)
Closest candidates are:
(::Type{T})(!Matched::AbstractChar) where T<:Union{AbstractChar, Number} at char.jl:50
(::Type{T})(!Matched::Base.TwicePrecision) where T<:Number at twiceprecision.jl:266
(::Type{T})(!Matched::Complex) where T<:Real at complex.jl:44
...
iterate@generator.jl:47[inlined]
_collect@array.jl:807[inlined]
collect_similar(::Vector{InlineStrings.String15}, ::Base.Generator{Vector{InlineStrings.String15}, Type{Float64}})@array.jl:716
map(::Type, ::Vector{InlineStrings.String15})@abstractarray.jl:2933
top-level scope@Local: 1[inlined]
1 Like
Look into the transform
function and the transformation x -> parse(Float64, x)
.
Also, if you are using CSV.jl to read in the data, look into specifying the type when you call the read
function.
can you give me a sample code for transform
function?
Say you’ve downloaded the file StarWars.csv.
Then reading it in using CSV.jl
:
df = CSV.read("StarWars.csv", DataFrame)
we get the :Weight
column has type String7
since Jabba’s weight is given as “NA”.
Updating from here, we might do
df[15, :Weight] = "1358"
but now we want to interpret this :Weight
column as containing Float64
data.
So we do:
transform!(df, :Weight => ( x -> parse.(Float64,x) ) => :Weight)
do convert the column. (See First Steps with DataFrames.jl for more on this.)
If you give the types at the time you do CSV.read
, the String7
type becomes a Union{Missing, Float64}
instead:
df = CSV.read("StarWars.csv", DataFrame; types=Dict(:Weight => Float64))
2 Likes
DataFramesMeta.jl can simplify some of this syntax; check out the examples for Propagating missing values with @passmissing
1 Like
Seems you’re looking for an element-wise parse:
removed_missing_df[:, "Area"] = parse.(Float64, removed_missing_df[:, "Area"])
I guess (like @jd-foster) that the issue stems from missing values in the input. If you’re using CSV it’s also possible to tell CSV.read
directly which token to interpret as a missing value:
julia> df = CSV.read("StarWars.csv", DataFrame; missingstring="NA")
julia> eltype(df.Weight)
Union{Missing, Float64}
The :
syntax seems not working in this situation, I changed :
=> !
remove_missing_df[!, "Area"] = parse.(Float64, remove_missing_df[!, "Area"])
Now I have a different question. Why !
works while :
throws error
The error messages in here also weird, it should be
Cannot `convert` an object of type InlineStrings.String15 to an object of type Float64
Is it a bug?
Ups, my bad. One should use ! here.
DataFrames.jl doesn’t always follow Julia rules and conventions, neither for assignment, indexing, type promotion, concatenation nor broadcasting. Instead it has its own set of rules: Indexing · DataFrames.jl
As to your question “Why?”. No particular reason afaik.
I don’t think so. Considere this:
julia> vs=["1.2","3.4","2.0"]
3-element Vector{String}:
"1.2"
"3.4"
"2.0"
julia> df=DataFrame(;vs)
3×1 DataFrame
Row │ vs
│ String
─────┼────────
1 │ 1.2
2 │ 3.4
3 │ 2.0
julia> eltype(df.vs)
String
julia> df.vs[1]=99.9
ERROR: MethodError: Cannot `convert` an object of type Float64 to an object of type String
julia> df.vs[1]=parse(Float64,"99.9")
ERROR: MethodError: Cannot `convert` an object of type Float64 to an object of type String
Closest candidates are:
and this
julia> vs[1]=9.99
ERROR: MethodError: Cannot `convert` an object of type Float64 to an object of type String
Closest candidates are:
then the “responsible” is julia non DataFrames
2 Likes
you can use parse inside the transform function
julia> df
3×1 DataFrame
Row │ vs
│ String
─────┼────────
1 │ 1.2
2 │ 3.4
3 │ 2.0
julia> transform(df, :vs=>ByRow(x->parse(Float64, x))=>:fls)
3×2 DataFrame
Row │ vs fls
│ String Float64
─────┼─────────────────
1 │ 1.2 1.2
2 │ 3.4 3.4
3 │ 2.0 2.0