I am a beginner in Julia. I am trying to get data from a single row and perform statistics on it like taking the mean, median, etc. However, the error I am getting is telling me that I am trying to perform calculations on an incompatible data type. I tried to use a for-if nest to see and convert the non-float data to float:
begin
algeria = df[df."Country/Region" .== "Algeria", 4:end]
for i = 4:size(algeria, 2)
if eltype(algeria[!, i]) .!= Float64
algeria[!, i] = parse.(Float64, algeria[!, i])
end
end
Statistics.mean(eachcol(algeria))
end
But the error persists.
This is my full error:
MethodError: no method matching parse(::Type{Float64}, ::Array{Union{Missing, Int64},1})
Closest candidates are:
parse(::Type{T}, !Matched::AbstractString; kwargs...) where T<:Real at parse.jl:376
I think that the code you are running and the code in this snippet are not the same.
The line
if eltype(algeria[!, i]) .!= Float64
should actually error, I think. And the fact that it doesn’t is odd.
On the other hand, the call parse.(Float64, algeria[!, i]) should not error, but is actually throwing the error that you just showed.
Here is an MWE that does what you want
julia> begin
df = DataFrame()
N = 100
df."Country/Region" = fill("Algeria", N)
df.x_float = rand(N)
df.x_string = [string.(rand(N-1)); missing]
for i in 2:size(df, 2)
v = df[!, i]
if eltype(v) != Float64
df[!, i] = passmissing(parse).(Float64, v)
end
end
mean(eachcol(df[:, 2:end]))
end
Note that you wan to use passmissing(parse) instead of just parse to deal with missing values properly.
No, dot operators are equivalent to scalar operators if all of the arguments are scalars. That is, it works for the same reason that sqrt.(4) is 2.0 (and sqrt.(4) .== 2.0 is true).
My Dataframe is Parquet in Arrow so idk if df = DataFrame() is soething I want.
Just to be clear, and I think this has been mentioned in other threads, once you read something into memory, it is all the same. A DataFrame is a DataFrame, and it doesn’t matter whether it came from Parquet, CSV, or was created in the code like I did above. Parquet in Arrow does not mean anything once you have a DataFrame.
My code creates a DataFrame with 100 rows, which is why N is 100.
rand(N) just creates a vector of length N of random numbers.
the error is telling you that you can’t parse Int64. parse is used to process text (String) to number (float or int). In this case you probably want:
ok, so you must be including some columns you shouldn’t be including by doing
for i = 4:size(algeria, 2)
The issue is, it looks like you’re parsing columns that are not meant to be number. You can try this:
if eltype(algeria[!, i]) .!= Float64
try
algeria[!, i] = float.(algeria[!, i])
catch
println(i)
end
end
then once you find that ith column is offending, df[!, i] and see what’s the content of this column and if it’s actually intended to be understood as number.
The only columns that contain String are: Symbol("Province/State") and Symbol("Country/Region"), which I don’t think can be understood as numbers to begin with.
begin
algeria = df[df."Country/Region" .== "Algeria", 4:end]
findall(eltype.(df[!, i] for i = 1:size(df, 2)) .!= Float64)
# for i in 2:size(df, 2)
# v = df[!, i]
# if eltype(v) != Float64
# df[!, i] = passmissing(parse).(Float64, v)
# end
# end
for i = 4:size(algeria, 2)
if eltype(algeria[!, i]) .!= Float64
try
algeria[!, i] = float.(algeria[!, i])
catch
println(i)
end
end
end
Statistics.mean(eachcol(algeria))
end
Its throwing this error:
MethodError: no method matching +(::Float64, ::String)
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:538
+(::Float64, !Matched::Float64) at float.jl:401
+(!Matched::ChainRulesCore.One, ::Any) at /home/onur/.julia/packages/ChainRulesCore/7d1hl/src/differential_arithmetic.jl:94