You only βfixedβ columns 4:end
in your loop. But are doing eachcol
on all columns in your dataset.
Are you sure? With DataFrames loaded, passmissing
should be defined.
not sure what is screwed up because it works for me, I will paste the complete example including downloading data below:
julia> using CSV, DataFrames
julia> df = DataFrame(CSV.File(download("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")))
julia> algeria = df[df."Country/Region" .== "Algeria", 4:end]
julia> unique(describe(algeria).eltype)
2-element Vector{Type}:
Union{Missing, Float64}
Int64
julia> for i = 1:size(algeria, 2)
if eltype(algeria[!, i]) .!= Float64
algeria[!, i] = float.(algeria[!, i])
end
end
julia> unique(describe(algeria).eltype)
1-element Vector{DataType}:
Float64
julia> algeria
1Γ422 DataFrame
Row β Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 2/1/20 2/2/20 2/3/20 2/4/20 2/5/20 2/6/20 2/7/20 2/8/20 2/9/20 2/10/20 2/11/20 2/12/20 2/13/20 2/14/20 2/15/20 2/16/20 2/17/20 2/ β―
β Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Fl β―
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 1.6596 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 β―
394 columns omitted
This is what I did:
begin
algeria = df[df."Country/Region" .== "Algeria", 4:end]
for i = 1:size(algeria, 2)
if eltype(algeria[!, i]) .!= Float64
algeria[!, i] = float.(algeria[!, i])
end
end
end
This is the error:
MethodError: no method matching AbstractFloat(::String)
Closest candidates are:
AbstractFloat(!Matched::Bool) at float.jl:258
AbstractFloat(!Matched::Int8) at float.jl:259
AbstractFloat(!Matched::Int16) at float.jl:260
...
float(::String)@float.jl:277
_broadcast_getindex_evalf@broadcast.jl:648[inlined]
_broadcast_getindex@broadcast.jl:621[inlined]
getindex@broadcast.jl:575[inlined]
macro expansion@broadcast.jl:932[inlined]
macro expansion@simdloop.jl:77[inlined]
copyto!@broadcast.jl:931[inlined]
copyto!@broadcast.jl:886[inlined]
copy@broadcast.jl:862[inlined]
materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(float),Tuple{Array{Union{Missing, String},1}}})@broadcast.jl:837
top-level scope@Local: 7
I am trying to get the columns for Algeria from the 4th column onwardsβ¦
your algeria is already 4th column and forward of the original dataframe, I suspect you have some spelling error or something.
Can you try the whole snippet I pasted? Since they should do the same thing
Yea it works in the terminal but does not work in Pluto. Let me send you my entire file:
begin
using Pkg
Pkg.add("PlutoUI")
Pkg.add("Parquet")
Pkg.add("StatsModels")
Pkg.add("Missings")
Pkg.add("Arrow")
Pkg.activate(".")
import CSV, DataFrames, Dates, StatsPlots, StatsModels, Statistics, Missings
import DataFrames.DataFrame
using Plots, PlutoUI, DelimitedFiles, Parquet, Arrow
end
begin
df = CSV.read("temp.csv", DataFrame)
write_parquet("data_file.parquet", df)
df = DataFrame(read_parquet("data_file.parquet"))
Arrow.write("data_file.arrow", df)
df = DataFrame(Arrow.Table("data_file.arrow"))
end
begin
dates = names(df)[5:end]
countries = unique(df[:, :"Country/Region"])
end
begin
algeria = df[df."Country/Region" .== "Algeria", 4:end]
for i = 1:size(algeria, 2)
if eltype(algeria[!, i]) .!= Float64
algeria[!, i] = float.(algeria[!, i])
end
end
end
please make up your mind as to what to use, this is not doing anything useful. Considering the data is so small, Iβd just use csv+dataframe and forget about Arrow and Parquet.
I believe if you remove arrow and parquet this works, basically for the first begin block, only keep df = CSV.read("temp.csv", DataFrame)
the root of your error is that, Parquet re-arranged your columns. so your country region states stuff are now at the end, thus when you do 4:end
, you are still including them.
My friend, I have to do this chunk as part of an assignment. So this chunk of code has to exist whether I like it or not.
So if theyβre at the end, do I do begin:4
?
well, it would be begin:end-4
.
But in any case, if you need to do this dance, please inspect the final df
that youβre actually doing operation on. (or better yet, donβt rely on the order of columns at all)
begin
algeria = df[df."Country/Region" .== "Algeria", begin:end-4]
for i = 1:size(algeria, 2)
if eltype(algeria[!, i]) .!= Float64
algeria[!, i] = float.(algeria[!, i])
end
end
end
I would again strongly encourage you to work your way through https://github.com/bkamins/Julia-DataFrames-Tutorial
While it might take you a day or two, understanding how things work and why they error will still save you loads of time compared to trying to debug a DataFrames analysis line by line on Discourse.