How to parse/convert integers in DataFrame to float numbers

You only β€œfixed” columns 4:end in your loop. But are doing eachcol on all columns in your dataset.

Are you sure? With DataFrames loaded, passmissing should be defined.

not sure what is screwed up because it works for me, I will paste the complete example including downloading data below:

julia> using CSV, DataFrames

julia> df = DataFrame(CSV.File(download("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")))

julia> algeria = df[df."Country/Region" .== "Algeria", 4:end]

julia> unique(describe(algeria).eltype)
2-element Vector{Type}:
 Union{Missing, Float64}
 Int64

julia> for i = 1:size(algeria, 2)
              if eltype(algeria[!, i]) .!= Float64
                  algeria[!, i] = float.(algeria[!, i])
              end
         end

julia> unique(describe(algeria).eltype)
1-element Vector{DataType}:
 Float64


julia> algeria
1Γ—422 DataFrame
 Row β”‚ Long     1/22/20  1/23/20  1/24/20  1/25/20  1/26/20  1/27/20  1/28/20  1/29/20  1/30/20  1/31/20  2/1/20   2/2/20   2/3/20   2/4/20   2/5/20   2/6/20   2/7/20   2/8/20   2/9/20   2/10/20  2/11/20  2/12/20  2/13/20  2/14/20  2/15/20  2/16/20  2/17/20  2/ β‹―
     β”‚ Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Fl β‹―
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 β”‚  1.6596      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0     β‹―
                                                                                                                                                                                                                                                    394 columns omitted

This is what I did:

begin
	algeria = df[df."Country/Region" .== "Algeria", 4:end]
	
	for i = 1:size(algeria, 2)
		if eltype(algeria[!, i]) .!= Float64
			algeria[!, i] = float.(algeria[!, i])
		end
	end
end

This is the error:

MethodError: no method matching AbstractFloat(::String)

Closest candidates are:

AbstractFloat(!Matched::Bool) at float.jl:258

AbstractFloat(!Matched::Int8) at float.jl:259

AbstractFloat(!Matched::Int16) at float.jl:260

...

float(::String)@float.jl:277
_broadcast_getindex_evalf@broadcast.jl:648[inlined]
_broadcast_getindex@broadcast.jl:621[inlined]
getindex@broadcast.jl:575[inlined]
macro expansion@broadcast.jl:932[inlined]
macro expansion@simdloop.jl:77[inlined]
copyto!@broadcast.jl:931[inlined]
copyto!@broadcast.jl:886[inlined]
copy@broadcast.jl:862[inlined]
materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(float),Tuple{Array{Union{Missing, String},1}}})@broadcast.jl:837
top-level scope@Local: 7

I am trying to get the columns for Algeria from the 4th column onwards…

your algeria is already 4th column and forward of the original dataframe, I suspect you have some spelling error or something.

Can you try the whole snippet I pasted? Since they should do the same thing

Yea it works in the terminal but does not work in Pluto. Let me send you my entire file:

begin
	using Pkg
	Pkg.add("PlutoUI")
	Pkg.add("Parquet")
	Pkg.add("StatsModels")
	Pkg.add("Missings")
	Pkg.add("Arrow")
	Pkg.activate(".")
	import CSV, DataFrames, Dates, StatsPlots, StatsModels, Statistics, Missings
	import DataFrames.DataFrame
	using Plots, PlutoUI, DelimitedFiles, Parquet, Arrow
end

begin
	df = CSV.read("temp.csv", DataFrame)
	write_parquet("data_file.parquet", df)
	df = DataFrame(read_parquet("data_file.parquet"))
	Arrow.write("data_file.arrow", df)
	df = DataFrame(Arrow.Table("data_file.arrow"))
end

begin
	dates = names(df)[5:end]
	countries = unique(df[:, :"Country/Region"])
end

begin
	algeria = df[df."Country/Region" .== "Algeria", 4:end]
	
	for i = 1:size(algeria, 2)
		if eltype(algeria[!, i]) .!= Float64
			algeria[!, i] = float.(algeria[!, i])
		end
	end
end

please make up your mind as to what to use, this is not doing anything useful. Considering the data is so small, I’d just use csv+dataframe and forget about Arrow and Parquet.

I believe if you remove arrow and parquet this works, basically for the first begin block, only keep df = CSV.read("temp.csv", DataFrame)

1 Like

the root of your error is that, Parquet re-arranged your columns. so your country region states stuff are now at the end, thus when you do 4:end, you are still including them.

My friend, I have to do this chunk as part of an assignment. So this chunk of code has to exist whether I like it or not.

So if they’re at the end, do I do begin:4?

well, it would be begin:end-4.

But in any case, if you need to do this dance, please inspect the final df that you’re actually doing operation on. (or better yet, don’t rely on the order of columns at all)

begin
    algeria = df[df."Country/Region" .== "Algeria", begin:end-4]
	
	for i = 1:size(algeria, 2)
		if eltype(algeria[!, i]) .!= Float64
			algeria[!, i] = float.(algeria[!, i])
		end
	end
end

I would again strongly encourage you to work your way through https://github.com/bkamins/Julia-DataFrames-Tutorial

While it might take you a day or two, understanding how things work and why they error will still save you loads of time compared to trying to debug a DataFrames analysis line by line on Discourse.

3 Likes