How to parse/convert integers in DataFrame to float numbers

pdeffebach · March 19, 2021, 1:57am

You only “fixed” columns 4:end in your loop. But are doing eachcol on all columns in your dataset.

pdeffebach · March 19, 2021, 1:58am

Are you sure? With DataFrames loaded, passmissing should be defined.

jling · March 19, 2021, 1:58am

not sure what is screwed up because it works for me, I will paste the complete example including downloading data below:

julia> using CSV, DataFrames

julia> df = DataFrame(CSV.File(download("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")))

julia> algeria = df[df."Country/Region" .== "Algeria", 4:end]

julia> unique(describe(algeria).eltype)
2-element Vector{Type}:
 Union{Missing, Float64}
 Int64

julia> for i = 1:size(algeria, 2)
              if eltype(algeria[!, i]) .!= Float64
                  algeria[!, i] = float.(algeria[!, i])
              end
         end

julia> unique(describe(algeria).eltype)
1-element Vector{DataType}:
 Float64


julia> algeria
1×422 DataFrame
 Row │ Long     1/22/20  1/23/20  1/24/20  1/25/20  1/26/20  1/27/20  1/28/20  1/29/20  1/30/20  1/31/20  2/1/20   2/2/20   2/3/20   2/4/20   2/5/20   2/6/20   2/7/20   2/8/20   2/9/20   2/10/20  2/11/20  2/12/20  2/13/20  2/14/20  2/15/20  2/16/20  2/17/20  2/ ⋯
     │ Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Fl ⋯
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │  1.6596      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0     ⋯
                                                                                                                                                                                                                                                    394 columns omitted

oo92 · March 19, 2021, 2:10am

This is what I did:

begin
	algeria = df[df."Country/Region" .== "Algeria", 4:end]
	
	for i = 1:size(algeria, 2)
		if eltype(algeria[!, i]) .!= Float64
			algeria[!, i] = float.(algeria[!, i])
		end
	end
end

This is the error:

MethodError: no method matching AbstractFloat(::String)

Closest candidates are:

AbstractFloat(!Matched::Bool) at float.jl:258

AbstractFloat(!Matched::Int8) at float.jl:259

AbstractFloat(!Matched::Int16) at float.jl:260

...

float(::String)@float.jl:277
_broadcast_getindex_evalf@broadcast.jl:648[inlined]
_broadcast_getindex@broadcast.jl:621[inlined]
getindex@broadcast.jl:575[inlined]
macro expansion@broadcast.jl:932[inlined]
macro expansion@simdloop.jl:77[inlined]
copyto!@broadcast.jl:931[inlined]
copyto!@broadcast.jl:886[inlined]
copy@broadcast.jl:862[inlined]
materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(float),Tuple{Array{Union{Missing, String},1}}})@broadcast.jl:837
top-level scope@Local: 7

I am trying to get the columns for Algeria from the 4th column onwards…

jling · March 19, 2021, 2:12am

your algeria is already 4th column and forward of the original dataframe, I suspect you have some spelling error or something.

Can you try the whole snippet I pasted? Since they should do the same thing

oo92 · March 19, 2021, 2:20am

Yea it works in the terminal but does not work in Pluto. Let me send you my entire file:

begin
	using Pkg
	Pkg.add("PlutoUI")
	Pkg.add("Parquet")
	Pkg.add("StatsModels")
	Pkg.add("Missings")
	Pkg.add("Arrow")
	Pkg.activate(".")
	import CSV, DataFrames, Dates, StatsPlots, StatsModels, Statistics, Missings
	import DataFrames.DataFrame
	using Plots, PlutoUI, DelimitedFiles, Parquet, Arrow
end

begin
	df = CSV.read("temp.csv", DataFrame)
	write_parquet("data_file.parquet", df)
	df = DataFrame(read_parquet("data_file.parquet"))
	Arrow.write("data_file.arrow", df)
	df = DataFrame(Arrow.Table("data_file.arrow"))
end

begin
	dates = names(df)[5:end]
	countries = unique(df[:, :"Country/Region"])
end

begin
	algeria = df[df."Country/Region" .== "Algeria", 4:end]
	
	for i = 1:size(algeria, 2)
		if eltype(algeria[!, i]) .!= Float64
			algeria[!, i] = float.(algeria[!, i])
		end
	end
end

jling · March 19, 2021, 2:26am

oo92:

	df = CSV.read("temp.csv", DataFrame)
	write_parquet("data_file.parquet", df)
	df = DataFrame(read_parquet("data_file.parquet"))
	Arrow.write("data_file.arrow", df)
	df = DataFrame(Arrow.Table("data_file.arrow"))

please make up your mind as to what to use, this is not doing anything useful. Considering the data is so small, I’d just use csv+dataframe and forget about Arrow and Parquet.

I believe if you remove arrow and parquet this works, basically for the first begin block, only keep df = CSV.read("temp.csv", DataFrame)

jling · March 19, 2021, 2:29am

the root of your error is that, Parquet re-arranged your columns. so your country region states stuff are now at the end, thus when you do 4:end, you are still including them.

oo92 · March 19, 2021, 2:33am

My friend, I have to do this chunk as part of an assignment. So this chunk of code has to exist whether I like it or not.

So if they’re at the end, do I do begin:4?

jling · March 19, 2021, 2:39am

well, it would be begin:end-4.

But in any case, if you need to do this dance, please inspect the final df that you’re actually doing operation on. (or better yet, don’t rely on the order of columns at all)

begin
    algeria = df[df."Country/Region" .== "Algeria", begin:end-4]
	
	for i = 1:size(algeria, 2)
		if eltype(algeria[!, i]) .!= Float64
			algeria[!, i] = float.(algeria[!, i])
		end
	end
end

nilshg · March 19, 2021, 6:54am

I would again strongly encourage you to work your way through https://github.com/bkamins/Julia-DataFrames-Tutorial

While it might take you a day or two, understanding how things work and why they error will still save you loads of time compared to trying to debug a DataFrames analysis line by line on Discourse.

Topic		Replies	Views
Converting columns in Dataframe from Int to Float type New to Julia question , dataframes , convert	8	13096	June 12, 2017
String31 in dataframe New to Julia question , dataframes	4	663	July 24, 2023
Julia convert 1-element Array to Integer New to Julia question , arrays , convert	2	5808	May 3, 2019
I want to extract and convert to integer a column in a DataFrame Data data	8	3434	August 26, 2017
Foolproof method for converting to Float64 New to Julia dataframes , convert	10	1985	May 28, 2021

How to parse/convert integers in DataFrame to float numbers

Related topics