Julia not able to convert Array to a series data for plotting

oo92 · March 21, 2021, 8:43pm

I am trying to plot a dataframe I created from an existing dataframe. Basically I just took about 5 rows and all N-4 columns and created a new dataframe. I wanted to plot this but Julia (on Pluto.jl) is throwing this error:

Cannot convert Array{Any,2} to series data for plotting

error(::String)@error.jl:33
_prepare_series_data(::Array{Any,2})@series.jl:8
_series_data_vector(::Array{Any,2}, ::Dict{Symbol,Any})@series.jl:27
macro expansion@series.jl:144[inlined]
apply_recipe(::AbstractDict{Symbol,Any}, ::Type{RecipesPipeline.SliceIt}, ::Any, ::Any, ::Any)@RecipesBase.jl:282
_process_userrecipes!(::Any, ::Any, ::Any)@user_recipe.jl:36
recipe_pipeline!(::Any, ::Any, ::Any)@RecipesPipeline.jl:70
_plot!(::Plots.Plot, ::Any, ::Any)@plot.jl:172
#plot#129@plot.jl:58[inlined]
#add_label#17(::Base.Iterators.Pairs{Symbol,Any,NTuple{13,Symbol},NamedTuple{(:label, :xlabel, :ylabel, :xticks, :xrotation, :marker, :line, :legend, :grid, :framestyle, :legendfontsize, :tickfontsize, :formatter),Tuple{Array{String,2},String,String,Array{Dates.Date,1},Int64,Tuple{Symbol,Int64},Tuple{Symbol,String},Symbol,Bool,Symbol,Int64,Int64,Symbol}}}, ::typeof(StatsPlots.add_label), ::Array{Any,1}, ::Function, ::Array{Dates.Date,1}, ::Vararg{Any,N} where N)@df.jl:155
(::Main.workspace140.var"#1#3")(::DataFrames.DataFrame)@range.jl:0
top-level scope@Local: 17

This is the cell that is throwing this error:

begin
    
    countries = ["Italy", "Germany", "India", "United Kingdom"];
    y = DataFrame() # empty dataframe

    for country in countries    
        data_dfr = get_country(df,country); # returns a dataframe row 
        data_dfr = DataFrame(data_dfr);           # convert dataframe row back to a                                                                             dataframe
        df_rows, df_cols = size(data_dfr);
        data_dfl = stack(data_dfr, 5:df_cols);       # convert dataframe into long                                                                                  format
        y[!,Symbol("$country")] = data_dfl[!,:value]
    end

    rows,cols = size(y)
    
    gr(size=(900,600))
    @df y plot(x_axis, cols(1:cols), 
        label =  reshape(names(y),(1,length(names(y)))),
        xlabel = "Time",
        ylabel = "Total number of reported cases",
        xticks = x_axis[1:7:end],
        xrotation = 45,
        marker = (:diamond,4),
        line = (:line, "gray"),
        legend = :topleft,
        grid = false,
        framestyle = :semi,
        legendfontsize = 9,
        tickfontsize = 9,
        formatter = :plain
        )
    
    y.One_million = Array{Union{Missing,Float64},1}(missing,size(y,1));
    y.One_million .= 10^6.0;
    
    display(@df y plot!(x_axis, y[!,cols+1],
           linestyle = :dot,
           linewidth = 5,
           color = :red,
           label = names(y)[cols+1]))
    
    y = select!(y, Not([:One_million])); 
end

Here are the functions and variables used in the above code:

begin
    dates = names(df)[begin:end-4]
    date_format = Dates.DateFormat("m/d/y")
    x_axis = parse.(Date, dates, date_format) .+ Year(2000)
end

function get_country(dataframe, country::String)
    df_country = dataframe[ismissing.(dataframe[!, Symbol("Province/State")]), :]
    indx = findfirst(df_country[!, Symbol("Country/Region")] .== country)
    return df_country[indx, :]
end

begin
	dataset = download("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv","covid_19_global_data.csv")
	
	df = CSV.read("covid_19_global_data.csv", DataFrame)
	write_parquet("covid_19_global_data.parquet", df)
	df = DataFrame(read_parquet("covid_19_global_data.parquet"))
	Arrow.write("covid_19_global_data.arrow", df)
	df = DataFrame(Arrow.Table("covid_19_global_data.arrow"))
		
end

aramirezreyes · March 21, 2021, 9:01pm

I couldn’t run your code because df is not defined. Could you try and define the rest of the variables?

oo92 · March 21, 2021, 9:03pm

Done. Check the edits.

aramirezreyes · March 21, 2021, 9:36pm

Hmmm. I tried for 20 minutes to replicate your problem but I couldn’t. Part of it is because you use many packages that are not in your snippet. This made me have to investigate what packages I needed to install to be able to run every line. After discovering that I had to install and use Arrow.jl, CSV.jl, and Parquet.jl, I stopped looking because there are further packages I need to install to get to the point of your error. I advice to take a look at PSA: make it easier to help you for advice on how to paste a minimal working example, it would be easier to reproduce your error and help.
Ideally we should be able to copy and paste your snippet and it should throw the error.

oo92 · March 21, 2021, 10:03pm

Well, did you get a graph output? What did you end up getting?

oo92 · March 21, 2021, 11:13pm

I do not want the last 4 columns. Your code seems to avoid the 1st 4 columns.

rafael.guerra · March 21, 2021, 11:25pm

The last 4 rows seem to have non-numerical data. Could you double-check?

julia> y
424×4 DataFrame
 Row │ Italy    Germany  India     United Kingdom 
     │ Any      Any      Any       Any
─────┼────────────────────────────────────────────
   1 │ 2319036  1993892  10512093  3211576        
   2 │ 2336279  2015235  10527683  3260258        
   3 │ 2352423  2023828  10542841  3316019
   4 │ 2368733  2038645  10557985  3357361
   5 │ 2381277  2050129  10571773  3395959
   6 │ 2390102  2059382  10581823  3433494
  ⋮  │    ⋮        ⋮        ⋮            ⋮
 420 │ 281583   256433   4465863   355219
 421 │ Italy    Germany  India     United Kingdom
 422 │ 41.8719  51.1657  20.5937   55.3781
 423 │ 12.5674  10.4515  78.9629   -3.436
 424 │ missing  missing  missing   missing

oo92 · March 21, 2021, 11:30pm

Yep. That’s why I don’t want them. This df was created from an arrow file, at the end of the day. It reversed the ordering of the columns from when it was a CSV.

How can I use, from begin:end, only the numerical data?

kmundnic · March 21, 2021, 11:37pm

For the dataframe posted above:

plot(filter(x -> isa(x, Number), y.Italy))

would plot the Italy column.

oo92 · March 21, 2021, 11:39pm

What if I want to plot all of them?

kmundnic · March 21, 2021, 11:43pm

One way would be to iterate over all the columns:

plot()
for colname in names(y)
    plot!(filter(x -> isa(x, Number), y[!, Symbol(colname)]), label = colname, show = true)
end

oo92 · March 21, 2021, 11:44pm

Sure but my current code is good. How can I improve my existing code?

rafael.guerra · March 22, 2021, 12:21am

Check if this helps:

i = 1; n = rows
while i <= n
    if prod(isa.(collect((y)[i,:]),Number))==0
       delete!(y,i)
       x_axis = x_axis[1:end .!= i]
       n -= 1
    end
    i += 1
end

PS: the plot does not look very nice though:

oo92 · March 22, 2021, 12:24am

But where in the code does this go to? Or rather, what do I replace with this?

rafael.guerra · March 22, 2021, 12:26am

You should place it between these two lines:

rows, cols = size(y)
....
gr(size = (900, 600))

oo92 · March 22, 2021, 12:27am

What about my existing plot code that starts here:

@df y plot(x_axis, cols(1:cols),

rafael.guerra · March 22, 2021, 12:33am

If you had tried the code you would have gotten for the second plot:

PS: most of us do not use discourse as a chat tool. There are dedicated tools for that purpose.

oo92 · March 22, 2021, 12:35am

UndefVarError: i not defined

Which is weird because you’re clearly defining i

rafael.guerra · March 22, 2021, 1:15am

Sounds like problem of local vs global scope. May need to declare as global in the while loop.

oo92 · March 22, 2021, 1:23am

I did this:

	global i = 1 
	
	n = rows
	while i <= n
    	if prod(isa.(collect((y)[i,:]),Number))==0
       		delete!(y,i)
       		x_axis = x_axis[1:end .!= i]
       		n -= 1
    	end
    	i += 1
	end

Same error. When I did while global i <= n, it threw a syntax error.

Topic		Replies	Views
Plot problem, flat line in the middle of the plot General Usage plotting	6	73	October 4, 2024
Pandas dataframe convert to Array New to Julia plotting , dataframes	4	596	November 27, 2021
How to parse/convert integers in DataFrame to float numbers New to Julia dataframes	30	1954	March 19, 2021
Barplot in Julia New to Julia plotting , dataframes , csv , plotlyjs	2	1434	November 5, 2021
Need to plot high resolution time series data General Usage plotting	11	1287	December 24, 2020

Julia not able to convert Array to a series data for plotting

Related topics