Julia not able to convert Array to a series data for plotting

I am trying to plot a dataframe I created from an existing dataframe. Basically I just took about 5 rows and all N-4 columns and created a new dataframe. I wanted to plot this but Julia (on Pluto.jl) is throwing this error:

Cannot convert Array{Any,2} to series data for plotting

error(::String)@error.jl:33
_prepare_series_data(::Array{Any,2})@series.jl:8
_series_data_vector(::Array{Any,2}, ::Dict{Symbol,Any})@series.jl:27
macro expansion@series.jl:144[inlined]
apply_recipe(::AbstractDict{Symbol,Any}, ::Type{RecipesPipeline.SliceIt}, ::Any, ::Any, ::Any)@RecipesBase.jl:282
_process_userrecipes!(::Any, ::Any, ::Any)@user_recipe.jl:36
recipe_pipeline!(::Any, ::Any, ::Any)@RecipesPipeline.jl:70
_plot!(::Plots.Plot, ::Any, ::Any)@plot.jl:172
#plot#129@plot.jl:58[inlined]
#add_label#17(::Base.Iterators.Pairs{Symbol,Any,NTuple{13,Symbol},NamedTuple{(:label, :xlabel, :ylabel, :xticks, :xrotation, :marker, :line, :legend, :grid, :framestyle, :legendfontsize, :tickfontsize, :formatter),Tuple{Array{String,2},String,String,Array{Dates.Date,1},Int64,Tuple{Symbol,Int64},Tuple{Symbol,String},Symbol,Bool,Symbol,Int64,Int64,Symbol}}}, ::typeof(StatsPlots.add_label), ::Array{Any,1}, ::Function, ::Array{Dates.Date,1}, ::Vararg{Any,N} where N)@df.jl:155
(::Main.workspace140.var"#1#3")(::DataFrames.DataFrame)@range.jl:0
top-level scope@Local: 17

This is the cell that is throwing this error:

begin
    
    countries = ["Italy", "Germany", "India", "United Kingdom"];
    y = DataFrame() # empty dataframe

    for country in countries    
        data_dfr = get_country(df,country); # returns a dataframe row 
        data_dfr = DataFrame(data_dfr);           # convert dataframe row back to a                                                                             dataframe
        df_rows, df_cols = size(data_dfr);
        data_dfl = stack(data_dfr, 5:df_cols);       # convert dataframe into long                                                                                  format
        y[!,Symbol("$country")] = data_dfl[!,:value]
    end

    rows,cols = size(y)
    
    gr(size=(900,600))
    @df y plot(x_axis, cols(1:cols), 
        label =  reshape(names(y),(1,length(names(y)))),
        xlabel = "Time",
        ylabel = "Total number of reported cases",
        xticks = x_axis[1:7:end],
        xrotation = 45,
        marker = (:diamond,4),
        line = (:line, "gray"),
        legend = :topleft,
        grid = false,
        framestyle = :semi,
        legendfontsize = 9,
        tickfontsize = 9,
        formatter = :plain
        )
    
    y.One_million = Array{Union{Missing,Float64},1}(missing,size(y,1));
    y.One_million .= 10^6.0;
    
    display(@df y plot!(x_axis, y[!,cols+1],
           linestyle = :dot,
           linewidth = 5,
           color = :red,
           label = names(y)[cols+1]))
    
    y = select!(y, Not([:One_million])); 
end

Here are the functions and variables used in the above code:

begin
    dates = names(df)[begin:end-4]
    date_format = Dates.DateFormat("m/d/y")
    x_axis = parse.(Date, dates, date_format) .+ Year(2000)
end
function get_country(dataframe, country::String)
    df_country = dataframe[ismissing.(dataframe[!, Symbol("Province/State")]), :]
    indx = findfirst(df_country[!, Symbol("Country/Region")] .== country)
    return df_country[indx, :]
end
begin
	dataset = download("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv","covid_19_global_data.csv")
	
	df = CSV.read("covid_19_global_data.csv", DataFrame)
	write_parquet("covid_19_global_data.parquet", df)
	df = DataFrame(read_parquet("covid_19_global_data.parquet"))
	Arrow.write("covid_19_global_data.arrow", df)
	df = DataFrame(Arrow.Table("covid_19_global_data.arrow"))
		
end

I couldn’t run your code because df is not defined. Could you try and define the rest of the variables?

1 Like

Done. Check the edits.

Hmmm. I tried for 20 minutes to replicate your problem but I couldn’t. Part of it is because you use many packages that are not in your snippet. This made me have to investigate what packages I needed to install to be able to run every line. After discovering that I had to install and use Arrow.jl, CSV.jl, and Parquet.jl, I stopped looking because there are further packages I need to install to get to the point of your error. I advice to take a look at PSA: make it easier to help you for advice on how to paste a minimal working example, it would be easier to reproduce your error and help.
Ideally we should be able to copy and paste your snippet and it should throw the error.

1 Like

Well, did you get a graph output? What did you end up getting?

I do not want the last 4 columns. Your code seems to avoid the 1st 4 columns.

The last 4 rows seem to have non-numerical data. Could you double-check?

julia> y
424×4 DataFrame
 Row │ Italy    Germany  India     United Kingdom 
     │ Any      Any      Any       Any
─────┼────────────────────────────────────────────
   1 │ 2319036  1993892  10512093  3211576        
   2 │ 2336279  2015235  10527683  3260258        
   3 │ 2352423  2023828  10542841  3316019
   4 │ 2368733  2038645  10557985  3357361
   5 │ 2381277  2050129  10571773  3395959
   6 │ 2390102  2059382  10581823  3433494
  ⋮  │    ⋮        ⋮        ⋮            ⋮
 420 │ 281583   256433   4465863   355219
 421 │ Italy    Germany  India     United Kingdom
 422 │ 41.8719  51.1657  20.5937   55.3781
 423 │ 12.5674  10.4515  78.9629   -3.436
 424 │ missing  missing  missing   missing   

Yep. That’s why I don’t want them. This df was created from an arrow file, at the end of the day. It reversed the ordering of the columns from when it was a CSV.

How can I use, from begin:end, only the numerical data?

For the dataframe posted above:

plot(filter(x -> isa(x, Number), y.Italy))

would plot the Italy column.

What if I want to plot all of them?

One way would be to iterate over all the columns:

plot()
for colname in names(y)
    plot!(filter(x -> isa(x, Number), y[!, Symbol(colname)]), label = colname, show = true)
end

Sure but my current code is good. How can I improve my existing code?

Check if this helps:

i = 1; n = rows
while i <= n
    if prod(isa.(collect((y)[i,:]),Number))==0
       delete!(y,i)
       x_axis = x_axis[1:end .!= i]
       n -= 1
    end
    i += 1
end

PS: the plot does not look very nice though:

But where in the code does this go to? Or rather, what do I replace with this?

You should place it between these two lines:

rows, cols = size(y)
....
gr(size = (900, 600))

What about my existing plot code that starts here:

@df y plot(x_axis, cols(1:cols), 

If you had tried the code you would have gotten for the second plot:

PS: most of us do not use discourse as a chat tool. There are dedicated tools for that purpose.

UndefVarError: i not defined

Which is weird because you’re clearly defining i

Sounds like problem of local vs global scope. May need to declare as global in the while loop.

I did this:

	global i = 1 
	
	n = rows
	while i <= n
    	if prod(isa.(collect((y)[i,:]),Number))==0
       		delete!(y,i)
       		x_axis = x_axis[1:end .!= i]
       		n -= 1
    	end
    	i += 1
	end

Same error. When I did while global i <= n, it threw a syntax error.