Julia not able to convert Array to a series data for plotting

I am trying to plot a dataframe I created from an existing dataframe. Basically I just took about 5 rows and all N-4 columns and created a new dataframe. I wanted to plot this but Julia (on Pluto.jl) is throwing this error:

Cannot convert Array{Any,2} to series data for plotting

error(::String)@error.jl:33
_prepare_series_data(::Array{Any,2})@series.jl:8
_series_data_vector(::Array{Any,2}, ::Dict{Symbol,Any})@series.jl:27
macro expansion@series.jl:144[inlined]
apply_recipe(::AbstractDict{Symbol,Any}, ::Type{RecipesPipeline.SliceIt}, ::Any, ::Any, ::Any)@RecipesBase.jl:282
_process_userrecipes!(::Any, ::Any, ::Any)@user_recipe.jl:36
recipe_pipeline!(::Any, ::Any, ::Any)@RecipesPipeline.jl:70
_plot!(::Plots.Plot, ::Any, ::Any)@plot.jl:172
#plot#129@plot.jl:58[inlined]
#add_label#17(::Base.Iterators.Pairs{Symbol,Any,NTuple{13,Symbol},NamedTuple{(:label, :xlabel, :ylabel, :xticks, :xrotation, :marker, :line, :legend, :grid, :framestyle, :legendfontsize, :tickfontsize, :formatter),Tuple{Array{String,2},String,String,Array{Dates.Date,1},Int64,Tuple{Symbol,Int64},Tuple{Symbol,String},Symbol,Bool,Symbol,Int64,Int64,Symbol}}}, ::typeof(StatsPlots.add_label), ::Array{Any,1}, ::Function, ::Array{Dates.Date,1}, ::Vararg{Any,N} where N)@df.jl:155
(::Main.workspace140.var"#1#3")(::DataFrames.DataFrame)@range.jl:0
top-level scope@Local: 17

This is the cell that is throwing this error:

begin
    
    countries = ["Italy", "Germany", "India", "United Kingdom"];
    y = DataFrame() # empty dataframe

    for country in countries    
        data_dfr = get_country(df,country); # returns a dataframe row 
        data_dfr = DataFrame(data_dfr);           # convert dataframe row back to a                                                                             dataframe
        df_rows, df_cols = size(data_dfr);
        data_dfl = stack(data_dfr, 5:df_cols);       # convert dataframe into long                                                                                  format
        y[!,Symbol("$country")] = data_dfl[!,:value]
    end

    rows,cols = size(y)
    
    gr(size=(900,600))
    @df y plot(x_axis, cols(1:cols), 
        label =  reshape(names(y),(1,length(names(y)))),
        xlabel = "Time",
        ylabel = "Total number of reported cases",
        xticks = x_axis[1:7:end],
        xrotation = 45,
        marker = (:diamond,4),
        line = (:line, "gray"),
        legend = :topleft,
        grid = false,
        framestyle = :semi,
        legendfontsize = 9,
        tickfontsize = 9,
        formatter = :plain
        )
    
    y.One_million = Array{Union{Missing,Float64},1}(missing,size(y,1));
    y.One_million .= 10^6.0;
    
    display(@df y plot!(x_axis, y[!,cols+1],
           linestyle = :dot,
           linewidth = 5,
           color = :red,
           label = names(y)[cols+1]))
    
    y = select!(y, Not([:One_million])); 
end

Here are the functions and variables used in the above code:

begin
    dates = names(df)[begin:end-4]
    date_format = Dates.DateFormat("m/d/y")
    x_axis = parse.(Date, dates, date_format) .+ Year(2000)
end
function get_country(dataframe, country::String)
    df_country = dataframe[ismissing.(dataframe[!, Symbol("Province/State")]), :]
    indx = findfirst(df_country[!, Symbol("Country/Region")] .== country)
    return df_country[indx, :]
end
begin
	dataset = download("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv","covid_19_global_data.csv")
	
	df = CSV.read("covid_19_global_data.csv", DataFrame)
	write_parquet("covid_19_global_data.parquet", df)
	df = DataFrame(read_parquet("covid_19_global_data.parquet"))
	Arrow.write("covid_19_global_data.arrow", df)
	df = DataFrame(Arrow.Table("covid_19_global_data.arrow"))
		
end

I couldnโ€™t run your code because df is not defined. Could you try and define the rest of the variables?

1 Like

Done. Check the edits.

Hmmm. I tried for 20 minutes to replicate your problem but I couldnโ€™t. Part of it is because you use many packages that are not in your snippet. This made me have to investigate what packages I needed to install to be able to run every line. After discovering that I had to install and use Arrow.jl, CSV.jl, and Parquet.jl, I stopped looking because there are further packages I need to install to get to the point of your error. I advice to take a look at PSA: make it easier to help you for advice on how to paste a minimal working example, it would be easier to reproduce your error and help.
Ideally we should be able to copy and paste your snippet and it should throw the error.

1 Like

Well, did you get a graph output? What did you end up getting?

I do not want the last 4 columns. Your code seems to avoid the 1st 4 columns.

The last 4 rows seem to have non-numerical data. Could you double-check?

julia> y
424ร—4 DataFrame
 Row โ”‚ Italy    Germany  India     United Kingdom 
     โ”‚ Any      Any      Any       Any
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚ 2319036  1993892  10512093  3211576        
   2 โ”‚ 2336279  2015235  10527683  3260258        
   3 โ”‚ 2352423  2023828  10542841  3316019
   4 โ”‚ 2368733  2038645  10557985  3357361
   5 โ”‚ 2381277  2050129  10571773  3395959
   6 โ”‚ 2390102  2059382  10581823  3433494
  โ‹ฎ  โ”‚    โ‹ฎ        โ‹ฎ        โ‹ฎ            โ‹ฎ
 420 โ”‚ 281583   256433   4465863   355219
 421 โ”‚ Italy    Germany  India     United Kingdom
 422 โ”‚ 41.8719  51.1657  20.5937   55.3781
 423 โ”‚ 12.5674  10.4515  78.9629   -3.436
 424 โ”‚ missing  missing  missing   missing   

Yep. Thatโ€™s why I donโ€™t want them. This df was created from an arrow file, at the end of the day. It reversed the ordering of the columns from when it was a CSV.

How can I use, from begin:end, only the numerical data?

For the dataframe posted above:

plot(filter(x -> isa(x, Number), y.Italy))

would plot the Italy column.

What if I want to plot all of them?

One way would be to iterate over all the columns:

plot()
for colname in names(y)
    plot!(filter(x -> isa(x, Number), y[!, Symbol(colname)]), label = colname, show = true)
end

Sure but my current code is good. How can I improve my existing code?

Check if this helps:

i = 1; n = rows
while i <= n
    if prod(isa.(collect((y)[i,:]),Number))==0
       delete!(y,i)
       x_axis = x_axis[1:end .!= i]
       n -= 1
    end
    i += 1
end

PS: the plot does not look very nice though:

But where in the code does this go to? Or rather, what do I replace with this?

You should place it between these two lines:

rows, cols = size(y)
....
gr(size = (900, 600))

What about my existing plot code that starts here:

@df y plot(x_axis, cols(1:cols), 

If you had tried the code you would have gotten for the second plot:

PS: most of us do not use discourse as a chat tool. There are dedicated tools for that purpose.

UndefVarError: i not defined

Which is weird because youโ€™re clearly defining i

Sounds like problem of local vs global scope. May need to declare as global in the while loop.

I did this:

	global i = 1 
	
	n = rows
	while i <= n
    	if prod(isa.(collect((y)[i,:]),Number))==0
       		delete!(y,i)
       		x_axis = x_axis[1:end .!= i]
       		n -= 1
    	end
    	i += 1
	end

Same error. When I did while global i <= n, it threw a syntax error.