Can't get a basic plot to work (Gadfly

I think I’m using too many libraries and I can’t get two lines on the same plot.

using Pkg
Pkg.add("DataFrames")
Pkg.add("Gadfly")
Pkg.add("Dates")
Pkg.add("StatsBase")
Pkg.add("Query")
Pkg.add("Polynomials")
Pkg.add("Missings")

using DataFrames, Gadfly, Dates, LibPQ, StatsBase, Query, Polynomials

using Missings

# Connect to the database
conn = LibPQ.Connection("postgresql://postgres:mypw@localhost:5433/financials")

# Execute the SQL query to retrieve the data
df = DataFrames.DataFrame(execute(conn, """
    SELECT "Date", "Amount", "Cat2"
    FROM "2022_finances"
    WHERE "Cat1" = 'Income'
"""))

# Close the connection
close(conn)

# Convert the 'Date' column to a DateTime object
df.Date = Dates.DateTime.(df.Date)

grouped =  df |>
    @groupby(month(_.Date)) |>
    @map({Month=key(_), Amount=sum(_.Amount)}) |>
    DataFrame


# Generate the extrapolation
grouped.Month = convert(Array{Float64,1}, grouped.Month)
x = coalesce.(grouped.Month)
y = coalesce.(grouped.Amount)
p = Polynomials.fit(x, y, 3) # can't get this work without the parent class

x_float = convert(Array{Float64, 1}, collect(13:24))
y_float = p.(x_float)

# Plot the data
df_extrapolation = DataFrame(Month=x_float, Amount=y_float)
x = df_extrapolation[!, :Month]
y = df_extrapolation[!, :Amount]

p = plot()
# Plot the original data
plot(grouped, x=:Month, y=coalesce.(:Amount), Geom.line(),
Guide.xlabel("Month"), Guide.ylabel("Amount (\$)"))

# Plot the extrapolation
plot!(df_extrapolation, x=:Month, y=:Amount, Geom.line(), label="Extrapolation")

ERROR: Cannot convert DataFrame to series data for plotting
I’ve tried so many different inputs and can’t the plots to combine.

Could you provide a minimal set of data for the dataframe df ?

Yes.
df is:
246×3 DataFrame
Row │ Date Amount Cat2
│ DateTime Float64? String?
─────┼──────────────────────────────────────────────
1 │ 2022-01-27T00:00:00 1036.61 Tim
2 │ 2022-03-31T00:00:00 1500.0 John
3 │ 2022-04-28T00:00:00 1500.0 John
4 │ 2022-05-31T00:00:00 1500.0 John

Grouped is something like:
12×2 DataFrame
Row │ Month Amount
│ Float64 Float64?
─────┼────────────────────────
1 │ 1.0 146.3
2 │ 3.0 134.2
3 │ 4.0 240.7
4 │ 5.0 348.7

Super appreciative of your help.

I’m wondering if something like this would be a better approach:

using DataFrames, Dates, TimeSeries

# Load the data set into a DataFrame
df = DataFrame(Date = DateTime[DateTime("2022-01-27T00:00:00"),
                                DateTime("2022-03-31T00:00:00"),
                                DateTime("2022-04-28T00:00:00"),
                                DateTime("2022-05-31T00:00:00")],
               Amount = [1036.61, 1500.0, 1500.0, 1500.0],
               Cat2 = ["Tim", "Rentals", "Rentals", "Rentals"])

# Convert the Date column to a TimeArray
t = TimeArray(df.Date, df.Amount)

# Fit an ARIMA model to the time series
model = fit(ARIMA, t)

# Generate forecasts for the next 6 months
forecast = forecast(model, 6)

# Plot the historical data and the forecast
using Plots
plot(t, label = "Historical Data")
plot!(forecast, label = "Forecast")

# Add uncertainty bands to the forecast
fcast_mean, fcast_lower, fcast_upper = fcast(model, 6, 0.95)
fill!(fcast_mean, fcast_lower, fcast_upper, alpha = 0.3)
plot!(fcast_mean, fill = (0, :gray), label = "Uncertainty Band")

Just on your first issue, I think you are mixing up some plotting packages, e.g. plot! is not part of Gadfly (you probably have a using Plots in your REPLs history).

Be sure, that your REPL is clean, by starting a new one.

Composing a plot with several data in Gadfly you use layers, see Compositing · Gadfly.jl

You original code with the example data you provided would look like:

using DataFrames, Gadfly, Dates, LibPQ, StatsBase, Query, Polynomials

using Missings

# Connect to the database
#conn = LibPQ.Connection("postgresql://postgres:mypw@localhost:5433/financials")
# Execute the SQL query to retrieve the data
#df = DataFrames.DataFrame(execute(conn, """
#   SELECT "Date", "Amount", "Cat2"
#    FROM "2022_finances"
#    WHERE "Cat1" = 'Income'
#"""))
# Close the connection
#close(conn)
# Convert the 'Date' column to a DateTime object
#df.Date = Dates.DateTime.(df.Date)

df=DataFrame(
    "Date" => Dates.DateTime.(["2022-01-27T00:00:00", "2022-03-31T00:00:00", "2022-04-28T00:00:00", "2022-05-31T00:00:00"]), 
    "Amount" => [1036.61, 1500.0, 1500.0, 1500.0], 
    "Cat2" => ["Tim", "John", "John", "John"])

grouped =  df |>
    @groupby(month(_.Date)) |>
    @map({Month=key(_), Amount=sum(_.Amount)}) |>
    DataFrame


# Generate the extrapolation
grouped.Month = convert(Array{Float64,1}, grouped.Month)
x = coalesce.(grouped.Month)
y = coalesce.(grouped.Amount)
p = Polynomials.fit(x, y, 3) # can't get this work without the parent class

x_float = convert(Array{Float64, 1}, collect(13:24))
y_float = p.(x_float)

# Plot the data
df_extrapolation = DataFrame(Month=x_float, Amount=y_float)
x = df_extrapolation[!, :Month]
y = df_extrapolation[!, :Amount]

p1=layer(grouped, x=:Month, y=coalesce.(:Amount), Geom.line())
p2=layer(df_extrapolation, x=:Month, y=:Amount, Geom.line())
plot(p1,p2,Guide.xlabel("Month"), Guide.ylabel("Amount (\$)"))

It’s not so nice because of the limited number of samples in the example and you would use some colors for the different layers.

I am ignoring that new issue as it overloads me for now, but again it looks like the Plots package and not Gadfly. Same advice: clean REPL and decide on one plotting package. Both are great!

That is super helpful. Thanks. That makes sense now and explains why I was so confused.