Plot a column of 1D arrays from DataFrame using Gadfly

I have a DataFrame with columns df.x and df.y. Each row has a 1D array.

Can I use Gadfly.jl to plot this data simply? Seems I need to extract each row one-at-a-time first, as in

x = df.x[1]
y = df.y[1]
plot(x=x, y=y, Geom.line)

Is there something condensed similar to the following that could work to get each row in a plot?

plot(df, x=:x, y=:y, Geom.line)

that’s not an idiomatic use of data frames, so no, gadfly does not support that. try flattening the data frame by putting the vector elements into separate rows.

1 Like

Thank you! It seems like that helps. I also have a column of ID codes for each row (:id). So the following works:

plot(flatten(df,[:x,:y]), x=:x, y=:y, color=:id, Geom.line)

Followup question:
If I don’t specify color, it comes out a mess tracing a single line over all the data.

How can I get a separate line for each ID but all the same color?

try Scale.color_discrete_hue. Scales · Gadfly.jl

How can I get a separate line for each ID but all the same color?

Try e.g. group=:id, color=[colorant"black"]

1 Like

That’s just what I needed, thanks!

I’ll also add that the color palettes from Plots.jl work with this. It seems palettes aren’t natively part of Gadfly.jl (e.g. Crameri color maps)? The commands that work best for me in the case where I have 3 columns representing y axes to plot are:

import Plots:palette
using Gadfly

pal = palette(:batlow,3)
cols = [:y1, :y2, :y3]
plot_stack = 
    [plot(flatten(df, [:x, cols[idx]]), x=:x, y=cols[idx], group=:id, color=[pal[idx]], Geom.line) for idx in 1:3]
title(vstack(plot_stack),"Title",fill(colorant"black"))

In Julia, colorschemes come from the package ColorSchemes.jl.
In Gadfly, we don’t add ColorSchemes as a dependency.
ColorSchemes,jl doesn’t have a Plots-like palette function, but you can roll your own:

import ColorSchemes as cschemes
palette(cs, n) = cschemes.colorschemes[cs][range(0,1,n)]
# palette(:batlow, 3)

If your plot_stack has the same x-axis, you could also use Geom.subplot_grid e.g.

using RDatasets

iris = dataset("datasets", "iris")[:,Not(4)]
iris[!,:x] = 10*rand(150)
df = stack(iris, Not([:x, :Species]))

plot(df, x=:x, y=:value, group=:Species, ygroup=:variable,color=:variable,
    Geom.subplot_grid(Geom.line, free_y_axis=true),
    Scale.color_discrete_manual(palette(:batlow, 3)...)
)

1 Like

When I was browsing the docs, I came across the title method using vstack and hstack first before seeing the title guide. The method works well with the array comprehension over the flattened columns. subplot_grid looks good, too, but I’m not sure how to use it succinctly with my df that needs to be flattened over the cols to plot properly. Do you have a suggestion on how to use subplot_grid with this, so I can use just the bottom x-axis? Seems to me that I’d need to add all the flattened data from each of my y-columns to the same column, and label each row by its previous column name for ygroup.

I’m also interested in applying transparency to the lines. I see in an older post, there was no alpha aesthetic for Geom.line. This is still the case? Is there a convenient way you’d recommend adding that with the named color scheme?

Seems to me that I’d need to add all the flattened data from each of my y-columns to the same column, and label each row by its previous column name for ygroup.

Yes, and that’s what stack (in DataFrames.jl) does (also used in my example above).

Unfortunately the alpha aesthetic has not yet been implemented for Geom.line. The issue lies not in Gadfly, but in the graphics package that Gadfly is built on (Compose.jl). See this related post.

Is there a convenient way you’d recommend adding that with the named color scheme?

If you mean to use with Geom.line, that wouldn’t work until strokeopacity in Compose.jl is fixed.

1 Like