Good ways to plot DataFrames content?

I have never really gotten along with DataFrames very well (and to be fair, not used it that much). I figured I should ask if there are any straightforward recipes (or methods) for plotting the content of a DataFrame?
(asking because I saw someone doing the equivalent code in pandas and he seemed to have had something relatively straightforward).

Right now, when I want to plot column c2, c3, and c4 against c1 and do:

using DataFrames, CSV, Plots
data = DataFrame(CSV.File("data.csv"))
plot(data.c1, [data.c2, data.c3, data.c4])

It works perfectly fin, but I figured there might be an actually nice way to do this. Trying to plot DataFrame columns directly seems to yield errors.

Check this other post for one option.

1 Like

Looks good.

The plotting is not the actual problem, i.e. I can create the plot just fine. However, it feel like something where it likely could/would be something built into DataFrames/Plots already. And if this is the case, I would want to use that (especially in tutorials and stuff that might be published, which here is the case).

AlgebraOfGraphics.jl is your friend if you are seeking plots of tables that read off the columns automatically with some predefined meaning. Are you aware of it?

2 Likes

Wasn’t aware, thanks a lot!

I always found Gadfly.jl to be well-suited for plotting dataframes. There has not been much active development according to github, BUT that doesn’t mean it won’t work for your purposes. To be honest though, the last time I used it was probably at least 3 years ago.

DataFramesMeta

The @with macro from DataFramesMeta.jl can make your Plots.jl command slightly shorter:

@with data plot(:c1, [:c2, :c3, :c4])

or with labels for a fair comparison to the other options below:

@with data plot(:c1, [:c2, :c3, :c4], labels=["c2" "c3" "c4"])  # no commas

AlgebraOfGraphics

AlgebraOfGraphics.jl is great for analyzing complicated tabular data, but you will quickly find it wasn’t built for simple pre-grouped wide data like yours. There is a lot of extra boilerplate:

draw(
    data(df) *  # can't use data as variable name since it is an AoG function
    mapping(
        :c1,
        [:c2, :c3, :c4],
        color = dims(1) => renamer(["c2", "c3", "c4"]),
    ) *
    visual(Lines)
)

Here are some references on how to plot wide data with AoG:


Gadfly

I think the output plots from Gadfly.jl are nice and interactive, but I’ve found it somewhat buggy and slow to respond to issues. It also requires some boilerplate to handle wide data. (These DataFrame plotting packages tend to expect you want statistical plots of tall data.)

plot(
    data,
    x=:c1,
    y=Col.value(:c2, :c3, :c4),
    color=Col.index(:c2, :c3, :c4),
    Geom.line,
)

Gadfly documentation on plotting wide data:


TidierPlots

TidierPlots.jl would be the most R-like, but I don’t have any experience with it. If you already know R, then I’d try that.

1 Like

Or TidierPlots:

:slight_smile:

2 Likes

StatsPlots is now part of Plots.jl, and has some convenient handling of columns, groups, etc.

1 Like

StatsPlots’s @df seems to be no better than DataFrameMeta’s @with since you still have to manually provide the legend entries. The syntax and output are exactly the same:

@df data plot(:c1, [:c2, :c3, :c4], labels=["c2" "c3" "c4"])

Why can none of these packages automatically find the column names for the legend?

1 Like

I have an issue tracking that on AlgebraOfGraphics here.

1 Like
@df df plot(:c1, cols([:c2, :c3, :c4]))

I found this pretty much by accident, but passing them through a cols call seems to turn that feature on.

2 Likes