AlgebraOfGraphics documentation frustrations

After undertaking one more attempt to wrap my head around AlgebraOfGraphics I started to suspect the problem is not only me being stupid, lazy, and unable to abstract logical thinking.

Ok, please help me to make some simple plots. First let’s produce some sample data

using AlgebraOfGraphics, DataFrames

# sample data
function make_y(v, n)
    m = Matrix{Float64}(undef, length(v), n)
    for i in 1:n
        e = 0.4 + i * 0.1
        m[:, i] = v .^ e
    end
    return m
end

xs = 0.0:10
m = hcat(xs, make_y(xs, 5))
nms = vcat("x", ["y$n" for n in 1:5])
df = DataFrame(m, nms)

I’d like to have

  • Lines of the same color
  • Lines differentiated in color using default palette
  • Lines differentiated in color using given palette from ColorSchemes.jl (if supported)
  • Lines differentiated in style
  • Scatter differentiated in color
  • Scatter differentiated marker form

OK, the first is easy and kind of logical:

plt = data(df) * mapping(:x, names(df)[2:end] .=> :y) * visual(Lines) |> draw

You’re starting out with a wide-format dataframe which is not the most convenient format for AlgebraOfGraphics. It does have some convenience built in for that case as well, but generally it’s a bit easier to start with long-format because then you don’t have to wrangle the multi-dimensional mappings.

Let me try to put it into high-level perspective: In AlgebraOfGraphics you use a tabular data source and specify columns you want to plot. These columns are split into groups by specifying categorical columns in mapping. Each group then becomes one “trace” or separate plot, maybe split across facets even if you use layout, row or col. But there’s another higher level of grouping and that’s the multidimensional or “wide” case. This means that you basically define a “tensor” of mappings and for each element in this tensor you do the whole pipeline of grouping by categorical columns etc. So here it’s just a one-dimensional tensor (the vector of y columns) but it can go to arbitrarily many dimensions in principle. The dimensions don’t matter much except for the dims mapping helper which is special as it makes a faux categorical variable along one (or more, but usually one) dimension of the mapping tensor. This is cool and all, but the zero-dimensional case (where all mappings are just symbols) or long-format is the simplest and should be what you start with.

So here are the examples you wanted in wide format, note the need for => renamer(ys) because the dimensions of the multidimensional input don’t automatically have names (maybe the could have them in simple cases but generally different mappings could contribute to the same dimensions). I factored out the ys variable to keep it less verbose.

xs = 0.0:10
m = hcat(xs, make_y(xs, 5))
ys = ["y$n" for n in 1:5]
nms = vcat("x", ys)
df = DataFrame(m, nms)

data(df) * mapping(:x, ys) * visual(Lines) |> draw
data(df) * mapping(:x, ys, color = dims(1) => renamer(ys)) * visual(Lines) |> draw
data(df) * mapping(:x, ys, color = dims(1) => renamer(ys)) * visual(Lines) |> draw(scales(Color = (; palette = :Set1_5)))
data(df) * mapping(:x, ys, linestyle = dims(1) => renamer(ys)) * visual(Lines) |> draw
data(df) * mapping(:x, ys, color = dims(1) => renamer(ys)) * visual(Scatter) |> draw
data(df) * mapping(:x, ys, marker = dims(1) => renamer(ys)) * visual(Scatter) |> draw

And here’s the same ones in long format. Most are simpler, only the first one needs the additional group mapping because there’s just one zigzagging line otherwise:

dfl = stack(df, ys)
rename!(dfl, :value => :y, :variable => :group)

data(dfl) * mapping(:x, :y, group = :group) * visual(Lines) |> draw
data(dfl) * mapping(:x, :y, color = :group) * visual(Lines) |> draw
data(dfl) * mapping(:x, :y, color = :group) * visual(Lines) |> draw(scales(Color = (; palette = :Set1_5)))
data(dfl) * mapping(:x, :y, linestyle = :group) * visual(Lines) |> draw
data(dfl) * mapping(:x, :y, color = :group) * visual(Scatter) |> draw
data(dfl) * mapping(:x, :y, marker = :group) * visual(Scatter) |> draw
4 Likes

@jules, thank you for your explanations.

My questions were just one side of my post.

Another side is the package documentation, which, while looking beautifully, is apparently of little help for a user like me.

I could probably help to improve it by continuing asking naive questions, and maybe providing some specific suggestions, but surely I can’t re-write the docs on my own.

Now some specific notes.


Knowledge of long and wide formats is assumed as given. In the very beginning, the docs just says:

…“tidy” (long format) tables as input … “Tidy” tables are the most common input type, but wide data, pregrouped arrays, and other input types are also supported.

As of this evening, I understand what long vs wide format means, but actually it’s the first time I’m confronted with it, despite decades of experience in sciences and engineering. Just I’m not a statistician.

As I was not clear about the data format expected by the package, I couldn’t really proceed any further.


The mapping reference doesn’t really explain what are the positional arguments, and which named arguments are accepted, what data types they take, and how are they proceeded further.


Should we continue? I understand, reworking the documentation is a substantial effort, assuming “somebody” has a capacity and desire.

Or maybe I’m just the wrong type of user, had wrong expectations, and AoF is not for me. Then just please replace the phrase in the introduction

No familiarity with plotting or data analysis in Julia is required

by

You are expected to have some prior knowledge in R, tidy, you name it…

P.S. I definitely highly value the work of Makie creators. Sorry for sounding frustrated - it is just because I am :frowning: