Stacked bar graphs


#1

Hello all, I’m quite new to Julia and was a bit lost on how to create Stacked Bar graphs. I’d like to have ideally 8 bars, grouped in pairs. Each of these bars would have 5 stacks that add up to a 100%- this is basically plotting a variance decomposition. So it would look like 4 sets of 2 bars each, each grouped pair of bars would have the same repeating label of “Conditional” and “Unconditional Variance”, and the stacks would represent shocks/errors that have their own titles. Each pair would represent testing a certain model, i.e. 4 models being tested in total.

I’d appreciate any and all help, thank you!


#2

In Gadfly:

import Cairo
using DataFrames, Gadfly, RDatasets
D = dataset("datasets","HairEyeColor")
palette = ["brown","blue","tan","green"]

p = plot(D, x=:Sex, y=:Freq, color=:Eye, xgroup=:Hair,
    Geom.subplot_grid(Geom.bar(position=:stack)),
    Scale.color_discrete_manual(palette...), 
    Guide.xlabel("Hair color"))

# draw(PNG("haireyecolor.png", 6.6inch, 4inch), p)
draw(PNG(6.6inch, 4inch), p)

haireyecolor

For more Gadfly, see the plot gallery.


#3

An alternative is to use StatPlots: https://github.com/JuliaPlots/StatPlots.jl#grouped-bar-plots


#4

I’ve seen this and attempted it with StatPlots, but I’m not sure where to input the values (manually) for the stacks? And to group them in pairs?


#5

great, thank you! If I wanted to input the values manually, how would this be arranged? I.e. in what order?


#6

I was looking at attempting what you did in this: https://discourse.julialang.org/t/plots-bars-side-by-side but with manually adding stacks and putting in these values for each bar having 5 stacks.


#7

Interestingly enough I think StatPlots and StatsMakie are both a bit ill-equipped to both stack and dodge bars at the same time. It should be possible to add this feature to StatsMakie with the new grouping API though.


#8

@svpillai The easiest way to enter data is to make a DataFrame. e.g.

# Set up the DataFrame
D = [[x model prop] for x in ["Conditional", "Unconditional"], 
        model in "Model".*["1","2","3","4"], prop in 1:5]
 D = DataFrame(vcat(D...), [:Variance, :Model, :Prop])

# Now enter the corresponding values in a new column like this:
D[:values] = [20, 40, ]

For more DataFrames, see the DataFrames docs.


#9

Yes, but you can do it manually by defining the matrix values for groupedbars and passing the x positions to x. In this case I find the gadfly way nicer, though.


#10

Here is the VegaLite.jl way:

using VegaLite, RDatasets

df = dataset("datasets", "HairEyeColor")

df |> @vlplot(:bar, x=:Sex, y=:Freq, color=:Eye, column=:Hair)

and you get:
visualization


#11

this is perfect, thank you!


#12

Hi @davidanthoff, I tried this exact same code (all packages added), and the codes ran without error messages but on Atom/Juno’s Plots box, it just displayed an empty black screen. I tried creating my data frame as well, but it showed an empty box on the Plots screen and displayed messages saying “Invalid field type “undefined” for channel x” and the same for y, color, and column. I’d appreciate any help on this.


#13

So when I run exactly the code I posted above in Juno/Atom, I get a plot with a black background that is hard to read. When I change things to df |> @vlplot(:bar, x=:Sex, y=:Freq, color=:Eye, column=:Hair, background=:white), all looks as it should. Does that work for you? Or are you seeing a completely black plot pane?


#15

This worked perfectly somehow once I restarted Juno. However, when I use my own data frame with @Mattriks’s method above, I get this warning (and then an empty plot):
WARN Invalid field type “undefined” for channel “x”, using “quantitative” instead.
WARN Invalid field type “undefined” for channel “color”, using “nominal” instead.
WARN Invalid field type “undefined” for channel “y”, using “quantitative” instead.

What am I doing wrong here?


#16

@Mattriks every time I try this code, and/or versions of it with other datasets, everything runs fine until the very end with the plot command, I get the following error:
ERROR: UndefVarError: plot not defined
Stacktrace:
[1] top-level scope at none:0

Is this a bug? I’m using Juno/Atom Julia v 1.0.2.


#17

The code from @Mattriks creates a DataFrame where the columns have an element type of Any. That is not ideal, because it means that VegaLite.jl can’t determine what type of values are in each column, and the warning you are seeing essentially means “I’m going to guess a type for each column”. There is in general nothing wrong with that, i.e. it seems to guess the right column formats here.

If you want to get rid of these messages, there are two ways: 1) create a DataFrame with typed columns, or 2) manually tell VegaLite.jl what type of encoding you want to use for each channel. That would look like this:

df |>
  @vlplot(:bar, 
    x="col1:q", 
    y="col2:q", 
    color="col3:n", 
    column="col4:n")

Note how I’m now passing strings instead of symbols, and in addition to passing the name of the column, I’m also passing information what type of encoding this should be, after the colon :. The details of that are documented here. So I think you need to make sure that you pick the right type there for each encoding.

But overall, I would try to go with option 1). A DataFrame with Any columns is generally very inefficient.


#18

Should VegaLite.jl do something like identity.(x) as an intermediate step so the eltype becomes narrower?


#19

That would add another pass over all the input data, and in some cases that might end up being very inefficient… There is probably some things one could do that would be safe and would help a bit, but my gut feeling is that really we should steer folks towards not using Any columns, as really everything will be inefficient with them…


#20

@davidanthoff This worked perfectly, thank you so much!


#22

Like this:

df |>
  @vlplot(:bar,
    x={:Sex, title=nothing},
    y={:Freq, title=nothing}, 
    color={:Eye, title=nothing}, 
    column={:Hair, title=nothing}
  )

The trick is that you now pass a composite value in curly brackets {} to x etc., and then you can configure more details there. All the config options for axis title are described here.

I think you need to configure your scales a bit for that. It might be enough to configure the scale to not try to come up with nice numbers on the axis like this (documented here):

 df |>
  @vlplot(:bar, x=:Sex, y={:Freq, scale={nice=false}}, color=:Eye, column=:Hair)

But you might also have to additionally configure your domain:

 df |>
  @vlplot(:bar,
    x=:Sex,
    y={:Freq, scale={domain=[0,100], nice=false}},
    color=:Eye, column=:haircut_woman
  )

I’m not entirely sure why this doesn’t work automatically. I suspect, that maybe there is a small numerical rounding error that pushes some value just slightly above 100, and then vega-lite thinks it needs to extend the axis or something like that…

You can specify a sort property for the encoding, documented here:

 df |>
  @vlplot(:bar,
    x=:Sex,
    y=:Freq,
    color=:Eye,
    column={:haircut_woman, sort=["Red", "Blond", "Brown", "Black"]}
  )