Apparent inconsistent plotting of histograms in Gadfly

Why is the histogram for the WildType normalised data differently when plotted with other data compared to being plotted alone?
This plot command

plot(df[df.genotype .== "WildType", :], x=:level, Geom.histogram(bincount=200), Scale.x_log10, color=:genotype, Coord.cartesian(xmin=-2, xmax=1, ymin=0, ymax=8000)

produces this plot

while this plot

plot(df, x=:level, Geom.histogram(bincount=200), Scale.x_log10, color=:genotype, Coord.cartesian(xmin=-2, xmax=1, ymin=0, ymax=8000))

produces

The WildType data are roughly 3x larger in the second plot. This does not happen to the KO data if I plot it by itself. Am I missing somethong about how Gadfly histograms work?

Here is a useable example the reproduces the behaviour.

using Distributions, Gadfly, DataFrames

d1 = Normal(0, 1)
d2 = Normal(1, 0.5)

x = rand(d1, 10000)
df = DataFrame(x=x, class="Class 1")

x = 0.5 .* rand(d2, 10000)

append!(df, DataFrame(x=x, class="Class 2"))

spike = -2.0 .* ones(500)
append!(df, DataFrame(x=spike, class="Class 2"))

h = plot(df[df.class .== "Class 1", :], 
  x=:x, 
  color=:class, 
  Geom.histogram(bincount=200), 
  Coord.cartesian(xmin=-4, xmax=4, ymin=0, ymax=1000)
  )
display(h)

h = plot(df,
  x=:x, 
  color=:class, 
  Geom.histogram(bincount=200), 
  Coord.cartesian(xmin=-4, xmax=4, ymin=0, ymax=1000)
  )
display(h)

This produces:


It looks like it has plotted Class 1 as the sum of Class 1 and Class 2. Is this to be expected? Is there a way to turn this behaviour off?

@evan-wehi try to reach the Gadfly.jl maintainers in their community channels. They use Gitter instead of Discourse.

1 Like

My bad - I missed it in the documentation. This behaviour is intended and is controlled by the position argument in Geom.histogram. I need to use position=:dodge.

1 Like