Why is the histogram for the WildType normalised data differently when plotted with other data compared to being plotted alone?
This plot command
plot(df[df.genotype .== "WildType", :], x=:level, Geom.histogram(bincount=200), Scale.x_log10, color=:genotype, Coord.cartesian(xmin=-2, xmax=1, ymin=0, ymax=8000)
produces this plot
while this plot
plot(df, x=:level, Geom.histogram(bincount=200), Scale.x_log10, color=:genotype, Coord.cartesian(xmin=-2, xmax=1, ymin=0, ymax=8000))
produces
The WildType data are roughly 3x larger in the second plot. This does not happen to the KO data if I plot it by itself. Am I missing somethong about how Gadfly histograms work?
Here is a useable example the reproduces the behaviour.
using Distributions, Gadfly, DataFrames
d1 = Normal(0, 1)
d2 = Normal(1, 0.5)
x = rand(d1, 10000)
df = DataFrame(x=x, class="Class 1")
x = 0.5 .* rand(d2, 10000)
append!(df, DataFrame(x=x, class="Class 2"))
spike = -2.0 .* ones(500)
append!(df, DataFrame(x=spike, class="Class 2"))
h = plot(df[df.class .== "Class 1", :],
x=:x,
color=:class,
Geom.histogram(bincount=200),
Coord.cartesian(xmin=-4, xmax=4, ymin=0, ymax=1000)
)
display(h)
h = plot(df,
x=:x,
color=:class,
Geom.histogram(bincount=200),
Coord.cartesian(xmin=-4, xmax=4, ymin=0, ymax=1000)
)
display(h)
This produces:
It looks like it has plotted Class 1 as the sum of Class 1 and Class 2. Is this to be expected? Is there a way to turn this behaviour off?
@evan-wehi try to reach the Gadfly.jl maintainers in their community channels. They use Gitter instead of Discourse.
1 Like
My bad - I missed it in the documentation. This behaviour is intended and is controlled by the position
argument in Geom.histogram
. I need to use position=:dodge
.
1 Like