Scatterplot with marginal histograms

Any idea how to do scatterplot with marginal histograms in Gadfly or Plots. Preference is Gadfly. Found similar post in forum below, but it was done in R/ggplot2. Technically, each histograms may have (normalized) cumulative distribution line (not shown in image below).

In Gadfly, something (less) similar can be done with Guide.xrug and Guide.yrug. It is possible that subplot_grid or gridstack functions might do it. Histograms make it perfect.



Image from the forum.

Thanks.

Check out cornerplot and corrplot from StatsPlots.jl

2 Likes

Thanks for the pointer. Will give StatsPlots.jl a try. Seems “marginalhist” is what I need.
Any idea if this is doable in Gadfly?

1 Like

Sorry, have never used gadfly.

In Gadfly, it’s possible to write custom Guides. There’s an example here: https://github.com/Mattriks/Heatmap.jl. There’s plenty of scope for adding such functionality. The current roadmap is here.

And a new example: marginal histograms in Gadfly.

I was able to do it with VegaLite.jl (which has a grammar of graphics like environment). Pretty verbose, though:

using VegaLite, VegaDatasets

p_hist1 = @vlplot(
    mark={:bar, color=:lightgray, stroke=:black},
    width=400, height=40,
    x={:IMDB_Rating, bin=true, axis=nothing},
    y={"count()", title=nothing, axis=nothing},
    view={stroke=:transparent}
)

p_scatter = @vlplot(
    mark={:point, color=:black},
    x={:IMDB_Rating, axis={grid=false}},
    y={:Rotten_Tomatoes_Rating, axis={grid=false}},
    view={stroke=:black},
    width=400, height=400
)

p_hist2 = @vlplot(
    mark={:bar, color=:lightgray, stroke=:black},
    width=40, height=400,
    y={"Rotten_Tomatoes_Rating:q", bin=true, axis=nothing},
    x={"count()", title=nothing, axis=nothing},
    view={stroke=:transparent}
)

dataset("movies") |>
@vlplot(spacing=15, founds=:flush, background=:white) + [p_hist1; @vlplot(spacing=15, bounds=:flush) + [p_scatter p_hist2]]

It looks like this:

My current plan is to add shortcuts for plots like this to QuickVega.jl, so that one can create this kind of plot with a simple function call. Just need to noodle a bit about the API.

3 Likes

In Plots you could do this in several ways - one way is to open the layout and plot to it. That would be something like

using Distributions, Plots
x, y = rand(Normal(), 300), rand(TDist(2), 300)

layout = @layout [a            _
                  b{0.8w,0.8h} c]

default(fillcolor = :lightgrey, markercolor = :white, grid = false, legend = false)
plot(layout = layout, link = :both, size = (500, 500), margin = -10Plots.px)
scatter!(x,y, subplot = 2, framestyle = :box)
histogram!([x y], subplot = [1 3], orientation = [:v :h], framestyle = :none)

6 Likes

Couldn’t see anything related (marginal histogram) on the roadmap, but the Heatmap.jl is pretty interesting for other use. Thanks.

Cool! Thanks @davidanthoff for sharing this. Seems a pretty matured product. I would investigate/ experiment VegaLite.jl. Yesterday, I also learned that there is Gnuplot.jl which linked to Gnuplot, which I’ve used in the past. Just realized that Julia have many choices in visualization space - Plots, Gaston, Gadfly, VegaLite, Gnuplot, and more.

Thanks @mkborregaard for sharing this. This is a neat solution as it uses pure Julia language.

On slightly different topic, Gadfly, like ggplot2 in R, by default, produced beautiful (or colorful) plots. Other plotting packages (equal or more capable/ features), e.g. VegaLite.jl (@davidanthoff above), Gnuplot.jl and Plots.jl, by default output dull (black and white) plots. Of course, with a bit of tweak, the plot will be colorful, but still feel missing “anti-alias”. These are the differentiators, OR could be just my biases.

The reason the plot produced by Michael’s example is dull is that he chose the parameters to match the appearance of the plot you provided in the first post. If you omit the quoted line above, the plot will be colorful.

1 Like

I did change the color with that liner, but didn’t know that this line can be commented out. Will try/ use Plots.jl in next plotting job. Thanks.

Having explored other visualization tools/ packages today, following some of the recommendations above, I found there are way more can be done with visualization without compromising quality. :innocent:

Exactly, most of that code was formatting code to achieve the look of your image. The plot really only needs the link attribute (linking the axes of subplots), and the orientation attribute (to flip the last histogram). The layout and other things are just for prettyness. The simplest way would be a one-liner

using Distributions, Plots
x, y = rand(Normal(), 300), rand(TDist(2), 300)

plot(    histogram(x),     plot(framestyle = :none),
         scatter(x,y),     histogram(y, orientation = :horizontal),
     link = :both)


Of course you can tweak this in all kinds of ways to get the look you want.

3 Likes

For the VegaLite.jl example I actually had to manually tweak some of the parameters to make it match the example you had originally posted, the default is nice and colorful and not black and white :slight_smile: Same for Plots.jl, I believe.

The only format one can upload here to the forum seems PNG images, which is really not ideal for plots because things can look a bit pixely. You can export VegaLite.jl figures to png, svg, pdf, vegalite or html, and everyone format other than PNG will actually look much better because it will be a vector format.

And there are even more: Makie.jl, Plotly.jl, Winston.jl, at least one PGF wrapper (if not two), Vega.jl, PyPlot.jl and I’m sure I still forgot some. Many of them are really very useful and feature rich at this point, I think plotting is actually in pretty good shape on Julia now.

I think actually the only package that can truly claim to be pure Julia is probably UnicodePlots.jl, I think pretty much every other plotting package uses non-Julia code at least for some things. My guess is that the next most pure Julia packages are Gadfly.jl and Winston.jl, as far as I know they only use cairo (the C library) for export to various image formats. Then maybe Makie.jl, which uses a GL (C based) or cairo for rendering, as far as I can tell, but is otherwise mostly Julia? I think all other packages (including Plots.jl) use even more non Julia. For example, the default backend for Plots.jl is GR (implemented in C), and I think most (if not all) other popular backends are also not Julia. The vega family of packages obviously also uses a lot of non Julia code etc.

On some level I think this is good: creating a robust plotting solution from scratch is clearly a many person many year project, and at the same time there are excellent non-Julia solutions out there. If we can make them available with a lot less work, hurray!

Another aspect is that I think I wouldn’t worry at all whether a package uses non-Julia code or not, as long as deployment is reliable and it supports the same platforms as Julia does. That situation has improved a lot with the whole artifacts story.

1 Like

I’ve (today) searched Internet for Julia plotting packages, read posts in various forums to understand these packages, including the packages you mentioned above. Vegalite and Makie are two interesting plotting packages that I might invest more time. Vegalite for the GOG-like and Makie (require extra package to make it GOG-like) for seemingly exciting future.

Yes, I realized PDF output is equally impressive as well.

For the sake of completeness, here’s the Gnuplot.jl (master branch, new release is coming soon) solution:

using Gnuplot

# Generate random numbers
x = randn(1000);
y = randn(1000);

# Fraction of the figure devoted to main plot
frac = [0.8, 0.75]

# Gap between main plot and histograms
gap  = 0.015

# Main plot
@gp "set multiplot" linetypes(:Set1_5, lw=1.5, ps=1.5) :-
@gp :- 1 rmargin=frac[1] tmargin=frac[2] :-
@gp :-   x y "w p notit" xlab="X" ylab="Y"
margins = gpmargins()
xr = gpranges().x
yr = gpranges().y

# Histogram on X
h = hist(x, range=xr, nbins=10)
@gp :- 2  "unset margins" "set xtics format ''" "set ytics format ''"  xlab="" ylab="" :-
@gp :-   bmargin=frac[2]+gap rmargin=frac[1] lmargin=margins.l :-
bs = fill(h.binsize, length(h.bins));
@gp :-   xr=xr h.bins h.counts./2 bs./2 h.counts./2 "w boxxyerror notit fs solid 0.4" :-

# Histogram on Y
h = hist(y, range=yr, nbins=10)
@gp :- 3 "unset margins" "unset xrange" :-
@gp :-   lmargin=frac[1]+gap  tmargin=frac[2] bmargin=margins.b  :-
bs = fill(h.binsize, length(h.bins));
@gp :-   yr=yr h.counts./2 h.bins h.counts./2 bs./2 "w boxxyerror notit fs solid 0.4"

3 Likes

Wondering, for a quick solution with the code that I have (no time now to learn Vegalite or Makie yet), would Gadfly be able to repeat this plot?

@gcalderone Yes, I’m aware of Gnuplot.jl, which was announced recently. Thanks for sharing the code. Appreciate it.