How can I make this plot?

I came across this article on Freakonometrics and it has the following really nice plot in it (generated with R, I believe):

image

Has anyone seen anything similar plotted in Julia or does anyone know if this is at least possible with Julia?

1 Like

The 3D-waterfall is an easy task for PGFPlotsX.jl when I’m back in office I will post an example.

2 Likes

Awesome, I think I’ve heard of the package but I’ve not actually used it. It looks really cool!

Here we go:

using Random
using Distributions
using PGFPlotsX


Random.seed!(42)
#Generate Data
σ = 2      # std dev
x_min = -10 # xrange to plot
x_max = 10
μ_min = -5
μ_max = 5

dist = (μ, σ) ->  Normal(μ, σ)      # define the distribution
dists = [dist(μ, σ) for μ in -5:1:5]       # The set of distributions we're going to use
rnd = rand.(Truncated.(dists, x_min, x_max),10)  # Generates norm distributed data
dat_pdf = [(x) ->  pdf(d,x) for d in dists ] # Get the pdf of the dists

 x_pnts = collect(x_min:0.2:x_max)

# first mak a axes object
axis = @pgf Axis(
        {
            width = raw"1\textwidth",
            height = raw"0.4\textwidth",
            ymajorgrids,        # plot only gridlines in y direction
            xmax = x_max,         # set the plot range
            xmin = x_min,
            zmin = 0,
            "axis background/.style={fill=gray!10}",    # add some beauty
            "set layers",   # this is needed to make the scatter points appear behind the graphs
            view = raw"{60}{60}", # viewpoint
        },
        );

# first the jellow area at the bottom
@pgf y = Plot3(
        {
            "no marks",
            style ="{dashed}",
            color = "black",
            fill = "yellow",
            "fill opacity = 0.65",
            "on layer" = "axis background", # so we can see the grid lines trought
        },
            Table(x=[μ_min-σ, 0, μ_max+σ, 0], y=[length(rnd), 0, 0, length(rnd)],z=[0, 0, 0, 0] ),
            raw"\closedcycle",
        )
        push!(axis,y) # append the area to the axis

# second the scatter dots
@pgf for i in eachindex(dists)
        s = Plot3(
            {
                "only marks",
                color = "red",
                "mark options" = raw"{scale=0.4}",
                "mark layer" = "like plot",     # set the markers on the same layer as the plot
                "on layer" = "axis background",
            },
                Table(x = rnd[i], y= (length(dists)-i) *ones(length(rnd[i])), z=zeros(length(rnd[i].+0.1)) )
            )
            push!(axis,s)

    if i%2 ==0
        d= Plot3(
            {
                "no marks",
                style ="{thick}",
                color = "blue",
                "fill opacity=0.25",
                fill = "blue",
            },
                Table(x = x_pnts, y= (length(dists)-i) *ones(length(x_pnts)), z=dat_pdf[i](x_pnts) ),

            )
        push!(axis,d)
    end


    end
plot = @pgf TikzPicture({"scale"=>2},axis)

PGFPlotsX.save("plot.png",plot)

The reason why I like PGFPlotsX is because you stay close to latex code, which makes it easy to adopt examples for latex.

33 Likes

This is amazing :clap::clap::clap:

This is indeed very nice. Please consider contributing this to the PGFPlotsX gallery.

2 Likes

Request is open:
https://github.com/KristofferC/PGFPlotsX.jl/pull/191

2 Likes

That was the whole idea :slight_smile: Glad to see it working so well!

4 Likes

Here’s something similar that I made with Plots.jl:

cool_plot

using DataFrames
using Distributions
using GLM
using Random
using StatsBase
using StatsPlots
pyplot()

# Generate some funky heteroscedastic data
data = DataFrame(y =[rand(TruncatedNormal(1, .4n, n/2, 2n)) for n in 1:100], x=collect(1:100))

# Model the data
ols = lm(@formula(y ~ x), data)

# Get cutoff points to group data (we’ll use these to generate the distributions)
groups = vcat(0, [percentile(data.x, n) for n in 10:20:100])

dists = [
    fit(Normal, [data.y[i] for i in 1:length(data.y) if groups[j - 1] < data.x[i] < groups[j]])
    for j in 2:length(groups)
]

xmin = minimum(data.y)
xmax = maximum(data.y)
xrange = collect(xmin:1:xmax)

p = plot(zeros(length(xrange)) .+ groups[2], xrange, [pdf(dists[1], x) for x in xrange])

popfirst!(groups)

for i in 2:length(dists)
    plot!(
        zeros(length(xrange)) .+ groups[i],
        xrange,
        [pdf(dists[i], x) for x in xrange],
        legend = false
    )
end

# Add scatter points
plot!(
    data.x,
    data.y,
    seriestype = :scatter,
    markersize = 2,
    markerstrokewidth = 0,
    markeralpha = 0.8
)

# Add regression line
plot!(data.x, predict(ols), line=:line)
10 Likes

Thanks for working through the PR revision process with us, now it is one of the highlights of the gallery in the manual:

https://kristofferc.github.io/PGFPlotsX.jl/dev/examples/gallery.html#D-Waterfall-1

4 Likes

As you may have noticed, my knowledge with git and its processes is still very basic. So thanks for your patience and for helping me pulling this!

3 Likes

I cleaned up/improved my code a bit and posted it to a github repo for easy access:

New result:

output

And for a plot of the residuals:

residuals

Unfortunately, the fill keyword argument doesn’t seem to work with the 3D format. It would be nice to figure that part out…I might start a new thread for that. Anyways, here’s the code:

using DataFrames
using Distributions
using GLM
using Random
using StatsBase
using StatsPlots

# Set plots backend to PyPlot
pyplot()

# Generate some funky heteroscedastic data
data = DataFrame(y =[rand(TruncatedNormal(1, .4n, n/2, 2n)) for n in 1:100], x=collect(1:100))

# Model the data
ols = lm(@formula(y ~ x), data)

# Get cutoff points to group data (we’ll use these to generate the distributions)
groups = [percentile(data.x, n) for n in 0:20:100]

dists = [
    fit(Normal, [data.y[i] for i in 1:length(data.y) if groups[j - 1] < data.x[i] < groups[j]])
    for j in 2:length(groups)
]

# The distributions are at the 20th, 40th, 60th, 80th, and 100th percentiles so this
# next variable will store values at the 10th, 30th, etc., percentiles so that dists
# appear in the middle of the data points that they represent when plotted
distlocs = [percentile(data.x, n) for n in 10:20:100]

xmin = minimum(data.y)
xmax = maximum(data.y)

# You'll likely have to tweak the xmin/xmax values in xrange to get the desired result
xrange = collect(xmin-25:1:1.5xmax)

# Add scatter points
p = plot(
    data.x,
    data.y,
    seriestype = :scatter,
    markersize = 2,
    markerstrokewidth = 0,
    markeralpha = 0.8
)

# Add regression line
plot!(data.x, predict(ols), line=:line, linestyle=:dash, linealpha=0.6)

# Add distributions
for i in 1:length(dists)
    plot!(
        zeros(length(xrange)) .+ distlocs[i],
        xrange,
        [pdf(dists[i], x) for x in xrange],
        legend = false,
        fill=(0.0)
    )
end

savefig(p, "output.png")
6 Likes