I’m trying to convert some plots from R based examples to Julia.
The R examples are found here:
Essentially the single plot is a histogram of a probability distribution, with the corresponding Beta distribution plotted on top.
I’ve managed to do something similar in Plots.jl, but not quite…
My data is horse racing related, rather than baseball, and looks like this:
julia> trainers_sr_hundred_runs
380×4 DataFrame
│ Row │ trainer │ runs │ wins │ sr │
│ │ String │ Int64 │ Int64 │ Float64 │
├─────┼───────────────────┼───────┼───────┼───────────┤
│ 1 │ A P OBrien │ 394 │ 69 │ 0.175127 │
│ 2 │ Ed Dunlop │ 1711 │ 198 │ 0.115722 │
│ 3 │ Sir Henry Cecil │ 651 │ 128 │ 0.196621 │
⋮
│ 377 │ Mark Walford │ 190 │ 14 │ 0.0736842 │
│ 378 │ Rebecca Bastiman │ 113 │ 8 │ 0.0707965 │
│ 379 │ Heather Dalton │ 146 │ 11 │ 0.0753425 │
│ 380 │ Sarah Hollinshead │ 109 │ 6 │ 0.0550459 │
Code is:
using DataFrames
using Distributions
using Plots
using StatsPlots
# Plot histogram of trainers strike rate, more than 100 runs
gr()
histogram(trainers_sr_hundred_runs.sr, label = "Strike Rate")
# Fit Beta distirbution to find alpha and beta
alpha_beta = fit(Beta, trainers_sr_hundred_runs.sr)
# Store alpha and beta in a named tuple
alpha_beta_tup = Distributions.params(alpha_beta)
# Plot the Beta distribution over the histogram
plot!(Beta(alpha_beta_tup[1], alpha_beta_tup[2]))
Created plot looks like:
The problem looks like the Beta distribution is continuing out to 1, which makes some sense, but there’s no histogram values out there. So, perhaps the question is, how do I limit the Beta distribution plot to end at the largest value in the histogram/distribution.
Secondly, in the R examples, you’ll notice the y-axis of the combined plots is density rather than counts. I tried in Plots.jl starting with the Beta distribution and then the histogram, but it didn’t seem to make much difference. How does one set the y-axis to correspond with the second plot?