I think that blog post actually gives scipy a bit of an unfair advantage.
# scipy
norm.cdf(x)
vs
# Distributions.jl
cdf(Normal(0,1),x)
as it assumes default values being used for scipy but not for Distributions.jl. A more fair comparison would be:
scipy:
import scipy.stats as st
st.norm(2, 3).cdf(0.9)
# with default values
st.norm.cdf(0.9)
I could perhaps do
from scipy.stats import norm
But that isn’t normally idiomatic python as far as I know.
Distributions.jl
using Distributions
cdf(Normal(2, 3), 0.9)
# with default values
cdf(Normal(), 0.9)
There’s no less punctuation in the scipy version. Actually, scipy and Distributions.jl are not that different, in that you have a distribution object that you call various functions/methods on. This is very nice compared to the Matlab approach:
>> normcdf(0.9, 2, 3)
# with default values
>> normcdf(0.9)
It is terse, but awkward, since you need to know the name of the distribution to get its various properties.
A brief comment on the idea that the pdf contains all the information about a distribution:
This may be true in principle (for most distributions, at least), but the information is very hard to retrieve. Let’s do a thought experiment: I give you a variable, called f
. I tell you that it holds the pdf of some statistical distribution. How would you go about finding the mean of the distribution described by f
? I would contend that, in general, this is impossible, or would take infinite time. If you are lucky, it is centered around zero, and you could start doing some indefinite integration, searching for the outer limits of the pdf. But, what if it is centered around -10^64 instead, and is extremely narrow? And maybe it’s a mixed distribution? And maybe it’s multi-dimensional?
So what do you do?