On beauty of Distributions.jl pdf interface

Julia is often touted for its syntax that resembles math notation. I have a kind of objection here.

Getting values of this density function with this syntax is awkward to say the least. Shouldn’t this be simply p(0:0.01:1) ? In fact, this is the case in almost all other languages.

This really is a matter of taste. The issue is that you are considering p a density function. I see it as a distribution where different things can be done with it, including, but not limited to, calculating the density function or generating random numbers from this distribution.

Now, if you really want to use as a density function, you could resort to type piracy and define the function yourself:

(p::Distribution)(x) = pdf(p, x)

And now,

julia> p = Beta(2,3)
Beta{Float64}(α=2.0, β=3.0)

julia> p.(x) == pdf.(p, x)
true

But I wouldn’t recommend it. It will work with the simpler distributions.

13 Likes

This is a choice a package (Distributions.jl) made for its API; it very well could have chosen to use p(x) for this.

Since that syntax is available for packages to use, I’ve moved this out of #dev and into #usage and I’ve made the title a bit more focused.

6 Likes

No. The other API clearly could work (since you can emulate it with 1 line of code). The question is just what it should mean to call a distribution. IMO, sampling would be the more natural behavior of a function call, and the reason Distributions chooses to make it explicit is that there isn’t a clear right answer.

2 Likes

Thanks for the comment. But there is a way of resolving this, right? We aim at making this look like math notiation. In statistics, p, a distribution, is nothing but a function. p(0) is unanimously agreed on to represent density at x=0. Sampling would be represented by something like x ~ p. Back to the point, pdf(p, 2) is not remotely close to what we write in math notation.

There are various notations in statistics. For example, f_X(5) is pretty similar to pdf(x, 5). It could be better if we had a nicer way to make partial functions like pdf(x::Distribution). That is a Julia syntax issue.

2 Likes

So why wouldn’t p() be the proper interface for sampling? Or maybe p(0.5) should be the logpdf instead (as is often used by bayesians)? A distribution is a mathematical object which you can “do stuff” to. And in Julia, “stuff” usually means functions with multiple-dispatch.

Personally, I think the p(x) etc. should only be used for function-like use, but there are probably some DSLs where that is worth deviating from.

FWIW, I often write F \sim N(0,1) etc. and then \int g(x) d F(x) so to me the CDF is more natural than the PDF?

not really a Julia syntax issue, all you need to do is define

pdf(d::Distribution) = p(x)->pdf(d, x)
p = pdf(Beta(2,3))
p.(x)
1 Like

Fair enough. Though Distributions.jl didn’t do that, and we can’t do it without piracy.

2 Likes

Which languages are they? Certainly neither Python, R, Matlab or Mathematica, to mention the most comparable.

That’s not what I learned in my statistics courses. There are a number of functions that can describe a probability distribution, such as the pdf or cdf, but they aren’t the distribution itself.

I don’t think I’ve ever heard this before. The density function of a distribution has this property, not the distribution itself.

5 Likes

Here’s a illustrative table prepared by @Albert_Zevelev :

3 Likes

If Normal() just returned the pdf function of the standard normal distribution, it would be far less powerful. Calculating the cdf from this function would mean having to numerically integrate the pdf. Even just calculating the mean or other moments would entail numerical integration, because you wouldn’t have access to the distribution parameters, only the function itself.

4 Likes

I really like that characterization. It’s more elegant, and more flexible.

2 Likes

Thanks for that. I personally see that scipy is the most elegant, least amount of composition and brackets.

I don’t find your arguments for making the pdf representation of a distribution the fundamental object particularly compelling. Every probability distribution on a subset of the reals has a cumulative distribution function, which uniquely defines the distribution. By contrast, a probability density function exists only for (absolutely) continuous probability distributions. It’s a matter of taste, but I find the API of Distributions.jl well inspired.

3 Likes

Suppose you want to switch the distribution in some model. In Julia (and Mathematica) it does not take much. Simply switch out the distribution in say cdf(Normal(4, 10), x) to cdf(TDist(r), x). In other frameworks you have to worry about how many parameters the distribution has, and then order, etc, no?
eg:

const dist = Normal(4, 10)

# compute probability -3 <= x <= 3:
prob = cdf(dist, 3) - cdf(dist, -3)

Simply redefine the distribution at the top and nothing else has to change in the rest of the script.

2 Likes

I think that blog post actually gives scipy a bit of an unfair advantage.

# scipy
norm.cdf(x)

vs

# Distributions.jl
cdf(Normal(0,1),x)

as it assumes default values being used for scipy but not for Distributions.jl. A more fair comparison would be:

scipy:

import scipy.stats as st
st.norm(2, 3).cdf(0.9)
# with default values
st.norm.cdf(0.9)

I could perhaps do

from scipy.stats import norm

But that isn’t normally idiomatic python as far as I know.

Distributions.jl

using Distributions
cdf(Normal(2, 3), 0.9)
# with default values
cdf(Normal(), 0.9)

There’s no less punctuation in the scipy version. Actually, scipy and Distributions.jl are not that different, in that you have a distribution object that you call various functions/methods on. This is very nice compared to the Matlab approach:

>> normcdf(0.9, 2, 3) 
# with default values
>> normcdf(0.9)

It is terse, but awkward, since you need to know the name of the distribution to get its various properties.

A brief comment on the idea that the pdf contains all the information about a distribution:
This may be true in principle (for most distributions, at least), but the information is very hard to retrieve. Let’s do a thought experiment: I give you a variable, called f. I tell you that it holds the pdf of some statistical distribution. How would you go about finding the mean of the distribution described by f? I would contend that, in general, this is impossible, or would take infinite time. If you are lucky, it is centered around zero, and you could start doing some indefinite integration, searching for the outer limits of the pdf. But, what if it is centered around -10^64 instead, and is extremely narrow? And maybe it’s a mixed distribution? And maybe it’s multi-dimensional?

So what do you do?

8 Likes

Thanks for the perspective. This restored my belief in Julia.
Hopefully the accident was not caused by Julia.

2 Likes

Thank you for the detailed response. This has certainly dissuaded my position.

2 Likes

I just wrote some code last week, because I also felt that the pdf(X, 1) syntax was tedious.
My premise is that the f_X syntax means the pdf of the distribution X. I also assume that I will mostly define my distributions as X. As a final thing, it seemed like 0-based indexing was beneficial, so I constructed the return value with an OffsetArray with default offset = -1:

using OffsetArrays
using Distributions

f_X(; offset = -1) = OffsetArray((pdf(eval(:X))), offset)
f_X(ind::Real;           offset = -1) = OffsetArray(pdf(eval(:X)), offset)[ind]
f_X(inds::AbstractArray; offset = -1) = f_X.(inds; offset)

It is possible that this only makes sense for discrete distributions - my lecture on continuos ones are today :slight_smile:

This allows

julia> X = Binomial(1000, 0.9)
Binomial{Float64}(n=1000, p=0.9)
julia> f_X() == pdf(X)  # Acutally f_X(offset=0) == pdf(X)
true

Now if I redefine X and call the function again, it evaluates to whatever X is currently defined as through metaprogramming:

julia> X = DiscreteUniform()
DiscreteUniform(a=0, b=1)
julia> f_X() == pdf(X)  # Acutally f_X(offset=0) == pdf(X)
true

Note that the offsets are set to 0 for comparison - otherwise I prefer 0-based indexing for this technical domain.

Part of the reason for all this is that I am currently more interested in indexing the pdf than evaluating it at a given value. I was expecting the second argument to pdf to be the index, but it is instead the input value. So what I want is

julia> pdf(X)[1:3]
3-element Vector{Float64}:
 0.34867844010000004
 0.387420489
 0.19371024449999996

This can be acieved with only one set of brackets and zero-based indexing with my function:

julia> f_X(0:2) .== pdf(X)[1:3]
3-element BitVector:
 1
 1
 1

In case you should find that useful :slight_smile: