For probabilistic programming in particular, it’s critical to be able to use distribution-related functions (especially logpdf) in non-standard ways. For example, I would love to be able to call logpdf(TDist(nu),x) where nu and/or x have type
ApproxFun.Fun
SymPy.Sym
ModelingToolkit.Operation
It seems there are at least two things that get in the way:
checks like Distributions.@check_args that can’t be determined statically
type constraints like logpdf(::Normal, ::Real)
Neither of these is in itself a bad thing. But it seems critical to have an easy workaround for both.
For the first case, one idea is for Distributions.jl to have an unexported “unsafe” kernel _logpdf with no check. Then the exported logpdf could wrap this in the appropriate check. It may be possible to use a similar trick to have something available without the type constraint, but this is less clear to me. Of course it would be better to be able to force other libraries’ types to be ::Real (if they really ought to be), but I don’t know that this can be done without either lots of forking or lots of wrappers.
If Distributions.jl isn’t general-purpose, people will write their own alternatives, and we’ll have lots of duplicated effort. There’s already some of this - I had to fork Distributions to get it to play nice with SymPy, and (as I understand it) the Gen team built their own version of it to have nice gradients.
Does an approach like the above make sense, or would it break other things? Is there a better way to go about it? Are there other capabilities specific to probabilistic programming that Distributions.jl should provide?
It could but I don’t think that’s a neat fix. Perhaps, someone else has a more useful opinion on this. A keyword argument can also work to remove the check.
I actually like the idea from an ease-of-use persepctive, but I wonder if someone could speak to what the performance of this would be. Does the fact that the “workhorse” function doesn’t actually have formal typing reduce performance?
This really only addresses the @check_args call. I think probably the bigger hamstring is the type constraints. Maybe there’s like a subtyping trick that’s possible?
As I understand there’s no penalty for generic functions, as long as they’re type-stable. Code specializes each time it’s called with a different argument type.
The first official release of Distributions.jl was in 2013, 6 years ago, before the actual release of Julia 0.3. IMO a lot of its design reflects that, eg
it is full of ostensibly performance-enhancing checks and conversions which may not needed anymore, as the compiler got way smarter,
the idiomatic Julia style for writing generic code has matured a lot since these days, using traits instead of elaborate type hierarchies, and duck typing where one can get away with it.
This is natural — code ages, and needs to be dusted off occasionally.
A complete, breaking rewrite & refactor by a small team of experienced Julia programmers would probably be the best way to remedy this, but I am not sure how to engineer the consensus necessary for that. It may come to a complete rewrite in a parallel package.
A lot of the checks could probably be removed and the code would just error anyway for incompatible dimensions. All subtypes of Real, and vectors with elements ::Real (not necessarily Vector{<:Real}, for well-known reasons, eg Any[1.0, 1]) should just work.
But for other types, eg symbolic, you may have to provide your own methods if they cannot trickle through an interface designed for numbers. Eg I am not sure what ApproxFun.Fun should do as an argument.
I just remembered there is already a NoArgCheck empty struct which can be used for that purpose, with the inner constructor doing nothing, the “standard construction method” checking args and the T(params, ::NoArgCheck) bypassing that
Can you (or @cscherrer) expain how it helps this this particular use case?
My understanding is that for values that are ::Real or arrays of such, and support the usual operations on numbers that checks use (eg <, etc), skipping the checks is not needed to just make things work (they are useful as an optimization though, but that’s another use case, which I suppose motivated your PR).
For other types, eg symbolic values, it is very likely that a different implementation is needed for most distributions, as special functions won’t play nice with these objects (some pdfs would be an exception). This will require another method definition anyway.
Yes I think the check_args issues is mostly an optimization, Chad reported the two but my understanding is that they are not related. The arg_check was the easy one because consensual, lots of use cases remove the needs for verification.
Now removing all type bounds is a bit more tricky and doubling the functions not a nice solution either. My feeling is that the symbolic libraries mentioned should use a wrapper type which subtypes Real or Number
If folks here want to take a look, the PR for optional removal of @check_args has been merged to master. If this seems ok, we will release v0.21.1 for people to have it (nothing breaking, the default is still to check).
Also, I’d be glad to get more opinions on #945 from @richardreeve, which fixes the semantics of the type hierarchy, which has been blocking us for some PRs (including Dirac for example)