Would slowing Julia's release cadence improve ecosystem quality?

I think Sebastian’s point is that it doesn’t even really do that without also retrocapping the registry since if JuliaInterpreter v0.9.26 has an upper bound on Julia 1.11 say, then on Julia 1.12 Pkg will just install JuliaInterpreter v0.9.25 which doesn’t have that upper bound. So these upper bounds should also be paired with PRs to General (using GitHub - JuliaRegistries/RetroCap.jl: Retroactively add "caps" (upper-bounded compat entries) to all packages in one or more registries or other tooling - which IIRC you have used before, but is probably new to a lot of folks as well).

4 Likes

Real world case of what happens if you don’t retrocap the registry: On Julia 1.8.0-DEV, LocalRegistry restricted to v0.3.2 · Issue #50 · GunnarFarneback/LocalRegistry.jl · GitHub.

2 Likes

Yes, but it would be sufficient to retrocap just the “known sinners”, without touching directly any package not depending on them.

As soon as there are programmatic means to detect usage of Julia internals, it should become possible to enforce Julia upper bound for such packages during registration.

The following point is a bit tangential, but I do have plans for juliaup to eventually automatically start the exact Julia version that is recorded in the manifest of a project. That obviously won’t solve most of the issues discussed here, but I think overall it might make these issues less frequent as folks will run less often into a situation where they have to update a project because they are using a different Julia version than was used to create the manifest of the project.

10 Likes

Does the python community have the same problem? If so, how do they fix it?

Yes and they don’t.

Or well, I can’t speak for Conda, but what always happens to me if I have a project with pinned pip dependencies and upgrade the Python version is that at least a few of the packages contain C or C++ code. Being old versions they have no pre-built wheels for the new Python version and they instead try to build locally, and always fail.

4 Likes

Try using conda… I’ve never had these kinds of problems (on Windows, even).

The only thing I can recall is in a package I help maintain, we support all non-EOL python versions, and try to run CI with one set of dependency versions (from a manifest file) – it was recently hard to find compatible dependency versions which are valid for all active python versions.

Do those bugs have open issues or PRs? If yes, please bump them and if not, please open issues.

They’re open and have been bumped many times; the biggest problem is lack of resources/contributors to Distributions.jl :confused:

@ParadaCarleton can you share links to the issues so these conversations aren’t just hearsay? there are over 200 issues, mostly feature requests, so its hard to find what you mean.

5 Likes

Yeah links would be useful. I occasionally contribute to Distributions.jl and would like to get anything that stops others from being able to use it fixed.

There’s a bunch here, but the issues brought up most often were related to:

  1. Random samples falling outside of the support (especially Beta+Dirichlet distributions)
  2. Bugs caused by pervasive use of @inbounds (this was a problem regardless of whether the code used OffsetArrays)

Users also tended to be frustrated with a lack of tools to prevent bugs in their own code. Most common complaints were about the lack of support for kwarg constructors and Distributions that take/return tables or labeled arrays, which they tend to take for granted when they come from PyMC.

Distributions that return keyed arrays are available in GitHub - invenia/KeyedDistributions.jl: Distributions and Sampleables with keys for the variates. I used them some time ago, worked fine!

Huh, thanks, I’ll check it out.

At the same time, this means I have to take the time out of my day to teach new users how to use two different Julia packages for array labeling. DimensionalData.jl is the standard for Bayesian analysis of posterior draws, because that’s what ArviZ.jl uses. It’s the same kind of pain that comes up when I have to use both Polars and Pandas for a data analysis.

I think (at least some of) what KeyedDistributions.jl does just works out of the box in DimensionalData.jl :

julia> rand(Normal(), X(5), Y('a':'c'))
5×3 DimArray{Float64,2} with dimensions: 
  X,
  Y Categorical{Char} 'a':1:'c' ForwardOrdered
   'a'        'b'         'c'
  0.600073   0.252401    1.60991
  0.558153  -0.603837   -1.23051
 -0.170615   0.0776176  -0.671733
 -1.26675    1.01441     0.0785599
 -0.780844  -1.11786     0.122756

julia> dists = DimArray(Returns(Normal()), X('a':'e'))
5-element DimArray{Normal{Float64},1} Returns(X) with dimensions: 
  X Categorical{Char} 'a':1:'e' ForwardOrdered
 'a'  Normal{Float64}(μ=0.0, σ=1.0)
 'b'  Normal{Float64}(μ=0.0, σ=1.0)
 'c'  Normal{Float64}(μ=0.0, σ=1.0)
 'd'  Normal{Float64}(μ=0.0, σ=1.0)
 'e'  Normal{Float64}(μ=0.0, σ=1.0)

julia> mean.(dists)
5-element DimArray{Float64,1} with dimensions: 
  X Categorical{Char} 'a':1:'e' ForwardOrdered
 'a'  0.0
 'b'  0.0
 'c'  0.0
 'd'  0.0
 'e'  0.0

julia> rand.(dists)
5-element DimArray{Float64,1} with dimensions: 
  X Categorical{Char} 'a':1:'e' ForwardOrdered
 'a'  -0.846532
 'b'   0.0950699
 'c'   0.561192
 'd'  -0.390887
 'e'   1.28783

Most things like this are generic in DD because we don’t use Symbol for dimension names so we own the dispatch methods.

Most things in DimensionalData “just work” 80% of the time with no extra effort. Distributions.jl is no exception :sweat_smile:

julia> z = randn(3, 3) |> x->x' * x |> x->DimArray(x, (ax1=[:x, :y, :z], ax2=[:x, :y, :z]))
3×3 DimArray{Float64,2} with dimensions: 
  Dim{:ax1} Categorical{Symbol} Symbol[:x, :y, :z] ForwardOrdered,
  Dim{:ax2} Categorical{Symbol} Symbol[:x, :y, :z] ForwardOrdered
        :x          :y          :z
  :x   0.747829    0.0869189  -0.255987
  :y   0.0869189   4.23079    -0.0218206
  :z  -0.255987   -0.0218206   0.0913322

julia> d = MvNormal(z)
ZeroMeanFullNormal(
dim: 3
μ: Zeros(3)
Σ: [0.7478286057002036 0.08691892384943312 -0.25598736147315915; 0.08691892384943312 4.230791186850467 -0.021820611723327138; -0.25598736147315915 -0.021820611723327138 0.0913321678844106]
)

julia> rand(d)
3-element Vector{Float64}:
 -0.3047069352909566
 -0.04075659598335215
  0.0686493909126511

(This is why interfaces aren’t a universal replacement for standard packages. Even if we have interfaces that are supposed to work, there are just so many features and possible bugs at the intersections between functions. You end up having to spend all your time tracking them down for several packages, and the Julia community just doesn’t have the manpower for that.)

I feel you on that last 20%

PDMats.jl only just accepted anything other than Matrix, and now just converts every other array type to Matrix even though it has a type parameter. DD tries pretty hard to propagate dimensions through everything but they cant survive through a Matrix constructor.

1 Like

Perhaps worth splitting into a Distributions-specific thread?

2 Likes

Let me share my recent experience with Distributions.jl. Please nobody take this as criticism of the developers! Rather, it might illustrate how a lack of resources and focus on a central package can make the user experience miserable.

I was writing a textbook implementation of a Kalman filter, where you typically have some states with noise and some without, so the covariance matrix has some zeros on the diagonal. Nothing fancy, a covariance matrix just needs to be positive-semidefinite and I didn’t give it a thought when I wrote the same filter in Matlab, but in Julia it didn’t go well. Here’s a summary using a simplified example:

julia> using Distributions

julia> x = MvNormal([0, 0], [0 0; 0 1])
ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.

Weird… let’s look at the documentation!

help?> MvNormal
search: MvNormal MvNormalCanon MvNormalKnownCov MvLogNormal AbstractMvNormal MultivariateNormal

  MvNormal

  Generally, users don't have to worry about these internal details.

  We provide a common constructor MvNormal, which will construct a distribution of appropriate type depending on the input arguments.

  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

  MvNormal(μ::AbstractVector{<:Real}, Σ::AbstractMatrix{<:Real})

  Construct a multivariate normal distribution with mean μ and covariance matrix Σ.

  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

  MvNormal(Σ::AbstractMatrix{<:Real})

  Construct a multivariate normal distribution with zero mean and covariance matrix Σ.

No word about special steps that would be needed to use a positive-semidefinite matrix (also, the first sentence doesn’t make sense but never mind).

I eventually found a GitHub issue with the suggested workaround: install PDMatsExtras.jl and wrap the covariance matrix in PSDMat.

julia> Σ = PSDMat([0 0; 0 1])
ERROR: MethodError: no method matching PSDMat{Int64, Matrix{Int64}}(::Int64, ::Matrix{Int64}, ::CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}})

This is inconsistent with Distributions.jl which promotes to Float64 automatically but that’s a minor complaint, let’s move on.

julia> Σ = PSDMat([0. 0; 0 1])
2×2 PSDMat{Float64, Matrix{Float64}}:
 0.0  0.0
 0.0  1.0

julia> x = MvNormal([0, 0], Σ)
MvNormal{Float64, PSDMat{Float64, Matrix{Float64}}, Vector{Float64}}(
dim: 2
μ: [0.0, 0.0]
Σ: [0.0 0.0; 0.0 1.0]
)

It works, I can use rand(x) to draw random values! Well no, I found out later that the sampling was giving wrong results :frowning:

julia> rand(x)
2-element Vector{Float64}:
 -0.6249974633185146
  0.0

Here the two values are swapped (it’s the first one that has zero variance so should always be 0). Hopefully that bug will be fixed soon: Fix pivoted order in whiten & unwhiten by vandenman · Pull Request #33 · invenia/PDMatsExtras.jl · GitHub (edit: already merged and tagged, thanks @vandenman and @oxinabox !)

This stuff just works in Python, R, Matlab, etc. I was surprised by the bad experience in Julia. Drawing samples from a multivariate Gaussian is such a basic thing. So I’m not surprised to see user testimonies and blog posts complaining about the immaturity of the language/ecosystem (though I know it’s actually mature in other respects).

14 Likes