Archival of discontinued packages in JuliaML

I am cleaning the house in JuliaML and will proceed with a sequence of archival operations to facilitate the lives of newcomers who come to Julia and are completely lost in the middle of packages that don’t even load anymore.

We are coordinating this effort in our Zulip machine learning stream:

The list of packages that will be archived is listed below:

If you would like to inhibit the archival of any package listed above or want to do some final adjustment to its Project.toml before we archive it, please leave a message.

I will cross-post this thread on Slack as well so that we don’t miss anyone.

We also need help updating the JuliaML website. The website is referring to discontinued packages and certainly deserves a new version. Please join our Zulip stream if you want to help.

9 Likes

Thanks for doing this! Could you explain what “archival” means here? Presumably you’re not yanking things from General?

We want to use GitHub archival feature to mark the repository as read-only. It will show a banner at the top but the URL will exist forever on the General registry.

The archival feature is to signal users and contributors that the package is no longer actively maintained, and that there are no plans of reviving these efforts. Some of these packages date back to pre-Project.toml era.

2 Likes

We are currently depending on MLDataUtils, MLLabelUtils, and MLDataPattern.

Is there anything in those that’s not in MLUtils that we should know about? I imagine we just need to migrate to that.

MLDataUtils.jl and MLDataPatterns.jl were absorbed in MLUtils.jl by @CarloLucibello , he can provide more details.

Regarding MLLabelUtils.jl, there are also plans to migrate parts of it to MLUtils.jl and @darsnack is probably leading that front.

Keep in mind that our ultimate goal is to reduce the number of packages in JuliaML and concentrate efforts around MLUtils.jl. Any help with this migration is highly appreciated.

2 Likes

MLUtils.jl should be almost a drop in replacement for MLDataUtils and MLDataPattern. MLLabelUtils has not been ported yet. @iamed2 can you point me to the specific methods that you are using from there or to the dependent package (if it is public)?

Are there plans for a new interface package? It seems MLUtils is a lot heavier then LearnBase and so something more lightweight might be useful for packages that want e.g. to implement getobs. I think lightweight interface packages are quite useful for additional or optional functionality without pulling in too many dependencies or using Requires, as e.g., ChainRulesCore, StatsAPI, InverseFunctions etc. show.

Not at the moment. An effort to build an interface package is justified after we have a decent set of packages sharing these concepts. Right now JuliaML is not like that, unfortunately.

We should try to reuse existing interface packages like StatsAPI.jl as much as possible before deciding on yet another one. I feel that this is an effort that can be postponed to after we have cleaned up JuliaML and its dead orphan packages.

1 Like

With the addition of multi-threaded data loading, MLUtils.jl has grown larger than we initially planned. We are trying to trim back, though it might be time to introduce an interface package.

It’s worth noting that the default set of methods to implement are Base.getindex and Base.length. The only time you need to implement getobs or numobs is if these functions have a different meaning for your type than getindex/length (e.g. multi-dimensional arrays).

Assuming the MLDataUtils.jl and MLDataPattern.jl APIs are fully ported to MLUtils.jl with minimal API changes I don’t think this is a big concern for us. We do use islabelenc and convertlabel in 1 private package, but that’s the only case I could find and it wouldn’t be too hard to work around if necessary. I think our bigger concern is the dependency stack and size of MLUtils.jl. A lot of the time we just use it to call a single function like kfolds to reduce code size in small packages, but I’m not sure if the added dependencies on FLoops, FoldThreads, ShowCases, etc makes it worth it in those cases.

1 Like

Update: Looks like the MLDataPattern.jl functionality wasn’t directly ported. For example, a previously working call to shuffleobs now errors because it’s internally calling length on a dataframe.

  MethodError: no method matching length(::DataFrame)
  Closest candidates are:
    length(::Union{Base.KeySet, Base.ValueIterator}) at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/abstractdict.jl:58
    length(::Union{DataStructures.OrderedRobinDict, DataStructures.RobinDict}) at ~/.julia/packages/DataStructures/izXYB/src/ordered_robin_dict.jl:86
    length(::Union{DataStructures.SortedDict, DataStructures.SortedMultiDict, DataStructures.SortedSet}) at ~/.julia/packages/DataStructures/izXYB/src/container_loops.jl:322
    ...
  Stacktrace:
    [1] numobs(data::DataFrame)
      @ MLUtils ~/.julia/packages/MLUtils/W3W0A/src/observation.jl:19
    [2] _check_numobs(data::Tuple{DataFrame, Adjoint{Float64, Matrix{Float64}}})
      @ MLUtils ~/.julia/packages/MLUtils/W3W0A/src/observation.jl:120
    [3] numobs(data::Tuple{DataFrame, Adjoint{Float64, Matrix{Float64}}})
      @ MLUtils ~/.julia/packages/MLUtils/W3W0A/src/observation.jl:129
    [4] shuffleobs(rng::Random._GLOBAL_RNG, data::Tuple{DataFrame, Adjoint{Float64, Matrix{Float64}}})
      @ MLUtils ~/.julia/packages/MLUtils/W3W0A/src/obstransform.jl:194
    [5] shuffleobs(data::Tuple{DataFrame, Adjoint{Float64, Matrix{Float64}}})
      @ MLUtils ~/.julia/packages/MLUtils/W3W0A/src/obstransform.jl:191

Having these breaking changes mixed into the porting process means we can’t simply rely on semver to stagger the work needed to resolve them. If the goal was to centralize the code in one package then my preference would have been to (1) copy the existing functionality, as is, into MLUtils.jl (2) tag a 1.0 release and (3) make the desired breaking changes for a v2 release…?

I am not following the specific details of this issue you raised, but notice that you have all the time in the world to migrate your projects to the new package. The archival just means that no one is gonna touch the old packages moving forward. They will continue to exist but are dead projects. If you want to receive updates from JuliaML maintainers, then you can migrate your packages slowly to MLUtils.jl.

As others already mentioned, LearnBase.jl, MLDataPattern.jl , … continue to exist on GitHub as frozen repos that no one is maintaining. We are concentrating efforts in a single repo now, that is MLUtils.jl

You can submit a PR to MLUtils.jl with improvements if you feel something should be done differently.

1 Like