Costs of registering packages

I wonder what the costs associated with registering packages are. I brought up this subject in the thread on fragmentation, but it probably wasn’t the right place to ask this question.

What I mean is: if I have a piece of software which is in reasonably good shape and potentially of use to others, should I put it into the registry or not? Right now I don’t have a solid idea on what the costs of that action might be. But if I knew that there was an energetic cost (for instance because of the need to duplicate the registry, costing some watt-hours), that would enter into my consideration. I do want to be a good citizen (I do know where the electric power is coming from…). If I don’t register the package, but just keep it on Github, that energetic cost may very well be the same (after all, there are many server farms storing our stuff on Github). If that’s so, I might just as well register the package. But the thing is: I just don’t know what other costs there might be. Perhaps filling up the global “namespace” is such a cost?

What do you think?

6 Likes

If you think your package might be useful for others, and you intend to maintain it at least for a few years, please register it. Only registered packages are safe to be used in reproducible research.

5 Likes

I think taking away a package name is the biggest cost to the community.

For example, xKDR registered a package called TSFrames.jl, for handling time series data. The package name would have been TimeSeries.jl had it not been there already.

3 Likes

I would urge you to search for similar functionality already in the ecosystem. If your package is mostly covered by other things already out there, I would suggest refraining from registering it unless there is some significant advantage to the community in the way you’ve done things.

If you just need to register something for reproducible research, you can always start your own registry. My group has done that. IMO General means “of general interest” and using it to document a specific paper seems like an abuse to me.

8 Likes

Just a question here: IMHO, adding another registry maintained by someone means that I need to trust him/her quite a bit as it appears to be easy to register a higher version of any package available from General in that registry and thus hijack it. Or can this be prevented by Pkg by just avoiding to install a doubly registered package ?

EDIT: Also, using another registry IMHO doesn’t solve the namespace problem. The only possibility I see in the moment is to use good old hungarian notation - all packages in registry CoolPackageRegistry would would have names like prefixed by a slug of the registry name, eg. coolCore.jl etc.

1 Like

Yes, adding a registry requires that you trust the maintenance of that registry. See Dependency confusion between internal registries and General · Issue #2393 · JuliaLang/Pkg.jl · GitHub for Pkg discussions on the subject.

Does having a custom registry solve the problem of name clashes?

I think that is one important thing to think. For instance, if I create my custom registry, and develop a “Optim” package there, can I in any way use this “Optim” without conflicting with the general-registry “Optim”.

I have the impression that with custom registries or not, at some point it would be nice to have a feature of being able to bind package names to a higher-level tag, such that one could do:

add Optim
add MyRegistry/Optim

And if this is nicely integrated to the ecosystem, much better (for instance, if one could “register” the custom registry).

Concerning documentation of papers: Maybe that’s not clear to everyone, but one can create a package and simply just host it on github, without registering anything. That’s probably much simpler than having a custom registry or anything else.

4 Likes

That discussion actually makes one wander if custom registries should be encouraged at all. Seems that “custom sub-registries” of the general registry, with a central control of UUIDs would be a better alternative. That prompts again the OPs question on the cost of adding packages to the general registry. Is there a cost such that too many packages (how many?) are a problem in any sense?

1 Like

I also wonder what the monetary costs are…? Many changes in Julia are benchmarked against the general registry, right? And every user downloads the general registry when they install Julia. What are the costs of CI and data transfer here? Suppose the general registry were 20% smaller, what would the economic impact be?

3 Likes

The most common case of custom registries is for internal use within a company or other organization and there trust shouldn’t be an issue (otherwise you have bigger problems to deal with). Likewise for personal registries.

For random external registries you do need think about whether you trust it. Actually that’s a question you should ask yourself whenever you install a package as well, also from General.

At the content of package level sure, but multiple registries apparently create an additional issue associated with possible duplicated UUIDs, isn’t this the issue?

A use case for a package in multiple registries I see is a “package nursery”, where package versions 0.0.x are registered until they are “grown up” enough for General where versions >= 0.1.0 would be registered. Actually, this is what I am exploring. The benefit is that the convenient Pkg version management via version numbers can be used already during early stages.

Unless you plan to generate more than a quintillion UUIDs duplicated UUIDs are statistically “impossible.”

1 Like

I’ll get there.

9 Likes

I understood that as a possible malicious manipulation by a custom registry.

I haven’t tried, but ideally Julia should throw an error if one tries to add a registry that has the same UUID as an existing registry. Since you raised the concern, can I appoint you to test it and report back to us?

1 Like

I can, but the possible issues have to be more clear. For instance, I can obviously add a package from its url:

(jl_NzZEBk) pkg> add http://github.com/m3g/PDBTools.jl

even though it is a package that is in the general registry (thus we have possibly conflicting UUIDs here already).

I guess we can do that from a custom registry as well. I can try that later today (going to lecture now).

2 Likes

Pkg only errors if the package names are different for the same UUID. Otherwise it’s considered the same package and it’s by design possible to have the same package in multiple registries. E.g. you could in your company internal registry add a hotfix for a package from General without having to wait for the maintainer to register a new version in General.

I again refer to Dependency confusion between internal registries and General · Issue #2393 · JuliaLang/Pkg.jl · GitHub for discussion on how Pkg could be made more resistant to dependency confusion.

2 Likes

I agree. For that I create a package on Github, and the references of the publication include the url.

Good way of putting it!

1 Like