Package naming guidelines

The more I’m trying to understand certain packages that depend on other packages that all kind of are related to the same thing (the entire glvisualize/glabstraction etc stack), the more I find it hard trying to understand what function belongs to which package. My opinion is more and more that explicitly calling functions preceded by the package/module to which they belong makes code a lot more readable/understandable/easy to work with.

Even if the function names don’t collide, if I see an allocate() function inside some package I never quite know what to expect because it means vastly different things in vastly different contexts. On top of that it can come from any number of packages that are being used in the code, and scattering @which’s throughout packages is difficult. This kind of gets compounded by the fact that it’s very easy for packages to overload functions from other used modules/packages, it gets a bit out of control.

Although for the people who are doing the actual coding, ofcourse I understand it’s a huge pain to always write ModuleName.function(). I haven’t quite figured out what a solution might be to this inconvenience, I quite like the idea of using an as keyword too, or some explicit namespace usage.

1 Like

For discoverability, I think node and npm is doing great. An example is, yesterday I want use js to cp files in a gulp script, i just google “node cp file”, then the package “cp-file” comes at the third result. After read the description in that page, i know it is exactly what i need.

My point is, a good package list page (eg cp-file - npm) and good search engine optimization is more effective than the package name.

I definitely see the appeal for making packages more readable. I may start deliberately using import instead of using in every PackageName.jl.

For other code (data analysis, etc), we could stick with using.

3 Likes

Adding one more example to the discussion, TSne.jl is another package name that I disliked profoundly. It is super famous nowadays because of the Machine Learning hype, but no one will know what it is in the future.

Acronyms are really bad for learners. You go into a new community like R and all you can see there is a big soup of letters:

This is insane :man_facepalming:

7 Likes

T-SNE is more famous than t-Stochastic Neighbor Embedding. I knew the former years ago, but noticed the latter name only today. It is likely to be the same in the future I guess.

Most R users just happily import into the global namespace using library(mysteriousacronym) (OK). Namespaces in R are an afterthought (bad, but they learned to live with it). Most communities just know their relevant packages. CRAN has nice curated lists (called task views) by domain (good).

Another idea is to have keyword ID numbers, that package developers could include in their REQUIRE file.

Then to each number you can associate a keyword category, such as “metaprogramming” or “numerical methods”. The advantage of having a numbering system is that the names of the categories can easily be modified, updated, and changed, without requiring the package developers to change their REQUIRE files.

So basically, the ID numbers will be invariant, but you can still easily change the names of the categories then, so that as the community evolves it is easier to also make the discovery and categorization evolve in time.

With thousands of packages avaialbale, it doesn’t make sense to have a single huge list of packages, it needs to be filterable and browsable using kewyord categories, and using ID numbers to specify them gives the extra flexibility of allowing the category names to be updated as things evolve.

Otherwise, instead of having the ID numbers in the REQUIRE file, they should be required as part of a Pull-Request to METADATA, so that they keyword category ID’s can be approved and agreed by the community.

@innerlee I think acronyms are bad anyways and shouldn’t be used as package names. The best solution for the case of t-SNE in my opinion is to have it absorbed by a larger package that has a clear name like MultivariateStats.jl.

I stopped to think about this naming issue more closely today, and noticed that all (or most) of these acronyms represent a simple method within a larger class of methods. They surely can be grouped in a single package with a clear name. For example, MultivariateStats.jl includes PCA, ICA, FA, kPCA, … and could include t-SNE as well. This approach also makes things more discoverable because if a user knows about PCA, he/she will learn about the alternatives in a single place.

The Julia community is doing a great job into organizing similar concepts into higher abstraction levels, and we should maintain this effort whenever possible. The long-term solution is: 1) Implement simple methods in separate packages, 2) find the common concepts, and 3) merge packages into a larger package with broader applicability, clear name, and consistent API.

3 Likes

Or you could use tags and keywords to group and organize packages, instead of forcing related packages to be grouped into a single package. Then you can immediately tell which domain a package belongs to and that helps more than anything with understanding its relevance to a user browsing for packages.

Having a tagging keyword system is more flexible than grouping packages into a single package, because the package you want to re-group might actually be relevant to multiple keyword domains, and organizing it as part of a single meta package will be less clear than having it organized into multiple keyword categories.

So I’d advocate for having a “multivariate stats” keyword tag, then any package related to that can add that tag ID to it’s listing on the Julia package browser. This is more flexible and allows evolving changes in future.

Keywords are definitely a valid point, the main downside is that they don’t enforce a consistent API. You develop PCA with an API that looks like fit() + predict() and someone else develops another package with kPCA with an API that looks like train() + projection(). Now, as a user, you want to experiment with both methods, and you need to change your code to deal with these annoying variations. Not productive.

1 Like

That’s not really an issue with keywords, that’s a matter of communication and coordination between the package developers, which can be done on the Discourse or on Issues / Pull requests.

1 Like

I can guarantee that the problem of communication between developers around the world is NP-hard :slight_smile: There is no success in trying to enforce a consistent API by chatting or agreeing on Discourse.

There is no way to enforce a consistent API, since no one has power over other developers. But discussion frequently helps. Perhaps you missed @chakravala’s mention of issues and pull requests. A lot of API discussion happens there, and it frequently improves compatibility. The evolution of the AD library APIs is an example.

2 Likes

Discussion definitely helps, but it doesn’t scale. Wait until Julia becomes of the size of Python or something and everyone will be submitting packages with the exact same functionality to pkg.julialang.org only different naming. What I like about the current state is that packages try to unify concepts and users switch between implementations with ease. I see keywords as another layer of complexity added to the search.

Another analogy here that is not perfect, but helps understand the scaling issue is the use of keywords in journal papers. Many of my friends (including myself) rarely use keywords when searching papers because we are afraid of missing out a good paper that is not properly tagged. However, if I am searching on a specific journal specialized on a topic, I can safely search for titles, authors, and methods in the contents of the journal. Everything is on the right place.

I am not against keywords, I just share the view that they are not as helpful compared to having a single repository with different implementations of the same concept.

Pkg3 will make renaming packages straightforward (and even allow different packages to have the same name if necessary), so I think that we do not have to worry about this now since we will be able to adjust package names later.

As a more general comment, I do wish that people would stop declaring things that should or must be ready in time for 1.0 unless they are planning on actually doing the necessary work. There is a small group of people who have been working almost nonstop on 1.0 and we have spent literally hundreds of hours triaging what still needs to be done. We don’t have enough bandwidth to do all the work we have to do, let alone any more of it. If you want to help, check out the 1.0 milestone on GitHub issues and give one of them a shot.

Although renaming packages seems like it should not have much effect on people working on the base language, there are inevitably complications to package renames that ripple through the ecosystem and many of the people working on 1.0 are the same ones who would end up sorting out those problems.

I also feel that this thread has turned a bit negative (“I don’t like package name X, Y and Z”), so let’s just leave this topic alone until after 1.0 is out and we’re all using the new package manager and we can rename packages to our heart’s content.

17 Likes