Fixing Package Fragmentation

Giving users tools to find and choose packages, use better the package’s API, organize them, etc., is fundamental. This can help avoid someone starting a new package and choosing another working package that requires a minor intervention to do whatever is needed, and in the same way, enable more effective use in general through documentation.

On this matter, I was on the idea of creating full-text search indexes for packages descriptions and docs,

For instance, several package managers support package searching and some categorization, but most will not support full-text search.

I kept working on the proof of concept. The new version uses web scrapping instead of GitHub API and includes support for packages hosted in Gitlab and possibly others.

https://github.com/sadit/Search-Julia-Packages/blob/main/search-pkgs.ipynb

It can search on package names (with errors) and by readme content (also topics and descriptions if they are in Github). It supports changing name and content weighting scores.

It also adds support for FT search on module documentation based on Base.Docs and uses the Lunr search index that Documenter produces (searching without installing).

https://github.com/sadit/Search-Julia-Packages/blob/main/search-docs.ipynb

It is pretty similar to packages but for documentation. It is in an early stage and requires a lot of work, but also just works.

In addition, visualization and a cluster are generated
https://github.com/sadit/Search-Julia-Packages/blob/main/clustering.ipynb
These are based on package README files. This can help get a view of similar packages as a whole.

A small web API using the Oxygen package was created to see who to use the API. Additionally, the package database is slow to load per query, so if it is used someday, it should be as a web server or something like that.

There are some sites to search for packages (juliahub included), but most of them need a browser or install other kinds of tools. The proposal is to create tools that work in the REPL or any Julia process.

So, the idea is to gather people interested in going in this direction and produce some tools that can help the community using the information already on the packages. I am open to discussing and collaborating on these ideas (negative ideas that make me stop this effort are also welcome).

7 Likes