Distinguishing projects from packages

How about, instead of distinguishing projects/packages; top-level/reusable; whatever, each repository can just have certain characteristics or tags such as runnable, includable (bad terms, I am just making these up).

But if it depends on a bunch of global configurations which you may/may not have, is it really ā€œrunnableā€? The only thing Iā€™d clearly classify as ā€œrunnableā€ is a package which has tests on Traivs/AppVeyor (since I use both Linux and Windows) and good test coverage, anything else is suspect.

I think that is not the point. I am talking about labels, while you mean testing.

It should be. Otherwise what is the proposal for things to share? Untested code which ran on one computerā€™s code at some point in time? Thatā€™s a notebook file in a repo, not something that should be formalized.

Julia has the tools to make everything of use into a package, so that script like that could be small and all of the meaningful parts could be tested. That should be used.

That is what the travis test badges are for

Yes, and everything else is ā€œrandom script in a Github repo, use at your own riskā€. I donā€™t get why we need a name or formalisms for that.

When trying to communicate intentions, there are two extreme options:

  • Write a README file where the intention is clearly stated in detail.
  • Name the product itself after the intention.

The README approach does work, but is the most free-form and least standardized option, not to mention that some people donā€™t bother writing one, hoping that their code is self-explanatory.

Names are expected to answer the question of what something is. Of course that has multiple answers. Something may be Julia code, so we may call it a script. And may be reusable coherent code (ideally, all code is reusable), so we may call it a program. And may consist of separate functionalities packed into a single library, so we may call it a package. And may be code not intended for reuse, so we may call it what? A project? It doesnā€™t follow. It is the product of a project, not the project itself. An application? Maybe, although that word is used for compiled executables. Still what stops me from reusing that product? Its present configuration? How hard is it to change it? We want it as easy as it gets. In any case, intentions reside on the creatorā€™s mind, not on the package itself. To the eyes of a programmer who intends to reuse it, it remains a reusable package.

Then come the non-extreme options. An option closer to name is the suggested tags/labels/badges. An option closer to README is a file of special format. I believe that both options are better than either of the two extremes. Maybe we can come with an option better than all, but it will be somewhere in the middle. The extreme name option is not the only alternative to README nor the best one.

@ChrisRackauckas I was in fact trying to suggest something to make the distinction more light-weight since I actually dislike a (to me arbitrary) division into packages and projects.

But I would very much like a much much smaller METADATA_CORE.jl of ke packages that are carefully reviewed, where naming conventions are actually enforces, and so forth than the current free-for-all. There your suggestion makes sense

Iā€™m afraid I have no idea what youā€™re talking about at this point.

Having runnable vs. reusable as independent properties is certainly possible and an interesting idea to consider ā€“ thatā€™s what I was alluding to in the last paragraph of this post. However, Iā€™m not convinced that the combinations other than the ones I listed are sensible or practical. For example, Iā€™m quite convinced that providing global runtime configuration makes sense if and only if it makes sense to ā€œrun itā€. Does it make sense for a package (i.e. reusable code) to also be runnable? It might in the sense of having usage examples.

No worries, I appreciate your effort either way. I wish I could express my point in Julia code, but the language hasnā€™t reached that level yet.

The bottom line is that Iā€™m against the suggested distinction and especially calling some packages as projects. The suggestion for independent properties makes more sense, whatever form they may take.

FWIW for my purposes local packages do the trick perfectly. Even without some main function it is easy enough to structure that package in a way that the targeted user just has to call some function run_experiment(use_3d_pi_charts=false) to reproduce the results (or achieve whatever the point of the experiment is).

That said, I am quite confident it would be a nice feature for scientists to offer an easy way to freeze some working code/package in time, if that sentence makes sense. As important as reproducible research is, it is quite a hard sell if one has to maintain the code after the paper using that code has been published. People want to move on to new things.

I agree that an easy, foolish-person-proof way to say ā€œthis works as it is right nowā€ and respond to otherā€™s requests for the workproduct to get the most recent of the "this works as it was right then"s unless the most recent workproduct, working or not, has been requested (like Pkg.checkout vs Pkg.add). I donā€™t think the the github versionā€¦patch tagging facility is user/scientist friendly in that way. Although we could autogen a next stepped tag and META remember the tags have tagged ā€œworks as was right thenā€ workproduct, my experience is that it is too easy to break tag assumptions.

Concerning Project vs Packages: why not just standardize where to put runnable scripts into packages as we know them now? Say a folder run or scripts and the main program would be run/main.jl. Pure ā€œProjectsā€ would have an empty src/ folder and full run/ folder and vice versa (most would have a bit of both). Similar to Pkg.test("SomePkg") we could have a Pkg.run("SomePkg") to run run/main.jl.

7 Likes

Iā€™m in favour of this idea in any event: Sometimes a package of code is written providing some functionality, and it can be used in other packages and projects then, but for conveinience, you might want some very simple programs that come with the package, which wrap a few functions in that package that can be called from the terminal. I recently made a package with code and functions for DNA sequence dating which is useful to load in other projects, but it is also useful to distribute a little program as you describe, that can be invoked from the terminal and accepts a few inputs and spits out an output.

1 Like

Iā€™m in favor of a run/ folder.
I think scripts/ is more ambiguous.

I posted my suggestion as issue over in Pkg3-Julep
https://github.com/JuliaLang/Juleps/issues/20

Similar to mauro, I also have tons of command line utilities in my Python packages. The users use them as libraries but have some terminal commands for common operations (they even provide them so I can add them in future releases).

I like how setuptools for Python manages this command line tools as ā€œhooksā€. You define in a configuration file the name of the command and the entry function to execute, like:

entry_points={
    'console_scripts': [
        'foo = foo.tools.cmd:foonction',
    ],
}

Many Python packages provide such tools and at least in our collaboration we use them on a daily basisā€¦

3 Likes

Update ā€“ terminology I ended up using in the Pkg3 documentation:

https://julialang.org/Pkg3.jl/latest/index.html#Glossary-1

Quoting the relevant parts:

Project: a source tree with a standard layout, including a src directory for the main body of Julia code, a test directory for testing the project, docs for documentation files, and optionally a build directory for a build script and its outputs. A project will typically also have a project file and may optionally have a manifest file:

Package: a project which provides reusable functionality that can be used by other Julia projects via import X or using X. A package should have a project file with a uuid entry giving its package UUID. This UUID is used to identify the package in projects that depend on it.

Application: a project which provides standalone functionality not intended to be reused by other Julia projects. For example a web application or a command-line utility. An application may have a UUID but does not need one. An application may also provide global configuration options for packages it depends on. Packages, on the other hand, may not provide global configuration since that could conflict with the configuration of the main application.

Projects vs. Packages vs. Applications:

  1. Project is an umbrella term: packages and applications are kinds of projects.
  2. Packages should have UUIDs, applications can have a UUIDs but donā€™t need them.
  3. Applications can provide global configuration, whereas packages cannot.

Iā€™ve found this terminology to be intuitive and helpful. Having ā€œprojectā€ as an umbrella term for both packages and applications is good since otherwise you find yourself using the overly long and awkward phrase ā€œpackages or applicationsā€ over and over in situations where the term ā€œprojectā€ is quite natural and easy.

10 Likes