Distinguishing projects from packages

this definitely doesn’t work at least in the academic context; what if I develop 2 or more project in parallel that depend on one another?

Then they should be packages – i.e. reusable code.

It sounds like you are describing microservices (i.e. projects that talk).

If anything, I think this decision will allow you to accomplish that in a more agreed-upon manner.

Then a follow-up, “what if the two projects share code?”. This is solved by:

  • extracting shared/core code into a package (or multiple packages)
  • and then importing the core package(s) into your projects

And that’s the problem. If that’s what you’re calling a project, then I don’t think it should be shared. If you want to share it, you should make it a package. It’s dead simple (you don’t have to register it) and then you have tests to tell you when and why it works.

But the idea of just sharing scripts like people did with MATLAB was never a good idea, and I don’t think Julia should try to make it standard at all. In the end, any sharable analysis can and should be broken down into clearly defined and documented functions, a short script which ties together the functions (for which a simple case serves as a test, a more difficult case is an example notebook), and documentation. That’s a package.

[quote=“djsegal, post:12, topic:153”][/quote]

Of course,

  • Projects can become Packages (many good packages do start this way)
  • and Packages can have a “dummy” Project inside them for testing purposes

What I’m looking for is to have packages and projects look nearly identical, except for their essential differences.

The above two-way look at “nearly identical” things reminds me of a Matryoshka doll, which to me is a strong indication against their distinction. As a consumer, I like the freedom to easily combine any available package of code, based solely on the functionality of its content, not on the intentions of its packager.

It is loosely similar to modules. Their creators export what they think as useful, while their users appreciate easy access to any code inside. Another way to look at it, on runtime a function is a single thing, no matter what module contains each one of its methods.

I believe that Julia’s philosophy is to limit such “cannot include” distinctions as artificial barriers, whether lower/higher, internal/external, partial/complete etc.

Whatever we choose to call them, there are three distinct concepts:

  1. Reusable code:
  • provides mechanism for other code to load it: YES
  • can provide global runtime configuration: NO
  • meaningful to “run it” on its own: NO
  1. Top-level code:
  • provides mechanism for other code to load it: NO
  • can provide global runtime configuration: YES
  • meaningful to “run it” on its own: YES
  1. Either of the above.

Reusable code cannot provide global runtime configuration because if it does then different units of reusable code can provide conflicting configuration. We already call 1 a “package” and we’re going to keep on doing that. We can either call 2 “project” and call 3 “project or package” or we can call 3 “project” and call 2 “top-level project”. I’m not sure which is better, but trying to argue that there’s no distinction is not especially helpful since the distinction is real. At the same time, keeping the distinction as small as possible is a good idea. If you have a project and you want to turn it into package, it should largely be a simple matter of deleting the global configuration and providing an appropriate entry-point for loading code.

It might also make sense to allow other sensible combinations of these three yes/no choices, but I think having terminology for the most common patterns is still helpful. One idea would be having “targets” and associating global runtime configuration with targets. Top-level code would then simply be a project with a “run” target. Other targets could include “test” in various incarnations. Not entirely sure how loading code fits in.

1 Like

The fact that no particular term comes easily in mind means that the introduction of terminology will be more confusing than helping, at least in the beginning. And the closer the terms, the bigger the confusion. Imagine explaining to someone that some “package” is actually a “project”, which is nearly identical to being a “package”, still it doesn’t act as one, but if it’s a “top-level project”, then it can actually act as either a “project” or a “package”. Sure, that makes sense, but not the quality of sense the rest of Julia makes. There are already many terms which basically mean the same, a few of them stated earlier:

I don’t see three distinct concepts there, but some overlapping ones. I do see three distinct choices (the yes/no ones) of two distinct values (yes/no). Those are the real distinctions. And there may be more than three choices. The numbered cases are distinct combinations, not necessarily concepts. But then comes the Either case which implies that the values may be more than two (e.g. maybe/either/possibly/depends/irrelevant). The possible combinations can easily grow to dozens and who can predict which one of them will be sensible to acquire its own term.

Maybe you are into something, but the effort of stressing a distinction, while trying to keep it as small as possible, sounds contradicting to me.

I’m not sure what your position is here then. Refuse to have terminology for standard distinctions like intended for reuse vs. not intended for reuse?

How about, instead of distinguishing projects/packages; top-level/reusable; whatever, each repository can just have certain characteristics or tags such as runnable, includable (bad terms, I am just making these up).

But if it depends on a bunch of global configurations which you may/may not have, is it really “runnable”? The only thing I’d clearly classify as “runnable” is a package which has tests on Traivs/AppVeyor (since I use both Linux and Windows) and good test coverage, anything else is suspect.

I think that is not the point. I am talking about labels, while you mean testing.

It should be. Otherwise what is the proposal for things to share? Untested code which ran on one computer’s code at some point in time? That’s a notebook file in a repo, not something that should be formalized.

Julia has the tools to make everything of use into a package, so that script like that could be small and all of the meaningful parts could be tested. That should be used.

That is what the travis test badges are for

Yes, and everything else is “random script in a Github repo, use at your own risk”. I don’t get why we need a name or formalisms for that.

When trying to communicate intentions, there are two extreme options:

  • Write a README file where the intention is clearly stated in detail.
  • Name the product itself after the intention.

The README approach does work, but is the most free-form and least standardized option, not to mention that some people don’t bother writing one, hoping that their code is self-explanatory.

Names are expected to answer the question of what something is. Of course that has multiple answers. Something may be Julia code, so we may call it a script. And may be reusable coherent code (ideally, all code is reusable), so we may call it a program. And may consist of separate functionalities packed into a single library, so we may call it a package. And may be code not intended for reuse, so we may call it what? A project? It doesn’t follow. It is the product of a project, not the project itself. An application? Maybe, although that word is used for compiled executables. Still what stops me from reusing that product? Its present configuration? How hard is it to change it? We want it as easy as it gets. In any case, intentions reside on the creator’s mind, not on the package itself. To the eyes of a programmer who intends to reuse it, it remains a reusable package.

Then come the non-extreme options. An option closer to name is the suggested tags/labels/badges. An option closer to README is a file of special format. I believe that both options are better than either of the two extremes. Maybe we can come with an option better than all, but it will be somewhere in the middle. The extreme name option is not the only alternative to README nor the best one.

@ChrisRackauckas I was in fact trying to suggest something to make the distinction more light-weight since I actually dislike a (to me arbitrary) division into packages and projects.

But I would very much like a much much smaller METADATA_CORE.jl of ke packages that are carefully reviewed, where naming conventions are actually enforces, and so forth than the current free-for-all. There your suggestion makes sense

I’m afraid I have no idea what you’re talking about at this point.

Having runnable vs. reusable as independent properties is certainly possible and an interesting idea to consider – that’s what I was alluding to in the last paragraph of this post. However, I’m not convinced that the combinations other than the ones I listed are sensible or practical. For example, I’m quite convinced that providing global runtime configuration makes sense if and only if it makes sense to “run it”. Does it make sense for a package (i.e. reusable code) to also be runnable? It might in the sense of having usage examples.

No worries, I appreciate your effort either way. I wish I could express my point in Julia code, but the language hasn’t reached that level yet.

The bottom line is that I’m against the suggested distinction and especially calling some packages as projects. The suggestion for independent properties makes more sense, whatever form they may take.

FWIW for my purposes local packages do the trick perfectly. Even without some main function it is easy enough to structure that package in a way that the targeted user just has to call some function run_experiment(use_3d_pi_charts=false) to reproduce the results (or achieve whatever the point of the experiment is).

That said, I am quite confident it would be a nice feature for scientists to offer an easy way to freeze some working code/package in time, if that sentence makes sense. As important as reproducible research is, it is quite a hard sell if one has to maintain the code after the paper using that code has been published. People want to move on to new things.