Distinguishing projects from packages

old title: "Extending PkgDev.jl to Project Generation"

edit:

  • the title was changed to reflect what gets discussed.
  • however this starting post (below) remains unedited to highlight the question that provoked it

From PkgDev’s description,

PkgDev.jl provides a set of tools for a developer to create, maintain and register packages in Julia package

How could this tool be extended to generating projects?


This necessitates a distinction between projects and packages.

  • Projects being the code users write 99% of the time
  • Packages being conventional 3rd party libraries/addons/gems (whatever you want to call them).

In typical Julia usage, a project is often just a package that hasn’t been published. i.e. it’s not clear that we need a distinction.

I think that for a distinction of a project over a package we could turn to R and packrat, and say a project is a collection of data files and scripts, which are bundled with their own copy of packages at the correct versions (maybe like how Pkg3 is looking to include environments). Projects then form a self-contained reproducible package submitted as supplements to papers and deposited on data archives. The scripts in a project are often single use and work on the data and package versions provided in the project. Whereas with packages the code inside is usually of production quality, and any data included is typically for unit testing purposes.

There a few meaningful distinctions.

  1. global runtime configuration of dependencies (libraries & packages):
  • a package cannot since it is intended to be used by other projects

  • a standalone project can provide this since it’s the “end user”

  1. a run target:
  • you generally need a way to run a project as a service or program – and this can/should be done in a standardized way

  • it doesn’t really make sense to run a package, although we may want to provide binary artifacts alongside packages and do so in a standardized way

  1. a deploy target:
  • one often wants to deploy a project by placing all the necessary runtime dependencies in a place where they can be used without any other build-time artifacts

  • for a package, installation by the package manager is how you deploy it.

That seems like a very academic-centric definition. Most software projects are never published, nor do they come with any papers. Software projects provide one or more programs that are either run occasionally to accomplish a task in a limited amount of time or run continuously, cooperating to respond to ongoing requests. In other words, they are live systems that evolve over time, not artifacts.

2 Likes

Absolutely, in that case talking about ‘projects’ is insufficient. An academic end user of julia will be thinking of what I described, whereas someone else would be thinking of the kind of software project you outline. I’m only speculating but where the original post states:

This necessitates a distinction projects and packages.

Projects being the code users write 99% of the time
Packages being conventional 3rd party libraries/addons/gems (whatever you want to call them).

I think this indicates @djsegal maybe means something more like the academic sense of ‘project’, if “the code users write 99% of the time” is taken to mean most scripts written by people to just do some job they’ve been given. But again, this is not 100% precise, because ‘user’ could very well be an academic - julia being a language with many users comming from Python, R, Matlab - but it could equally be a different kind of user. Perhaps @djsegal you could expand on what you mean by ‘project’?

The notion that academic software should be a single snapshot artifact strikes me as fairly unfortunate and increasingly outdated. The ideal output of an academic project should be a package that others can reuse and extend.

2 Likes

I agree with the caveat that if you’re reviewing how someone did something, you want to be able to go back and use the same versions of things someone used, otherwise if the project has been extended or a component updated, a difference between a reviewers result and the researchers result needs to be determined as either a failiure to reproduce, or a difference caused by a change in software. Thankfully, julia’s package manager makes this pretty easy as you can specify specific versions as dependencies! And so for julia, versioned packages do the job of packrat pretty well already.

Pkg3 will help considerably since you can specify a complete environment.

1 Like

Yes! Looking forward to that very much :slight_smile:

Indeed, or as with the JuliaDiffEq universe common interface, an academic project could just something that adds your research algorithm as a dispatch to solve(prob::AbstractODEProblem,MyAlg();kwargs), i.e. take in an ODE in a common generic form and have your special research algorithm spit out a solution. Then make organizations able to accommodate people extending function with their special algorithms, and have a goal for academic projects to be to have a repo with testing enabled that plugs into an ecosystem like this.

Of course, there’s many different ways to handle this, but I would like to see JuliaOpt (Optim.jl), JuliaML (I think it is), etc. go this direction as well. One of the things that drove me to Julia is that using two commands and putting your scripts in the right folder does this, so with barely any dev knowledge (and maybe a little help from your friendly neighborhood org) your “project” is now a tested, stable, and maintained (if there’s an active org with access, even if the original author leaves) part of the ecosystem.

Usually it’s pretty arrogant to tell people “no, you’ve been doing it all wrong before”, but in this case, it is correct that leaving random scripts that won’t work in 2 years around the web is not reproducible research, and it’s a shame if people don’t take the half hour (in Julia) to make it part of a maintained package for others to actually use.

Going with what @StefanKarpinski said, I believe:

  • Packages are modular, reusable chunks of code.

  • Projects are end-user codebases that can string together multiple packages using global configurations

By this, I mean:

  • Projects can include Packages
  • Packages can include Packages
  • Projects cannot include Projects
  • Packages cannot include Projects

Of course,

  • Projects can become Packages (many good packages do start this way)
  • and Packages can have a “dummy” Project inside them for testing purposes

What I’m looking for is to have packages and projects look nearly identical, except for their essential differences.

For example:

  • A config folder in Projects
  • A dummy folder in Packages

this definitely doesn’t work at least in the academic context; what if I develop 2 or more project in parallel that depend on one another?

Then they should be packages – i.e. reusable code.

It sounds like you are describing microservices (i.e. projects that talk).

If anything, I think this decision will allow you to accomplish that in a more agreed-upon manner.


Then a follow-up, “what if the two projects share code?”. This is solved by:

  • extracting shared/core code into a package (or multiple packages)
  • and then importing the core package(s) into your projects

And that’s the problem. If that’s what you’re calling a project, then I don’t think it should be shared. If you want to share it, you should make it a package. It’s dead simple (you don’t have to register it) and then you have tests to tell you when and why it works.

But the idea of just sharing scripts like people did with MATLAB was never a good idea, and I don’t think Julia should try to make it standard at all. In the end, any sharable analysis can and should be broken down into clearly defined and documented functions, a short script which ties together the functions (for which a simple case serves as a test, a more difficult case is an example notebook), and documentation. That’s a package.

[quote=“djsegal, post:12, topic:153”][/quote]

Of course,

  • Projects can become Packages (many good packages do start this way)
  • and Packages can have a “dummy” Project inside them for testing purposes

What I’m looking for is to have packages and projects look nearly identical, except for their essential differences.

The above two-way look at “nearly identical” things reminds me of a Matryoshka doll, which to me is a strong indication against their distinction. As a consumer, I like the freedom to easily combine any available package of code, based solely on the functionality of its content, not on the intentions of its packager.

It is loosely similar to modules. Their creators export what they think as useful, while their users appreciate easy access to any code inside. Another way to look at it, on runtime a function is a single thing, no matter what module contains each one of its methods.

I believe that Julia’s philosophy is to limit such “cannot include” distinctions as artificial barriers, whether lower/higher, internal/external, partial/complete etc.

Whatever we choose to call them, there are three distinct concepts:

  1. Reusable code:
  • provides mechanism for other code to load it: YES
  • can provide global runtime configuration: NO
  • meaningful to “run it” on its own: NO
  1. Top-level code:
  • provides mechanism for other code to load it: NO
  • can provide global runtime configuration: YES
  • meaningful to “run it” on its own: YES
  1. Either of the above.

Reusable code cannot provide global runtime configuration because if it does then different units of reusable code can provide conflicting configuration. We already call 1 a “package” and we’re going to keep on doing that. We can either call 2 “project” and call 3 “project or package” or we can call 3 “project” and call 2 “top-level project”. I’m not sure which is better, but trying to argue that there’s no distinction is not especially helpful since the distinction is real. At the same time, keeping the distinction as small as possible is a good idea. If you have a project and you want to turn it into package, it should largely be a simple matter of deleting the global configuration and providing an appropriate entry-point for loading code.

It might also make sense to allow other sensible combinations of these three yes/no choices, but I think having terminology for the most common patterns is still helpful. One idea would be having “targets” and associating global runtime configuration with targets. Top-level code would then simply be a project with a “run” target. Other targets could include “test” in various incarnations. Not entirely sure how loading code fits in.

1 Like

The fact that no particular term comes easily in mind means that the introduction of terminology will be more confusing than helping, at least in the beginning. And the closer the terms, the bigger the confusion. Imagine explaining to someone that some “package” is actually a “project”, which is nearly identical to being a “package”, still it doesn’t act as one, but if it’s a “top-level project”, then it can actually act as either a “project” or a “package”. Sure, that makes sense, but not the quality of sense the rest of Julia makes. There are already many terms which basically mean the same, a few of them stated earlier:

I don’t see three distinct concepts there, but some overlapping ones. I do see three distinct choices (the yes/no ones) of two distinct values (yes/no). Those are the real distinctions. And there may be more than three choices. The numbered cases are distinct combinations, not necessarily concepts. But then comes the Either case which implies that the values may be more than two (e.g. maybe/either/possibly/depends/irrelevant). The possible combinations can easily grow to dozens and who can predict which one of them will be sensible to acquire its own term.

Maybe you are into something, but the effort of stressing a distinction, while trying to keep it as small as possible, sounds contradicting to me.

I’m not sure what your position is here then. Refuse to have terminology for standard distinctions like intended for reuse vs. not intended for reuse?