Distinguishing projects from packages

djsegal · November 7, 2016, 6:37am

old title: "Extending PkgDev.jl to Project Generation"

edit:

the title was changed to reflect what gets discussed.
however this starting post (below) remains unedited to highlight the question that provoked it

From PkgDev’s description,

PkgDev.jl provides a set of tools for a developer to create, maintain and register packages in Julia package

How could this tool be extended to generating projects?

This necessitates a distinction between projects and packages.

Projects being the code users write 99% of the time
Packages being conventional 3rd party libraries/addons/gems (whatever you want to call them).

stevengj · November 7, 2016, 1:53pm

In typical Julia usage, a project is often just a package that hasn’t been published. i.e. it’s not clear that we need a distinction.

Ward9250 · November 7, 2016, 2:24pm

I think that for a distinction of a project over a package we could turn to R and packrat, and say a project is a collection of data files and scripts, which are bundled with their own copy of packages at the correct versions (maybe like how Pkg3 is looking to include environments). Projects then form a self-contained reproducible package submitted as supplements to papers and deposited on data archives. The scripts in a project are often single use and work on the data and package versions provided in the project. Whereas with packages the code inside is usually of production quality, and any data included is typically for unit testing purposes.

StefanKarpinski · November 7, 2016, 2:35pm

There a few meaningful distinctions.

global runtime configuration of dependencies (libraries & packages):

a package cannot since it is intended to be used by other projects
a standalone project can provide this since it’s the “end user”

a run target:

you generally need a way to run a project as a service or program – and this can/should be done in a standardized way
it doesn’t really make sense to run a package, although we may want to provide binary artifacts alongside packages and do so in a standardized way

a deploy target:

one often wants to deploy a project by placing all the necessary runtime dependencies in a place where they can be used without any other build-time artifacts
for a package, installation by the package manager is how you deploy it.

StefanKarpinski · November 7, 2016, 2:39pm

That seems like a very academic-centric definition. Most software projects are never published, nor do they come with any papers. Software projects provide one or more programs that are either run occasionally to accomplish a task in a limited amount of time or run continuously, cooperating to respond to ongoing requests. In other words, they are live systems that evolve over time, not artifacts.

Ward9250 · November 7, 2016, 2:52pm

Absolutely, in that case talking about ‘projects’ is insufficient. An academic end user of julia will be thinking of what I described, whereas someone else would be thinking of the kind of software project you outline. I’m only speculating but where the original post states:

This necessitates a distinction projects and packages.

Projects being the code users write 99% of the time
Packages being conventional 3rd party libraries/addons/gems (whatever you want to call them).

I think this indicates @djsegal maybe means something more like the academic sense of ‘project’, if “the code users write 99% of the time” is taken to mean most scripts written by people to just do some job they’ve been given. But again, this is not 100% precise, because ‘user’ could very well be an academic - julia being a language with many users comming from Python, R, Matlab - but it could equally be a different kind of user. Perhaps @djsegal you could expand on what you mean by ‘project’?

StefanKarpinski · November 7, 2016, 3:06pm

The notion that academic software should be a single snapshot artifact strikes me as fairly unfortunate and increasingly outdated. The ideal output of an academic project should be a package that others can reuse and extend.

Ward9250 · November 7, 2016, 3:11pm

I agree with the caveat that if you’re reviewing how someone did something, you want to be able to go back and use the same versions of things someone used, otherwise if the project has been extended or a component updated, a difference between a reviewers result and the researchers result needs to be determined as either a failiure to reproduce, or a difference caused by a change in software. Thankfully, julia’s package manager makes this pretty easy as you can specify specific versions as dependencies! And so for julia, versioned packages do the job of packrat pretty well already.

StefanKarpinski · November 7, 2016, 3:15pm

Pkg3 will help considerably since you can specify a complete environment.

Ward9250 · November 7, 2016, 3:15pm

Yes! Looking forward to that very much

ChrisRackauckas · November 7, 2016, 5:30pm

Indeed, or as with the JuliaDiffEq universe common interface, an academic project could just something that adds your research algorithm as a dispatch to solve(prob::AbstractODEProblem,MyAlg();kwargs), i.e. take in an ODE in a common generic form and have your special research algorithm spit out a solution. Then make organizations able to accommodate people extending function with their special algorithms, and have a goal for academic projects to be to have a repo with testing enabled that plugs into an ecosystem like this.

Of course, there’s many different ways to handle this, but I would like to see JuliaOpt (Optim.jl), JuliaML (I think it is), etc. go this direction as well. One of the things that drove me to Julia is that using two commands and putting your scripts in the right folder does this, so with barely any dev knowledge (and maybe a little help from your friendly neighborhood org) your “project” is now a tested, stable, and maintained (if there’s an active org with access, even if the original author leaves) part of the ecosystem.

Usually it’s pretty arrogant to tell people “no, you’ve been doing it all wrong before”, but in this case, it is correct that leaving random scripts that won’t work in 2 years around the web is not reproducible research, and it’s a shame if people don’t take the half hour (in Julia) to make it part of a maintained package for others to actually use.

djsegal · November 10, 2016, 10:31pm

Going with what @StefanKarpinski said, I believe:

Packages are modular, reusable chunks of code.
Projects are end-user codebases that can string together multiple packages using global configurations

By this, I mean:

Projects can include Packages
Packages can include Packages
Projects cannot include Projects
Packages cannot include Projects

Of course,

Projects can become Packages (many good packages do start this way)
and Packages can have a “dummy” Project inside them for testing purposes

What I’m looking for is to have packages and projects look nearly identical, except for their essential differences.

For example:

A config folder in Projects
A dummy folder in Packages

cortner · November 13, 2016, 12:52pm

this definitely doesn’t work at least in the academic context; what if I develop 2 or more project in parallel that depend on one another?

StefanKarpinski · November 13, 2016, 4:14pm

Then they should be packages – i.e. reusable code.

djsegal · November 13, 2016, 5:01pm

It sounds like you are describing microservices (i.e. projects that talk).

If anything, I think this decision will allow you to accomplish that in a more agreed-upon manner.

Then a follow-up, “what if the two projects share code?”. This is solved by:

extracting shared/core code into a package (or multiple packages)
and then importing the core package(s) into your projects

ChrisRackauckas · November 13, 2016, 7:49pm

And that’s the problem. If that’s what you’re calling a project, then I don’t think it should be shared. If you want to share it, you should make it a package. It’s dead simple (you don’t have to register it) and then you have tests to tell you when and why it works.

But the idea of just sharing scripts like people did with MATLAB was never a good idea, and I don’t think Julia should try to make it standard at all. In the end, any sharable analysis can and should be broken down into clearly defined and documented functions, a short script which ties together the functions (for which a simple case serves as a test, a more difficult case is an example notebook), and documentation. That’s a package.

akis · November 13, 2016, 9:53pm

[quote=“djsegal, post:12, topic:153”][/quote]

Of course,

Projects can become Packages (many good packages do start this way)

and Packages can have a “dummy” Project inside them for testing purposes

What I’m looking for is to have packages and projects look nearly identical, except for their essential differences.

The above two-way look at “nearly identical” things reminds me of a Matryoshka doll, which to me is a strong indication against their distinction. As a consumer, I like the freedom to easily combine any available package of code, based solely on the functionality of its content, not on the intentions of its packager.

It is loosely similar to modules. Their creators export what they think as useful, while their users appreciate easy access to any code inside. Another way to look at it, on runtime a function is a single thing, no matter what module contains each one of its methods.

I believe that Julia’s philosophy is to limit such “cannot include” distinctions as artificial barriers, whether lower/higher, internal/external, partial/complete etc.

StefanKarpinski · November 14, 2016, 12:49am

Whatever we choose to call them, there are three distinct concepts:

Reusable code:

provides mechanism for other code to load it: YES
can provide global runtime configuration: NO
meaningful to “run it” on its own: NO

Top-level code:

provides mechanism for other code to load it: NO
can provide global runtime configuration: YES
meaningful to “run it” on its own: YES

Either of the above.

Reusable code cannot provide global runtime configuration because if it does then different units of reusable code can provide conflicting configuration. We already call 1 a “package” and we’re going to keep on doing that. We can either call 2 “project” and call 3 “project or package” or we can call 3 “project” and call 2 “top-level project”. I’m not sure which is better, but trying to argue that there’s no distinction is not especially helpful since the distinction is real. At the same time, keeping the distinction as small as possible is a good idea. If you have a project and you want to turn it into package, it should largely be a simple matter of deleting the global configuration and providing an appropriate entry-point for loading code.

It might also make sense to allow other sensible combinations of these three yes/no choices, but I think having terminology for the most common patterns is still helpful. One idea would be having “targets” and associating global runtime configuration with targets. Top-level code would then simply be a project with a “run” target. Other targets could include “test” in various incarnations. Not entirely sure how loading code fits in.

akis · November 14, 2016, 4:43am

The fact that no particular term comes easily in mind means that the introduction of terminology will be more confusing than helping, at least in the beginning. And the closer the terms, the bigger the confusion. Imagine explaining to someone that some “package” is actually a “project”, which is nearly identical to being a “package”, still it doesn’t act as one, but if it’s a “top-level project”, then it can actually act as either a “project” or a “package”. Sure, that makes sense, but not the quality of sense the rest of Julia makes. There are already many terms which basically mean the same, a few of them stated earlier:

I don’t see three distinct concepts there, but some overlapping ones. I do see three distinct choices (the yes/no ones) of two distinct values (yes/no). Those are the real distinctions. And there may be more than three choices. The numbered cases are distinct combinations, not necessarily concepts. But then comes the Either case which implies that the values may be more than two (e.g. maybe/either/possibly/depends/irrelevant). The possible combinations can easily grow to dozens and who can predict which one of them will be sensible to acquire its own term.

Maybe you are into something, but the effort of stressing a distinction, while trying to keep it as small as possible, sounds contradicting to me.

StefanKarpinski · November 14, 2016, 3:17pm

I’m not sure what your position is here then. Refuse to have terminology for standard distinctions like intended for reuse vs. not intended for reuse?

Topic		Replies	Views
Why start domain-specific scientific projects as Packages? General Usage	16	2282	May 16, 2020
Package Manager Documentation New to Julia	13	1846	August 21, 2019
Workflow tips for small-team academic projectscode New to Julia package , github , workflow , code-organization	13	3021	June 9, 2020
Recommendations for how to browse source code Tooling	25	3879	May 28, 2019
Experience report after finishing a (reasonably substantial) Julia project in 2024 Community	82	4357	May 4, 2024

Distinguishing projects from packages

Related topics