RFC: REUSE compliance plugin for PkgTemplates.jl — API, docs, and root LICENSE behavior

I have a preview implementation of a Reuse plugin for PkgTemplates.jl and would appreciate testing and feedback.

Current PR:
Add REUSE plugin support by gwr-de · Pull Request #528 · JuliaCI/PkgTemplates.jl · GitHub

Documentation preview:

The plugin generates a REUSE-style licensing layout for new packages:

  • REUSE.toml in the project root,
  • required license and exception texts, verbatim, in LICENSES/,
  • support for SPDX license expressions such as MIT OR Apache-2.0 and GPL-3.0-or-later WITH Classpath-exception-2.0,
  • different license expressions for primary code, project artifacts, documentation, and documentation assets; if left at nothing, these currently fall back to "MIT" matching the current PkgTemplates.jl default,
  • support for custom LicenseRef-* licenses via a user-supplied directory; these may also be rendered as templates,
  • SPDX headers for generated files where appropriate,
  • optional README.md ## Licensing section,
  • optional reuse lint job in generated GitHub Actions workflows.

I am especially interested in feedback on:

  1. Is the public API understandable?
  2. Are the defaults reasonable for ordinary Julia packages?
  3. Is the distinction between the existing License plugin and the new Reuse plugin clear?
  4. Are the docs clear enough for users who know SPDX identifiers but have not used REUSE before?
  5. Should the plugin optionally generate a conventional root LICENSE file for compatibility with GitHub/JOSS/human expectations?
  6. Is vendoring an SPDX license-data snapshot an acceptable tradeoff for deterministic, clearly versioned, offline package generation, or should this be split out / reduced?

For the root LICENSE question (also see this issue), my current inclination is:

  • REUSE layout only as the default;
  • optional root LICENSE only for simple single-license cases such as MIT or Apache-2.0;
  • no full root LICENSE for compound expressions, expressions with exceptions, or LicenseRef-* cases, where the authoritative information should remain in REUSE.toml and LICENSES/.

Testing with real package templates would be very helpful.

The source branch is visible from the PR; direct branch link if useful:
GitHub - bslMS/PkgTemplates.jl at feature/reuse-plugin · GitHub

As we talked about in the other thread, that’s not workable, and shouldn’t even be an option: all Julia packages (at least all registered ones) must have a main LICENSE file, independent of what is set up via REUSE. The template should ensure that the REUSE information for the .jl files in src and test match that main LICENSE.

In particular, you can probably probably have MIT as the main LICENSE together with MIT OR Apache-2.0 in REUSE, but you cannot have a registered Julia package that is only licensed under GPL-3.0-or-later WITH Classpath-exception-2.0. That’s assuming that this modified license cannot be put in the main LICENSE file and be recognized as “OSI approved”.

Thanks, that is a useful distinction, but I would not want to make General-registry compatibility the only supported mode of the plugin.

Reuse is meant to generate REUSE-compliant licensing metadata for Julia projects in general. Not every Julia project generated with PkgTemplates.jl is necessarily intended for registration in General: there are private packages, internal packages, monorepo packages, application packages, teaching/research packages, and packages intended for other registries.

So I agree with the narrower point:

  • for General-registry-oriented templates, the generated project should have a conventional main LICENSE file;
  • the REUSE metadata for src/**/*.jl and test/**/*.jl should match the primary code license represented by that file.

But I would still like the plugin to support the broader REUSE case. In my view, the right design is not “always require a root LICENSE”, but rather to distinguish REUSE compliance from General registry compatibility.

Fair enough, but I would strongly suggest using “General registry compatible” as the default choice

That is a good point and we might have general_registry_compliance = true as a default for an additional switch and then the plugin can use the metadata in the snapshot (data/spdx-license-data/licenses.json) and expression parsing to fail — or at least warn — if violated and to generate the required LICENSES copy from LICENSES/?

Out of curiosity, what about this PR then?
AutoMerge: support REUSE-compatible LICENSES directories by DilumAluthge-LLM · Pull Request #670 · JuliaRegistries/RegistryCI.jl · GitHub

I had believed that it might allow for multiple licenses in a LICENSES/ directory?

If that draft ever gets finished / merged / deployed in General, that would be an entirely different story.

I’m certainly not opposed to actual full REUSE support for registered packages, although it might be a bit non-trivial to actually verify that fine-grained licensing includes OSI-approved licenses for all files distributed via the package servers. Maybe @dilumaluthge can give more details about the status of that PR.

But isn’t that exactly where REUSE becomes especially useful?

  1. reuse lint makes sure that there is no file that is not covered by the licenses provided in LICENSES/ and these must match exactly (e.g., you cannot have more than used in any license expression),
  2. For OSI approval check one might take a strict stance:
    – do not allow any LicenseRef-... file,
    – do not allow any exception,
    – for all licenses isOSIapproved must be true (contained in SPDX-license-list-data metadata .json)

Since the snapshot copies files verbatim, you can also check whether standard licenses have been modified using hashes. REUSE FAQ strongly advise against any such modification.

Yeah, that’s an option, but probably not what you’d actually want. The whole point of fine-grained licensing would be to allow for things like dual-licensing, including dual licensing with one OSI-compatible license and one non-OSI-compatible licensing. This comes up occasionally when government employees of certain countries maintain packages, where they are legally obligated to put their work in the public domain in certain jurisdictions, and sometimes that “public domain” is enforced with a specific non-OSI-compatible license text. In order for such a package to be registered, it must be dual-licensed. A full REUSE-aware registrator would have to pull all of that apart, and check that all relevant files are covered under at least one OSI-compatible license, but allow for additional non-compatible licenses. That’s where things might become “non-trivial”

So something like:

SPDX-License-Identifier: MIT OR LicenseRef-Gov-Public-Domain

is what you are saying? (here we cannot use my simple check)

That seems mechanically checkable once the SPDX expression has been parsed into a normalized expression.

For a first conservative RegistryCI rule, one could evaluate each file’s effective SPDX expression as a Boolean expression:

  • standard SPDX license with isOsiApproved == truetrue;
  • non-OSI license or LicenseRef-*false;
  • OR → Boolean OR;
  • AND → Boolean AND;
  • WITH exceptions could initially be treated as false, or later handled by a separate allow-list / known-SPDX-exception policy.

Then MIT OR LicenseRef-X evaluates to true, while MIT AND LicenseRef-X evaluates to false.

So the rule would not have to be “all mentioned licenses must be OSI-approved”. It could instead be: every distributed file must have at least one complete OSI-approved licensing path.

(The plugin tooling returns SPDX AST for SPDX license expressions…).

Yes, exactly

It’s definitely possible, and I think it would be neat to have… but all of that is what’s approaching “non-trivial” in my book, compared to checking a single LICENSE file. Someone “just” has to implement that. I haven’t looked at the existing draft PR in much detail, but I don’t think it covers much ground yet

Yes, agreed: the full RegistryCI problem is more than expression parsing.

What I wanted to point out is narrower: the “is there at least one OSI-approved licensing path?” part becomes quite tractable once the effective SPDX expression for a file has been resolved.

The PR already contains SPDX expression tooling for the Reuse plugin, because the plugin needs to parse expressions, collect referenced licenses/exceptions, and normalize them. Internally this is represented as an AST, not just as a string. That AST could support the Boolean-style policy check sketched above.

So I would see the split roughly as:

  1. REUSE resolution: determine the effective license expression for each distributed file.
  2. SPDX expression analysis: parse/evaluate the expression against an OSI-approval predicate.
  3. Registry policy: decide what General accepts, e.g. root LICENSE, LicenseRef-*, exceptions, generated files, etc.

My PR mostly covers (2) for package generation purposes, and (1) partially insofar as it generates the metadata. It does not claim to solve the full RegistryCI side, but perhaps the SPDX expression support could be useful if such a checker is developed?