RFC: REUSE compliance plugin for PkgTemplates.jl — API, docs, and root LICENSE behavior

Yes, I can understand that and that’s why I will add a short section about how to reach General registry acceptance in the documentation. Note that the License plugin for PkgTemplates.jl does allow you to provide some file as a license and that file will override anything else.

I believe there should be symmetrical requirements?

Yes, and I can see a bit why. It is not greatly advertised, people get the impression that it is more of a nuisance than it is useful, and GitHub doesn’t bother really to support it.

But then — at least here in Europe — people might wonder whether it must be GitHub and companies might start to appreciate something like an SBOM provided upon entering a single command in the CLI?

There’s nothing wrong with giving people the option to use whatever LICENSE file they want. But the default of the License plugin is MIT, which is very registration-friendly. I’m just saying it helps if the defaults allow a package to be registered without further hoops to jump through.

It’s not a bad idea to have a “main” license.

I think they kinda should be able to treat the root LICENSE as the “authoritative” project license. It’s a good idea to have all the code covered by a relatively uniform license. And if documentation materials are covered under a different license like Creative Commons (which I agree is a good idea), those licenses should be relatively “compatible” in spirit, so that users aren’t going to overly surprised by mismatches between the root LICENSE and whatever fine-grained licensing might exist in the repo.

Reuse()

And all of a packages licensing is going to be MIT.

If you do not intend to re-distribute the entire repository under the terms of a single LICENSE.*, you shouldn’t invent one that’s inaccurate. Similarly, if you do not intend to accept all contributions to the repository under the terms of the single LICENSE.*, you shouldn’t invent one.

GitHub says contributions are under “the repository license” (or some other affirmative action I’ve taken that supersedes it, like signing a CLA). If there’s both a LICENSE.md with just one license and a REUSE.toml with multiple LICENSES/*, I would not know how to interpret this. Especially since GitHub itself will happily annotate the repository if the LICENSE.md file matches some known license.

It’s not just GitHub, it’s also SBOM tools and spdx and such. They all generally look to a single value for every package in a manifest. Worse than a General-incompatibilty is the wrong license.

This is an optional package that tries to do a good job at providing full REUSE compliance in package setups right from the start. It is totally flexible and allows you to go from “everything is licensed MIT” to very dedicated setups, which allow you to mention trademarks for marketing material and branding or to ship dual licenses etc.

You do not have to use it—it’s an offer. :wink:

Sorry, I am not getting what you are saying here? Do you mean LicenseRef-... licenses? The Reuse plugin currently will only accept these as “custom”; all other licenses are verbatim copies from SPDX-license-list-data (you can currently not generate a package with a depreciated license).

Can you explain, what you mean, please?

This sounds like you’re considering generating repositories that, e.g., have a REUSE.toml that says docs are CC-BY-4.0 and src is MIT, while simultaneously having a straightforward MIT LICENSE.md that contains no additional information about reuse. If I’ve misunderstood that, then my comment is irrelevant.

Sure, but that still misses out on how code may be moving. Let’s say I verbatim copy code files (not snippets for simplicity) that you have in your GitHub repo. SPDX headers for those files will clearly state say SPDX-License-Identifier: EUPL-1.2-or-later.

You can write your package with that code and provide your code with a compatible license, say AGPL-3.0-only.

You will still need to ship both licenses according the REUSE and it is fully ok to say that “THE” code is now AGPL-3.0-only. But the code files that were copied will still remain EUPL-1.2-or-later.

It can be more complicated than what you are saying here. And REUSE helps with all of that. Not more not less.

No, that’s exactly my point. Repositories that manage multiple licenses are complicated, and it’s a feature (not a bug!) that license scanners (and General and GitHub itself) will fail to classify licenses like SecurityAdvisories.jl/LICENSE.md at main · JuliaLang/SecurityAdvisories.jl · GitHub into a single value.

That’s not how I build the REUSE.toml. Your definitions overlap, i.e., "**" should also cover "docs/**". While it may work, I would not believe this to be proper REUSE setup.

Whenever possible, there should be SPDX headers on a per file basis. This in my default setup is done for .jl files in src/, benchmark/, and test/. Documentation or something like Project.toml is covered by REUSE.toml.

You can see the standard template below (you may provide your own file for the plugin).

# REUSE.toml
# Part of {{{PKG}}}

version = 1

# project artifacts
[[annotations]]
path = [
    ".gitattributes",
    ".gitignore",
    ".github/**",
    ".vscode/**",
    ".JuliaFormatter.toml",
    "Project.toml",
    "Manifest.toml",
    "docs/make.jl",
    "docs/Project.toml"
]
precedence = "closest"
SPDX-FileCopyrightText = "{{{YEAR}}} {{{AUTHORS}}}"
SPDX-License-Identifier = "{{{ARTIFACT_LICENSE}}}"

# documentation assets
[[annotations]]
path = [
    "docs/src/assets/**",
]
precedence = "closest"
SPDX-FileCopyrightText = "{{{YEAR}}} {{{AUTHORS}}}"
SPDX-License-Identifier = "{{{DOCS_ASSETS_LICENSE}}}"

# documentation source texts
[[annotations]]
path = [
    "{{{README}}}",
    "docs/*.md",
    "docs/src/**/*.md",
]
precedence = "closest"
SPDX-FileCopyrightText = "{{{YEAR}}} {{{AUTHORS}}}"
SPDX-License-Identifier = "{{{DOCS_LICENSE}}}"

You’re missing my point totally; I’ve edited the post to remove the shoddy example toml.

I am still not getting this? The default is root_license=false and there will be no single LICENSE or LICENSE.md that may be invalid.

reuse has no issues at all to assign all licenses in LICENSES/ to each and every file. It by default ignores a LICENSE file.

If you set root_license=true, then the plugin (after implementing this add-on) will copy the license given as an SPDX License Expression. For this to work out, that must be a single valid SPDX License (no WITH exception, no LicenseRef_... and no A OR B compound statement). That license will come verbatim from the licenses published by SPDX and that will be copied. So you can only mess up after package generation. :wink:

Again, you will only have a LICENSE text file, if root_license=true (default is false) and that will match the license expression given as license for the primary code. This is only done, because we cannot make REUSE compatible with General registry without this.

I would not have included a text as given above. And I am doing otherwise now because of the discussion here. What am I missing out on?

@mbauman Have I understood you now and clarified the issue that you raised?

I fully agree with what you wrote in post #18 above! You’re not missing anything; I’m just a different person than goerz and not a registry maintainer. I sympathize with the fact that a complicated licensing setup will incur speed bumps and maintainer burdens during registration. But my answer is to avoid complicated licensing setups if at all possible. PkgTemplates is all about setting folks on a happy path, and complicated licensing setups are not a happy path, in my very opinionated personal opinion.

:sweat_smile:

Good to know! And I agree exactly: There should be no ambiguity and that’s one part where REUSE excels imho. Every single file will have a clear license attribution that can be seen using reuse spdx—if the project passes reuse lint.

While this may be a “collection” of totally unrelated licenses and even “custom” licenses. That’s not what we typically want!

We want to make it easier for people to use things we provide. For example, documentation. Since that will be copied and consumed differently than code, using something like CC-BY-SA-4.0 makes sense for this.

Having attribution be done on a per file level is not meant to lead to madness and disorder. It just should protect that file from losing that attribution simply because someone just copied it manually.

Personally, I would very much prefer to allow for pure REUSE acceptance in General. But that’s not up to me…

That’s certainly what I have in mind! (Or, beyond this exact spec, whatever expresses "MIT for code, CC for non-code, probably with file-by-file headers). Independent of this particular plugin, I think that’s more than a sensible thing to do. People gloss over the fact that a code license like MIT just doesn’t make sense for documentation materials which are not source code. People should do this kind of licensing if they don’t just want to ignore the copyright for the non-code files of the repo (which is also a pretty sensible thing to do for small projects).

However, as you point out, lots of tooling (including the General registry) require a “repository license”. As long as that license also matches the fine-grained license for (approximately all of) the code, and as long as the fine-grained licenses for non-code aren’t surprising, I don’t think that’s too much of a problem.

If you want to play things really safe, or if the fine-grained licensing is surprising with respect to the main LICENSE, then you might just have to have all contributors sign such a CLA acknowledging the fine-grained licensing.

But generally, I think it’s okay (IANAL, obviously).

There should always be a section in the README that explains this. Would you be confused by the licensing in GRAPE.jl, where I’ve played around with REUSE (MIT for main code, CC-BY-4.0 for docs, and public-domain for some other files like *.toml, and CI.yml)?

At the end of the day, though, right now there is no way not to have a main LICENSE file for registered packages. So we have to work around that fact in some sensible way, which, I agree, requires bending one’s mind a bit around the potential for “ambiguity”. But the law – even copyright law – is supposed to be reasonable. As long as you don’t go out of your way to create confusion between REUSE and a repo-wide license, I think you’re okay.

I don’t disagree. There is a tension between fine-grained licensing and requiring a single LICENSE. As I’ve been trying to argue here, applying common sense should for the most part resolve that tension, but if someone wants to revamp the general registry license checks to allow for REUSE without a main LICENSE (sounds like a job for Claude, maybe?), that would change and potentially clarify things. At least as far as the registry is concerned. There’s probably plenty of other tools still built around the expectation of a single “repository license”, so realistically we might just have to be okay with the idea of a “main license”, even in the presence of fine-grained licensing.

Are you guys open for something like that? Given that the reuse tooling is available for CI use, once it‘s clear that a package is REUSE-compliant, things should be doable from SPDS expression parsing after collecting reuse spdx results, which should output the license expressions for all files (these might even be collected according to their paths…).

In the end, something like REUSE should not introduce more maintenance burden, but instead help with making things just more transparent and clear.

What about the strict_foss (or strict_open_source) idea? I know that you currently only check code. But if we know about documentation, specs, or data licensing as well, wouldn‘t it make sense to at least complain if CC-BY-NC-4.0 is mixed with say an MIT license?

I would be, but I’m not really that close with the actual registry tooling (as in: I wouldn’t be the one reviewing that PR). @dilumaluthge or @GunnarFarneback would probably be better to give you an initial opinion on that.

Registry tooling moves very slowly though, so even if something like that gets merged, it can be quite a while before it actually gets deployed. Understandably, we don’t want to screw up something as central to the ecosystem as package registration