Copyright and license of Julia packages

Well, I use two lines now:

# Copyright (c) 2020, 2021, 2022, 2024 Uwe Fechner and Bart van de Lint
# SPDX-License-Identifier: MIT

Lets see if our data steward is happy with it.

4 Likes

Sometimes it is tempting to copy a file rather than adding a dependency, especially in ecosystems lacking convenient package managers — for example, JuliaLang/julia itself contains several C files that were copied from other projects (with copyright headers).

Julia’s package manager reduces the temptation to copy code from other projects, but I’m sure it still happens from time to time when people don’t want to add a dependency for one little thing. And it is even more likely in cases where you want code from a project that is defunct and unmaintained. Hopefully people are careful to attribute such code and copy the license info, but historically many software developers are unsophisticated about copyright.

Adding a header to each file makes it a bit less likely to lose copyright info in such scenarios. Unfortunately, I suspect it’s less effective in a language like Julia where “file scope” is not significant. If people just copy snippets of a file, they are more likely to lose a copyright header.

4 Likes

As part of my previous thinking around copyright, I came around to the value of per-file “hey, I’m part of a project using license X” statements, as promoted by efforts like https://reuse.software.

The main reason against it IMO is the hassle, so I once again ignored xkcd: Is It Worth the Time? and wrote tec/headlice: Automatic license headers, and other licensing utilities - Code by TEC, which entirely handles it in the background.

4 Likes

Google has a suggested header for source files: Non-Apache License Text Headers  |  Google Open Source

We do this for all packages in jump-dev:

2 Likes

The university wants us to use this tool: Tutorial: How to become REUSE-compliant | REUSE

I am not so happy about it, because it means we have to attach a license to each and every file, even Project.toml or .gitignore. But I just did it for my main project, KiteModels.jl, I hope they are happy now.

Julia itself uses a one-line license header:

# This file is a part of Julia. License is MIT: https://julialang.org/license

Having a ton of boilerplate in each file seems very excessive, so we went for the smallest header we could and use a link that redirects to the actual license file (in case it moves).

3 Likes

So, presumably, that tells you exactly how they measure compliance: you have to make reuse lint pass. That gives you a chance to keep things sane while fulfilling the requirements.

Those REUSE tags don’t seem to be terrible (two comment lines per file). I very much liked the blog post How and why to properly write copyright statements in your code - … and probably more than what you ever wanted to know about them · Hook’s Humble Homepage linked in the README of @tecosaur’s tool. That really addresses a lot of my concerns, and gives clear information about details I would have been quite unclear on, like whether you need to bump the year every January 1st (you do not).

The idea that a .gitignore file is copyrightable at all is pretty bizarre. In any case, though, I would probably cover these files with a REUSE.toml file. That seems like a much cleaner option than adding a bunch of files like .gitignore.license files to your repo. And the TOML format seems like it allows you to define meaningful wildcards to keep the overhead pretty manageable. I think it would be pretty easy to add the two-line SPDX header to every actual source code file, cover the rest of the files via a REUSE.toml file, and thus make the reuse lint tool happy, and your university’s compliance officer, by extension.

As for the general point:

To be clear, I do that all the time. And then, yes, I usually add comments to point back to the original project, even on a function-level like in

I might start using the SPDX snippet syntax for that. It seems like a very thought-out and standardized system.

I suppose that’s true. My point was that I never had any trouble identifying the original license when I vendored third-party code. Referencing the original license information is on me, in any case, but I can’t argue with the idea that the original files all containing minimal license headers makes it more likely that the information isn’t lost accidentally.

The minimal SPDX headers seem pretty reasonable to this end, and the only real concern would be the overhead of adding them, i.e., the requirement to set up some tooling. I don’t think it’s a bad idea, if someone wants to do it, and https://reuse.software provides all the building blocks. Putting a multi-paragraph license text at the top of each file, as the GPL asks for, is still something I would consider absurd in today’s world.

3 Likes

I can think of three scenarios where licenses matter:

  1. Users You want to make the life of well-meaning users easier, because unless your code has a license, they cannot (legally) use it. In this case, single LICENSE.md in the root should be sufficient.

  2. Protection You want to protect your rights. In this case I understand the theoretical argument of having a statement in each file, but in practice, given that most Julia packages have a permissive, MIT-like license, why would anyone steal your code? Copying code from a package is absolutely unnecessary and at the same time the perfect footgun, because you will miss out on unit tests, bug fixes, updates, etc. It is really a crime with a built-in punishment. And let’s be frank, if politely asking someone to remove your copyrighted code they used in a way not permitted by the license doesn’t work, are you willing to go to court, possibly in another country? Really?

  3. Compliance This case is simple, because unless you want to fight the system, you will have to do as they say.

This topic is confusing because it is really about (3), but that was not clear from the start, so (1) and (2) are mixed in. But in this context they are not relevant.

3 Likes

While all you say is true, I found it helpful to have these per file copyrights. For example, I found out that the copyright for one 3D CAD file that I use is with my prof. So whoever wants to reuse this file should ask my prof and not me (well, or not ask him, but at least give reference to him as author). And one of my students prefers the MPL-2.0 to the MIT license. Not sure if you can easily mix different licenses in one Julia package, but these licenses are compatible so we just use both in the package.

I think that the right level of granularity for licensing Julia code in packages is… packages :wink: Preferably all code in the package should have the same license (may be multiple compatible licenses if preferred, but they should really apply to all code).

Otherwise, you are setting yourself up for a nightmare, eg if you refactor and move code between files (which happens frequently), do you keep track of each snippet?

If package authors cannot agree on a single license, they should consider packaging the code separately. Julia makes it easy (an, in some cases, preferable) to have small, focused packages.

1 Like

Well, we have multiple models in one package, and I will not change that unless I have to, I am already in charge of 10 packages and do not want to increase the number unless really needed.
And the different models are in different files and have different authors, so I do not see a problem to have two different licenses for the different models.

This thread has actually changed my opinion somewhat, and I’m strongly considering adopting https://reuse.software/ even outside of any formal compliance requirements in my organization. The per-file copyright management seems like a good idea. I want my code to be MIT licensed. But code licenses aren’t actually appropriate for non-code material (and vice versa). So I would in fact want the .md files, or images in the documentation to be licensed under CC BY-SA (or maybe CC BY, haven’t made up my mind yet), and “infrastructure files” such as a Makefile, build scripts, etc. should probably be in the public domain (CC0).

What was holding me back was lack of tooling to manage all that, but REUSE provides exactly that in what seems to be a well-thought-out system.

I agree though that intermixing multiple code licenses like MIT and MPL seems like a bad idea.

What I don’t understand is the legal basis for SPDX-FileCopyrightText: being a copyright notice. I understand that they have an ISO standard, but that does not make it law. AFAIK the law requires “Copyright” or “:copyright:”.

Frankly, I also don’t see the use case for REUSE from a FOSS developer perspective. I fully understand that it makes compliance people happy, because they can now Audit something with a tool and little green checkmarks come up if it is OK, and now they can say that they have Audited The Project with a Dedicated Tool, and All Is In Order and Everything is Compliant, etc.

But that information was already in the LICENSE file. This just adds red tape to the process for no clear reason. As long as it is optional, that is fine with me, but I am concerned that it will be come a requirement by habit.

3 Likes

This is answered in How and why to properly write copyright statements in your code - … and probably more than what you ever wanted to know about them · Hook’s Humble Homepage as well as in the REUSE FAQ. In short, it’s a good idea to add ©, including in the SPDX-FileCopyrightText, but it doesn’t make the copyright any more or less valid, since copyright is automatic (at least since 1989).

But that information was already in the LICENSE file.

If the LICENSE file can cover all files in the repo (which REUSE argues it cannot), then yes.

little green checkmarks come up if it is OK

I gotta admit I’m a sucker for little green checkmarks myself :wink:

I have read that multiple times and I don’t find the argument convincing. Actually, I am having a hard time even reconstructing an argument, they just keep repeating that they think every file should have a license statement, but it is unclear why.

Just to be specific, I think that the widespread convention that the nearest LICENSE file you find by going through parent directories is the effective one works just fine. You can do

LICENSE
Project.toml
src/
    PackageName.jl
test/
docs/
    src/
        LICENSE

where /docs/src/LICENSE is your preferred license for your documentation, but everything else is licensed under /LICENSE.

3 Likes

Yeah, I probably agree with you (and I would definitely have agreed with you at the start of this thread).

I suppose they have some notion of standardization / machine readability / making sure that every file is unambiguously licensed, plus the assumption that complex projects will need to include files with different licenses. That’s debatable at best, and probably massive overkill. I would not want to insist that people must adopt REUSE unless they’re being forced to by some compliance officer at their organization, or unless they have a warped sense of this kind of thing being “fun” (which might be a category I put myself in).

1 Like

Great topic and I just recently had that discussion in the Modelica community, which complicates things as tools serialize the code, so that code may end up in more or less files than you had intended (see here).

My last post there kind of wraps things up and most of it should carry over to this community as well.

Some key points:

  1. Your copyright needs no explicit statement, but … You do have a copyright the moment you create something, which is not trivial (to give a loose definition). Including per file headers is a precaution that you (or your organization) may take or not. The advantage is that it is ultimately extremely safe as it is machine and human readable at the raw source level.

  2. REUSE compliance is much less hassle than it looks. As has been pointed out here before, you can use a REUSE.toml to elegantly assign SPDX headers to multiple files en bloc. This extremely lightens the burden of adding labels to files like .gitignore or tons of image files. Finally, there are pre-commit hooks to add SPDX tags conveniently to files still lacking one.

  3. A single license file does not meet today’s complexity. With just two lines of SPDX tags a lot of ambiguity can be removed. Why shouldn’t some of the images that I ship in my project have a different license than the stricter one given for the main code? (It’s also extremely likely that those artifacts get distributed separately down the road.) It’s also a good idea to explicitly separate branding and trademark related copyrights from the rest. Often the license text does address this (but not always), yet the license file in my repo and the file with my corporate logo may get separated. I find it nice that there is an additional way to explicitly label that logo a “proprietary asset” even in an open source project.

TL;DR

In Julia, we value the flexibility and composability of multiple dispatch. It’s interesting to see, how the very same people that value this kind of flexibility still believe a single license file in a repo is all that should ever be needed in a world of composition (e.g., copy & paste). :slight_smile:
The legal world of IP is surprisingly more complicated than it may look. The REUSE specification imho goes a long way in dealing with that complexity in a rather clear, unambiguous, and even convenient way.

1 Like

Since you found this discussion interesting, I think it’s probably worth mentioning an earlier discussion that overlaps a bit with this one that you might also find to be of interest :slight_smile:

1 Like