Copyright issues for code excerpts

In my understanding code excerpts are not a problem, if they would you couldn’t use Stackoverflow or answers from this forum.

IANAL, but if the paper also contains a full description of the algorithm, seeing code excerpts per se should not preclude one from programming the algorithm without having to worry about this.

That said, it would be great if anyone publishing code excerpts would just include a footnote that they are licensed MIT or public domain, so we would not have to have these discussions.

1 Like

Typically, small snippets of < 15 lines of code are considered non-copyrightable (according to the FSF’s legal counsel).

If someone posts 100–200 lines of code somewhere, then copying it into your code is indeed a problem if you don’t have an explicit license giving your permission. In the case of discourse.julialang.org, the terms of service explicitly state that contributions are provided under a CC noncommercial license, which is not free/open-source. StackOverflow also uses a creative commons license, and in particular the CC-BY-SA license which is GPL-incompatible and is much more restrictive than the BSD or MIT licenses. So be careful about using anything longer than a few lines!

Update: As of 6 March 2020, the Julia discourse terms of service were updated so that “Source-code user contributions … are additionally licensed under an MIT License”, so there should no longer be any problem in using code posted in this forum.

There are a lot of gray areas here, and you can never be 100% certain what a judge would say, but basically anything that is sufficiently “expressive” is automatically copyrighted. (Fair use gives some other extremely narrow exceptions that allow you to quote someone else for purposes of commentary etc.)

Yes, but you have to make sure you don’t even look at the excerpts and are not influenced by them. Your code needs to pass the abstraction-filtration-comparison test, and you must be able convince a court that you didn’t copy any “expressive” elements of the excerpts.

5 Likes

It almost seems as if including source code in a paper is an efficient way of “almost” patenting an algorithm, since it makes it really hard for anyone else to use it.

What if the author provides really detailed pseudo code in the paper? Wouldn’t any code written be derived work then? Where is the line between pseudo code and code drawn? What if the pseudo code is left out of the paper and only runnable code is included? This runnable code serves the same purpose as the pseudo code inasmuch as it describes the algorithm, but how could anyone ever implement it without deriving from the published code in this case?

(Hmm, would a syntax error in the published source code make it pseudo code?)

There was a 2017 ruling by the US copyright office on this demarcation. But as that ruling explains, pseudocode, even if it cannot be registered for copyright as computer software, can still be copyrighted as text if it contains “expressive” elements. It’s not completely cut-and-dried, therefore, that implementing something from pseudocode is safe (i.e., not a derived work), but the closer the pseudocode is to abstract statements of mathematical fact (and the less like code in any language that can be unambiguously executed by a computer) the safer you are.

Programmers often want the law to function like a computer program, with precise mathematical distinctions between the “algorithm” and the “implementation”, and often propose that it might be susceptible to gotchas like inserting a syntax error to transform source code into pseudocode (another famous example is of people trying to work around the GPL by dynamic linking). But (lawyers have repeatedly told me) the law doesn’t work like that. It is interpreted by human judges (who frown on attempts to game the system with technicalities), has many blurry lines, and you can often get surprising rulings.

Copyright law, unfortunately, is a lot more restrictive than most people seem to think.

3 Likes

If I write a code example in CC-BY-SA licensed paper, can you use the code in MIT-licensed repository? Will the situation change if I drop SA and publish under CC-BY?

An excerpt from here:

If an implementation must be done in a particular way , there is no room for creativity, which means the implementation is not protected by copyright.

A program could be the result of simply putting together straightforward calls to one or more given APIs (Application Programming Interfaces). Such a program would not be protected.

An algorithm is not copyrightable and it seems there is very no room for any creativity here, except how to exploit language syntax, which would make the Julia implementation unique anyway.

You can use the code, but then the resulting combined work us under CC-BY-SA (that is a copyleft license). CC-BY is a simple permissive license more like MIT.

In general, if you combine code derived from multiple sources, the result is governed by the union of all of the license terms. (If the licenses have mutually contradictory requirements, e.g. CC-BY-SA and GPL, then you cannot distribute the combined work at all: the licenses are “incompatible”.)

2 Likes

The problem is that there is lots of room for creativity in implementing any sufficiently complicated mathematical algorithm. If you ask 10 programmers to implement a Cooley–Tukey FFT algorithm, they will produce 10 very different-looking codes (and probably with very different-looking performance). The xsum code that spawned this discussion requires a very careful coding style in order to get good performance, and that coding style is probably copyrightable too.

(Where all this gets fuzzy is that any program is in some sense “just math” except for superficial details like the spelling of the variables. But that hasn’t stopped the courts from allowing software to be copyrighted, and in some cases patented.)

1 Like

I believe that StackOverflow answers are explicitly MIT licensed:

I forget if we have a similar policy here but we should.

7 Likes

You may be right, but at the same time a lot of these discussions on the internet happen with most participants having no legal education or experience whatsoever. So it is hard to form an opinion.

Also, people implicitly assume that US law and precedents apply, but that that is not necessarily the case.

1 Like

If we had these issues, no algorithm could be implemented in any language. The sole purpose of academic publications is to let other people build on these ideas. Otherwise they should have been kept secret. I consider such code excerpts ‘public domain’, do what you want with it thing.

So in this instance I would just take the the C code from the paper and base the Julia implementation on it. I am sure that this is what the author intended. But, you can just ask the author.

Out of curiosity, how come this did not come up before? Surely any part of Julia is subject to such debate?

Unfortunately, such teleological reasoning does not help in practice. Quite a few laws and established precedents just make little sense from a common sense or social welfare perspective. Many argue that copyright law is an outstanding example of this.

Not at all. Original contributions aren’t affected.

It is largely the case if you want to distribute the code in the US, especially if the code you are deriving from comes from the US. (In this case the author is Canadian, but Canadian law is similar in that works are automatically copyrighted.)

So in this instance I would just take the the C code from the paper and base the Julia implementation on it. I am sure that this is what the author intended.

This kind of thing is almost definitely a violation of copyright law. I’m sorry, but the law just doesn’t work the way you think it does. Please don’t contribute code to Julia that you have translated from other sources without a license.

2 Likes

There are lots of informed sources out there, including:

As you pointed out above, perhaps the least reliable method is to argue from the principle of what you think the law should be, unfortunately.

11 Likes