Suggestion: Fortran-like Keyword to Assert Function is Pure

That’s a great question. What are these purity annotations useful for?

1 Like

Some examples from broadcasting:

  • In f.(u) .+ g.(v') (with u,v both length N vectors) you could perhaps do N evaluations of f, not N^2. But for something like f(x) = x + rand() this would change the answer, perhaps that’s impure for this purpose.

  • In the gradient calculation for f.(u), knowing that f does not close over other parameters would I think simplify how we handle it. Here f(x) = y[1] + x would be pure under Mason’s macro’s definition, but this needs something stronger.

Edit – it seems that Base.issingletontype(typeof(f)) is a reasonable test for the 2nd case. The 1st seems to want a weaker definition than @pure’s, it’s not the method table that’s the concern. Whether this could be inferred, I don’t know. It could also be provided either when defining or when calling the function, like @purecast f.(u) .+ g.(v').

4 Likes

Currently the way I’m doing this to check a methods lowered code with another function that I call explicitly. So after parsing, but before the method is run (it’s only partially effective).

But the parser/lowering is putting those references to variables in Main into the code in the first place, so at the point in that process we could at least deal with globals used within a function? I understand that the called methods may not be defined yet so it can’t be recursive.

Another level could be that marked no-global methods are only allowed to call other marked no-global methods. Which would occur during first run compilation? This type of purity seems more difficult, but not impossible.

But this illustrates that these questions are not easy to answer sensibly without a solid understanding of how the compiler operates. We mostly don’t know what the “many things are possible” things are. Llisting some reasonable options would help me think about this.

2 Likes

e.g. improve constant prop in gcd by simeonschaub · Pull Request #41258 · JuliaLang/julia · GitHub

1 Like

But if the compiler has to verify these annotations anyway, then it could have just come to the conclusion without annotations that a method has whatever kind of purity that is needed. In that case there’s two ways the annotations could help:

  1. They’re not checked, they are a promise by the user about the behavior of the method in question, which the compiler cannot verify. This is like Base.@pure which is occasionally useful but way to easy to misuse, which can lead to crashes.

  2. They are checked and when present they force the compiler to do more work then it would normally do to check them.

5 Likes

Out of curiosity, why does Base.@pure have issues with the global method table being mutable?
Why can’t it be invalidated?

It’d be convenient if functions like exp or log could be recognized as pure.
Common example (of log) where this would benefit:

using Distributions
d = Normal(12, 3);
x = rand(d,400);
sum(Base.Fix1(logpdf, d), x)

This would be much faster if the log(3) could be hoisted out of the loop.

julia> using Distributions, BenchmarkHistograms

julia> d = Normal(12,3); x = rand(d, 400);

julia> function fastlogpdf(x, d::Normal)
           lp = zero(promote_type(eltype(x), eltype(d)))
           μ = d.μ; σ = d.σ
           @inbounds @simd for i ∈ eachindex(x)
               lp += (x[i] - μ)^2
           end
           -0.5lp/σ^2 - length(x)*(0.5*log(2π) + log(σ))
       end
fastlogpdf (generic function with 1 method)

julia> sum(Base.Fix1(logpdf, d), x) ≈ fastlogpdf(x, d)
true

julia> @benchmark sum(Base.Fix1(logpdf, $d), $x)
samples: 10000; evals/sample: 8; memory estimate: 0 bytes; allocs estimate: 0
ns

 (3118.0 - 3143.0]  ██████████████████████████████ 9734
 (3143.0 - 3168.0]  ▏1
 (3168.0 - 3193.0]   0
 (3193.0 - 3217.0]   0
 (3217.0 - 3242.0]   0
 (3242.0 - 3267.0]  ▊213
 (3267.0 - 3291.0]  ▏24
 (3291.0 - 3316.0]  ▏6
 (3316.0 - 3341.0]  ▏6
 (3341.0 - 3366.0]  ▏1
 (3366.0 - 3390.0]  ▏1
 (3390.0 - 3415.0]  ▏3
 (3415.0 - 3440.0]   0
 (3440.0 - 3464.0]  ▏1
 (3464.0 - 5118.0]  ▏10

                  Counts

min: 3.119 μs (0.00% GC); mean: 3.135 μs (0.00% GC); median: 3.130 μs (0.00% GC); max: 5.118 μs (0.00% GC).

julia> @benchmark fastlogpdf($x, $d)
samples: 10000; evals/sample: 989; memory estimate: 0 bytes; allocs estimate: 0
ns

 (46.1 - 46.8]  ██████████████████████████████ 9526
 (46.8 - 47.5]  █291
 (47.5 - 48.2]  ▋159
 (48.2 - 49.0]  ▏5
 (49.0 - 49.7]  ▏3
 (49.7 - 50.4]   0
 (50.4 - 51.1]  ▏1
 (51.1 - 51.9]  ▏1
 (51.9 - 52.6]   0
 (52.6 - 53.3]   0
 (53.3 - 54.0]   0
 (54.0 - 54.8]  ▏1
 (54.8 - 55.5]  ▏1
 (55.5 - 56.2]  ▏2
 (56.2 - 88.1]  ▏10

                  Counts

min: 46.054 ns (0.00% GC); mean: 46.395 ns (0.00% GC); median: 46.352 ns (0.00% GC); max: 88.143 ns (0.00% GC).

This sort of code is common in Bayesian models, and is many times slower than it has to be.

5 Likes

There is 68 years of efforts by thousands of experts and major vendors like IBM, Intel, Cray, NVIDIA, NAG, … behind the Fortran compilers. I doubt if any one of them have been potato for decades as you say, unless you provide evidence for such claims.

3 Likes

A mechanism like this would be useful for avoiding “confusing” behavior of, e.g., rand.() .+ sprand(3, 0.1) (edit: it’s already mentioned by mcabbott Suggestion: Fortran-like Keyword to Assert Function is Pure - #45 by mcabbott):

julia> a = sprand(3, 0.1)
3-element SparseVector{Float64, Int64} with 1 stored entry:
  [1]  =  0.474913

julia> y = rand.() .+ a
3-element SparseVector{Float64, Int64} with 3 stored entries:
  [1]  =  1.39306
  [2]  =  0.976046
  [3]  =  0.976046

We have y[2] == y[3] because the broadcasting implementation assumes purity.

We can avoid this if the purity information is available at API level. That is to say, sparse broadcasting implementation can try to evaluate lower-dimensional sub-arguments as a pure function first. If it is successful, it is justified to broadcast pre-evaluated results. Otherwise, it needs to broadcast all non-pure arguments first and then materialize it.



I wonder if there is a way to handle puerility “check” as a simple (more or less) frontend lowering problem. I think this is desirable since an important property for dynamic language like Julia is that it has to be independent of inference. For example, the compiler (or even a macro) can, at a relatively early stage in the compilation, transform all calls f(args...) (including getindex etc.) in a pure function body to (say)

Base.pureapply(f, args...)

This implements the invariance that a pure function only calls pure functions. There is no need to know the types of args. The default implementation of Base.pureapply can, say, throw an error. [1]

A definition of pure function can be done with

@effectfree foo(args...) = $body

which is lowered to

Base.pureapply(::typeof(foo), args...) = $body
foo(args...) = Base.pureapply(foo, args...)

where all calls in $body are replaced with pureapply.

(A similar technique could be useful for annotating certain functions are blocking (no yield point) which helps us writing correct and efficient code in a cooperative task system.)

A trickier situation is when the implementation uses temporary buffers but makes sure the effect is not observable:

function foo(n)
    tmp = zeros(n)
    acc = 0
    for i in eachindex(tmp)
        tmp[i] = acc
        acc += i
    end
    return sum(tmp)
end

Maybe we need an “unsafe” variant for this:

@effectfree_turstme foo(n) = $body

Note that we can safely use pureapply(sum, tmp) but, of course, tmp[i] = acc cannot be pure. Making it “work” (i.e., rigorously checkable at least at run-time) would require “linear update” version of setindex! and any other mutation-based API.


  1. To make this usable without inference in a composable manner, it’s probably required to use the “easier to ask for forgiveness than permission (EAFP)” pattern by returning Union{Nothing,Some}. On one hand, the definition of purity makes the use of EAFP very attractive (as nothing is observable when “asking” failed). On the other hand, implementing lifting is tricky when there are control flows and closures. ↩︎

5 Likes

I thought it was a funny thing to say, but I am also aware that it’s reductive at best which is why I crossed it out.

I did my master’s thesis in Fortran, and that’s the full extent of my experience. Writing Fortran was awful, but I’m sure there is nothing wrong with the compiler.

2 Likes

Learn Fortran 2018, if you get back to it in the future. and please do not present reductive arguments (for any topic or language), unless you are confident about their truth.

1 Like

I hope the Julia community can avoid insulting other people’s tools. It maybe be funny to some, but it will only give us a bad name.

7 Likes

Purity annotations can help by indicating to the compiler that checking for purity is a good idea – functions not labeled pure can reasonably be guessed to be impure, so the compiler can avoid checking whether or not they’re pure. At the same time, the compiler could be more aggressive in optimizing pure functions to take advantage of their purity, without worrying that these checks will end up being a waste of time.

To give a particular example for this.
It is not just dynamic dispatch between methods of a function some of which are pure (in some sense) and others are not.
We also need to worry about methods that are
It is not just pure methods being overwritten, but invalidated by more specialized ones.

consider
Lets say we have a package Bar.jl that defines a function that is clearly pure
(maybe we have to annotate that, maybe we don’t. Doesn’t change discussion that follows much)

bar(x) = x

And lets say your package Foo.jl uses Bar.jl and it wants to check that it’s foo function is (in some sense) pure

using Bar
@ensure_this_is_pure function foo(x)
     2*bar(x)
end

all is well in the world.

Now you go and load a library Qux.jlwhich also happens to use Bar.jl for something unrelated, and it contains the code:

bar(x::Qux) = rand()

Does this now trigger an error, because this Qux type could get passed into foo, which would make it inpure?
I guess not until it done and the JIT compiles it – because the alternative is too aweful - unrelated code from somewhere distant throwing errors.
Spooky errors at a distance.

Like was said, all things are possible though.
It’s just hard, and some things have to be worked out.
and determined if they have a answer that is useful

4 Likes

Thanks for the explanation, this makes a lot of sense. I think that yeah, not throwing errors until JIT compilation is the best choice. Linters and other tools can help with static type checking to catch these sorts of cases ahead of time; they’ll be imperfect, but no language can ever catch 100% of bugs. Although, if a pure function calls another function which has both pure and impure methods, it should probably select a pure function even if there is a more specific impure one, which I think would reduce the risk of this kind of thing happening. (This has the coincidental benefit of making compilation slightly faster, since it narrows down the list of methods you’re searching through.)

In the case of overwriting, I think you could have pure functions be governed by the same rules as constants – overwriting a pure function throws an error. You could overwrite a pure function in the REPL, but you’d get the same warning about previously-compiled functions possibly breaking.

I think this is a dealbreaker. You would have two categories of functions the pure/fast/inflexible ones, and the impure/slow/flexible ones. Julia would not have dynamic dispatch for every function anymore, and you would lose flexibility because someone in your stack was paranoid about performance and declared everything pure. Also, you would need some boundary or special syntax, otherwise how someone can declare a pure function with multiple methods? Or you restrict pure functions to a single method, or you need to a way to say “these are all methods of this pure function, no more will come after, no need to worry about the mutable method table”. If you allow adding pure methods to a pure function after the first use of the function you either lose the guarantee of dynamic dispatch working (i.e., the more specific pure method will not be called) or you lose purity (i.e., the pure function can return a different result for the same parameters because now there is a more specific method).

Maybe if purity was something defined at call site, adding a Var{:pure}/Var{:impure} to the start of the methods, and using the current machinery over the contract that nobody will ever add new methods passing Var{:pure} as first parameter outside of some boundary (or after the same call).

1 Like

How often do you actually overwrite methods in your code, outside of the REPL? I’m not talking about adding an additional method that dispatches on different types. I’m talking about completely replacing a method with a different method that dispatches on the exact same types.

I’m not sure frequency factors into it, it’s allowed and possible, so you can’t assume it’s not done.

And this problem happens with normal overloading too:

a = 3
f(x::Number) = 1
f(a) == 1
f(x::Integer) = 2
f(a) != 1
2 Likes

this is not the only situation that can change what methods get called given the same set of arguments. A package could have a generic fallback on abstract types, then another package / user come along added a subtype of that abstract type and added a more specific method.

2 Likes

Yes, I’m aware. If you’re not overwriting the method, I’m saying that the more specific method should be ignored when called from inside a pure function if it’s not pure. This strikes me as a good thing – if a more specific method is impure while the more general method is pure, they’re probably doing different things and aren’t interchangeable, and function calls relying on purity can reasonably be inferred to want to use the pure method.

With regards to frequency, I’m saying that disallowing this – complete overwriting of pure methods, rather than just adding a new impure method to a function – would presumably not be a big burden on anyone, since it’s very rare that people want to completely overwrite a method. (Especially one that you’d think of as pure.)

Thanks for giving the example of normal overloading, though – I see now why the proposal of just defaulting to the most specific pure method wouldn’t work, since even introducing a new pure method can create impure behavior. Yeah, that’s a real problem, and I’ll have to think about it.