Suggestion: Fortran-like Keyword to Assert Function is Pure

It can be a problem… if you define models in a script with lots of variables, misnaming something or missing a variable from a let block can use the global instead of throwing an error. Killing performance.

This takes a long time to debug manually. pure would help that a lot, especially if it could name the impure variable in the error.

1 Like

Sorry, but what are you referring to with this?

1 Like

Hm, I don’t encounter those kinds of problems because I even put setup stuff and the like into functions. I basically don’t use let and other global friends in global scope, ever. I can see why having pure would help in those cases, but it feels like requiring no side effects at all is a much stronger requirement than not mutating inherited state (which is simpler to achieve by not having inherited state in the first place).

Maybe that comes with me not restarting julia sessions, thus having the smallest runnable piece of code being not files but functions?

You can still have inherited state for anonymous functions defined inside another function. And yes these more limited types of purity are actually more useful to me. Like you can’t use any variables with inherited state, but e.g. writing to an array is fine.

For me this is about managing the transition from freeform scripting environment to a high performance core. In packages like DynamicGrids.jl you write the high performance core code in a loose script and it gets run on CPU or GPU, but hopefully many millions to billions of times a second.

Using anonymous functions and let blocks gives you fast modelling iteration time compared to using structs and more formalized code - which require restarts of julia to change basic things.

But currently the tools for ensuring the quality of that kind of code are not really there, especially when using one global var by accident means a 100x performance drop. Really you want to catch those kinds of errors as early as possible in a structured way. Short of doing a full DSL like ModellingToolkit.jl, or using JET.jl on whole files, I’m hacking a bunch of checks using JuliaVariables and other things to inspect the AST for obvious mistakes. But easy built-in safeguards like some kind of inherited state barriers or pure would be better.

I really want to be able to flag an anonymous function as being pure at the point of definition.

Having an annotation on functions that they are pure. Whatever notion of purity one settles on, how does one enforce it? There’s also the issue that there are many notions of purity that can be useful in different circumstances. How does one choose which one is the notion of purity that is going to be elevated into the language itself?

3 Likes

I think the best way to define would be to use the Wikipedia definition:

  1. The function will always return the same output for any given input.
  2. The function does not affect the global state.

I think the easiest way to enforce this would be to have the compiler throw an error if the function contains any references to variables that are defined outside of the function, excluding constants, or any functions not marked pure. I don’t think Julia has static variables, so I think that would cover it.

So pure functions can only call other pure functions? How do you define “the same input”? What if a function has multiple methods? Do they all have to be pure or impure? What if, as is common in Julia, it is unknown which specific method of a function will be called and some of the methods are pure while others are impure? What about calling C code? Do we mark each call with whether it is pure or not and hope that the caller doesn’t get it wrong (or just lie)?

7 Likes

Oh I should also mention the #265 issue: method tables are global mutable state. So if a function calls any function, it is technically impure because it might return a different value if called later. This is actually why the Base.@pure annotation is so hard to use properly — because to not be a lie the annotated function must not call any methods that are ever modified later on.

8 Likes
  1. Yeah, I think only calling pure functions is the easiest way to do this.
  2. Two inputs would be the same if x===y, I guess? I’m not sure how else to define it.
  3. I’m not sure why all the methods would have to be pure or impure. If you don’t know whether the method called will be pure or impure, it should probably be treated as impure in general.
  4. Yeah, I don’t know if there’s any possible solution to calling code from other languages, other than considering it impure unless the person calling it decides to mark it as pure.

When do you want to be told that this definition is illegal? If purity isn’t a whole-function property then what happens when a pure function calls a function with some pure and some impure methods and you don’t know which will be called at runtime? What if there were no impure methods at the time of definition but there are by the time it gets called?

5 Likes

I remember I asked the same question a few years ago on discourse while I was discovering Julia. My concern was (and still is) to understand if it was (will be) possible to build safe large multi-threaded Julia code relying on the language to prevent data races.

As I understand it now, Julia is not the language to go for this kind of safety : Rust put a lot of emphasis on this problem (Races - The Rustonomicon). Haskell and pure functional languages offer great tools for this via referential transparency like (e.g. 2. The Futhark Language — Parallel Programming in Futhark).

Julia is incredibly expressive, performant, productive, composable and is clearly my favorite language to code scientific programs. In other word, Julia is extremely powerful but not the best language for MT safety (IMHO).

P.S.
I tried several times to bait some Julia wizards to consider the Futhark approach to build a safe Julia EDSL (limited to pure array computations) to overcome my skills limitations and get access to safe MT.

4 Likes

Sorry, I know this litany of questions is annoying, but they are why this sort of thing isn’t so simple. It’s much more reasonable in Fortran since it is a static language with a separate type checking and compilation phase which is a natural time for the purity checking to happen. It is also a language without multimethods or dynamic dispatch, so any time a function call occurs, there is only one statically knowable function that can be called.

7 Likes

One approach I think might work pretty well is to use a static analysis tool that raises a warning whenever it can prove a pure function could modify its input. E.g.

@pure function foo(x::Vector)
     x[1] += 1
end

I don’t think it’s possible to detect all cases like this, but you could detect a lot, and possibly extend it with things like property based testing.

A related idea with a static analysis tool would be to just check that any function without a ! in its name is pure.

1 Like

So the method table really complicates things. But are there some elements of purity that are easier to get at?

Can we prevent accessing global state in a method or any method that it calls, recursively?

1 Like

Whoever answers this, I’d be interested to learn how I could go about answering this kind of question myself. I have similar questions about “how do I know if this method call can do X” where X is “I/O”, “allocate memory”, etc.

Many things are possible but I think that buried in my pile of annoying questions is one that is more essential than the rest: When do you want to be told that you’ve done something illegal?

3 Likes

Maybe you’re getting at something more specific, but

in general the answer is “as soon as possible”. If JET can find it statically, great; otherwise, in testing; otherwise at runtime.

Or maybe you mean something like “this might be illegal later, depending on what other stuff is added to the method table in the future, do you want to error now”? That sounds like something for the user to decide depending on the situation.

1 Like

Here’s an example of a macro that could implement purity checks (which can be disabled for no runtime cost):

using JuliaVariables, MLStyle, ExprTools

_get_outers(_) = Symbol[]
_get_outers(x::Var) = x.is_global ? [x.name] : Symbol[]
function _get_outers(ex::Expr)
    @match ex begin
        Expr(:(=), _, rhs) => _get_outers(rhs)
        Expr(:tuple, _..., Expr(:(=), _, rhs)) => _get_outers(rhs)
        Expr(_, args...) => mapreduce(_get_outers, vcat, args)
    end
end

get_outers(ex) = (unique! ∘ _get_outers ∘ solve! ∘ simplify_ex)(ex)

enable_purity_checks() = true

macro pure(fdef)
    d = splitdef(fdef)
    @gensym outers thunk result
    d[:body] = quote
        if $enable_purity_checks()
            $outers = $deepcopy.(($(get_outers(d[:body])...),))
        end
        $thunk = () -> $(d[:body])
        $result = $thunk()
        if $enable_purity_checks()
            $all($outers .== ($(get_outers(d[:body])...),)) || throw("This function is not pure")
        end
        $result
    end
    esc(combinedef(d))
end

Now,

julia> f = let y = 1
           @pure (x) -> (y = y + x; y)
       end
#50 (generic function with 1 method)

julia> f(1)
ERROR: "This function is not pure"
Stacktrace:
 [1] (::var"#50#52")(x::Int64)
   @ Main ./REPL[50]:11
 [2] top-level scope
   @ REPL[52]:1
3 Likes

The macro has hefty performance costs when things like arrays are involved:

julia> @pure g(v) = sum(v)
g (generic function with 1 method)

julia> v = rand(10000);

julia> @btime g($v);
  11.981 μs (4 allocations: 78.56 KiB)

julia> @btime sum($v)
  1.052 μs (0 allocations: 0 bytes)
4999.517674034476

but totally disappear if you disable the assertions:

julia> enable_purity_checks() = false
enable_purity_checks (generic function with 1 method)

julia> @btime g($v);
  1.054 μs (0 allocations: 0 bytes)
2 Likes

My point is that as a language, Julia doesn’t have anything like this: there is no static analysis phase in the language itself — if it parses, it will run. It might error at runtime, but there’s no phase where this kind of global invariant can be checked. So when would this check that functions annotated as pure are actually pure occur? One could do it with an external tool like JET, but then it’s kind of weird to have the keyword as part of the language — it would just be a kind of comment with special syntax. You would be able to run a Julia program with incorrect purity annotations just fine, you would only find out that they’re invlaid if you ran JET (or whatever checker).

1 Like