Suggestion: Explicit anonymous functors?

Before coming to the point, for your amusement I’ll share a well known footgun I er… stumbled across. A simplified version.

using .Threads
function boxingday(n)
    result = sum(fetch.([
        @spawn begin
            y = i^2
            tmp = mod(y, 7)
            Libc.systemsleep(1e-8rand())  # do some more work
            return tmp
        end for i in 1:n]))
 
    # a few weeks later we add some more computation...
    z = n^2; tmp = (z+1)^2; r2 = mod(tmp, 7)
    # done!  It starts acting up!
    return result, r2
end

for _ in 1:10
    println(boxingday(100))
end

It prints a variety of results. If running the result = ... part in the REPL, with n=100, it’s always 201 as it should. The problem is that in the function, the tmp variable has been captured and boxed, creating a race.

Anyway, this boils down to some big questions, iterated and reiterated many times: When should something be boxed, what should be captured?

When creating an anonymous function, a functor is created, with the captured variables in a struct, possibly boxed. This is easy and fast to use. But there is no way to specify which variables should be captured (and possibly boxed). The boxing decision is even done early, at a syntactic level.

For the more cautious programmer it would be nice to have a way to specify which variables to capture for an anonymous function. This would ensure that things are not captured/boxed by accident. E.g. something like

f = (a,b)(x,y) -> a*x + b*y

creating a function of x and y, with a and b captured. A Tuple functor. It is already parsed as a function, albeit with an illegal argument name, and is reminiscent of functor definitions (ab::MyStruct)(x,y) = ab.a*x + ab.b*x.

Would this make sense? Any thoughts?

What about

f = let a=a, b=b
   (x,y) -> a*x + b*y
end

This is the suggested workaround from the manual: Performance Tips · The Julia Language

4 Likes

Even inside a let block, variables outside are visible, so the function can still capture by mistake.

f = let a=a, b=b
    (x,y) -> a*x + b*y + c
end

The idea is to ensure that no other variables than those mentioned are captured.

1 Like

Oh, I see what you mean. Yeah, have been experiencing this myself quite a lot.

1 Like

EDIT: I just saw that the manual already points to GitHub - c42f/FastClosures.jl: Faster closure variable capture which is a proper macro implementation. Why not use that?

Performance is not the main issue in my suggestion. The example I shared in the start of the thread e.g. makes @code_warntype light up like a Christmas tree. However, the performance did not suffer, the overhead of the boxing in the actual case was neglible, and the error showed up infrequently because the “more work” was not random, so the race mostly was not present. It’s more about declaring your intentions, both to the compiler and to other programmers, to catch subtle correctness errors as displayed in the above example.

FastClosures.jl should address this here by automatically capturing all all ‘outside variables’ for you, with the let block trick from above.

If you want it more explicit, so that it throws an error, you can write a macro to do that, like I showed in my previous post (hidden in the edit list).

1 Like

I am very well aware of using let blocks. I use them all the time. I know workarounds for these problems and to write macros. I’m not asking for advice on how to write performant and/or correct code.

Instead I suggest to add to the language the possibility to specify the capture structure manually. As in C++. Just like anonymous functions can easily be constructed, I suggest, as in the title, to add the possibility to easily construct anonymous functors. This is what anonymous functions implicitly do anyway, at least when it captures one or more variables. With an explicit list of captured variables there is the added benefit that implicit captures are disallowed, and it is clear to other programmers (and the compiler) what is being done.

Even if your suggested feature might not be breaking (maybe the syntax would need adjustment), I think this still applies here: PSA: Julia is not at that stage of development anymore

What benefit would it bring when this would be added to the language compared to an implementation based on a macro in a pkg that uses the let block trick?

I think the general sentiment is to first try things out in packages (in particular when it needs no changes to the compiler) and then later see if it is worth adding to Base.

2 Likes

I have found no package which disallows capturing variables. I could write a macro, or use yours, to solve the particular problem of capturing variables by mistake, and wrap the others in a let block. The latter is however usually achieved by the $ quote trick in @spawn and other macros like @eval, and in quote. I could use f = @eval((x,y) -> $a*x + $b*y) to ensure no local variable was captured, but @eval is a different beast.

The reason I suggest a non-breaking language extension with already parseable, though meaningless syntax is that the construction I suggest fits nicely with the way closures are already constructed, as functors. It would be a neat parallel to C++ anonymous functions (which can capture only const and the like implicitly), and would not require a macro looking through the entire syntax tree, with whatever surprises which might turn up there in future versions. It would be a general and natural construction, which could also be made to work with do blocks.

I try to always use local for task-local variables (like tmp and y in your @spawn block).

1 Like

Do you really need @eval for this?

This here works for me:

ExplictClosure.jl
module ExplicitClosure


import MacroTools


function get_all_syms(expr)
    syms = Symbol[]
    _get_all_syms!(syms, expr)
    unique!(syms)
    return syms
end
_get_all_syms!(syms::Vector{Symbol}, s) = nothing
_get_all_syms!(syms::Vector{Symbol}, s::Symbol) = push!(syms, s)
function _get_all_syms!(syms::Vector{Symbol}, expr::Expr)
    MacroTools.postwalk(expr) do ex
        if ex isa Expr && ex.head === :call
            for a in ex.args[2:end]
                _get_all_syms!(syms, a)
            end
        end
        ex
    end
    return syms
end


function get_lhs_syms(expr)
    syms = Symbol[]
    _get_lhs_syms!(syms, expr)
    unique!(syms)
    return syms
end
function _get_lhs_syms!(syms::Vector{Symbol}, expr::Expr)
    MacroTools.postwalk(expr) do ex
        if ex isa Expr && ex.head === :(=)
            push!(syms, ex.args[1])
        end
        ex
    end
    return syms
end


get_args(expr::Symbol) = error("@explicit: expected closure as second argument, found $expr")
function get_signature(expr)
    expr = MacroTools.unblock(expr)
    if expr.head === :(->)
        sig = expr.args[1]
        @assert sig.head === :tuple
        return nothing, sig.args
    elseif expr.head === :function
        sig = expr.args[1]
        @assert sig.head === :call
        # TODO Strip any extra type info
        if length(sig.args) == 1
            sig.args[1], Any[]
        else
            sig.args[1], sig.args[]
        end
    else
        error("@explicit: don't know how to extract signature from $expr")
    end
end


macro explicit(captures, fn)
    return esc(explicit(captures, fn))
end
function explicit(captures, fn)

    captures = MacroTools.unblock(captures)
    if !(captures isa Expr && captures.head === :tuple)
        error("@explicit: expected tuple of symbols as first argument, found $captures")
    end
    captures = captures.args

    allsyms = get_all_syms(fn)
    lhssyms = get_lhs_syms(fn)
    name, args = get_signature(fn)

    rest = filter(allsyms) do s
        s ∉ lhssyms && s ∉ args && s ∉ captures
    end
    if !isempty(rest)
        error("@explicit: uncaptured symbols: $(join(rest,","))")
    end

    letargs = Expr(:block)
    for c in captures
        push!(letargs.args, :($c = $c))
    end
    letclosure = Expr(:let, letargs, fn)

    letclosure
end


end
using .Threads
import ExplicitClosure: @explicit

function boxingday(n)

    result = sum(fetch.([
        @spawn $(@explicit (i,) () -> begin
            y = i^2
            tmp = mod(y, 7)
            Libc.systemsleep(1e-8rand())  # do some more work
            return tmp
        end)() # note the trailing () here
        for i in 1:n]))

    # this reads better now
    # fs = [ @explicit (i,) () -> begin
    #             y = i^2
    #             tmp = mod(y, 7)
    #             Libc.systemsleep(1e-8rand())  # do some more work
    #             return tmp
    #        end for i in 1:n ]
    # result = sum(fetch.([ @spawn f() for f in fs ]))

    # a few weeks later we add some more computation...
    z = n^2; tmp = (z+1)^2; r2 = mod(tmp, 7)
    # done!  It starts acting up!
    return result, r2
end

for _ in 1:10
    println(boxingday(100))
end

If I understand correctly you want that this check happens some time after parsing in one of the lowering stages? Tbh I don’t know too much about those internals and I can’t gauge how difficult it would be to implement.
Regardless of whether this is implement at the parsing stage (with a macro or JuliaSyntax.jl) or a lower level stage, someone will need to maintain “whatever surprises which might turn up there in future versions”.
So in order to get this into Base you would also need to convince julia devs and maintainers that this feature is worth the effort.

Sure. But in an ordinary capture this job is already being done now. An anonymous function is realized as a callable struct, where all the captured variables are put into the struct, some of them boxed, and are visible inside the function without the normal field-access. The only difference in my suggestion is that the captured variable list is provided by the user, not figured out by analyzing the function.

You can also do this:

f = (x, y; a=a, b=b) -> a*x + b*y

That’s Python’s only mechanism for capture iirc.

4 Likes

One of the luxuries of using a language with proper macros is that it lets users add their own language feature. This has always been a brag in Lisp circles, and the strength of the macro system in Julia was a main criterion when I decided to go all-in on using it.

When you say this:

What you’re asking for would definitely require looking through the entire syntax tree. The difference between a macro and a piece of the Julia parser is blurry, that’s what’s so great about them.

Even in this thread, there are different visions for what’s useful here. You want to specify the variables which must be captured, and get an error if other variables are explicitly captured which aren’t on that list. Other people think it would be sufficient to have a feature which localizes all captures. I happen to agree with them but that’s not the point. There’s a third way to do it, which is to specify which variables you want localized and have any others use the shared reference. I would say that a let block is the best way to write this third one, but a macro could make it ‘cleaner’, or at least, allow a syntax some might find preferable.

Edit: as I was writing this reply @StefanKarpinski gave a fourth approach, which is my favorite, I’ll probably end up using that eventually.

But just focusing on the first two: this is a good fit for a macro, and a bad fit for a core language feature. If you’d like a closure form where you have to specify the captured variables or it throws an error, you should have one! Make it a macro, package it up so other people can use it. Someone else can use ExplicitClosures instead, if they don’t think the error-checking and repetitive syntax is worthwhile.

If one of these becomes very popular, it can be added to Base… as a macro. To begin where I started, the great thing about strong macros is that features which boil down to syntax sugar don’t have to be baked into the parser.

1 Like

This thread is very muddled.

  1. “Anonymous functor” isn’t a meaningful concept. The reason we have “anonymous functions” is to distinguish them from singleton functions typically given a const name at the start. Functors are not like this.
  2. Functions capture variables because they borrow outer local variables automatically like any other local scope. The underlying functor implementation does not capture variables, it contains the instances in its fields. When a captured variable is reassigned anywhere, the lowerer/compiler currently can only implement a functor that boxes the instances with no type information.
  3. sgaure’s proposal is about specifying which variables are allowed to be captured from enclosing local scopes. The anonymous functions are supposed to be affected if the specified variables are reassigned elsewhere. It is not about making new local variables so they don’t need to be boxed when reassigned, whether it’s let blocks, the let block macros in FastClosures.jl, or StefanKarpinski’s keyword arguments with default values.

All that said, altering function syntax wouldn’t help much. It might not even be feasible to change the parser to pull this off in a nonbreaking way. In your example, the local scope that does the capture was created by @spawn; this proposal wouldn’t give you a way to specify the capturable variables for the macro, nor change the macro to specify those variables in the transformed Task function. This proposal also wouldn’t affect other local scopes; true, they alone don’t persist to capture variables like functions do, but they can show up in functions and become involved.

Putting the limitations aside, you are in fact proposing a reversal of the rule that local scopes automatically borrow outer local variables, manually listing the borrowable variables rather than manually local-declaring new variables. Julia already made the choice, and it’s because manually listing borrowable variables is incredibly tedious. Sure, in this example a manual borrowable list would be convenient to guard against edits, but in more common scenarios, you would be forced to edit many scattered lists whenever you introduce a variable that you want to be borrowed.

This proposal would be a very complicated change and a frankly strange syntax for users, all to help prevent unintended variable capturing in scenarios so limited that you still generally must be aware of where local scopes are in the entire function and avoid needless name overlaps among them. I think same-variable highlighting that is aware of Julia’s scoping rules would be simpler and more widely applicable assistance, but an actually cautious programmer won’t need any training wheels because they practice sensible naming principles, like naming tmp1, tmp2, tmp3 across wholly separate segments.

3 Likes

Just a couple of comments. As mentioned above, the expressions

(a,b)(x,y) -> a*x + b*y

and also, like in C++:

[a,b](x,y) -> a*x + b*y

are already parsed as function definitions, but with the nonsensical function argument list (a,b)(x,y) which is not accepted further down. Without knowing the full details I would guess that instead of picking up the captured variables by looking through the function body, one could use the user provided list directly, which is already parsed. Depending on the details, it does not have to be very complicated to implement this language extension. The old syntax would still work as before. I am not proposing to reverse the current capture scheme, it would still work. But if you choose to specify a list of captured variables, other variables will not be captured by mistake.

I agree that it would not immediately work with @spawn, but a change to that macro could accommodate an optional argument (a,b) which it puts in front of the Task function like (a,b)() -> ....

Otherwise I fully agree that the use of local and a sensible variable naming practice alleviates many problems.

In other threads there have been endless discussions about the lack of tools for avoiding “correctness problems”, most of it involving interfaces like in C++, rust, java, fortran and many other languages. My suggestion would provide a (voluntary) terse syntax for avoiding another type of footgun; the inadvertent capturing of variables. It isn’t as tedious as typing local everywhere.

If I understand what you’re getting at, I think these proposals are aimed at the same problem.

More. Or less. Comparing to let as one of the proposals extends, It would be more like introducing an er… blockinglet, which blocks the visibility of all outer local scopes. I.e.

z = blocklet a=2+x, b=3*x
    myvar = 4a + 3b^2
    myvar2 = a^2 + 2b^3 + c
    myvar * myvar2
end

would scream about c, even if it were defined outside the blocklet. I.e. a tightly controlled one-way passage of variables. But in the function case, boxing would still be allowed, if the variable is mentioned in the capture list.

but it throws a syntax error unlike C++, which has vastly different concepts of variables, scoping, closures, and capturing. The parser would still need a drastic change. It’s easy to say it “doesn’t have to be complicated”, but if it were, then we wouldn’t have open issues about feature consistency between our existing function syntaxes.

Not just that macro, you would need to edit every macro that makes a function block. I couldn’t even guess at what happens to macros that make multiple.

That doesn’t imply you can effortlessly repurpose syntax from other languages and expect the same perks. All the languages you named are statically typed and their proven features are fundamentally difficult to replicate in dynamically typed languages that made different tradeoffs.

Making new rules that are convenient only for specific cases results in feature bloat, a common complaint about C++ in fact. Instead of training wheels pointing in all sorts of directions, it’s easier for people to learn better general practices. Making edits to a function without knowing what else happens in it is not something that should go on for long.

Manual specification is always tedious, and you just found one example where one kind of manual specification happens to be convenient. That won’t translate to the general case. The fact that typing local everywhere is tedious is exactly why local scopes share variables automatically.

This name being similar to let is misleading because you are specifying what can be borrowed rather than what isn’t. Something like alllocalexcept maybe. But I’ve mused about a very similar thing before and realized it gets nastily complicated to work out nice scoping rules in general, and ultimately nothing was simpler than the status quo and common sense.

1 Like