[RFC/ANN] FLoops.jl: fast generic for loops (foldl for humans™)

FLoops.jl provides a macro @floop to provide alternative “backend” of the for loop syntax based on the mechanism provided by Transducers.jl. It can be used to generate a fast generic iteration over complex collections.

I think the iteration mechanism (foldl) of Transducers.jl has many advantages over currently how for loop is implemented (iterate). However, I’ve realized that functional aspect of Transducers.jl can be a cognitive overhead and impedes its adoption. By lowering the familiar for syntax to foldl, I’m hoping that foldl become much more accessible to many Julia users.

This package is not registered yet. I thought to post this here first to measure the interest. I used raw foldl too much and I’m still not sure if I personally really need this. However, I think it’d be great if FLoops.jl can provide some incentive for data collection authors to define foldl.

Usage

Type ] add https://github.com/tkf/FLoops.jl.git in the REPL to install the package.

Quoting Usage section of the README:

Simply wrap a for loop and its initialization part by @floop:

julia> using FLoops  # exports @floop macro

julia> @floop begin
           s = 0
           for x in 1:3
               s += x
           end
       end
       s
6

When accumulating into pre-defined variables, simply list them between begin and for. @floop also works with multiple accumulators.

julia> @floop begin
           s
           p = 1
           for x in 4:5
               s += x
               p *= x
           end
       end
       s
15

julia> p
20

The begin ... end block can be omitted if the for loop does not require local variables to carry the state:

julia> @floop for x in 1:3
           @show x
       end
x = 1
x = 2
x = 3

Why @floop?

@floop is better because foldl is better than iterate. Here is some demonstration. It’s a “recap” if you already have heard about Transducers.jl.

@floop is fast for complex collections

image

This is the ratio (baseline / target) of the time takes to run

@floop begin
    acc = 0.0
    for x in xs
        acc += x
    end
end

with and without @floop (so larger value means @floop is better).

The input collections are generated by

floats = randn(1000)
dataset = [
    "Vector" => floats,
    "filter" => Iterators.filter(!ismissing, ifelse.(floats .> 2, missing, floats)),
    "flatten" => Iterators.flatten((floats, (1, 2), 3:4, 5:0.2:6, Zeros(1000))),
    "BlockVector" => mortar([floats, floats]),
]

As you can see, @floop is beneficial for collections with more complex structure. In particular, @floop is much faster for chunked/blocked collections like BlockVector and flatten.

Deterministic setup and teardown

foldl is also useful for building robust and correct API. For example, eachline(::AbstractString) is not safe to use with break and also not exception-safe; the file object is not closed deterministically. Note that this is not because the implementation of eachline is not careful enough. This is simply a limitation of iterate.

Defining a safer version of eachline(::AbstractString) is much simpler with foldl:

using Transducers
using Transducers: @next, complete

function safe_eachline(filename::AbstractString; keep=false)
    return AdHocFoldable(filename) do rf, acc, filename
        open(filename) do io
            while !eof(io)
                acc = @next(rf, acc, readline(io; keep=keep))
            end
            return complete(rf, acc)
        end
    end
end

@floop for ln in safe_eachline(".gitignore")
    @show ln
end

This mechanism is useful for any kind of container that needs some resources during the loop (e.g., GC.@preserve).

How it works

@floop works by converting the native Julia for loop syntax to foldl defined by Transducers.jl. Unlike foldl defined in Base, foldl defined by Transducers.jl is powerful enough to cover the for loop semantics and more.

@floop needs to do a bit of analysis on the AST to figure out unbound variables. This is simply done by using the excellent package JuliaVariables.jl by @thautwarm (Thank you!).

Next steps?

It may be nice to extend @floop to parallel loops. However, this is where (map)reduce-like approach is more appropriate and I cannot come up with the syntax to naturally express (map)reduce (the parallel version of foldl).

66 Likes

When I see a post like this, I always wonder: shouldn’t this be inside core Julia? What I mean is that (to my understanding) what you present is strictly superior in terms of performance. So, shouldn’t this go in the core language since it is a performance improvement, instead of new functionality…? (obviously after maturity)

Or there is a catch?

5 Likes

Eg having to wait for a Julia release to get the latest improvements and bugfixes.

4 Likes

sometimes improvements like these find their way into the base language, once the sintax and implementation has fully crystalized. But work in progress lives better in packages. Just like Jeff + Stefan talked about integrating some of the work in LoopVectorizations.jl

11 Likes

They can co exist. Not either this or that.

I think the policy should be like Canary builds of the browsers.
What’s working reliably (Well tested on code in the wild) in those packages should be imported into Julia (With different names).
The packages, independently, keep moving forward to keep searching for better optimizations.

This way the base Julia improves over time yet flexibility is kept.

1 Like

There’s always the possibility of hidden gotchas; typically when someone first posts a really cool demo like this one, it’s not the case that it’s been run against, say, the entire Julia package ecosystem. (That’s not easy to do.) Doing these kinds of transformations in a way that doesn’t break anybody’s code is a tall order and a lot to ask from a first implementation of an idea, yet would be required for this to move into Julia itself.

There’s interest in making a smoother “on-ramp” towards getting code transformations like these into Julia itself. Right now it’s not straightforward, but some kind of compiler plug-in architecture might allow packages to customize the compiler, and then you could just do a PkgEval run and see what breaks. Then you fix bugs, get all the tests passing, and submit your change to become part of Julia’s ordinary compilation pipeline.

@tkf, I know you’re interested in the question of whether lowering to foldl in general would be a good idea. If you can’t persuade others to try this for you and want to play with this on your own, for now you could consider hacking in a transformation prior to inference. I’m not quite certain of the right place to recommend, but maybe in the InferenceState(result::InferenceResult, src::CodeInfo, cached::Bool, params::Params) inner constructor?

One advantage to doing this transformation after lowering is that deciding the scope of variables is far more straightforward: lowering assigns different slots to variables with different scope, even if they have the same symbol.

19 Likes

This is cool and interesting. I do agree that many excellent uses of foldl lead to inaccessible syntax for most users (including me :slight_smile: ). It’s a great idea to try some more familiar syntax.

So as I understand it, foldl-based lowering allows the iteration source to own part of the program stack which is very natural and efficient, rather than forcing the compiler to reconstruct something efficient from the iterator state machine.

The downside with this kind of lowering as a langauge feature seems to be that body of every for loop generates a closure? That seems kind of disturbing for what “feels like” a basic looping construct and I guess it creates the usual optimization difficulties with lexical scoping of captured variables?

I’m impressed you got goto to work inside @floop, very neat.

4 Likes

Wow, very nice!

Question, why the (rather heavy) dependencies on JuliaVariables and MLStyle? Are they essential, or could they be made optional via Requires? MLStyle in particular pulls in a lot of stuff via DataFrames, etc.

1 Like

@Datseris Yeah, my ambition is to put this in julia itself. I think it is possible to do this in a backward compatible manner so this might even be possible to do within 1.x. But, as @tim.holy says, they might be some edge cases or even some fatal flaw at the moment. So I think it’s a good idea to explore this as an external package.

@tim.holy Before writing this, I (naively) thought the analysis of the user code would be much simpler. So, initially, I thought I’d automatically figure out the state variables. It then turned out that it was a rather hard problem so I had to backoff a bit in terms of the UI and let the user manually specify the states by listing them after @floop begin. So, yeah, hooking into the compiler become more appealing while writing FLoops.jl. Thanks a lot for preparing the plugable compiler path infrastructure!

@c42f Exactly! The basic idea is that collection authors know much more about their collection types than the compiler. foldl-based iteration is a way to let them express their knowledge more naturally.

Are you referring to that it’s hard to efficiently mutate variables of the outer scope? I think this is why it is important to use foldl rather than foreach as the lowering target. Since foldl takes the function of the form (state, input) -> state′, there is no need to mutate the variables of outer scope via the boxing used by the usual lowering of closure. All the changes during the loop can stay in state (a named tuple) and assigned back to the original variables after the loop is finished. OTOH, using foreach would have this problem.

Haha, thanks :slight_smile: It was kind of a fun moment when I figured out how to do it.

@oschulz Thanks! Good to know you like it. I need JuliaVariables.jl and MLStyle.jl during macro expansion time so I can’t make them optional. I’m using JuliaVariables.jl as I just didn’t want to implement scope analysis myself (as it sounds like a hairy problem). I use MLStyle.jl as it is already a dependency of JuliaVariables.jl and I’ve been wanting to try it out in a non-trivial package. Regarding the concern of the dependencies, it looks like MLStyle.jl has no dependencies and uses DataFrames.jl only during test.

8 Likes

It could be interesting to play with MLStyle.jl’s bootstrap mechanism. In theory, I think one could use that to remove the MLStyle.jl and JuliaVariables.jl dependencies yet still use them to write the package. Perhaps if @oschulz is interested in the functionality but can’t tolerate heavy dependencies this would be a fun PR to attempt.

4 Likes

Wow, that’s cool. I didn’t know that’s possible. Though, to make the dependencies minimal, it’s probably more effective if I split foldl facility out of Transducers.jl (say to FoldBase.jl) so that FLoops.jl doesn’t need to import whole Transducers.jl.

2 Likes

@tkf will this play well with autodiff functionality in julia?

1 Like

Yes, exactly. But I had a look at the actual lowering you’ve used and can see how the state is unpacked after the loop. Very interesting indeed, I’m convinced this is worth thinking about.

So here’s a few impacts on the compiler / runtime which I can think of:

  • (-) An additional function for each for loop body (not a closure though :tada:)
  • (-) A function call to foldl per loop which needs to be inferred / inlined
  • (-) Very weird stacktraces (may need some way to mark “internal frames” (cf #35371) or otherwise make things less confusing)
  • (+) Two calls to iterate + other machinery removed per loop
  • (+) Possibly remove some inference machinery which was required to get iterate to behave well enough.
2 Likes

By the way — seeing how you’ve handled goto, I’ve often wondered whether do blocks would benefit from a slightly different lowering. Given that they’re block syntax I think it’s a bit weird that return and goto don’t refer to the scope of the parent function, and that mutating variables can easily run afoul of the closure variable boxing problem. The flavor of that problem seems a little similar to what you’ve done in FLoops.

2 Likes

Ref Neal Gafter's blog: Tennent's Correspondence Principle and Returning From a Closure.

The principle dictates that an expression or statement, when wrapped in a closure and then immediately invoked, ought to have the same meaning as it did before being wrapped in a closure. Any change in semantics when wrapping code in a closure is likely a flaw in the language.

5 Likes

I just wanted to say that this is really cool @tkf! I’ve wondered for a while now whether constructs like for could lower to calls to generic higher-order functions rather than imperative code and this seems like a great proof-of-principle.

Out of curiousity - would there be situations where it’s an advantage to lower while loops to a transducer style?

3 Likes

Oops, silly me - you’re right, of course, should have taken a closer look. Sorry!

1 Like

tkf, this is brilliant, i will try it soon! hope this makes into base

1 Like

I don’t think FLoops.jl does anything special to autodiff. At the very bottom, it’d be executed using lower-level language constructs like while loop and recursion. So autodiff frameworks need to handle them, like it has to handle for loop.

Yeah, it’s an extra work. A bright side is that the loop at “global scope” is as fast as in the function scope because it is actually in the function scope. OK, yes, that’s because foldl call pays the cost of inference and optimization.

I kind of hope it gets less weird after you see it a few times. I also worry that using #35371-like system can potentially make debugging hard. That said, if there is a very robust mechanism to avoid hiding bugs in foldl implementations, maybe it is worth trying in foldl.

Is there some compiler path very specific to iterate? I thought it comes out naturally from the small union handling. My impression is that certain iteration is better when done via foldl [*1] since it lets the compiler have a clearer view of the program and hence provides more optimization opportunities, not because the program itself is “faster.” The compiler has to inline the loop body, after all.

However, since there is much more freedom for the programmer to construct the code with foldl than iterate, it’s much easier to communicate with the compiler (Julia+LLVM) and generate fast iterations. That is to say, I think foldl is better not because it’s inherently better but because it gives more control to the programmer (though it’s pretty easy to write fast code when you don’t need to think about the iteration as state machine as in iterate). I think it aligns with the “philosophy” of Julia: minimize the language constructs that cannot be implemented by the users. This way, users can write code as good as (or sometimes even better than) Base. Incidentally (OK, actually not, as that’s one of the motivation), @tim.holy is proposing to add a compiler pass for iterate(::CartesianIndices) and thereby promoting CartesianIndices to a compiler construct which is something impossible to implement with the surface language API. It may make sense to do this for now for various reasons (exercising the compiler path facility, changing the lowering of for is much larger project, etc.). But I’d like to mention that there is another possibility before we go nuts and accidentally add too many compiler constructs that users cannot express with a stable surface language.

[*1] Sometimes it’s “unreasonably” effective. For example, type-based filtering using iterator comprehension is as effective as skipmissing (ref: Transducer as an optimization: map, filter and flatten by tkf · Pull Request #33526 · JuliaLang/julia).

Yeah, I find how return works in do is a bit frustrating sometimes, too. Though how return in do works currently is also useful. Without this, what is equivalent to break and continue would be impossible to implement or difficult to use in foreach. Of course, we can use temporary named function/closure to get back the normal return/@goto semantics and wrap it in a macro. So I guess it is a valid design choice that the do syntax does not “block” return and @goto.

Hmm… Interesting question. I think while loops in general are too unstructured to lower to foldl. I see while loop and recursion (and maybe @goto sometimes) as low-level building blocks to “bootstrap” foldl in a language without for loops (e.g., see safe_eachline in the OP). So, if you’ve built a useful pattern with whlie, there is a good chance that it can be wrapped as a foldl (but not with iterate). But this has to be done case-by-case basis; sometimes it’s possible and useful but sometimes not.

3 Likes

Yeah, this is more or less why that PR is stalled for the moment :grimacing: In general it’s not clear that hiding user frames is a good idea… so it might be is probably a bad idea here too.

Good question. I had a vague memory that there was some difficulty with iterate a while back but it might have been solved in a general way at this point. I can see some explicit treatment of the iteration protocol in inference (there’s abstract_iteration), but we use iteration for things like splatting so I can’t see that going away.

Hmm, it seems that a foldl based lowering provides significant advantages in type stability in some cases? I’m imagining things like iterating over rows of a DataFrame where you very much do want a function barrier.

2 Likes