[RFC/ANN] FLoops.jl: fast generic for loops (foldl for humans™)

I don’t think FLoops.jl does anything special to autodiff. At the very bottom, it’d be executed using lower-level language constructs like while loop and recursion. So autodiff frameworks need to handle them, like it has to handle for loop.

Yeah, it’s an extra work. A bright side is that the loop at “global scope” is as fast as in the function scope because it is actually in the function scope. OK, yes, that’s because foldl call pays the cost of inference and optimization.

I kind of hope it gets less weird after you see it a few times. I also worry that using #35371-like system can potentially make debugging hard. That said, if there is a very robust mechanism to avoid hiding bugs in foldl implementations, maybe it is worth trying in foldl.

Is there some compiler path very specific to iterate? I thought it comes out naturally from the small union handling. My impression is that certain iteration is better when done via foldl [*1] since it lets the compiler have a clearer view of the program and hence provides more optimization opportunities, not because the program itself is “faster.” The compiler has to inline the loop body, after all.

However, since there is much more freedom for the programmer to construct the code with foldl than iterate, it’s much easier to communicate with the compiler (Julia+LLVM) and generate fast iterations. That is to say, I think foldl is better not because it’s inherently better but because it gives more control to the programmer (though it’s pretty easy to write fast code when you don’t need to think about the iteration as state machine as in iterate). I think it aligns with the “philosophy” of Julia: minimize the language constructs that cannot be implemented by the users. This way, users can write code as good as (or sometimes even better than) Base. Incidentally (OK, actually not, as that’s one of the motivation), @tim.holy is proposing to add a compiler pass for iterate(::CartesianIndices) and thereby promoting CartesianIndices to a compiler construct which is something impossible to implement with the surface language API. It may make sense to do this for now for various reasons (exercising the compiler path facility, changing the lowering of for is much larger project, etc.). But I’d like to mention that there is another possibility before we go nuts and accidentally add too many compiler constructs that users cannot express with a stable surface language.

[*1] Sometimes it’s “unreasonably” effective. For example, type-based filtering using iterator comprehension is as effective as skipmissing (ref: Transducer as an optimization: map, filter and flatten by tkf · Pull Request #33526 · JuliaLang/julia).

Yeah, I find how return works in do is a bit frustrating sometimes, too. Though how return in do works currently is also useful. Without this, what is equivalent to break and continue would be impossible to implement or difficult to use in foreach. Of course, we can use temporary named function/closure to get back the normal return/@goto semantics and wrap it in a macro. So I guess it is a valid design choice that the do syntax does not “block” return and @goto.

Hmm… Interesting question. I think while loops in general are too unstructured to lower to foldl. I see while loop and recursion (and maybe @goto sometimes) as low-level building blocks to “bootstrap” foldl in a language without for loops (e.g., see safe_eachline in the OP). So, if you’ve built a useful pattern with whlie, there is a good chance that it can be wrapped as a foldl (but not with iterate). But this has to be done case-by-case basis; sometimes it’s possible and useful but sometimes not.

3 Likes