FLoops.jl provides a macro @floop
to provide alternative “backend” of the for
loop syntax based on the mechanism provided by Transducers.jl. It can be used to generate a fast generic iteration over complex collections.
I think the iteration mechanism (foldl
) of Transducers.jl has many advantages over currently how for
loop is implemented (iterate
). However, I’ve realized that functional aspect of Transducers.jl can be a cognitive overhead and impedes its adoption. By lowering the familiar for
syntax to foldl
, I’m hoping that foldl
become much more accessible to many Julia users.
This package is not registered yet. I thought to post this here first to measure the interest. I used raw foldl
too much and I’m still not sure if I personally really need this. However, I think it’d be great if FLoops.jl can provide some incentive for data collection authors to define foldl
.
Usage
Type ] add https://github.com/tkf/FLoops.jl.git
in the REPL to install the package.
Quoting Usage section of the README:
Simply wrap a
for
loop and its initialization part by@floop
:julia> using FLoops # exports @floop macro julia> @floop begin s = 0 for x in 1:3 s += x end end s 6
When accumulating into pre-defined variables, simply list them between
begin
andfor
.@floop
also works with multiple accumulators.julia> @floop begin s p = 1 for x in 4:5 s += x p *= x end end s 15 julia> p 20
The
begin ... end
block can be omitted if thefor
loop does not require local variables to carry the state:julia> @floop for x in 1:3 @show x end x = 1 x = 2 x = 3
Why @floop
?
@floop
is better because foldl
is better than iterate
. Here is some demonstration. It’s a “recap” if you already have heard about Transducers.jl.
@floop
is fast for complex collections
This is the ratio (baseline / target
) of the time takes to run
@floop begin
acc = 0.0
for x in xs
acc += x
end
end
with and without @floop
(so larger value means @floop
is better).
The input collections are generated by
floats = randn(1000)
dataset = [
"Vector" => floats,
"filter" => Iterators.filter(!ismissing, ifelse.(floats .> 2, missing, floats)),
"flatten" => Iterators.flatten((floats, (1, 2), 3:4, 5:0.2:6, Zeros(1000))),
"BlockVector" => mortar([floats, floats]),
]
As you can see, @floop
is beneficial for collections with more complex structure. In particular, @floop
is much faster for chunked/blocked collections like BlockVector
and flatten
.
Deterministic setup and teardown
foldl
is also useful for building robust and correct API. For example, eachline(::AbstractString)
is not safe to use with break
and also not exception-safe; the file object is not closed deterministically. Note that this is not because the implementation of eachline
is not careful enough. This is simply a limitation of iterate
.
Defining a safer version of eachline(::AbstractString)
is much simpler with foldl
:
using Transducers
using Transducers: @next, complete
function safe_eachline(filename::AbstractString; keep=false)
return AdHocFoldable(filename) do rf, acc, filename
open(filename) do io
while !eof(io)
acc = @next(rf, acc, readline(io; keep=keep))
end
return complete(rf, acc)
end
end
end
@floop for ln in safe_eachline(".gitignore")
@show ln
end
This mechanism is useful for any kind of container that needs some resources during the loop (e.g., GC.@preserve
).
How it works
@floop
works by converting the native Julia for
loop syntax to foldl
defined by Transducers.jl. Unlike foldl
defined in Base
, foldl
defined by Transducers.jl is powerful enough to cover the for
loop semantics and more.
@floop
needs to do a bit of analysis on the AST to figure out unbound variables. This is simply done by using the excellent package JuliaVariables.jl by @thautwarm (Thank you!).
Next steps?
It may be nice to extend @floop
to parallel loops. However, this is where (map)reduce
-like approach is more appropriate and I cannot come up with the syntax to naturally express (map)reduce
(the parallel version of foldl
).