Are "Closures should be avoided whenever possible" still valid in Julia v1.9+?

I have posted the same question on Are the examples in "Closures should be avoided whenever possible" still valid in Julia v1.9+? · Issue #34 · SciML/SciMLStyle · GitHub.

I just tested the examples in the section “Closures should be avoided whenever possible”. But it turns out that the suggested way (using Base.Fix2) is no longer faster than using closures. For example,

julia> vector_of_vectors = [rand(4) for _ in 1:5];

julia> @code_warntype map(Base.Fix2(getindex, 2), vector_of_vectors)
MethodInstance for map(::Base.Fix2{typeof(getindex), Int64}, ::Vector{Vector{Float64}})
  from map(f, A::AbstractArray) @ Base abstractarray.jl:3255
Arguments
  #self#::Core.Const(map)
  f::Base.Fix2{typeof(getindex), Int64}
  A::Vector{Vector{Float64}}
Body::Vector{Float64}
1 ─ %1 = Base.Generator(f, A)::Base.Generator{Vector{Vector{Float64}}, Base.Fix2{typeof(getindex), Int64}}
│   %2 = Base.collect_similar(A, %1)::Vector{Float64}
└──      return %2

julia> @code_warntype map(v -> v[2], vector_of_vectors)
MethodInstance for map(::var"#25#26", ::Vector{Vector{Float64}})
  from map(f, A::AbstractArray) @ Base abstractarray.jl:3255
Arguments
  #self#::Core.Const(map)
  f::Core.Const(var"#25#26"())
  A::Vector{Vector{Float64}}
Body::Vector{Float64}
1 ─ %1 = Base.Generator(f, A)::Base.Generator{Vector{Vector{Float64}}, var"#25#26"}
│   %2 = Base.collect_similar(A, %1)::Vector{Float64}
└──      return %2

julia> @code_warntype Base.vect(v[2] for v in vector_of_vectors)
MethodInstance for Base.vect(::Base.Generator{Vector{Vector{Float64}}, var"#27#28"})
  from vect(X::T...) where T @ Base array.jl:126
Static Parameters
  T = Base.Generator{Vector{Vector{Float64}}, var"#27#28"}
Arguments
  #self#::Core.Const(Base.vect)
  X::Tuple{Base.Generator{Vector{Vector{Float64}}, var"#27#28"}}
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  @_4::Int64
  i::Int64
Body::Vector{Base.Generator{Vector{Vector{Float64}}, var"#27#28"}}
1 ─ %1  = Base.length(X)::Core.Const(1)
│   %2  = (1:%1)::Core.Const(1:1)
│   %3  = Base.IteratorSize(%2)::Core.Const(Base.HasShape{1}())
│   %4  = (%3 isa Base.SizeUnknown)::Core.Const(false)
│   %5  = Base._array_for($(Expr(:static_parameter, 1)), %2, %3)::Vector{Base.Generator{Vector{Vector{Float64}}, var"#27#28"}}
│   %6  = Base.LinearIndices(%5)::LinearIndices{1, Tuple{Base.OneTo{Int64}}}
│         (@_4 = Base.first(%6))
│         (@_3 = Base.iterate(%2))
│   %9  = (@_3::Core.Const((1, 1)) === nothing)::Core.Const(false)
│   %10 = Base.not_int(%9)::Core.Const(true)
└──       goto #6 if not %10
2 ─ %12 = @_3::Core.Const((1, 1))
│         (i = Core.getfield(%12, 1))
│   %14 = Core.getfield(%12, 2)::Core.Const(1)
│   %15 = Base.getindex(X, i::Core.Const(1))::Base.Generator{Vector{Vector{Float64}}, var"#27#28"}
│         nothing
└──       goto #4 if not %4
3 ─       Core.Const(:(Base.push!(%5, %15)))
└──       Core.Const(:(goto %21))
4 ┄       Base.setindex!(%5, %15, @_4::Core.Const(1))
│         nothing
│         (@_4 = Base.add_int(@_4::Core.Const(1), 1))
│         (@_3 = Base.iterate(%2, %14))
│   %24 = (@_3::Core.Const(nothing) === nothing)::Core.Const(true)
│   %25 = Base.not_int(%24)::Core.Const(false)
└──       goto #6 if not %25
5 ─       Core.Const(:(goto %12))
6 ┄       return %5

You would probably think the last one would be the slowest, but it turns out to be the fastest and the suggested way to be the slowest:

julia> @btime map(Base.Fix2(getindex, 2), vector_of_vectors);
  216.746 ns (2 allocations: 128 bytes)

julia> @btime map(Base.Fix2(getindex, 2), $vector_of_vectors);
  22.735 ns (1 allocation: 96 bytes)

julia> @btime map(v -> v[2], vector_of_vectors);
  183.554 ns (2 allocations: 112 bytes)

julia> @btime map(v -> v[2], $vector_of_vectors);
  22.526 ns (1 allocation: 96 bytes)

julia> @btime Base.vect(v[2] for v in vector_of_vectors);
  90.162 ns (2 allocations: 80 bytes)

julia> @btime Base.vect(v[2] for v in $vector_of_vectors);
  18.412 ns (1 allocation: 64 bytes)

Am I doing something wrong? Why is the last one the fastest? My Julia version is as follows:

In [30]: versioninfo()
Julia Version 1.9.0-rc1
Commit 3b2e0d8fbc1 (2023-03-07 07:51 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.4.0)
  CPU: 10 × Apple M1 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
  Threads: 1 on 8 virtual cores
1 Like
  • Note that Base.vect(v[2] for v in $vector_of_vectors) does not return, what you probably think:
julia> typeof(Base.vect(v[2] for v in [randn(3) for _ in 1:4]))
Vector{Generator{Vector{Vector{Float64}}, var"#34#36"}} (alias for Array{Base.Generator{Array{Array{Float64, 1}, 1}, var"#34#36"}, 1})
  • Also AFAIU v->v[2] never was a problematic closure. Only closures that capture variables can cause problems e.g. v->v[two] though you probably need a more complicated example to trip type inference.

  • The other discrepancies I would guess are @btime shenanigans and don’t tell you which pattern is faster in “real code”.

2 Likes

Note that Base.vect(v[2] for v in $vector_of_vectors) does not return

My bad, I should write [v[2] for v in vector_of_vectors], I only use it since @code_warntype does not work on the expression:

julia> @code_warntype [v[2] for v in vector_of_vectors]
ERROR: expression is not a function call, or is too complex for @code_warntype to analyze; break it down to simpler parts if possible. In some cases, you may want to use Meta.@lower.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] top-level scope
   @ REPL[14]:1
 [3] top-level scope
   @ ~/.asdf/installs/julia/1.9.0-rc1/share/julia/stdlib/v1.9/REPL/src/REPL.jl:1414

I will fix the question.

However, [v[2] for v in vector_of_vectors] still seems to be the fastest, so it does not change the question that much:

julia> @btime [v[2] for v in vector_of_vectors];
  83.807 ns (2 allocations: 112 bytes)

julia> @btime [v[2] for v in $vector_of_vectors];
  22.609 ns (1 allocation: 96 bytes)

Only closures that capture variables can cause problems e.g. v->v[two] though you probably need a more complicated example to trip type inference.

OK, I got it. The reason why I was getting different results from SciML’s tutorial is that I change their examples subconsciously: I should use variable i, not a literal number 2.
Now the type check and benchmark seems correct.

Type check:

julia> vector_of_vectors = [rand(4) for _ in 1:5];

julia> i = 2
2

julia> @code_warntype map(Base.Fix2(getindex, i), vector_of_vectors)
MethodInstance for map(::Base.Fix2{typeof(getindex), Int64}, ::Vector{Vector{Float64}})
  from map(f, A::AbstractArray) @ Base abstractarray.jl:3255
Arguments
  #self#::Core.Const(map)
  f::Base.Fix2{typeof(getindex), Int64}
  A::Vector{Vector{Float64}}
Body::Vector{Float64}
1 ─ %1 = Base.Generator(f, A)::Base.Generator{Vector{Vector{Float64}}, Base.Fix2{typeof(getindex), Int64}}
│   %2 = Base.collect_similar(A, %1)::Vector{Float64}
└──      return %2

julia> @code_warntype map(v -> v[i], vector_of_vectors)
MethodInstance for map(::var"#5#6", ::Vector{Vector{Float64}})
  from map(f, A::AbstractArray) @ Base abstractarray.jl:3255
Arguments
  #self#::Core.Const(map)
  f::Core.Const(var"#5#6"())
  A::Vector{Vector{Float64}}
Body::Vector
1 ─ %1 = Base.Generator(f, A)::Base.Generator{Vector{Vector{Float64}}, var"#5#6"}
│   %2 = Base.collect_similar(A, %1)::Vector
└──      return %2

julia> @code_warntype Base.vect((v[i] for v in vector_of_vectors)...)
MethodInstance for Base.vect(::Float64, ::Float64, ::Float64, ::Float64, ::Float64)
  from vect(X::T...) where T @ Base array.jl:126
Static Parameters
  T = Float64
Arguments
  #self#::Core.Const(Base.vect)
  X::NTuple{5, Float64}
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  @_4::Int64
  i::Int64
Body::Vector{Float64}
1 ─ %1  = Base.length(X)::Core.Const(5)
│   %2  = (1:%1)::Core.Const(1:5)
│   %3  = Base.IteratorSize(%2)::Core.Const(Base.HasShape{1}())
│   %4  = (%3 isa Base.SizeUnknown)::Core.Const(false)
│   %5  = Base._array_for($(Expr(:static_parameter, 1)), %2, %3)::Vector{Float64}
│   %6  = Base.LinearIndices(%5)::LinearIndices{1, Tuple{Base.OneTo{Int64}}}
│         (@_4 = Base.first(%6))
│         (@_3 = Base.iterate(%2))
│   %9  = (@_3::Core.Const((1, 1)) === nothing)::Core.Const(false)
│   %10 = Base.not_int(%9)::Core.Const(true)
└──       goto #6 if not %10
2 ┄ %12 = @_3::Tuple{Int64, Int64}
│         (i = Core.getfield(%12, 1))
│   %14 = Core.getfield(%12, 2)::Int64
│   %15 = Base.getindex(X, i)::Float64
│         nothing
└──       goto #4 if not %4
3 ─       Core.Const(:(Base.push!(%5, %15)))
└──       Core.Const(:(goto %21))
4 ┄       Base.setindex!(%5, %15, @_4)
│         nothing
│         (@_4 = Base.add_int(@_4, 1))
│         (@_3 = Base.iterate(%2, %14))
│   %24 = (@_3 === nothing)::Bool
│   %25 = Base.not_int(%24)::Bool
└──       goto #6 if not %25
5 ─       goto #2
6 ┄       return %5

Benchmark:

julia> @btime map(Base.Fix2(getindex, i), vector_of_vectors);
  262.350 ns (3 allocations: 144 bytes)

julia> @btime map(Base.Fix2(getindex, $i), $vector_of_vectors);
  22.902 ns (1 allocation: 96 bytes)

julia> @btime map(v -> v[i], vector_of_vectors);
  2.398 μs (25 allocations: 960 bytes)

julia> @btime map(v -> v[$i], $vector_of_vectors);
  22.944 ns (1 allocation: 96 bytes)

julia> @btime [v[i] for v in vector_of_vectors];
  287.770 ns (9 allocations: 224 bytes)

julia> @btime [v[$i] for v in $vector_of_vectors];
  23.050 ns (1 allocation: 96 bytes)

There are two scenarios where Fix1 and Fix2 (and possibly generic typed partial applicators, if humanity survives long enough to see them) offer benefit:

  1. To avoid recompilation.
  2. To avoid boxing when variables are captured.

To illustrate the first benefit—avoiding recompilation—consider the following example:

# After warmup

julia> @time let xs=(1, 2.0, 3+0im)
           reduce((x,y)->x+y, map(x->x^2, Iterators.filter(x->isodd(x), xs)))
       end;
  0.304171 seconds (124.78 k allocations: 8.307 MiB, 99.76% compilation time)

julia> @time let xs=(1, 2.0, 3+0im)
           reduce(+, map(Base.Fix2(^,2), Iterators.filter(isodd, xs)))
       end;
  0.000030 seconds (15 allocations: 656 bytes)

Each time you execute code that declares a new anonymous function, it’s compiled from scratch even if the same code has been compiled before. In interactive mode this usually isn’t a big deal, but if you have long data processing chains the compile time can get annoying. For this reason, typed partial-applicators are popular when using @chain with DataFrames (e.g. it’s a preferred idiom of @bkamins, until such a future when function definitions are hashed).

To illustrate the second benefit—avoiding boxing—consider the following example:

julia> fa(xi) = let
           if rand(Bool);  x=xi  else  x=xi  end
           map(i->x[i], eachindex(xi))
       end;

julia> fb(xi) = let
           if rand(Bool);  x=xi  else  x=xi  end
           map(Base.Fix1(getindex, x), eachindex(xi))
       end;

julia> @btime fa($[1:100;]);
       @btime fb($[1:100;]);
  2.544 μs (13 allocations: 1.31 KiB)
  50.760 ns (1 allocation: 896 bytes)

Notice that the closure code experiences a ~50x slowdown compared to the partial applicator. This is because x is “boxed” in a type-unstable mutable container, causing loss of type stability (you can confirm this with @code_warntype). This occurs because there is more than one syntactical assignment to x within the scope that it belongs to, despite the fact that it doesn’t actually need to be boxed (since after the closure’s declaration there is never a code branch in which x is reassigned).

At the moment my preference is to use typed partial applicators where possible, since it avoids recompiling things needlessly and avoids capture boxing, but when that’s not an option I’ll use the let block trick:

julia> fc(xi) = let
           if rand(Bool);  x=xi  else  x=xi  end
           let x=x; map(i->x[i], eachindex(xi)) end
       end;

julia> @btime fc($[1:100;]);
  50.103 ns (1 allocation: 896 bytes)

As argued here, there’s still a lot of room for improvement in how closures’ captures are handled.

Similar considerations apply for untyped global variables:

# after warmup

julia> x=[1:100_000_000;];
       @time map(i->x[i], eachindex(x));
       @time map(Base.Fix1(getindex, x), eachindex(x));
       @time let x=x; map(i->x[i], eachindex(x)) end;
  6.384933 seconds (200.05 M allocations: 3.729 GiB, 21.36% gc time, 1.07% compilation time)
  0.167096 seconds (7 allocations: 762.940 MiB, 0.63% gc time)
  0.245638 seconds (26.92 k allocations: 764.735 MiB, 18.32% gc time, 14.48% compilation time)
5 Likes

Base.Fix2(^,2) will probably be much worse than abs2

2 Likes