Cannot achieve type-stability and get rid of allocations

Here is a simple implementation of a two-pointer technique for sorted arrays:

function f(xs, ys)
    i_begin, i_end = firstindex(xs), firstindex(xs)
    map(ys) do y
        i_begin = something( findnext(x -> x >= y, xs, i_begin), lastindex(xs) + 1 )
        i_end = something( findnext(x -> x > y, xs, max(i_begin, i_end)), lastindex(xs) + 1 ) - 1
        length(@view(xs[i_begin:i_end]))  # in real code - arbitrary function instead of length
    end
end

While it seems that the function should be type-stable, this is not the case. As far as I can tell, instability is related to nothing or something:

julia> @code_warntype f([1, 2, 3], [1, 2, 3])
Variables
  #self#::Core.Compiler.Const(f, false)
  xs::Array{Int64,1}
  ys::Array{Int64,1}
  #46::var"#46#49"{Array{Int64,1}}
  i_begin::Core.Box
  i_end::Core.Box
  ys_new::Array{_A,1} where _A

Body::Array{_A,1} where _A
1 ─       (i_begin = Core.Box())
│         (i_end = Core.Box())
│   %3  = Main.firstindex(xs)::Core.Compiler.Const(1, false)
│   %4  = Main.firstindex(xs)::Core.Compiler.Const(1, false)
│         Core.setfield!(i_begin, :contents, %3)
│         Core.setfield!(i_end, :contents, %4)
│   %7  = Main.:(var"#46#49")::Core.Compiler.Const(var"#46#49", false)
│   %8  = Core.typeof(xs)::Core.Compiler.Const(Array{Int64,1}, false)
│   %9  = Core.apply_type(%7, %8)::Core.Compiler.Const(var"#46#49"{Array{Int64,1}}, false)
│   %10 = i_begin::Core.Box
│         (#46 = %new(%9, xs, %10, i_end))
│   %12 = #46::var"#46#49"{Array{Int64,1}}
│   %13 = Main.map(%12, ys)::Array{_A,1} where _A
│         (ys_new = %13)
└──       return %13

This instability actually affects performance a lot:

julia> @btime f($(rand(1000)), $(rand(1000)))
  139.358 μs (6603 allocations: 158.03 KiB)

After adding types explicitly by replacing the 2nd line with
i_begin::Int, i_end::Int = firstindex(xs), firstindex(xs):

julia> @btime f($(rand(1000)), $(rand(1000)));
  17.733 μs (1085 allocations: 24.88 KiB)

Still, instability doesn’t go away completely: @code_warntype still has Core.Boxes in it.

Any suggestions on how to optimize such code? The examples above were ran in Julia 1.5.

I think you are hitting performance of captured variables in closures · Issue #15276 · JuliaLang/julia · GitHub, because you create anonymous functions for the findnext call.

The issue remains even without anonymous functions in findnext:

function f(xs, ys)
    i_begin, i_end = firstindex(xs), firstindex(xs)
    ys_new = map(ys) do y
        i_begin = something( findnext(>=(y), xs, i_begin), lastindex(xs) + 1 )
        i_end = something( findnext(>(y), xs, max(i_begin, i_end)), lastindex(xs) + 1 ) - 1
        length(@view(xs[i_begin:i_end]))
    end
end

shows the same results.

It is still the same issue, but I misidentified the anonymous function. It is the one created by the do block. Here is non-allocating version (EDIT: of a similar function).

function f(xs, ys)
  local ys_new
  i_begin, i_end = firstindex(xs), firstindex(xs)
  let xs = xs, ys = ys, i_begin = i_begin, i_end = i_end;
    g = function(y)
      i_begin2 = something( findnext(>=(y), xs, i_begin), lastindex(xs) + 1 )
      i_end2 = something( findnext(>(y), xs, max(i_begin2, i_end)), lastindex(xs) + 1 ) - 1
      length(@view(xs[i_begin2:i_end2]))
    end
    ys_new = map(g, ys)
  end;
  return ys_new
end

Your code doesn’t do the same calculations: it never modifies i_begin and i_end. Each findnext should start at the index i_begin from the previous iteration.

You can wrap those variables in Ref s that you then mutate instead of rebinding.

3 Likes

Ah sorry, you are right. Then do what Kristoffer suggested.

Thanks, this works perfectly well! The final solution looks like this:

function f(xs, ys)
    ix = Ref(firstindex(xs):firstindex(xs))
    map(ys) do y
        i_begin = something( findnext(x -> x >= y, xs, first(ix[])), lastindex(xs) + 1 )
        i_end = something( findnext(x -> x > y, xs, max(i_begin, last(ix[]))), lastindex(xs) + 1 ) - 1
        ix[] = i_begin:i_end
        length(@view(xs[ix[]]))  # in real code - arbitrary function instead of length
    end
end
@btime f($(rand(1000)), $(rand(1000)));
# 5.101 μs (2 allocations: 7.97 KiB)

It’s great that all these anonymous functions don’t introduce overhead. Also really useful to know this trick with Ref when writing other code as well.

However, I still don’t completely grasp the fundamental difference between for-loops where one can modify variables from outside without Ref, and this case with map where such modifications result in a performance penalty of 1.5 orders of magnitude.

1 Like