Identical method redefinition suspiciously optimizes runtime and allocations

I am using Revise.jl to work on a package for my research, and am noticing that “revising” a file to identical code with different variable names causes a function to stop allocating.

I am not sure how many details others will need to help me diagnose the issue, but my workflow is as follows:

  1. Run a script using Julia and the -i flag. startup.jl consists only of using Revise.
  2. The script loads some data files, as well as BenchmarkTools.jl and Plots.jl
  3. Benchmark some functions that need to be allocation-free (for performance reasons).

The offending function is:

function is_cell_contained_by(cell1, poly)
    cell_poly = cell_boundary_polygon(cell1) # this is an SVector
    return all(edge_starts(cell_poly)) do pt # edge_starts returns an SVector
        return PlanePolygons.point_inside_strict(poly, pt) # this does not allocate
    end
end

And benchmarking it:

julia> @benchmark Euler2D.is_cell_contained_by($test_cell, $test_poly)
BenchmarkTools.Trial: 10000 samples with 709 evaluations per sample.
 Range (min … max):  180.385 ns …  51.612 μs  ┊ GC (min … max): 0.00% … 99.40%
 Time  (median):     194.587 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   237.031 ns ± 556.060 ns  ┊ GC (mean ± σ):  8.90% ±  8.09%

   ▃▇█▅▄▁                           ▂▂▂▃▃▂▂▁▁                   ▂
  ███████████▇▇▆▆▆▆▄▃▅▃▁▁▄▄▁▄▁▃▄▁▁▁▇██████████▇▆▆▃▄▃▁▁▁▃▄▁▃▃▄▃▄ █
  180 ns        Histogram: log(frequency) by time        437 ns <

 Memory estimate: 464 bytes, allocs estimate: 5.

Well, that’s odd. Edit the file defining is_cell_contained_by(...) and let Revise.jl do its magic. I change the function to:

function is_cell_contained_by(cell1, poly)
    cell_polys = cell_boundary_polygon(cell1)
    return all(edge_starts(cell_polys)) do pt
        return PlanePolygons.point_inside_strict(poly, pt)
    end
end

Notice that only the variable name has changed: cell_polycell_polys. Benchmark it again:

julia> @benchmark Euler2D.is_cell_contained_by($test_cell, $test_poly)
BenchmarkTools.Trial: 10000 samples with 985 evaluations per sample.
 Range (min … max):  53.646 ns … 108.297 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     57.632 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   57.934 ns ±   2.129 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                  ▂▁▁▁▄▆█▇▇▅▁▁  ▁        ▁▁▂▁ ▁▁ ▂▁            ▂
  ▄█▆▇▃▃▁▁▁▆▇▃▅▇▅▇█████████████████▆▆▅▄▆▇███████████▇▆▆▆▅▄▅▆▆▆ █
  53.6 ns       Histogram: log(frequency) by time      63.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

This is the behavior I expect… but I do not know why I have to force Revise.jl to pick up on the new function every time I run the script.

Is there a cache I can clear somewhere? Should I kill every running Julia process on my machine and try again? I am absolutely baffled, and I hope that I’m just falling into a common Revise.jl trap, rather than doing something truly stupid.

I did try:

  1. removing the ~/.julia/compiled/ to force everything to be compiled again. The behavior persists.
  2. running the script and the benchmark without Revise loaded, which yields only the allocating version of the benchmark with no way to trigger optimization.

How exactly? Revise isn’t documented to track input scripts at command lines e.g. julia -i script.jl.

Disappearing allocations like that is bizarre though, I’d want to know if there’s some unstable compiler optimization going on. However, that’s impossible to tell from the outside without the source code (everything, startup.jl, the script, the packages, the benchmark) and a reproducible process.

1 Like

Ah, I apologize. I edit the source file in a local package.

I will try to reproduce the behavior without the full details of my current project when I have some spare time. Thanks!

Do you see the same effect if you redefine the method without Revise? You can do so (approximately, may need edits) with

@eval Euler2D function is_cell_contained_by(cell1, poly)
    cell_polys = cell_boundary_polygon(cell1)
    return all(edge_starts(cell_polys)) do pt
        return PlanePolygons.point_inside_strict(poly, pt)
    end
end

If you see the same effect, then Revise isn’t a cause, which could vastly simplify reproduction.

1 Like

Started up a my normal workflow (loaded a data file, loaded the Euler2D module). Source code for is_cell_contained_by is:

function is_cell_contained_by(cell1, poly)
    cell_polys = cell_boundary_polygon(cell1)
    return all(edge_starts(cell_polys)) do pt
        return PlanePolygons.point_inside_strict(poly, pt)
    end
end

function is_cell_contained_by(cell1, cell2::FVMCell)
    return is_cell_contained_by(cell1, cell_boundary_polygon(cell2))
end

The test_poly1 object is not a subtype of FVMCell, so the first method should be evaluated:

julia> test_poly1 isa FVMCell
false

Benchmarking:

julia> @benchmark Euler2D.is_cell_contained_by($(sim1.cells[1][1]), $test_poly1)
BenchmarkTools.Trial: 10000 samples with 728 evaluations per sample.
 Range (min … max):  177.935 ns …  22.936 μs  ┊ GC (min … max):  0.00% … 98.91%
 Time  (median):     190.871 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   255.046 ns ± 440.140 ns  ┊ GC (mean ± σ):  13.28% ±  8.12%

  ▃▅▇█▆▅▃▂                                   ▁▁▂▃▃▃▃▂▂▁         ▂
  ██████████▇▇▆▇▅▆▆▄▅▅▆▅▅▄▆▄▅▅▅▄▅▅▄▃▅▄▃▄▄▄▆▇███████████▇▇▆▅▃▄▃▃ █
  178 ns        Histogram: log(frequency) by time        399 ns <

 Memory estimate: 464 bytes, allocs estimate: 5.

julia> @eval Euler2D function is_cell_contained_by(cell1, poly)
           cell_polys = cell_boundary_polygon(cell1)
           return all(edge_starts(cell_polys)) do pt
               return PlanePolygons.point_inside_strict(poly, pt)
           end
       end
is_cell_contained_by (generic function with 2 methods)

julia> @benchmark Euler2D.is_cell_contained_by($(sim1.cells[1][1]), $test_poly1)
BenchmarkTools.Trial: 10000 samples with 989 evaluations per sample.
 Range (min … max):  46.894 ns … 499.099 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     49.862 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   50.416 ns ±   8.729 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                    ▁▂▄▅▇██▆▄▃▃▂▂▂▂▂▃▃▄▃▃▂▁▁    ▁▁             ▂
  ▅██▇▅▆▆▆▇▇▆▃▂▂▅▆▇██████████████████████████▇█████▇▇▇▆▆▇▇▇▇▆▆ █
  46.9 ns       Histogram: log(frequency) by time      54.3 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Am I right to assume you didn’t load Revise via startup.jl? If you had, can you replicate this effect without it loaded?

With that assumption, this effect is entirely down to redefining the method, even with the exact same code (cell_polys both times). I think that warrants an edit to the title and the original post to omit the spurious factors of Revise and a name change. I’d also try the @benchmark twice before the @eval, in case there are any BenchmarkTools first-run shenanigans (very doubtful).

1 Like

Correct.

You are correct. Fresh Julia session, no Revise:

julia> @benchmark Euler2D.is_cell_contained_by($(sim1.cells[1][1]), $test_poly1)
BenchmarkTools.Trial: 10000 samples with 719 evaluations per sample.
 Range (min … max):  175.549 ns …  56.964 μs  ┊ GC (min … max):  0.00% … 99.38%
 Time  (median):     193.399 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   260.331 ns ± 669.629 ns  ┊ GC (mean ± σ):  11.42% ±  7.87%

  ▃▅▆█▇▅▃▁                      ▂▄▃▃▂▂▃▃▄▃▃▂▂▁                  ▂
  █████████▇▇▇▆▆▆▅▆▆▆▆▅▄▅▅▄▅▄▆▅▅███████████████▆▅▅▅▃▅▄▄▄▆▄▁▅▃▅▄ █
  176 ns        Histogram: log(frequency) by time        453 ns <

 Memory estimate: 464 bytes, allocs estimate: 5.

julia> @benchmark Euler2D.is_cell_contained_by($(sim1.cells[1][1]), $test_poly1)
BenchmarkTools.Trial: 10000 samples with 714 evaluations per sample.
 Range (min … max):  176.485 ns …  58.552 μs  ┊ GC (min … max):  0.00% … 99.33%
 Time  (median):     197.097 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   289.968 ns ± 864.271 ns  ┊ GC (mean ± σ):  12.22% ±  7.10%

     ▅█                                                          
  ▂▄▄██▇▄▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▃▃▃▄▄▄▄▄▃▂▂▁▁▁▁▁▁▁▁▁▁ ▂
  176 ns           Histogram: frequency by time          408 ns <

 Memory estimate: 464 bytes, allocs estimate: 5.

julia> @eval Euler2D function is_cell_contained_by(cell1, poly)
           cell_polys = cell_boundary_polygon(cell1)
           return all(edge_starts(cell_polys)) do pt
               return PlanePolygons.point_inside_strict(poly, pt)
           end
       end
is_cell_contained_by (generic function with 2 methods)

julia> @benchmark Euler2D.is_cell_contained_by($(sim1.cells[1][1]), $test_poly1)
BenchmarkTools.Trial: 10000 samples with 989 evaluations per sample.
 Range (min … max):  47.076 ns … 467.765 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     50.297 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   50.826 ns ±   8.059 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     ▄▆▆▇▇██▆▄▃▂▂▃▃▄▄▄▄▃▂▃▂▁  ▁▁▂▂▁      ▁▁    ▃
  ▆▆▇▅▆▇▇▇██▇▃▁▁▁▁▁▃▆████████████████████████████████▇▇▇██████ █
  47.1 ns       Histogram: log(frequency) by time      54.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

I do not think I can change the title of the topic, though.

I’ve changed the thread title, let me know in case you had some other title in mind.

Is the name change a required part of this behaviour btw? i.e. does this happen only when you change the name of the variable, not if you eval the same code (without name change) again?

The name change is not required, no. The @eval-d function is identical to the function in the source code of the module.

I believe I’ve seen this before. I thought there was a GitHub issue for it but it’s obscure and I can’t find it now. Maybe there isn’t. I thought it was something to do with inlining decisions, max_methods, and/or some other arcane internals.

EDIT: problem with the unreliable approximation of `Core.Compiler.return_type` · Issue #35800 · JuliaLang/julia · GitHub seems to have links to a number of similar issues. It may be related.

2 Likes

Probably a participation permissions thing, which is a bit backwards imo. I edited the edit to fit the current findings.

Could try @code_warntype, JET.jl, or Cthulhu.jl to check if redefinition changes type inference, though the caveat is those redo type inference instead of showing how the compiler actually did it, and at least @code_warntype is known to overspecialize for special-cased inputs in some circumstances (Function, Type, Vararg).

In particular:

The title of the former issue seems especially relevant to this post. However all three of those issues perhaps have the same underlying cause (deep in the design of the compiler and the language), which may or may not be addressable. That is, the inference is stateful.

4 Likes

I’m a bit surprised that you’re seeing this when the method you’re calling is the non-reentrant one. I’ve seen this many times, but thought it only happened when a runtime call triggers compilation of the “outer” method before anything has triggered compilation of the “inner” method of a reentrant method definition like the second one. So I’m wondering:

  • Is there a call to is_cell_contained_by somewhere inside the call graph of the first method? For example, inside cell_boundary_polygon or edge_starts?
  • Does the problem go away if you never define the second method?
1 Like
  • There is not a call to is_cell_contained_by in the call graph of the first method.
  • The problem does not go away if I leave the second method undefined.

@code_warntype output:

Before redefining the method
julia> @code_warntype Euler2D.is_cell_contained_by((sim1.cells[1][1]), test_poly1)                                                                                     
MethodInstance for Euler2D.is_cell_contained_by(::TangentQuadCell{Float64, 3, 12}, ::SClosedPolygon{4, Float64})                                                       
  from is_cell_contained_by(cell1, poly) @ Euler2D ~/projects/stce/shockad/Euler2D.jl/src/cell_simulations/cells.jl:42                                                 
Arguments                                                                                                                                                              
  #self#::Core.Const(Euler2D.is_cell_contained_by)                                                                                                                     
  cell1::TangentQuadCell{Float64, 3, 12}                                                                                                                               
  poly::SClosedPolygon{4, Float64}                                                                                                                                     
Locals                                                                                                                                                                 
  #81::Euler2D.var"#81#82"{SClosedPolygon{4, Float64}}                                                                                                                 
  cell_polys::SClosedPolygon{4, Float64}                                                                                                                               
Body::Bool                                                                                                                                                             
1 ─ %1  = Euler2D.cell_boundary_polygon::Core.Const(Euler2D.cell_boundary_polygon)                                                                                     
│         (cell_polys = (%1)(cell1))                                                                                                                                   
│   %3  = Euler2D.all::Core.Const(all)                                                                                                                                 
│   %4  = Euler2D.:(var"#81#82")::Core.Const(Euler2D.var"#81#82")                                                                                                      
│   %5  = Core.typeof(poly)::Core.Const(SClosedPolygon{4, Float64})                                                                                                    
│   %6  = Core.apply_type(%4, %5)::Core.Const(Euler2D.var"#81#82"{SClosedPolygon{4, Float64}})                                                                         
│         (#81 = %new(%6, poly))                                                                                                                                       
│   %8  = #81::Euler2D.var"#81#82"{SClosedPolygon{4, Float64}}                                                                                                         
│   %9  = Euler2D.edge_starts::Core.Const(PlanePolygons.edge_starts)                                                                                                   
│   %10 = cell_polys::SClosedPolygon{4, Float64}                                                                                                                       
│   %11 = (%9)(%10)::SVector{4, SVector{2, Float64}}                                                                                                                   
│   %12 = (%3)(%8, %11)::Bool                                                                                                                                          
└──       return %12
After redefining the method
julia> @code_warntype Euler2D.is_cell_contained_by((sim1.cells[1][1]), test_poly1)
MethodInstance for Euler2D.is_cell_contained_by(::TangentQuadCell{Float64, 3, 12}, ::SClosedPolygon{4, Float64})
  from is_cell_contained_by(cell1, poly) @ Euler2D REPL[4]:1
Arguments
  #self#::Core.Const(Euler2D.is_cell_contained_by)
  cell1::TangentQuadCell{Float64, 3, 12}
  poly::SClosedPolygon{4, Float64}
Locals
  #244::Euler2D.var"#244#245"{SClosedPolygon{4, Float64}}
  cell_polys::SClosedPolygon{4, Float64}
Body::Bool
1 ─       (cell_polys = Euler2D.cell_boundary_polygon(cell1))
│   %2  = Euler2D.all::Core.Const(all)
│   %3  = Euler2D.:(var"#244#245")::Core.Const(Euler2D.var"#244#245")
│   %4  = Core.typeof(poly)::Core.Const(SClosedPolygon{4, Float64})
│   %5  = Core.apply_type(%3, %4)::Core.Const(Euler2D.var"#244#245"{SClosedPolygon{4, Float64}})
│         (#244 = %new(%5, poly))
│   %7  = #244::Euler2D.var"#244#245"{SClosedPolygon{4, Float64}}
│   %8  = cell_polys::SClosedPolygon{4, Float64}
│   %9  = Euler2D.edge_starts(%8)::SVector{4, SVector{2, Float64}}
│   %10 = (%2)(%7, %9)::Bool
└──       return %10

@inferred succeeds in both cases.

2 Likes

The @code_warntype reports do differ, but the program and inferred types don’t seem to. I don’t know if there’s a deeper significance, but it just seems like an arbitrary decision to opt out of explicitly identifying some callee functions as Const and binding them to temporary names. That doesn’t imply the compiler actually stops knowing they are const after redefinition; otherwise we’d expect to lose performance. Wonder if @code_llvm can catch something?

Just for a sanity check, have you also cached the results (maybe from a single run) and compared them before and after the redefinition, if that’s even possible? Should also make sure that both versions are giving the correct results.

The interesting part of this is the examples in #35800 show a type inference discrepancy before and after compilation+execution, but this example’s optimization change is strictly observed in execution before and after redefinition and isn’t yet shown to be caused by type inference changes. Could try some Base.return_types or @code_warntype reports before the benchmarks as well?

The examples in #35800 also appeared to stop reproducing after v1.8; this example doesn’t reproduce in v1.11.6 because Base.return_types populates mi.cache. So while there may be a shared underlying cause, it’s at least somewhat different and could result in a new MWE for the issue.

Interesting, I didn’t realize that. But this kind of stateful behavior is still encountered from time to time, see for example Type instability in nested quadgk calls

1 Like

At least this linked MWE still works on v1.11.6 (you can just redefine both methods before uncommenting the call in the same session to improve @code_warntype). However, I added @benchmark reports to that, and the improved type inference doesn’t improve runtime performance (median 2.3us, 44 allocations of 2.55 KiB) for that example. Whether that’s down to differences in the program or this particular redefinition optimization phenomenon being fundamentally different, I don’t know. Part of the problem is we don’t really have a standard protocol for discovering compiler statefulness.