Why does profiling suggest that integer comparison dominates broadcasted assignment?

julia> a = rand(40000,4000); b = rand(40000,4000);

julia> Profile.clear()

julia> @bprofile $a[1:end, 1:end] .= $b
BenchmarkTools.Trial: 9 samples with 1 evaluation.
 Range (min … max):  571.819 ms … 586.555 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     574.424 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   575.275 ms ±   4.457 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █    ▁ ▁  █    ▁▁                                           ▁  
  █▁▁▁▁█▁█▁▁█▁▁▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  572 ms           Histogram: frequency by time          587 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
    ╎4752 @Base/client.jl:535; _start()
    ╎ 4752 @Base/client.jl:561; repl_main
    ╎  4752 @Base/client.jl:424; run_main_repl(interactive::Bool, quiet::Bool, banner::Symbol, history_file::Bool, color_set::Bool)
    ╎   4752 @Base/essentials.jl:1017; invokelatest
    ╎    4752 @Base/essentials.jl:1020; #invokelatest#2
    ╎     4752 @Base/client.jl:440; (::Base.var"#1100#1102"{Bool, Symbol, Bool})(REPL::Module)
    ╎    ╎ 4752 …a-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:447; run_repl(repl::REPL.AbstractREPL, consumer::Any)
    ╎    ╎  4752 …-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:461; run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool, backend::…
    ╎    ╎   4752 …-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:302; kwcall(::NamedTuple, ::typeof(REPL.start_repl_backend), backend::REPL.REPLBackend, consu…
    ╎    ╎    4752 …-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:305; start_repl_backend(backend::REPL.REPLBackend, consumer::Any; get_module::Function)
    ╎    ╎     4752 …master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:320; repl_backend_loop(backend::REPL.REPLBackend, get_module::Function)
    ╎    ╎    ╎ 4752 …master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:224; eval_user_input(ast::Any, backend::REPL.REPLBackend, mod::Module)
    ╎    ╎    ╎  4752 @Base/boot.jl:428; eval
    ╎    ╎    ╎   4752 @BenchmarkTools/src/execution.jl:126; run(b::BenchmarkTools.Benchmark)
    ╎    ╎    ╎    4752 @BenchmarkTools/src/execution.jl:126; run
    ╎    ╎    ╎     4752 @BenchmarkTools/src/execution.jl:134; run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, n…
    ╎    ╎    ╎    ╎ 4752 @BenchmarkTools/src/execution.jl:40; run_result
    ╎    ╎    ╎    ╎  4752 @BenchmarkTools/src/execution.jl:41; #run_result#45
    ╎    ╎    ╎    ╎   4752 @Base/essentials.jl:1017; invokelatest
    ╎    ╎    ╎    ╎    4752 @Base/essentials.jl:1020; #invokelatest#2
    ╎    ╎    ╎    ╎     4752 @BenchmarkTools/src/execution.jl:102; _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters)
    ╎    ╎    ╎    ╎    ╎ 525  @BenchmarkTools/src/execution.jl:109; _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; verbose::Bool, pad::String, kwarg…
    ╎    ╎    ╎    ╎    ╎  525  @BenchmarkTools/src/execution.jl:556; var"##sample#344"(::Tuple{Matrix{Float64}, Matrix{Float64}}, __params::BenchmarkTools.Parameters)
    ╎    ╎    ╎    ╎    ╎   525  @BenchmarkTools/src/execution.jl:547; var"##core#343"(a#341::Matrix{Float64}, b#342::Matrix{Float64})
    ╎    ╎    ╎    ╎    ╎    525  @Base/broadcast.jl:875; materialize!
    ╎    ╎    ╎    ╎    ╎     525  @Base/broadcast.jl:878; materialize!
    ╎    ╎    ╎    ╎    ╎    ╎ 525  @Base/broadcast.jl:920; copyto!
    ╎    ╎    ╎    ╎    ╎    ╎  525  @Base/broadcast.jl:961; copyto!
    ╎    ╎    ╎    ╎    ╎    ╎   525  @Base/abstractarray.jl:1061; copyto!
  60╎    ╎    ╎    ╎    ╎    ╎    60   @Base/abstractarray.jl:0; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRa…
    ╎    ╎    ╎    ╎    ╎    ╎    58   @Base/abstractarray.jl:1116; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{Uni…
    ╎    ╎    ╎    ╎    ╎    ╎     58   @Base/abstractarray.jl:1411; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 58   @Base/abstractarray.jl:1441; _setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  58   @Base/subarray.jl:366; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   58   @Base/array.jl:979; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    58   @Base/abstractarray.jl:1345; _to_linear_index
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     58   @Base/abstractarray.jl:2975; _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 58   @Base/abstractarray.jl:2991; _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  58   @Base/abstractarray.jl:3007; _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   58   @Base/abstractarray.jl:3007; _sub2ind_recurse
  58╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    58   @Base/int.jl:88; *
    ╎    ╎    ╎    ╎    ╎    ╎    407  @Base/abstractarray.jl:1120; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{Uni…
    ╎    ╎    ╎    ╎    ╎    ╎     407  @Base/multidimensional.jl:422; iterate
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 78   @Base/multidimensional.jl:446; __inc
  78╎    ╎    ╎    ╎    ╎    ╎    ╎  78   @Base/int.jl:87; +
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 329  @Base/multidimensional.jl:447; __inc
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  329  @Base/operators.jl:276; !=
 328╎    ╎    ╎    ╎    ╎    ╎    ╎   329  @Base/promotion.jl:620; ==
   1╎    ╎    ╎    ╎    ╎ 4227 @BenchmarkTools/src/execution.jl:115; _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; verbose::Bool, pad::String, kwarg…
    ╎    ╎    ╎    ╎    ╎  4226 @BenchmarkTools/src/execution.jl:556; var"##sample#344"(::Tuple{Matrix{Float64}, Matrix{Float64}}, __params::BenchmarkTools.Parameters)
    ╎    ╎    ╎    ╎    ╎   4226 @BenchmarkTools/src/execution.jl:547; var"##core#343"(a#341::Matrix{Float64}, b#342::Matrix{Float64})
    ╎    ╎    ╎    ╎    ╎    4226 @Base/broadcast.jl:875; materialize!
    ╎    ╎    ╎    ╎    ╎     4226 @Base/broadcast.jl:878; materialize!
    ╎    ╎    ╎    ╎    ╎    ╎ 4226 @Base/broadcast.jl:920; copyto!
    ╎    ╎    ╎    ╎    ╎    ╎  4226 @Base/broadcast.jl:961; copyto!
    ╎    ╎    ╎    ╎    ╎    ╎   4226 @Base/abstractarray.jl:1061; copyto!
 527╎    ╎    ╎    ╎    ╎    ╎    527  @Base/abstractarray.jl:0; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRa…
    ╎    ╎    ╎    ╎    ╎    ╎    478  @Base/abstractarray.jl:1116; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{Uni…
    ╎    ╎    ╎    ╎    ╎    ╎     477  @Base/abstractarray.jl:1411; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 477  @Base/abstractarray.jl:1441; _setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  477  @Base/subarray.jl:366; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   474  @Base/array.jl:979; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    474  @Base/abstractarray.jl:1345; _to_linear_index
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     474  @Base/abstractarray.jl:2975; _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 474  @Base/abstractarray.jl:2991; _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  474  @Base/abstractarray.jl:3007; _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   474  @Base/abstractarray.jl:3007; _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    @Base/abstractarray.jl:3014; offsetin
   1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1    @Base/int.jl:86; -
 473╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    473  @Base/int.jl:88; *
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   3    @Base/subarray.jl:293; reindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    3    @Base/array.jl:3058; getindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     3    @Base/range.jl:932; _getindex
   3╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 3    @Base/int.jl:87; +
   1╎    ╎    ╎    ╎    ╎    ╎     1    @Base/essentials.jl:882; getindex
    ╎    ╎    ╎    ╎    ╎    ╎    3218 @Base/abstractarray.jl:1120; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{Uni…
    ╎    ╎    ╎    ╎    ╎    ╎     3218 @Base/multidimensional.jl:422; iterate
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 519  @Base/multidimensional.jl:446; __inc
 519╎    ╎    ╎    ╎    ╎    ╎    ╎  519  @Base/int.jl:87; +
   8╎    ╎    ╎    ╎    ╎    ╎    ╎ 2699 @Base/multidimensional.jl:447; __inc
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  2691 @Base/operators.jl:276; !=
2691╎    ╎    ╎    ╎    ╎    ╎    ╎   2691 @Base/promotion.jl:620; ==
    ╎    ╎    ╎    ╎    ╎    ╎    3    @Base/abstractarray.jl:1121; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{Uni…
    ╎    ╎    ╎    ╎    ╎    ╎     3    @Base/range.jl:902; iterate
   3╎    ╎    ╎    ╎    ╎    ╎    ╎ 3    @Base/promotion.jl:620; ==
Total snapshots: 4752. Utilization: 100% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.

I’m puzzled at why comparing Ints in iterating over a CartesianIndices appears to be the most expensive step of all? I wonder if this may be improved? At first glance, I would have expected setindex! to dominate.

Interesting, this is not what I get on neither Julia 1.9, nor Julia 1.10. It takes ~100 ms for me, nearly all time is spent in setindex!, and it compiles to nearly optimal code (memcpy takes 78 ms).
I’m on a Zen4 AMD x86_64 CPU.

Yes sorry I was on nightly.

julia> versioninfo()
Julia Version 1.11.0-DEV.1442
Commit c16472b0014 (2024-02-01 14:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  LD_LIBRARY_PATH = :/usr/lib/x86_64-linux-gnu/gtk-3.0/modules
  JULIA_EDITOR = subl

This is a regression on nightly, it goes from 100 → 300 ms for me. Will you make an issue?

Edit: And it’s not fixed by d54a455, which only applies to vectors.

2 Likes
1 Like