Slow Dict comprehension

bbejanov · February 17, 2023, 4:51pm

If I construct a Dict using a comprehension I get orders of magnitude slower speed, compared to starting with an empty Dict and running an equivalent loop to fill it in. I’m surprised, because I expected both ways to be roughly of the same time. What am I missing?

Just to be clear, I’m not looking for how to write the fastest code here, just wondering what’s going on.

Thanks!

julia> @time Dict{Symbol,Any}(Symbol(:a,i) => fill(float(i), 10) for i = 1:500);
  0.122558 seconds (40.33 k allocations: 2.259 MiB, 97.40% compilation time)

julia> @time begin
          p = Dict{Symbol,Any}()
          for i = 1:500
             p[Symbol(:a,i)] = fill(float(i), 10)
          end
       end;
  0.000981 seconds (3.01 k allocations: 226.766 KiB)

julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, ivybridge)
Environment:
  JULIA_EDITOR = code

Oscar_Smith · February 17, 2023, 5:17pm

I think you’re just measuring compilation time in the first but not the second. BenchmarkTools is useful here.

RobertGregg · February 17, 2023, 7:47pm

Yep, look like this is the case

julia> using BenchmarkTools

julia> @benchmark Dict{Symbol,Any}(Symbol(:a,i) => fill(float(i), 10) for i = 1:500)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  138.584 μs … 888.500 μs  ┊ GC (min … max): 0.00% … 75.28%
 Time  (median):     140.709 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   145.450 μs ±  48.509 μs  ┊ GC (mean ± σ):  2.11% ±  5.28%

   ▃▆██▇▆▆▆▆▅▄▄▄▃▃▂▂▁              ▁▁▂▂▂▂▁  ▁▁ ▁                ▂
  ▇██████████████████▇▇▆▅▆▇██▇▇▇█▇████████████████▇▇▆▆▅▅▃▄▃▅▄▄▄ █
  139 μs        Histogram: log(frequency) by time        159 μs <

 Memory estimate: 213.00 KiB, allocs estimate: 3007.

julia> @benchmark begin
                 p = Dict{Symbol,Any}()
                 for i = 1:500
                    p[Symbol(:a,i)] = fill(float(i), 10)
                 end
              end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  138.416 μs …  1.254 ms  ┊ GC (min … max): 0.00% … 79.59%
 Time  (median):     141.500 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   147.913 μs ± 74.061 μs  ┊ GC (mean ± σ):  3.23% ±  5.67%

    ▄▆▇▇▇██▇▆▅▄▃▄▄▄▃▁             ▂▁▂▂▂▂▂▁▁▁                   ▃
  ▃██████████████████▇▆▆▇███▇██▆████████████████▆▆▆▆▅▄▄▆▅▄▄▄▄▃ █
  138 μs        Histogram: log(frequency) by time       161 μs <

 Memory estimate: 218.95 KiB, allocs estimate: 3013.

bbejanov · February 17, 2023, 8:13pm

Thanks, this is helpful!

I’m still puzzled why the comprehension takes so long to compile, while the compilation time of the plain for-loop is negligible.

I guess this means that comprehensions are okay to use in functions, but would always slow down scripts, since scripts get compiled every time they run.

mortenpi · February 18, 2023, 1:14am

The compilation time doesn’t scale with N though:

julia> @time Dict{Symbol,Any}((Symbol(:a,i) => fill(float(i), 10) for i = 1:5_000_000));
  7.885567 seconds (35.03 M allocations: 2.072 GiB, 7.24% gc time, 0.22% compilation time)

julia> @time begin
                 p = Dict{Symbol,Any}()
                 for i = 1:5_000_000
                    p[Symbol(:a,i)] = fill(float(i), 10)
                 end
              end;
  7.877991 seconds (35.00 M allocations: 2.200 GiB)

So I would say you’re fine using a comprehension in practice. It’s unlikely you’ll have hundreds of these in your top-level script?

My hunch is that the reason that this particular for loop doesn’t require compilation is that the operations (looping over a range, assigning to a Dict{Symbol,Any}, fill(::Float64, ::Int) etc.) are all already compiled into the Base sysimage.

$ julia --sysimage-native-code=no
julia> @time begin
                 p = Dict{Symbol,Any}()
                 for i = 1:500
                    p[Symbol(:a,i)] = fill(float(i), 10)
                 end
              end;
  0.030967 seconds (4.89 k allocations: 307.391 KiB, 98.49% compilation time)

At the same time, the comprehension can’t be cached, because it constructs a unique iterator for each call, each of which have their own unique anonymous function that needs to be compiled:

julia> itr = (Symbol(:a,i) => fill(float(i), 10) for i = 1:500)
Base.Generator{UnitRange{Int64}, var"#3#4"}(var"#3#4"(), 1:500)

julia> itr = (Symbol(:a,i) => fill(float(i), 10) for i = 1:500)
Base.Generator{UnitRange{Int64}, var"#5#6"}(var"#5#6"(), 1:500)

Note var"#3#4" and var"#5#6".

bbejanov · February 20, 2023, 3:48pm

Ok, that makes sense, I get it now. Thanks for sharing this insight @mortenpi!

Topic		Replies	Views
For-loop vs list-comprehension Performance	15	2031	April 7, 2021
Comprehensions versus pre-allocation Performance question	16	2038	June 22, 2023
Allocations in Comprehensions Performance	2	828	March 1, 2019
Julia 1.9 vs. 1.8 performance for comprehension and splatting? Performance	0	449	April 15, 2023
Conditional array comprehension vs. simple for loop - 750k allocations vs. 1 allocation - why? Performance loops	14	1424	June 25, 2021

Slow Dict comprehension

Related topics