Hard to beat Numba / JAX loop for generating Mandelbrot

Elrod · June 14, 2022, 1:44pm

julia> @benchmark run4(2000,3000)
BenchmarkTools.Trial: 187 samples with 1 evaluation.
 Range (min … max):  25.048 ms … 29.405 ms  ┊ GC (min … max): 0.00% … 2.95%
 Time  (median):     27.816 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   26.838 ms ±  1.716 ms  ┊ GC (mean ± σ):  2.25% ± 3.49%

  ▁█                                    ▅    ▄              ▅
  ██▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▄█▆▅▁▁▄▁▁▁▁▁▁▁▁▄█ ▄
  25 ms        Histogram: log(frequency) by time      29.3 ms <

 Memory estimate: 22.89 MiB, allocs estimate: 2.

julia> @benchmark run_julia(2000,3000)
BenchmarkTools.Trial: 19 samples with 1 evaluation.
 Range (min … max):  273.904 ms … 285.103 ms  ┊ GC (min … max): 1.36% … 0.26%
 Time  (median):     276.711 ms               ┊ GC (median):    1.35%
 Time  (mean ± σ):   277.268 ms ±   2.543 ms  ┊ GC (mean ± σ):  1.30% ± 0.25%

           ▃ ▃█     █ ▃
  ▇▁▁▁▁▁▁▁▁█▇██▁▁▇▁▇█▇█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  274 ms           Histogram: frequency by time          285 ms <

 Memory estimate: 68.66 MiB, allocs estimate: 4.

This is single threaded. Code:

julia> macro vp(expr)
            nodes = (Symbol("llvm.loop.vectorize.predicate.enable"), 1)
            if expr.head != :for
                error("Syntax error: loopinfo needs a for loop")
            end
            push!(expr.args[2].args, Expr(:loopinfo, nodes))
            return esc(expr)
       end

julia> function run4(height, width)
           y = range(-1.0f0, 0.0f0; length = height)
           x = range(-1.5f0, 0.0f0; length = width)
           fractal = fill(Int32(20), height, width)
           @inbounds @fastmath for w in 1:width
               @vp for h in 1:height
                   z_re = _c_re = x[w]
                   z_im = _c_im = y[h]
                   m = true
                   Base.Cartesian.@nexprs 20 i -> begin
                       z_re,z_im = _c_re + z_re*z_re - z_im*z_im, _c_im + 2*z_re*z_im
                       az4 = (z_re*z_re + z_im*z_im) > 4f0
                       fractal[h, w] = ifelse(m & az4,i%Int32,fractal[h,w])
                       m &= (!az4)
                   end
               end
           end
           return fractal
       end

This also works quite well with awkward combinations w/ respect to vector length

julia> @benchmark run_julia(10,10)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.316 μs … 189.292 μs  ┊ GC (min … max): 0.00% … 95.85%
 Time  (median):     3.414 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.580 μs ±   2.479 μs  ┊ GC (mean ± σ):  0.95% ±  1.36%

   ▅ ▆█                ▁
  ▅█▆██▇▄▃▂▂▂▂▂▂▄▆▃▃▂▃▇█▆▄▃▃▃▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  3.32 μs         Histogram: frequency by time        4.46 μs <

 Memory estimate: 1.36 KiB, allocs estimate: 2.

julia> @benchmark run4(10,10)
BenchmarkTools.Trial: 10000 samples with 57 evaluations.
 Range (min … max):  896.930 ns …  23.460 μs  ┊ GC (min … max): 0.00% … 92.93%
 Time  (median):     905.561 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   936.896 ns ± 423.935 ns  ┊ GC (mean ± σ):  0.97% ±  2.06%

  ▅██▅▄▃   ▁▃▃▁                ▃▃▁ ▂▄▂▁▁▁                       ▂
  ████████▇█████▆▅▅▃▃▃▃▁▃▄▁▆█▆▄█████████████▇▇▅▆▄▅▅▅▃▄▃▄▅▅▄▆▆▇▇ █
  897 ns        Histogram: log(frequency) by time       1.12 μs <

 Memory estimate: 496 bytes, allocs estimate: 1.

julia> @benchmark run_julia(16,16)
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
 Range (min … max):  8.377 μs … 960.083 μs  ┊ GC (min … max): 0.00% … 93.93%
 Time  (median):     8.663 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.933 μs ±   9.612 μs  ┊ GC (mean ± σ):  1.01% ±  0.94%

     █▃   ▁▁
  ▁▂████▅▇██▆▄▃▂▂▂▂▂▄▆▇▆▆▅▄▄▃▃▂▂▂▂▂▂▂▂▂▁▁▁▁▂▁▁▁▁▁▁▁▂▂▂▂▂▂▂▁▁▁ ▂
  8.38 μs         Histogram: frequency by time        9.96 μs <

 Memory estimate: 3.19 KiB, allocs estimate: 2.

julia> @benchmark run4(16,16)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.236 μs … 147.251 μs  ┊ GC (min … max): 0.00% … 97.03%
 Time  (median):     1.306 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.438 μs ±   1.935 μs  ┊ GC (mean ± σ):  1.86% ±  1.38%

     ▅█▆▃▂▂▁      ▅▄▁    ▅▆▃▂   ▃▃▂▁                          ▂
  ▄▁▁███████▇▅▄▁▁▁███▆▇▅▇███████████████▇▇▆▆▆▆▆▅▆▅▆▆▆▆▅▅▄▅▃▄▅ █
  1.24 μs      Histogram: log(frequency) by time         2 μs <

 Memory estimate: 1.06 KiB, allocs estimate: 1.

Topic		Replies	Views
Julia & Mojo Mandelbrot Benchmark General Usage mojo	57	27084	October 15, 2023
Quite bad performance of Julia 0.6.4 vs Python+Numpy General Usage	26	5393	November 13, 2018
Julia Beginner (from Python): Numba outperforms Julia in rewrite. Any tips to improve performance? Performance benchmark , python , tullio , loopvectorization	56	6628	August 18, 2021
Improving an algorithm that compute gps distances Performance tullio	41	2208	April 27, 2020
General questions from Python user Performance	59	4584	March 8, 2021

Hard to beat Numba / JAX loop for generating Mandelbrot

Related topics