julia> @benchmark run4(2000,3000)
BenchmarkTools.Trial: 187 samples with 1 evaluation.
Range (min … max): 25.048 ms … 29.405 ms ┊ GC (min … max): 0.00% … 2.95%
Time (median): 27.816 ms ┊ GC (median): 0.00%
Time (mean ± σ): 26.838 ms ± 1.716 ms ┊ GC (mean ± σ): 2.25% ± 3.49%
▁█ ▅ ▄ ▅
██▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▄█▆▅▁▁▄▁▁▁▁▁▁▁▁▄█ ▄
25 ms Histogram: log(frequency) by time 29.3 ms <
Memory estimate: 22.89 MiB, allocs estimate: 2.
julia> @benchmark run_julia(2000,3000)
BenchmarkTools.Trial: 19 samples with 1 evaluation.
Range (min … max): 273.904 ms … 285.103 ms ┊ GC (min … max): 1.36% … 0.26%
Time (median): 276.711 ms ┊ GC (median): 1.35%
Time (mean ± σ): 277.268 ms ± 2.543 ms ┊ GC (mean ± σ): 1.30% ± 0.25%
▃ ▃█ █ ▃
▇▁▁▁▁▁▁▁▁█▇██▁▁▇▁▇█▇█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
274 ms Histogram: frequency by time 285 ms <
Memory estimate: 68.66 MiB, allocs estimate: 4.
This is single threaded. Code:
julia> macro vp(expr)
nodes = (Symbol("llvm.loop.vectorize.predicate.enable"), 1)
if expr.head != :for
error("Syntax error: loopinfo needs a for loop")
end
push!(expr.args[2].args, Expr(:loopinfo, nodes))
return esc(expr)
end
julia> function run4(height, width)
y = range(-1.0f0, 0.0f0; length = height)
x = range(-1.5f0, 0.0f0; length = width)
fractal = fill(Int32(20), height, width)
@inbounds @fastmath for w in 1:width
@vp for h in 1:height
z_re = _c_re = x[w]
z_im = _c_im = y[h]
m = true
Base.Cartesian.@nexprs 20 i -> begin
z_re,z_im = _c_re + z_re*z_re - z_im*z_im, _c_im + 2*z_re*z_im
az4 = (z_re*z_re + z_im*z_im) > 4f0
fractal[h, w] = ifelse(m & az4,i%Int32,fractal[h,w])
m &= (!az4)
end
end
end
return fractal
end
This also works quite well with awkward combinations w/ respect to vector length
julia> @benchmark run_julia(10,10)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
Range (min … max): 3.316 μs … 189.292 μs ┊ GC (min … max): 0.00% … 95.85%
Time (median): 3.414 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.580 μs ± 2.479 μs ┊ GC (mean ± σ): 0.95% ± 1.36%
▅ ▆█ ▁
▅█▆██▇▄▃▂▂▂▂▂▂▄▆▃▃▂▃▇█▆▄▃▃▃▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
3.32 μs Histogram: frequency by time 4.46 μs <
Memory estimate: 1.36 KiB, allocs estimate: 2.
julia> @benchmark run4(10,10)
BenchmarkTools.Trial: 10000 samples with 57 evaluations.
Range (min … max): 896.930 ns … 23.460 μs ┊ GC (min … max): 0.00% … 92.93%
Time (median): 905.561 ns ┊ GC (median): 0.00%
Time (mean ± σ): 936.896 ns ± 423.935 ns ┊ GC (mean ± σ): 0.97% ± 2.06%
▅██▅▄▃ ▁▃▃▁ ▃▃▁ ▂▄▂▁▁▁ ▂
████████▇█████▆▅▅▃▃▃▃▁▃▄▁▆█▆▄█████████████▇▇▅▆▄▅▅▅▃▄▃▄▅▅▄▆▆▇▇ █
897 ns Histogram: log(frequency) by time 1.12 μs <
Memory estimate: 496 bytes, allocs estimate: 1.
julia> @benchmark run_julia(16,16)
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
Range (min … max): 8.377 μs … 960.083 μs ┊ GC (min … max): 0.00% … 93.93%
Time (median): 8.663 μs ┊ GC (median): 0.00%
Time (mean ± σ): 8.933 μs ± 9.612 μs ┊ GC (mean ± σ): 1.01% ± 0.94%
█▃ ▁▁
▁▂████▅▇██▆▄▃▂▂▂▂▂▄▆▇▆▆▅▄▄▃▃▂▂▂▂▂▂▂▂▂▁▁▁▁▂▁▁▁▁▁▁▁▂▂▂▂▂▂▂▁▁▁ ▂
8.38 μs Histogram: frequency by time 9.96 μs <
Memory estimate: 3.19 KiB, allocs estimate: 2.
julia> @benchmark run4(16,16)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.236 μs … 147.251 μs ┊ GC (min … max): 0.00% … 97.03%
Time (median): 1.306 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.438 μs ± 1.935 μs ┊ GC (mean ± σ): 1.86% ± 1.38%
▅█▆▃▂▂▁ ▅▄▁ ▅▆▃▂ ▃▃▂▁ ▂
▄▁▁███████▇▅▄▁▁▁███▆▇▅▇███████████████▇▇▆▆▆▆▆▅▆▅▆▆▆▆▅▅▄▅▃▄▅ █
1.24 μs Histogram: log(frequency) by time 2 μs <
Memory estimate: 1.06 KiB, allocs estimate: 1.