Question about performance when iterating

Hi!
Really like julia and want to learn how to debug performance better.

I’m doing a project on iterated maps, and started writing in python and then decided to try julia. But when benchmarking, the python version ran faster, but I can’t figure out why.

The question about what to vectorize, when to use dot notation etc in julia is not entirely clear to me, if you have any general advice about this I’d be happy to receive it.

This is the vectorized python code

import numpy as np

def henon(x, a=1.4, b=0.3):
    x = np.array(x)
    x, y = x[:, 0], x[:, 1]
    return np.stack([1-a*x**2+y, b*x], -1)

# in IPython REPL

# 10 000 dummy starting points
y = 0.34 * np.ones((10000, 2))                                                                   

# %%time 
for _ in range(10000): 
    y = henon(y) 
                                                                                            
#CPU times: user 731 ms, sys: 0 ns, total: 731 ms
#Wall time: 731 ms

And here is the Julia version, one loop, and one vectorized

# in the REPL
henon(x;a=1.4,b=0.3) = [1 - a*x[1]^2 - x[2], b*x[1]]
vectorized_henon(x;a=1.4,b=0.3) = cat(1 .- a.*x[:,1].^2 .- x[:,2], b.*x[:,1], dims=2)

iter(f, x, k) = (for _=1:k x=f(x) end; x)

# 10 000 dummy points, an array of arrays, and as a 2d array
y_vec = [ones(2)*0.34 for _=1:10000]
y_mat = ones(10000,2)*0.34

# for compilation
iter.(henon, y_vec, 10000)
iter(vectorized_henon, y_mat, 10000)

@time iter.(henon, y_vec, 10000)
# 5.148718 seconds (100.00 M allocations: 8.941 GiB, 30.70% gc time)
@time iter(vectorized_henon, y_mat, 10000)
# 1.236298 seconds (410.00 k allocations: 5.231 GiB, 12.20% gc time)

I’m sure I’m missing something basic here, but please tell me how to make this code more efficient.

Best wishes
Johannes

The biggest inefficiency that I see here right off the bat is that you’re creating a bunch of Vectors with only two arguments which wastes a lot of time allocating memory. If instead, you use Tuples, then henon will outperform vectorized_henon:

henon(x;a=1.4,b=0.3) = [1 - a*x[1]^2 - x[2], b*x[1]]
henon2(x;a=1.4,b=0.3) = (1 - a*x[1]^2 - x[2], b*x[1])
vectorized_henon(x;a=1.4,b=0.3) = cat(1 .- a.*x[:,1].^2 .- x[:,2], b.*x[:,1], dims=2)


iter(f, x, k) = (for _=1:k x=f(x) end; x)

y_vec = [ones(2)*0.34 for _=1:10000]
y_mat = ones(10000,2)*0.34

#compilation
iter.(henon, y_vec, 10000)
iter.(henon2, y_vec, 10000)
iter(vectorized_henon, y_mat, 10000);

@time iter.(henon, y_vec, 10000)
#   4.501728 seconds (100.00 M allocations: 8.941 GiB, 22.01% gc time)
@time iter.(henon2, y_vec, 10000)
#   0.446856 seconds (11 allocations: 156.539 KiB)
@time iter(vectorized_henon, y_mat, 10000);
#   3.293624 seconds (410.00 k allocations: 5.231 GiB, 18.90% gc time)

You can get fancier with this using something like HybridArrays.jl, but this is a good start.

1 Like

The first thing I would recommend is to simplify your benchmarking example. There’s no need to iterate manually over a function 10000 times, because we have GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language to do that for us. And I’ll assert that there’s rarely any need to manually vectorize code like vectorized_henon in Julia, so I’ll leave that aside for now. Let’s time your fundamental operation:

julia> henon(x;a=1.4,b=0.3) = [1 - a*x[1]^2 - x[2], b*x[1]]
henon (generic function with 1 method)

julia> using BenchmarkTools

julia> @btime henon(x) setup=(x = ones(2) * 0.34)
  22.259 ns (1 allocation: 96 bytes)
2-element Array{Float64,1}:
 0.49816            
 0.10200000000000001

Your iter loop just consists of doing that operation 10000 * 10000 times, but we’ve already gotten a pretty good performance estimate of 22ns per iteration without doing any of that:

julia> 22e-9 * 10000 * 10000
2.1999999999999997

which is close enough to the 1.23s you saw (and might be different because I’m on a different machine).

Anyway, let’s speed this up. As is so often the case, you’re repeatedly constructing small, fixed-size arrays. This is a perfect use case for https://github.com/JuliaArrays/StaticArrays.jl

julia> using StaticArrays

julia> henon_static(x;a=1.4,b=0.3) = SVector(1 - a*x[1]^2 - x[2], b*x[1])
henon_static (generic function with 1 method)

julia> @btime henon_static(x) setup=(x = SVector(0.34, 0.34))
  0.017 ns (0 allocations: 0 bytes)
2-element SArray{Tuple{2},Float64,1,2} with indices SOneTo(2):
 0.49816            
 0.10200000000000001

Whoops, that’s too fast to be believable. Looks like the compiler has been able to optimize this function down to a constant answer. We can add a Ref to force the compiler to actually do the computation:

julia> @btime henon_static(x[]) setup=(x = Ref(SVector(0.34, 0.34)))
  1.733 ns (0 allocations: 0 bytes)
2-element SArray{Tuple{2},Float64,1,2} with indices SOneTo(2):
 0.49816            
 0.10200000000000001

That’s more like it. 1-2ns is about what I would expect for such a simple operation. And we’re now more than 10x faster than your original code, and we’re allocating zero memory. Hooray for StaticArrays!

8 Likes

Thank you both! This was very helpful. You both provided solutions to my problem.