Performance and allocation issue with arrays of functions (v1.5)

gideonsimpson · August 16, 2020, 12:14am

I have an ODE type problem where I generate a sequence of values, x_n, that, in principle, may be in a comparatively high dimensional space. Often, I don’t need to record teh full values, but rather, just some scalar valued functions f_k(x_n), k=1,\ldots, K, where K is much smaller than the dimension of x_n. I wrote the following code to illustrate a performance issue I’ve noticed:

function compute_values(x₀, Δt, nΔt, f)
    x = x₀;
    n_f = length(f);
    f_vals = zeros(n_f, nΔt)
    
    for n in 1:nΔt
        x += 0.5 * Δt * x
        for k in 1:n_f
           f_vals[k,n] = f[k](x); 
        end
    end
    return f_vals
end

x₀ = 1.0;
Δt = 0.5;
nΔt = 10^2;

function f1(x)
    return sin(x);
end

function f2(x)
    return x^2;
end

If I now doe:

@btime compute_values(x₀, Δt, nΔt, [f1]);
@btime compute_values(x₀, Δt, nΔt, [f2]);
@btime compute_values(x₀, Δt, nΔt, [f1,f2]);

it returns:

  259.131 ns (2 allocations: 976 bytes)
  269.878 ns (2 allocations: 976 bytes)
  15.101 μs (303 allocations: 6.56 KiB)

which is not what I anticipated. Both the speed and the memory allocation have gone up by over an order of magnitude when I try to have the computation performed with both functions, even though there is no trouble with a single function.

jonniedie · August 16, 2020, 1:32am

Try a tuple instead of an array. With an array, the compiler doesn’t know what type it’s going to get when it grabs one (each function is a separate type).

gideonsimpson · August 16, 2020, 1:41am

That doesn’t seem to help very much:

@btime compute_values(x₀, Δt, nΔt, (f1,f2));
  19.653 μs (401 allocations: 8.02 KiB)

Seif_Shebl · August 16, 2020, 6:57pm

This implementation speeds up the last case by 10X. Note the x *= 1 + 0.5Δt is the same as x += 0.5 * Δt * x but looks nicer to me and surprisingly gives a slight speedup. Also, don’t forget to interpolate ($) while using @btime.

function compute_values2(x₀, Δt, nΔt, f)
    x = x₀
    n_f = length(f)
    f_vals = zeros(n_f, nΔt)
    
    for n in 1:nΔt
        x *= 1 + 0.5Δt
        f_vals[:,n] .= x
    end
    for i in 1:n_f
        fi = view(f_vals,i,:)
        map!(f[i], fi, fi)
    end
    return f_vals
end

x₀ = 1.0;
Δt = 0.5;
nΔt = 10^2;

f1(x) = sin(x)
f2(x) = x^2

@btime compute_values2($x₀, $Δt, $nΔt, $[f1])
@btime compute_values2($x₀, $Δt, $nΔt, $[f2])
@btime compute_values2($x₀, $Δt, $nΔt, $(f1,f2))
  1.820 μs (1 allocation: 896 bytes)
  220.359 ns (1 allocation: 896 bytes)
  2.022 μs (5 allocations: 1.95 KiB)

Elrod · August 16, 2020, 7:19pm

Instead of map you could also use a generated function with Base.Cartesian.@nexprs to manually unroll the loop and make it type stable:

@generated function compute_values(x₀, Δt, nΔt, f::Tuple{Vararg{<:Any,K}}) where {K}
    quote
        x = x₀;
        f_vals = zeros($K, nΔt)
        for n in 1:nΔt
            x += 0.5 * Δt * x
            Base.Cartesian.@nexprs $K k -> f_vals[k,n] = (f[k])(x);
        end
        return f_vals
    end
end

gideonsimpson · August 17, 2020, 2:30am

Isn’t using @generated considered “bad?”

Elrod · August 17, 2020, 2:44am

Sure, but @generated is easy and it performs well. Comparing the above compute_values2 with the generated version:

julia> @btime compute_values2($x₀, $Δt, $nΔt, $[f1])
  1.665 μs (1 allocation: 896 bytes)
 1×100 Matrix{Float64}:
 0.948985  0.999966  0.927798  0.64436  0.0897141  -0.623416  -0.998433  -0.317148  0.919731  …  0.265737  -0.944011  0.0275612  -0.999406  0.342549  0.939933  -0.676239  0.599191

julia> @btime compute_values2($x₀, $Δt, $nΔt, $[f2])
  231.457 ns (1 allocation: 896 bytes)
 1×100 Matrix{Float64}:
 1.5625  2.44141  3.8147  5.96046  9.31323  14.5519  22.7374  35.5271  55.5112  86.7362  135.525  …  1.65608e18  2.58763e18  4.04317e18  6.31746e18  9.87103e18  1.54235e19  2.40992e19

julia> @btime compute_values2($x₀, $Δt, $nΔt, $(f1,f2))
  1.726 μs (1 allocation: 1.77 KiB)
 2×100 Matrix{Float64}:
 0.948985  0.999966  0.927798  0.64436  0.0897141  -0.623416  -0.998433  -0.317148   0.919731  …  -0.944011    0.0275612   -0.999406    0.342549    0.939933    -0.676239    0.599191
 1.5625    2.44141   3.8147    5.96046  9.31323    14.5519    22.7374    35.5271    55.5112        1.65608e18  2.58763e18   4.04317e18  6.31746e18  9.87103e18   1.54235e19  2.40992e19

julia> @btime compute_values_generated($x₀, $Δt, $nΔt, $(f1,))
  1.506 μs (1 allocation: 896 bytes)
 1×100 Matrix{Float64}:
 0.948985  0.999966  0.927798  0.64436  0.0897141  -0.623416  -0.998433  -0.317148  0.919731  …  0.265737  -0.944011  0.0275612  -0.999406  0.342549  0.939933  -0.676239  0.599191

julia> @btime compute_values_generated($x₀, $Δt, $nΔt, $(f2,))
  220.222 ns (1 allocation: 896 bytes)
 1×100 Matrix{Float64}:
 1.5625  2.44141  3.8147  5.96046  9.31323  14.5519  22.7374  35.5271  55.5112  86.7362  135.525  …  1.65608e18  2.58763e18  4.04317e18  6.31746e18  9.87103e18  1.54235e19  2.40992e19

julia> @btime compute_values_generated($x₀, $Δt, $nΔt, $(f1,f2))
  1.550 μs (1 allocation: 1.77 KiB)
 2×100 Matrix{Float64}:
 0.948985  0.999966  0.927798  0.64436  0.0897141  -0.623416  -0.998433  -0.317148   0.919731  …  -0.944011    0.0275612   -0.999406    0.342549    0.939933    -0.676239    0.599191
 1.5625    2.44141   3.8147    5.96046  9.31323    14.5519    22.7374    35.5271    55.5112        1.65608e18  2.58763e18   4.04317e18  6.31746e18  9.87103e18   1.54235e19  2.40992e19

Although if you’re unhappy with laziness as an excuse, with a little more work, we can use dispatch instead of Base.Cartesian.@nexprs to unroll our expressions:

@inline fmap(fs::Tuple, x) = (first(fs)(x), fmap(Base.tail(fs), x)...)
@inline fmap(fs::Tuple{T}, x) where {T} = (first(fs)(x), )
function compute_values_map(x₀, Δt, nΔt, f::Tuple{Vararg{<:Any,K}}) where {K}
    x = x₀;
    f_vals = zeros(K, nΔt)
    for n in 1:nΔt
        x += 0.5 * Δt * x
        fv = fmap(f, x)
        for k in 1:K
            f_vals[k,n] = fv[k]
        end
    end
    return f_vals
end

Result:

julia> @btime compute_values_map($x₀, $Δt, $nΔt, $(f1,f2))
  1.562 μs (1 allocation: 1.77 KiB)
 2×100 Matrix{Float64}:
 0.948985  0.999966  0.927798  0.64436  0.0897141  -0.623416  -0.998433  -0.317148   0.919731  …  -0.944011    0.0275612   -0.999406    0.342549    0.939933    -0.676239    0.599191
 1.5625    2.44141   3.8147    5.96046  9.31323    14.5519    22.7374    35.5271    55.5112        1.65608e18  2.58763e18   4.04317e18  6.31746e18  9.87103e18   1.54235e19  2.40992e19

Topic		Replies	Views
Array of functions and allocation General Usage	1	1619	April 13, 2019
Array of functions - is there a way to avoid allocations performance penalty? Performance memory-allocation , arrays	23	2356	May 23, 2019
Memory on array element assignment Performance	10	444	August 3, 2022
Generated Tuples of Functions Performance	3	427	April 20, 2021
A curious case of allocating behavior General Usage performance	2	971	May 24, 2017

Performance and allocation issue with arrays of functions (v1.5)

Related topics