Memory allocation and usage of dot notation

The article Performance Tips advises to pay attention to memory allocation when trying to evaluate or improve the efficiency of a program. This made me wonder whether the estimated number of allocations or the estimated memory consumption is the more meaningful quantity in this respect (or if this is a meaningless question and they should always be considered jointly).

In trying to find out, I produced the following result.

using BenchmarkTools

function fun(x)
    return @. 2*x^5 + 3*x^2 + sqrt(x) + 10.0 + x^1/5

x = rand(Float64, 100_000)

@btime fun.(x);
# 723.400 μs (5 allocations: 781.36 KiB)
@btime fun(x);
# 755.700 μs (2 allocations: 781.30 KiB)

If I have used @btime correctly, this seems to indicate that the second function call with a smaller estimated number of allocations is slower than the first, which I find rather counter-intuitive. This would imply that in some cases the number of assignments is of no concern when evaluating the speed of a function. Does anyone have a good explanation of why this is the case in this particular example or general advice on what quantities I should focus on when benchmarking using @btime?
Many thanks in advance!

x is a non-const global variable here. That can often influence benchmarking, though it is somewhat unpredictable when it will happen. Try interpolating the variable with $:

julia> @btime fun($x);
  641.000 μs (2 allocations: 781.30 KiB)

julia> @btime fun.($x);
  518.600 μs (2 allocations: 781.30 KiB)

The difference in allocations disappeared, but there is a clear performance difference, I don’t know why.

1 Like

Thanks for the advise! Good to at least see the difference in allocations disappear. :slight_smile:

The ideal is 0 allocations, so it doesn’t matter to know (or some low fixed number, when you see e.g. 2, then it may be artifact of using in the REPL, and will go to 0).

Sometimes type-instability is ok, thus allocations, and not all code is speed-critical. I would look at the timings first to see if I need to worry (for a realistic-sized workload). If you want the fastest code, and it’s often not easy to know the performance ceiling of code, if you have any allocations then it’s a good indicator you haven’t reacted it.

This parses as (x^1)/5. I’m guessing that you intended x^(1/5).

1 Like

…apart from interpolating (pointed out, already)…

julia> @btime fun($x)
  690.700 μs (2 allocations: 781.30 KiB)
100000-element Vector{Float64}:

…I would also always consider, whether you really still need the input (original vector x, in this case), or if you “only” need the results, to continue with whatever you’re doing, after calling that function. If only the result is needed, I’d reuse that already existing memory, like so (disclaimer: I’m new to julia, so maybe there’s a more elegant way):

function fun2!(x::Vector{Float64})
    @. x .= 2*x^5 + 3*x^2 + sqrt(x) + 10.0 + x^1/5
julia> @btime fun2!($x)
  552.700 μs (0 allocations: 0 bytes)
100000-element Vector{Float64}:

…it may only be some 20% performance-improvement, in this case, but can be more dramatic in other cases, when you’re operating on larger data, I think.

And while the number of allocations might hint at where memory is being allocated (i.e. how often), I would pay more attention to the amount. In this case, the original version allocated 781.3 KiB = 781.3 x 1024 Bytes, which is exactly 800.000 Bytes, which tells you that it allocated exactly the amount of memory, needed to store the result of the fct. for 100.000 x Float64 (8B, each). Why it is saying “2 allocations” and not just one, I have no idea, though.

1 Like

Just for the records, usually the idiomatic way to do this in Julia is to write a scalar function, and broadcast it, in place, or not:

julia> f(x) = 2*x^5 + 3*x^2 + sqrt(x) + 10.0 + x^(1/5)
f (generic function with 1 method)

julia> x = rand(10^5);

julia> @btime $x .= f.($x);
  588.600 μs (0 allocations: 0 bytes)

julia> @btime y = f.($x);
  612.474 μs (2 allocations: 781.30 KiB)