How do I make this allocation free?

Hello!

Suppose I have the parameters:

using StaticArrays
using LinearAlgebra

h         = 0.05
eta2 = (0.01)h * (0.01)h
points = rand(SVector{3,Float64},100000)
v         = points * 0.05

And I wish to evaluate this expression:

 maximum(abs.(h* dot.(v,points) ./ (dot.(points,points) .+ eta2)))

Which currently gives me a timing of:

@btime visc  = maximum(abs.($h* dot.($v,$points) ./ (dot.($points,$points) .+  $eta2)))
  442.800 μs (6 allocations: 2.29 MiB)
0.0024999997855364884

The speed is really good, but I just don’t understand why it is allocating 2 mb and how I avoid that?

Kind regards

your missing a . on the first *.

Thank you, this reduces it to:

@btime visc  = maximum(abs.($h .* dot.($v,$points) ./ (dot.($points,$points) .+  $eta2)))
  273.000 μs (2 allocations: 781.30 KiB)
0.0024999997855364884

So a third. Anything else I perhaps might be missing?

I just don’t understand why this would allocate at all since it reduces down to one scalar value?

Is there some intermediate array which is not being cleaned up properly?

Use the form of maximum that takes a function as the first argument, and / or make a generator expression out of the broadcast because that needs to create an array right now

1 Like

Sorry, I don’t understand. I see the syntax:

maximum(abs2, A, dims=1)

But I can’t see how to convert my example into that. I tried:

f() = abs.(h .* dot.(v,points) ./ (dot.(points,points) .+  eta2))

And then maximum of that, but that did the same of course :slight_smile:

Could you explain more please?

Kind regards

You’re iterating over points and v but for a maximum to be determined, not all of the results need to be stored in a vector at the same time. It is enough to compute one by one. So either you make a function to call on each element of the collection you’re interested in, or you use a generator expression which also only ever computes one element at a time. As you have two collections I use zip here, you’d do the same for the functional version a la maximum(f, zip(points, v)):

julia> @btime maximum(abs($h * dot(_v, p) / (dot(p, p) + $eta2)) for (p, _v) in zip($points, $v))
  2.237 ms (0 allocations: 0 bytes)
0.002499999788221453

The functional version:

julia> @btime maximum(zip($points, $v)) do (p, _v)
           abs($h * dot(_v, p) / (dot(p, p) + $eta2))
       end
  2.173 ms (0 allocations: 0 bytes)
0.002499999788221453
2 Likes

If you use mapreduce instead of maximum, then you don’t have to use zip. Something like this:

@btime mapreduce((p, _v) -> abs($h * dot(_v, p) / (dot(p, p) + $eta2)), max, $points, $v)

BTW, some obvious performance suggesions:

  1. Always use functions: the first suggestion in the performance tips: Performance Tips · The Julia Language
  2. Try to provide the init keyword argument to mapreduce and related functions.
2 Likes

Thank you @jules and @nsajko !

Benchmarking both approaches:

 @btime maximum(zip($points, $v)) do (p, _v)
                  abs($h * dot(_v, p) / (dot(p, p) + $eta2))
                         end
  1.323 ms (0 allocations: 0 bytes)
0.002499999786538688

@btime mapreduce((p, _v) -> abs($h * dot(_v, p) / (dot(p, p) + $eta2)), max, $points, $v)
  1.296 ms (2 allocations: 781.30 KiB)
0.002499999786538688

The zip approach still seems to work better, perhaps something can be improved in the map-reduce?

Neat knowing both ways though!

Both ways are about 6 times slower though:

@btime visc  = maximum(abs.($h .* dot.($v,$points) ./ (dot.($points,$points) .+  $eta2)))
  286.900 μs (2 allocations: 781.30 KiB)
0.002499999786538688

And I am not sure why :slight_smile: Could someone elaborate for me?

Kind regards

By the way, thank you so much for showing me this approach!

It gives me a completely allocation free time stepping function since I am using the principle you just showed me:
image

And that seems to give faster end result, even if the benchmark test is slower - a bit confusing for me, but really happy for you and @nsajko taking the time to show me how one could approach it :slight_smile:

Kind regards

Benchmarking like that is kind of pointless. The proper procedure would be organize your code into functions and then benchmark like so:

@benchmark your_function(arg1, arg2, arg3) setup=(arg1 = INIT1; arg2 = INIT2; arg3 = INIT3;)

We can’t organize your code into functions for you, because:

  1. We’d need to have access to your entire code to understand the choices and find best design.
  2. It’d be too much work for helping someone I don’t know

If you have some more specific questions, though, feel free to open new threads here in this forum.

1 Like

You can always just write the straight loop:

julia> function maxvp(v,points,h,eta2)
           maxval = -Inf
           for i in eachindex(v,points)
                maxval = max(maxval, abs(h * dot(v[i],points[i]) / (dot(points[i],points[i]) + eta2)))
           end
           return maxval
       end
maxvp (generic function with 1 method)

julia> @btime maxvp($v,$points,$h,$eta2)
  351.774 μs (0 allocations: 0 bytes)
0.002499999787200059

If you add @fastmath to maxval = ..., time reduces by roughly 2.

1 Like

True, I tried that but it became way to slow, since I used an “if branch” instead of what you are doing with max :slight_smile:

Thank you for showing it to me!

Kind regards

That does not really makes sense. Probably is something not being benchmarked correctly.

3 Likes

You can get another bit of speed like:

julia> function maxvp_2(v,points,h,eta2)
          maxval = -Inf
          for i in eachindex(v,points)
               tmp = abs(h * dot(v[i],points[i]) / (dot(points[i],points[i]) + eta2))
               tmp > maxval && (maxval = tmp)
          end
          return maxval
       end
maxvp_2 (generic function with 1 method)

julia> @btime maxvp($v,$points,$h,$eta2)
  244.788 μs (0 allocations: 0 bytes)
0.002499999783527899

julia> @btime maxvp_2($v,$points,$h,$eta2)
  136.531 μs (0 allocations: 0 bytes)
0.002499999783527899
1 Like

Thank you!

That did give an extra boost yes :slight_smile: Of course now it is getting down to extremely small performance benefits, so probably best to stop improving it now hehe

2 Likes

you can also try this :innocent:

julia>  maxvp_3(v,points,h,eta2)=0.00249999978
maxvp_3 (generic function with 1 method)

julia> @btime maxvp_3($v,$points,$h,$eta2)
  1.100 ns (0 allocations: 0 bytes)
0.00249999978

this isn’t fast, but it doesn’t allocate

julia> @btime maximum(abs, Iterators.map((e,p)-> ($h* dot(e,p) / (dot(p,p) + $eta2)),$v,$points))
  1.399 ms (0 allocations: 0 bytes)
0.0024999997831612647
1 Like

Seems related:

TLDR: performance of maximum has a lot of room for improvement, and there is an ongoing effort to fix it.

1 Like

Thank you for making me aware. Indeed seems to be the case that the most straight forward approach from a function perspective is not yet developed enough for performance

Kind regards