How can I eliminate allocations here?

How can I eliminate allocations here? I understand I am allocating in shares1 = copy(shares), but where else?

function logit(x; min=0, max=1)
    (max - min) * (1/(1 + exp(-x))) + min
end

function _cessum(x, shares, r)
    n = length(x)
    (n == length(shares)) || throw(DimensionMismatch())
    s = zero(eltype(x))
    for i in 1:n
        s += shares[i]*x[i]^r
    end
    return s
end

function ces(x, shares, r; normfun=logit)
    n = length(x)
    shares1 = copy(shares)
    sharessum = zero(eltype(shares))
    for i in 1:n
        shares1[i] = normfun(shares1[i]; max=1-sharessum)
        sharessum += shares1[i]
    end
    return _cessum(x, shares1, r)^(1/r)
end

n = 5
x = rand(n)
shares = rand(n)
r = 0.3

using BenchmarkTools

ulia> @benchmark ces($x, $shares, $r; normfun=logit)
BenchmarkTools.Trial: 10000 samples with 180 evaluations.
 Range (min … max):  571.317 ns … 56.049 ΞΌs  β”Š GC (min … max): 0.00% … 98.56%
 Time  (median):     595.050 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   661.777 ns Β±  1.414 ΞΌs  β”Š GC (mean Β± Οƒ):  5.62% Β±  2.60%

  β–‡β–†β–ˆβ–†β–„β–    ▁                                                  ▁
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–‡β–‡β–‡β–ˆβ–‡β–ˆβ–‡β–‡β–ˆβ–ˆβ–‡β–‡β–†β–†β–‡β–†β–†β–†β–‡β–†β–…β–‡β–†β–†β–†β–†β–…β–…β–†β–†β–†β–†β–…β–„β–„β–…β–…β–ƒβ–…β–…β–„β–…β–…β–… β–ˆ
  571 ns        Histogram: log(frequency) by time      1.15 ΞΌs <

 Memory estimate: 368 bytes, allocs estimate: 16.

So the problem seems to be the keyword arguments in the logit function. If you replace

logit(x; min=0, max=1) -> logit(x, min, max)
normfun(shares1[i]; max=1-sharessum) -> normfun(shares1[i], 0.0, 1-sharessum)

the allocations go away (and it’s way faster :+1:).
Unfortunately I can’t really tell you why this is the case, maybe someone else can chime in.

1 Like

It is something related to the fact that the function does not specialize to keyword arguments, and neither to the input function. For example:

julia> function ces(x, shares, shares1, r, normfun::F) where F
           n = length(x)
           #shares1 = copy(shares)
           sharessum = zero(eltype(shares))
           for i in 1:n
               shares1[i] = normfun(shares1[i]; max=1-sharessum)
               sharessum += shares1[i]
           end
           return _cessum(x, shares1, r)^(1/r)
       end
ces (generic function with 3 methods)

julia> @benchmark ces($x, $shares, $shares1, $r, $logit)
BenchmarkTools.Trial: 10000 samples with 196 evaluations.
 Range (min … max):  476.944 ns …  1.468 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     477.663 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   488.912 ns Β± 47.475 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–ˆ β–‚    β–‚  ▂▁                                      ▁          ▁
  β–ˆβ–ƒβ–ˆβ–†β–…β–…β–…β–ˆβ–β–„β–ˆβ–ˆβ–β–„β–…β–†β–β–ƒβ–„β–†β–…β–†β–†β–ˆβ–„β–„β–…β–„β–‡β–†β–„β–…β–ƒβ–ˆβ–„β–„β–ƒβ–ƒβ–…β–ƒβ–„β–„β–ƒβ–„β–‡β–ƒβ–…β–†β–„β–„β–ˆβ–„β–β–„β–„β–…β–…β–ˆβ–„β–… β–ˆ
  477 ns        Histogram: log(frequency) by time       692 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

(I moved shares1 to a parameter to eliminate all allocations).

Probably related to this: Performance Tips Β· The Julia Language

5 Likes

I modified ces like below:

function ces(x, shares, r; normfun::F=logit, shares1=copy(shares)) where F
    n = length(x)
    sharessum = zero(eltype(shares))
    for i in 1:n
        shares1[i] = normfun(shares1[i]; max=1-sharessum)
        sharessum += shares1[i]
    end
    return _cessum(x, shares1, r)^(1/r)
end

shares1 = copy(shares)

julia> @benchmark ces($x, $shares, $r; normfun=$logit, shares1=$shares1)
BenchmarkTools.Trial: 10000 samples with 846 evaluations.
 Range (min … max):  140.082 ns … 397.773 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     150.333 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   155.629 ns Β±  26.517 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–…β–ˆ  β–…β–ˆβ–ƒβ–…β–          ▁      ▁        ▁                          β–‚
  β–ˆβ–ˆβ–‡β–β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–‡β–ˆβ–†β–ˆβ–ˆβ–‡β–ˆβ–‡β–ˆβ–†β–‡β–ˆβ–‡β–‡β–ˆβ–‡β–†β–ˆβ–‡β–†β–ˆβ–ˆβ–‡β–‡β–‡β–‡β–†β–†β–‡β–†β–‡β–†β–†β–†β–†β–†β–…β–†β–†β–†β–†β–…β–†β–†β–†β–† β–ˆ
  140 ns        Histogram: log(frequency) by time        280 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Aside, I thought specifying types like f(x::T) where T was the same as f(x::Any). Is that not the case?

In general it is, for the specific case of function it might not be (each function has its own type, so they are special in that sense). Also, there is the exception where that declaration serves as a signal for the compiler to specialize the function to that type of function (but that only should make a different if the function in case is not used, but only passed as an argument, inside the function). Thus, I do not exactly understand what is the case here, I think is a combination of the function argument and optional arguments. Someone that understands these details better will probably be able to explain this better.

@amrods add on: I think is sort of an anti-pattern also to pass a function as a parameter but use keyword arguments inside the called function. Keyword arguments are specific to the function used, so it does not match well with the flexibility given the function being a parameter.

I would suggest a pattern like this: in the inner function, decide how many parameters the function has to receive (always). Then, you change the function being passed by the caller by providing a closure:

julia> function ces(x, shares, shares1, r; normfun=logit)
           n = length(x)
           #shares1 = copy(shares)
           sharessum = zero(eltype(shares))
           for i in 1:n
               shares1[i] = normfun(shares1[i],1-sharessum)
               sharessum += shares1[i]
           end
           return _cessum(x, shares1, r)^(1/r)
       end
ces (generic function with 2 methods)

julia> @benchmark ces($x, $shares, $shares1, $r; normfun=(x,max) -> logit(x,max=max))
BenchmarkTools.Trial: 10000 samples with 195 evaluations.
 Range (min … max):  487.031 ns …   2.310 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     496.036 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   564.994 ns Β± 132.099 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–ˆβ–‚β–ƒβ–„β–β–ƒβ–ƒβ–‚β–       ▁  ▁ ▁      ▁▄▁   ▁                           ▁
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–‡β–†β–ˆβ–ˆβ–ˆβ–†β–‡β–ˆβ–ˆβ–ˆβ–‡β–‡β–ˆβ–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–ˆβ–ˆβ–‡β–†β–ˆβ–ˆβ–‡β–…β–†β–…β–†β–…β–…β–…β–…β–…β–†β–„β–…β–ƒβ–‚β–ƒβ–„β–…β–„β–„β–ƒβ–ƒ β–ˆ
  487 ns        Histogram: log(frequency) by time       1.05 ΞΌs <

 Memory estimate: 0 bytes, allocs estimate: 0.


With this, you disassociate the calling syntax of the function being passed from the calling syntax internally.

(this also happens to solve the allocation issue).

1 Like

I use keywords all over my code. It’s turtles all the way down!

1 Like