Understanding memory allocations

I want fast code. :sunglasses: I will get a core meltdown.

I try to understand why some code allocates while other doesn’t. Here is my little benchmark, and I cannot make head or tail of the results.

b = Complex(1.,2.)

foo(a,b)         = a+b
o = foo(1.,2.)
@time o = foo(1.,2.)

0.000000 seconds

o =Complex(imag(b)  , real(b)    ) 
@time o = Complex(imag(b)  , real(b)    ) 

0.000004 seconds (3 allocations: 64 bytes)

o = imag(b)
@time o =imag(b)

0.000009 seconds (1 allocation: 16 bytes)

bar(b::Complex)        = Complex(imag(b)  , real(b)    ) 
o = bar(Complex(1.,2.))
@time o = bar(Complex(1.,2.))

0.000000 seconds   WHAAAAT !?!

baz(b::Complex)        = Complex(1.,2. ) 
o = baz(Complex(1.,2.))
@time o = baz(Complex(1.,2.))

0.000000 seconds

b = Complex(1.,2.)
@time o = Complex(1.,2.)

0.000000 seconds

buz(a,b)       = Complex(b,a) 
o = buz(1.,2.)
@time o = buz(1.,2.)

0.000000 seconds  

Wow. Any preconception of additivity is hereby shattered!

First question: what is counted here is heap allocations? Then: How can I learn to create operations without allocation?

I wouldn’t trust any benchmark that
a) is done in the global scope
b) uses @time on a fast function
c) produces 0 a second output (unless @code_native shows it’s a no-op)

Since these are all timing between 1 and a handful of machine instructions though, they should indeed be very fast. None of them allocate as written.

julia> using BenchmarkTools

julia> let b = Complex(1.0, 2.0)

           foo(a, b) = a+b
           bar(b::Complex) = Complex(imag(b), real(b))

           @btime foo(Ref(1.0)[], Ref(2.0)[])
           @btime Complex(imag($(Ref(b))[]), real($(Ref(b))[]))
           @btime imag($(Ref(b))[])
           @btime bar($(Ref(Complex(1.0, 2.0)))[])
  1.644 ns (0 allocations: 0 bytes)
  1.963 ns (0 allocations: 0 bytes)
  1.643 ns (0 allocations: 0 bytes)
  1.643 ns (0 allocations: 0 bytes)

Ah! Many thanks Tom. I was trying to make sense of random measurement errors!

Excellent, now I know how to bench mark my real-world problem. :grinning:

You call foo and bar using patterns like Ref(1.0)[]. Replacing that with 1.0 does not seem to affect the output of your benchmark. What is the idea?

And same question with $b : is this the macro interpolation syntax?


I should say that my sub-nanosecond times were also quite suspicious. I went back and fixed where the Refs were and the results make more sense now. The documentation discusses why this happens

Thanks - read the docs… ! :wink: :grinning:

why the results looks strange to you?

first case

everything is plain value, nothing is allocated, except maybe o

second case

allocation of imag(b), real(b), o 3allocationd

3rd case

allocation of imag(b) one alloc



Well, consider the call to “bar” which combine some of the operations you mention - and has reportedly zero allocation. But Tom mailed the issue: my measurements where wrong. Carefully using @btime I get results that make much better sense.