# Can't figure out why these allocations are happening

I can’t figure out why allocations are happening when that should simply destructively update zsum.

Here’s the relevant portions of the allocated trace:

0         i = j
0         while i <= nsync
185614464             zsum = zsum + (buffer[i] * sync[i])
0             i = i + 1
-         end
-
-         # and then the most recent values
0         if j > 1
0             i = 1
0             while i <= j-1
172792704                 zsum = zsum + (buffer[i] * sync[i])
0                 i = i + 1
-             end
-         end

The other thing I noticed, which is very suprising, is that replacing the while loops with, e.g.

for i=j:nsync
...
end

allocates memory (!) but the WHILE loop does not, hence the reason I am using them. I had figured that for a simple integer sequence the FOR loop would be re-written as a WHILE but apparently that is not the case.

Does this mean I should be writing all my integer counting loops using WHILE loops instead of FOR loops ?!

Also I would appreciate any comments on better ways to implement was is very much a producer-consumer model. This is being done to implement a discrete simulation, i.e. a process which consumes a sample at a time.

thanks !

complete original code…

#
#
#

const nsync = 28
sync = randn(nsync)
buffer = zeros(Float64, nsync)

function process1(cout)
println("process1")
while true
# doing this instead of x=randn(), y = randn(), z=complex(x,y)
# cuts way down on GC allocations.
for i=1:1000
z = randn()
put!(cout, z)
end
end
end

function process2(cin::Channel{Float64}, cout::Channel{Float64})
j = 1
zsum = complex(0.0, 0.0)
println("process2")
while true
buffer[j] = take!(cin)
# j will now point to the _oldest_ value
# the current value is at j-1
j = (j % nsync) + 1
zsum = 0.0
# process the oldest values

i = j
while i <= nsync
zsum = zsum + (buffer[i] * sync[i])
i = i + 1
end

# and then the most recent values
if j > 1
i = 1
while i <= j-1
zsum = zsum + (buffer[i] * sync[i])
i = i + 1
end
end

put!(cout, zsum)
end
end

function process3()
channel1 = Channel{Float64}(1000)
channel2 = Channel{Float64}(1)
println("starting")
schedule(t1)
schedule(t2)

yield()
z = 0.0
for i=1:100000
z = take!(channel2)
# println(i, " z=", z)
end
println("done")
end

#println(code_llvm(process2, (Channel{Complex{Float64}}, Channel{Complex{Float64}})))
@time process3()
process3()
#Profile.print()

No. Neither form should cause allocations on its own. Did you run your code and call Profile.clear_malloc_data() and then run it again? If not, then you are just seeing allocations due to compilation.

One potential source of allocations is the fact that zsum starts out as a Complex outside of your loop and then is immediately changed into a Float64 when you call zsum = 0.0. I’d suggest running @code_warntype and ensuring your code is fully type-stable before you go looking for allocations.

sync and buffer are non-const global variables.

1 Like

bingo.

Note to self, keep things local.

I had figured that global allocation would be better, clearly there’s something in the documentation about optimization I need to read.

I took your comment to mean that I should declare the local variables as const but
I did not find that to make any difference.

I’m interested in optimizing this code because I’m hoping it can process my datastream in real-time. Should be interesting to see if I can make it work…

Thank you !

@rdeits thanks for the pointer to code_warntype, obviously that’s really useful. Not sure how i missed it while i was reading through the profiling section.

Yeah, this section explains it: https://docs.julialang.org/en/stable/manual/performance-tips/#Avoid-global-variables-1.

So making the variables const didn’t reduce allocations? That’s surprising.

BTW, this would be a great use case for Traceur.jl, GitHub - JunoLab/Traceur.jl. Unfortunately, it didn’t work with the Channels and @schedule (ERROR (unhandled task failure): InvalidStateException("Channel is closed.", :closed)).