Can't figure out why these allocations are happening

purplishrock · March 20, 2018, 11:14pm

I can’t figure out why allocations are happening when that should simply destructively update zsum.

Here’s the relevant portions of the allocated trace:

        0         i = j
        0         while i <= nsync
185614464             zsum = zsum + (buffer[i] * sync[i])
        0             i = i + 1
        -         end
        - 
        -         # and then the most recent values
        0         if j > 1
        0             i = 1
        0             while i <= j-1
172792704                 zsum = zsum + (buffer[i] * sync[i])
        0                 i = i + 1
        -             end
        -         end

The other thing I noticed, which is very suprising, is that replacing the while loops with, e.g.

for i=j:nsync
...
end

allocates memory (!) but the WHILE loop does not, hence the reason I am using them. I had figured that for a simple integer sequence the FOR loop would be re-written as a WHILE but apparently that is not the case.

Does this mean I should be writing all my integer counting loops using WHILE loops instead of FOR loops ?!

Also I would appreciate any comments on better ways to implement was is very much a producer-consumer model. This is being done to implement a discrete simulation, i.e. a process which consumes a sample at a time.

thanks !

complete original code…

#
#
#

const nsync = 28
sync = randn(nsync)
buffer = zeros(Float64, nsync)

function process1(cout)
    println("process1")
    while true
        # doing this instead of x=randn(), y = randn(), z=complex(x,y)
        # cuts way down on GC allocations.
        for i=1:1000
            z = randn()
            put!(cout, z)
        end
    end
end

function process2(cin::Channel{Float64}, cout::Channel{Float64})
    j = 1
    zsum = complex(0.0, 0.0)
    println("process2")
    while true
        buffer[j] = take!(cin)
        # j will now point to the _oldest_ value
        # the current value is at j-1
        j = (j % nsync) + 1
        zsum = 0.0
        # process the oldest values

        i = j
        while i <= nsync
            zsum = zsum + (buffer[i] * sync[i])
            i = i + 1
        end

        # and then the most recent values
        if j > 1
            i = 1
            while i <= j-1
                zsum = zsum + (buffer[i] * sync[i])
                i = i + 1
            end
        end

        put!(cout, zsum)
    end
end

function process3()
    channel1 = Channel{Float64}(1000)
    channel2 = Channel{Float64}(1)
    t1 = Task(()->process1(channel1))
    t2 = Task(()->process2(channel1, channel2))
    println("starting")
    schedule(t1)
    schedule(t2)
    
    yield()
    z = 0.0
    for i=1:100000
        z = take!(channel2)
        # println(i, " z=", z)
    end
    println("done")
end

#println(code_llvm(process2, (Channel{Complex{Float64}}, Channel{Complex{Float64}})))
@time process3()
process3()
#Profile.print()

rdeits · March 20, 2018, 11:38pm

No. Neither form should cause allocations on its own. Did you run your code and call Profile.clear_malloc_data() and then run it again? If not, then you are just seeing allocations due to compilation.

One potential source of allocations is the fact that zsum starts out as a Complex outside of your loop and then is immediately changed into a Float64 when you call zsum = 0.0. I’d suggest running @code_warntype and ensuring your code is fully type-stable before you go looking for allocations.

tkoolen · March 20, 2018, 11:45pm

sync and buffer are non-const global variables.

purplishrock · March 20, 2018, 11:59pm

bingo.

Note to self, keep things local.

I had figured that global allocation would be better, clearly there’s something in the documentation about optimization I need to read.

I took your comment to mean that I should declare the local variables as const but
I did not find that to make any difference.

I’m interested in optimizing this code because I’m hoping it can process my datastream in real-time. Should be interesting to see if I can make it work…

Thank you !

@rdeits thanks for the pointer to code_warntype, obviously that’s really useful. Not sure how i missed it while i was reading through the profiling section.

tkoolen · March 21, 2018, 12:18am

Yeah, this section explains it: https://docs.julialang.org/en/stable/manual/performance-tips/#Avoid-global-variables-1.

So making the variables const didn’t reduce allocations? That’s surprising.

BTW, this would be a great use case for Traceur.jl, GitHub - JunoLab/Traceur.jl. Unfortunately, it didn’t work with the Channels and @schedule (ERROR (unhandled task failure): InvalidStateException("Channel is closed.", :closed)).

Topic		Replies	Views
Allocation puzzler General Usage	6	495	November 19, 2018
Allocation of new variables vs modification of existing variables New to Julia	11	2303	January 12, 2017
Finding the memory allocation in some code General Usage performance	3	875	August 25, 2017
Nested loops are consuming a lot of memory General Usage memory , memory-allocation	6	1826	September 20, 2017
Using `@allocated` to track memory allocations General Usage	7	2948	July 3, 2017

Can't figure out why these allocations are happening

Related topics