Extreme memory usage seemingly caused by using the functions `map` and `Iterators.flatten`

SZJX · November 4, 2018, 5:12pm

I had a program that was running extremely slowly so I ran memory benchmark on it. Apparently one particular line of code accumulated a whopping 27GB of memory allocation…

        (lgamma(theta(pyp)) - lgamma(theta(pyp) + pyp.crp.totalcustomers) +
                lgamma(theta(pyp) / d(pyp) + pyp.crp.ntablegroups) -
                lgamma(theta(pyp) / d(pyp)) +
                pyp.crp.ntablegroups * (log(d(pyp)) - lgamma(1 - d(pyp))) +
                sum(map(c -> lgamma(c - d(pyp)), Iterators.flatten(values(pyp.crp.tablegroups))))
                )

I suspected it’s the part

                sum(map(c -> lgamma(c - d(pyp)), Iterators.flatten(values(pyp.crp.tablegroups))))

that caused the problem. So I rewrote it as a simple loop:

        temp::Float64 = 0.0
        for tablegroup in values(pyp.crp.tablegroups)
            for table_customer_count in tablegroup
                temp += lgamma(table_customer_count - d(pyp))
            end
        end

        (lgamma(theta(pyp)) - lgamma(theta(pyp) + pyp.crp.totalcustomers) +
                lgamma(theta(pyp) / d(pyp) + pyp.crp.ntablegroups) -
                lgamma(theta(pyp) / d(pyp)) +
                pyp.crp.ntablegroups * (log(d(pyp)) - lgamma(1 - d(pyp))) +
                temp
                )

and in a test run on simple input data, the memory usage by that part was reduced 10-fold, from nearly 1GB (far exceeding any other part of the program) to only about 100MB.

One huge advantage of Julia is how easy it is to use FP patterns in it. However I didn’t expect such a huge performance hit. Maybe I didn’t write the code in an optimal way? Could this memory issue be improved in some other way, e.g. with more type annotations?

Or maybe there are some other problems with my program which were somehow dealt with by this change of code?

kristoffer.carlsson · November 4, 2018, 5:15pm

flatten has recently been improved (https://github.com/JuliaLang/julia/pull/29786) and this change will be in Julia v1.0.2 which will be released in a few days. Hopefully this will help with this case. If you can build from source you can try out the release-1.0 branch in the meantime.

SZJX · November 4, 2018, 5:40pm

Thanks. I was also wondering whether this is some performance issue that has been fixed in v1.0.2. I’ll give it a try later when it’s released then. I had to use the binary version because of https://github.com/JuliaLinearAlgebra/Arpack.jl/issues/5

Topic		Replies	Views
Massive memory allocation on iterating algorithm `8.789115 seconds (26.65 M allocations: 7.826 GiB, 25.07% gc time)` New to Julia question , performance , memory-allocation	24	986	October 13, 2020
Accumulating memory in nested loops General Usage performance	10	2184	April 17, 2017
Trying to understand memory usage General Usage	7	2944	June 14, 2019
Huge memory allocation New to Julia array , memory-allocation	17	1032	January 25, 2024
Repeated Function Call Leaking Memory Performance	3	602	July 29, 2018

Extreme memory usage seemingly caused by using the functions `map` and `Iterators.flatten`

Related topics