Performance degradation after upgrading from 0.5.1 to 0.6.2 -- how to avoid memory allocation?

jinliangwei · March 25, 2018, 8:17pm

for rating in ratings
    x_idx = rating[1] + 1
    y_idx = rating[2] + 1
    rv = rating[3]

     W_row = W[:, x_idx]
     H_row = H[:, y_idx]
     pred = dot(W_row, H_row)
     diff = rv - pred
     W_grad = -2 * diff .* H_row
     H_grad = -2 * diff .* W_row
     W[:, x_idx] = W_row - step_size .* W_grad
     H[:, y_idx] = H_row - step_size .* H_grad
end

Above is a piece of code whose performance degrades by ~20x (from 1.2 seconds to 32 seconds for some particular input data) after I upgraded to Julia v0.6.2 from v0.5.1. Even with v0.5.1, it’s 3~4x slower than the same program written in C++.

In the above code, ratings is a Vector of tuples, W and H are 2-dimensional arrays of roughly 100 by 5000.

I am guessing that it’s memory allocation that caused the problem.

Julia v0.6.2:
32.802334 seconds (148.08 M allocations: 11.332 GiB, 4.71% gc time)

Julia v0.5.1:
1.263061 seconds (18.58 M allocations: 6.835 GB, 9.55% gc time)

Why does v0.6.2 allocate much more memory?

In my C++ code, I would have pre-allocated memory for variables like W_row H_row etc and reuse the same memory across iterations. How would I do the same thing in Julia?

Memory allocation profiling (v0.6.2):

0 for iteration = 1:num_iterations
0 for rating in ratings
59192224 x_idx = rating[1] + 1
55158016 y_idx = rating[2] + 1
32006688 rv = rating[3]
-
2298050581 W_row = W[:, x_idx]
2291205792 H_row = H[:, y_idx]
32236624 pred = dot(W_row, H_row)
32006688 diff = rv - pred
3971640698 W_grad = -2 * diff .* H_row
3968829312 H_grad = -2 * diff .* W_row
5795581993 W[:, x_idx] = W_row - step_size .* W_grad
5793210528 H[:, y_idx] = H_row - step_size .* H_grad
- end

Memory allocation profiling (v0.5.1):

0 for iteration = 1:num_iterations
0 for rating in ratings
59192224 x_idx = rating[1] + 1
55158016 y_idx = rating[2] + 1
32006688 rv = rating[3]
-
1793490040 W_row = W[:, x_idx]
1792374528 H_row = H[:, y_idx]
32279345 pred = dot(W_row, H_row)
32006688 diff = rv - pred
1825048125 W_grad = -2 * diff .* H_row
1824381216 H_grad = -2 * diff .* W_row
3618250719 W[:, x_idx] = W_row - step_size .* W_grad
3616755744 H[:, y_idx] = H_row - step_size .* H_grad
- end

rdeits · March 25, 2018, 8:21pm

Please provide a complete reproducible example. It’s going to be impossible to make any definite statements about performance without knowing exactly how you’ve set up your problem. For example, is this code inside a function? Are W and H globals? What are the element types? These are all critically important to understanding the performance of your code.

If you provide a block of code that can be run locally then it will be much easier to help figure out the performance issue.

jinliangwei · March 25, 2018, 8:28pm

Thanks for your quick reply!

Please find my full program here: Pastiebin.com 5ab805f723ae3

kristoffer.carlsson · March 25, 2018, 8:45pm

That’s still missing the input data file?

jinliangwei · March 25, 2018, 8:52pm

Please download a sample input from here: http://www.cs.cmu.edu/~jinlianw/data/ratings.csv (11MB)

BTW, both versions of Julia were built from the source tarball.

kristoffer.carlsson · March 25, 2018, 9:06pm

Ok, so

ratings = Array{Tuple{Integer, Integer, Real}}(0)

is not concretely typed and will have quite bad performance. Change to e.g. Array{Tuple{Int, Int, Float64}}.
There were also some other places where you could use in place dot assignment more effecitvely and using views.

https://gist.github.com/KristofferC/91a4084b500c9f198b59af2486297b03 runs at 2 seconds per iteration for me while the original code ran at 55 seconds per iteration.

nalimilan · March 25, 2018, 9:06pm

Is there any reason you cannot replace Tuple{Integer, Integer, Real} with Tuple{Int, Int, Float64}? The latter is going to be more more efficient. It could also be even faster to store the data as three vectors rather than as a vector of 3-tuples.

jinliangwei · March 25, 2018, 10:08pm

Great! Thanks! By changing the array type and using views I could reduce the runtime down to 0.63 seconds from 32 seconds, and further down to 0.59 seconds by using in-place dot assignment .=. However, @. causes exception – although I never saw it before I take it’s just another way of writing in place dot assignments?

jinliangwei · March 25, 2018, 10:09pm

Thanks! Yes, fixing the tuple type did help!

kristoffer.carlsson · March 26, 2018, 9:33am

Yes @. is just a way of dotting all operators in the expression.

Topic		Replies	Views
Understanding meanings of memory allocation numbers New to Julia performance , memory-allocation	3	701	June 22, 2018
Doubled memory usage and computation time after update to 1.6.1 from 1.5.4 General Usage performance , memory-allocation	0	370	May 5, 2021
Memory allocation and performance Performance memory-allocation	7	641	June 26, 2020
A possible regression in 0.7? New to Julia	3	552	September 5, 2018
Improving runtime New to Julia matlab , ode	10	959	July 14, 2023

Performance degradation after upgrading from 0.5.1 to 0.6.2 -- how to avoid memory allocation?

Related topics