In-place operations on matrices

When working with huge matrices, the limitations of the RAM is a real issue, and one would like to perform matrix operations in place if possible. Given a matrix A which occupies a sizable chunk of RAM, what is the best way to perform the following operations

A.*=2;
A.+= A';

so that it is done in-place. I noticed that in spite of using the . notation, the RAM usage increases by at least 100%, indicating that Julia is creating a copy of the matrix.

It’s not a good idea to benchmark in global scope


julia> f(x) = x.*=2
f (generic function with 1 method)

julia> a = randn(1000,1000);

julia> @time f(a);
  0.000885 seconds (4 allocations: 160 bytes)
2 Likes

The first operation already happens in place. As for

it cannot be done in-place in a linear way, because you would be overwriting the other half of the transposed matrix you still need.

That said, you can just write a simple loop to implement it, and wrap the result in Symmetric, since you won’t need the other half.

5 Likes
symmetrize!(a) = begin
    for c in 1:size(a, 2)
    	for r in c:size(a, 1)
    		a[r, c] += a[c, r]
    		a[c, r] =  a[r, c]
    	end
    end
    a
end

using BenchmarkTools
a = rand(2500,2500)     ;
@btime symmetrize!($a);
@btime $a .+ $a';

gives

julia> @btime symmetrize!($a);                                            
  11.826 ms (0 allocations: 0 bytes)                                      
                                                                          
julia> @btime $a .+ $a';                                                  
  50.207 ms (2 allocations: 47.68 MiB)                                    
                                                                          
julia> 
8 Likes

Thanks. I was actually monitoring through my Systems Monitor. Since the matrix was about 5.8 GB, the duplication was clearly visible.

I am wondering how you achieved such a high speed with symmetrize! without using Multithreading Threads.@threads .

Since this is memory bound, you could probably go even faster at the expense of clarity by blocking it to be nicer to cache.