Can this random matrix simulation be sped up?

I am a Julia novice and converted the following program from Python (for a simple practice exercise). It computes a N by N random matrix of quaternions, drawn from the Gaussian distribution, and then computes the smallest singular value, and repeats this many times. (I have omitted the graphing of the resulting distribution for simplicity.) We use that quaternions may be represented as 2 by 2 complex matrices.

I have a few basic questions, which elude me even after looking through the manual.

  1. The version below hard-codes the constants 100 and 1000. When these are replaced by n and samples in the data function, the code becomes .5 seconds slower (4.8 versus 5.3). Why is this, and is there a way to prevent this slowdown without hard-coding the constants, so that the values are easier to change?

  2. This is a literal translation of a Python script. Are there any easy ways to make it faster or more memory efficient? I feel I’m surely missing a few tricks. In particular, it produces 2.5 GiB of memory allocations, which seems sub-optimal.

  3. Can the compilation of the script be cached, so that I do not re-pay the compilation cost each time the script is run without being edited?

  4. Are there any Julia packages/modules whose source code is particularly well-written and instructive, which I could read to pick up idiomatic, high-performance Julia? I was thinking of the linear algebra library, but then I realized that it’s probably just calls to OpenBLAS, which is not written in Julia, so probably that would not help. Perhaps the ODE/PDE/SDE solving library?

I apologize for any misunderstandings in these questions – I am still an inexperienced programmer, and an even more inexperienced Julia user. Thank you!

using LinearAlgebra

n = 100
samples = 1000

function symplectic(N)
	G = Array{Complex{Float64}}(undef, 2N ,2N)

	for i in 0:N-1
		for j in 0:N-1
			a = randn()
			b = randn()
			c = randn()
			d = randn()
			@inbounds G[2*i + 1,2*j + 1] = a + b*im
			@inbounds G[2*i+1 + 1,2*j+1 + 1] = a - b*im
			@inbounds G[2*i+1 + 1,2*j + 1] = -c + d*im
			@inbounds G[2*i + 1,2*j+1+ 1] = c + d*im
		end
	end

	M = G * G'
	ev = eigvals(M)
	return ev[1]*N
end

function data()
	for i in 1:1000
		println(i," ",symplectic(100))
	end
end

@time data()

I wouldn’t recommend my own code here since the ODE/PDE/SDE solving libraries are almost akin to being OpenBLAS sometimes, with its own definitions of broadcast variants, exponentiation, etc. There’s some things to learn from it on how to write high-performance generic code, but there’s also a ton of extra things that I wouldn’t recommend people would do unless their goal was to simply make users as happy as possible.

1 Like

Instead of generating single scalars with rand() running it many times inside the nested loops you could create 100x100 matrices with rand(100,100) before the loops.

Translating code is a non-trivial task - without the original python code it will be hard to find equivalences or mistakes in your julia version.

First, ideally you can pass n and samples as parameters to data():

data(n,nsamples)

avoid using global variables (or, at least, use const n = 100, etc).

You could allocate G in data and pass it as parameter to sympletic. And you could also allocate other auxiliary arrays (M, for instance), and pass it as well as parameter to sympletic.

Something like this (needs checking and still it might be possible to get the eigenvalues into a prealocated array, such that the function should not allocate anything):

using LinearAlgebra

function symplectic(N,G,R)

      for i in 0:N-1
            for j in 0:N-1
                  @. R = randn()
                  @inbounds G[2*i + 1,2*j + 1] = R[1] + R[2]*im
                  @inbounds G[2*i+1 + 1,2*j+1 + 1] = R[1] - R[2]*im
                  @inbounds G[2*i+1 + 1,2*j + 1] = -R[3] + R[4]*im
                  @inbounds G[2*i + 1,2*j+1+ 1] = R[3] + R[4]*im
            end
      end
      ev = svdvals!(G)
      return ev[2*N]*N
end

function data(n,samples)
      G = Array{Complex{Float64}}(undef, 2n ,2n)
      R = zeros(4)
      for i in 1:samples
            symplectic(n,G,R)
      end
end

n = 100
samples = 1000
@time data(n,samples)

(of course you can loop from 1 to N and remove the +1 from the G[...] = .

1 Like

this is equivalent to squaring svdvals(G) (albeit sorted in ascending rather than descending order).

2 Likes

Thank you everyone for the suggestions! They are helpful, and I am working through them carefully.

Here are my preliminary observations.

  1. Using const indeed removes the penalty from using the global variables (compared to hard-coding), which resolves one of my questions.

  2. Strangely, using svdvals(G) squaring, and taking the element in index end instead of computing the eigenvalues of G * G' as above almost doubles the running time of the program. I’m not sure what’s going on here, but this can be observed both when I run @lmiq’s version, and edit mine to include the SVD.

  3. @inbounds does nothing in this script, and should likely be removed from it for clarity.

  4. Initializing G in data() makes the script about .1 second slower, but uses about 2 GiB instead of 2.5 GiB memory. I don’t see much change initializing the random numbers outside of the function symplectic().

Keep in mind that for a more proper benchmarking you should better use BenchmarkTools. Or, at least, run the code twice, the first to compile, the second to measure the time and allocations.

Thank you for pointing this out. Doing this, I get virtually identical results in time to my original script (4.8 seconds) but with less memory usage (1.9 vs 2.5 GiB), so your suggestion is indeed a win. Sorry for the mistake.

I still wonder if there is a way to cache the compilation between successive runs of the same script…

If two executions occur in the same Julia section functions that did not change (and didn’t receive different types of parameters) won’t be compiled again.

Check out https://github.com/timholy/Revise.jl which should allow you to revise your code without restarting Julia. That should remove most of the compilation burden, since you will only have to recompile the code you’ve changed.

2 Likes

In my test the allocations are reduced to 250 Mb:

 17.909110 seconds (616.27 k allocations: 256.346 MiB, 1.37% gc time)

This is the code:

using LinearAlgebra

function symplectic(N,G,R)

	for i in 0:N-1
		for j in 0:N-1
			@. R = randn()
			@inbounds G[2*i + 1,2*j + 1] = R[1] + R[2]*im
			@inbounds G[2*i+1 + 1,2*j+1 + 1] = R[1] - R[2]*im
			@inbounds G[2*i+1 + 1,2*j + 1] = -R[3] + R[4]*im
			@inbounds G[2*i + 1,2*j+1+ 1] = R[3] + R[4]*im
		end
	end
      ev = svdvals!(G)
	return ev[2*N]*N
end

function data(n,samples)
	G = Array{Complex{Float64}}(undef, 2n ,2n)
      R = zeros(4)
	for i in 1:samples
	   println(symplectic(n,G,R))
	end
end

n = 100
samples = 1000
data(n,samples)
@time data(n,samples)
                         

And the difference (to 2Gib) is using svdvals! (note the exclamation mark)

Edit: I am not sure if this is doing the same as your code, exactly. But appropriate use of adjoint! and mul! could reduce the allocations even further.

1 Like

Interesting, thank you. I reproduce your drastic decrease in memory usage, to about 220 MiB. However, the time it takes to run the script about doubles when using svdvals!.

When I compute the singular values “by hand” by first doing G * G' and then eigvals! (note the exclamation mark), it runs in about half the time as the svdvals! version but uses about 5 times as much memory. This is still about half the memory usage of the non-exclamation mark version (the original), very roughly.

I am puzzled why the dedicated svd routine is twice as slow as the direct computation I’m using.

1 Like

It’s more accurate. Computing G * G' squares the condition number.

4 Likes

Hi,

The correct equivalence between SVD and EIG is when,

M = G * G’
ev_eig = eigvals(M)
ev_svd = svdvals(M)

And hence, for i=1,...,N.

ev_svd[i] == abs(ev_eig[N+1-i])

(Normally you would take the SVD of G, not M, in which case you would take the square roots of ev_eig. Also, you might then have to omit some zero eigenvalues for non-square or rank-deficient G.)

2 Likes

Thanks, @stevengj, my apologizes by my misconception