I am a Julia novice and converted the following program from Python (for a simple practice exercise). It computes a N by N random matrix of quaternions, drawn from the Gaussian distribution, and then computes the smallest singular value, and repeats this many times. (I have omitted the graphing of the resulting distribution for simplicity.) We use that quaternions may be represented as 2 by 2 complex matrices.
I have a few basic questions, which elude me even after looking through the manual.
The version below hard-codes the constants 100 and 1000. When these are replaced by n and samples in the data function, the code becomes .5 seconds slower (4.8 versus 5.3). Why is this, and is there a way to prevent this slowdown without hard-coding the constants, so that the values are easier to change?
This is a literal translation of a Python script. Are there any easy ways to make it faster or more memory efficient? I feel I’m surely missing a few tricks. In particular, it produces 2.5 GiB of memory allocations, which seems sub-optimal.
Can the compilation of the script be cached, so that I do not re-pay the compilation cost each time the script is run without being edited?
Are there any Julia packages/modules whose source code is particularly well-written and instructive, which I could read to pick up idiomatic, high-performance Julia? I was thinking of the linear algebra library, but then I realized that it’s probably just calls to OpenBLAS, which is not written in Julia, so probably that would not help. Perhaps the ODE/PDE/SDE solving library?
I apologize for any misunderstandings in these questions – I am still an inexperienced programmer, and an even more inexperienced Julia user. Thank you!
using LinearAlgebra
n = 100
samples = 1000
function symplectic(N)
G = Array{Complex{Float64}}(undef, 2N ,2N)
for i in 0:N-1
for j in 0:N-1
a = randn()
b = randn()
c = randn()
d = randn()
@inbounds G[2*i + 1,2*j + 1] = a + b*im
@inbounds G[2*i+1 + 1,2*j+1 + 1] = a - b*im
@inbounds G[2*i+1 + 1,2*j + 1] = -c + d*im
@inbounds G[2*i + 1,2*j+1+ 1] = c + d*im
end
end
M = G * G'
ev = eigvals(M)
return ev[1]*N
end
function data()
for i in 1:1000
println(i," ",symplectic(100))
end
end
@time data()
I wouldn’t recommend my own code here since the ODE/PDE/SDE solving libraries are almost akin to being OpenBLAS sometimes, with its own definitions of broadcast variants, exponentiation, etc. There’s some things to learn from it on how to write high-performance generic code, but there’s also a ton of extra things that I wouldn’t recommend people would do unless their goal was to simply make users as happy as possible.
Instead of generating single scalars with rand() running it many times inside the nested loops you could create 100x100 matrices with rand(100,100) before the loops.
First, ideally you can pass n and samples as parameters to data():
data(n,nsamples)
avoid using global variables (or, at least, use const n = 100, etc).
You could allocate G in data and pass it as parameter to sympletic. And you could also allocate other auxiliary arrays (M, for instance), and pass it as well as parameter to sympletic.
Something like this (needs checking and still it might be possible to get the eigenvalues into a prealocated array, such that the function should not allocate anything):
using LinearAlgebra
function symplectic(N,G,R)
for i in 0:N-1
for j in 0:N-1
@. R = randn()
@inbounds G[2*i + 1,2*j + 1] = R[1] + R[2]*im
@inbounds G[2*i+1 + 1,2*j+1 + 1] = R[1] - R[2]*im
@inbounds G[2*i+1 + 1,2*j + 1] = -R[3] + R[4]*im
@inbounds G[2*i + 1,2*j+1+ 1] = R[3] + R[4]*im
end
end
ev = svdvals!(G)
return ev[2*N]*N
end
function data(n,samples)
G = Array{Complex{Float64}}(undef, 2n ,2n)
R = zeros(4)
for i in 1:samples
symplectic(n,G,R)
end
end
n = 100
samples = 1000
@time data(n,samples)
(of course you can loop from 1 to N and remove the +1 from the G[...] = .
Thank you everyone for the suggestions! They are helpful, and I am working through them carefully.
Here are my preliminary observations.
Using const indeed removes the penalty from using the global variables (compared to hard-coding), which resolves one of my questions.
Strangely, using svdvals(G) squaring, and taking the element in index end instead of computing the eigenvalues of G * G' as above almost doubles the running time of the program. I’m not sure what’s going on here, but this can be observed both when I run @lmiq’s version, and edit mine to include the SVD.
@inbounds does nothing in this script, and should likely be removed from it for clarity.
Initializing G in data() makes the script about .1 second slower, but uses about 2 GiB instead of 2.5 GiB memory. I don’t see much change initializing the random numbers outside of the function symplectic().
Keep in mind that for a more proper benchmarking you should better use BenchmarkTools. Or, at least, run the code twice, the first to compile, the second to measure the time and allocations.
Thank you for pointing this out. Doing this, I get virtually identical results in time to my original script (4.8 seconds) but with less memory usage (1.9 vs 2.5 GiB), so your suggestion is indeed a win. Sorry for the mistake.
I still wonder if there is a way to cache the compilation between successive runs of the same script…
If two executions occur in the same Julia section functions that did not change (and didn’t receive different types of parameters) won’t be compiled again.
17.909110 seconds (616.27 k allocations: 256.346 MiB, 1.37% gc time)
This is the code:
using LinearAlgebra
function symplectic(N,G,R)
for i in 0:N-1
for j in 0:N-1
@. R = randn()
@inbounds G[2*i + 1,2*j + 1] = R[1] + R[2]*im
@inbounds G[2*i+1 + 1,2*j+1 + 1] = R[1] - R[2]*im
@inbounds G[2*i+1 + 1,2*j + 1] = -R[3] + R[4]*im
@inbounds G[2*i + 1,2*j+1+ 1] = R[3] + R[4]*im
end
end
ev = svdvals!(G)
return ev[2*N]*N
end
function data(n,samples)
G = Array{Complex{Float64}}(undef, 2n ,2n)
R = zeros(4)
for i in 1:samples
println(symplectic(n,G,R))
end
end
n = 100
samples = 1000
data(n,samples)
@time data(n,samples)
And the difference (to 2Gib) is using svdvals! (note the exclamation mark)
Edit: I am not sure if this is doing the same as your code, exactly. But appropriate use of adjoint! and mul! could reduce the allocations even further.
Interesting, thank you. I reproduce your drastic decrease in memory usage, to about 220 MiB. However, the time it takes to run the script about doubles when using svdvals!.
When I compute the singular values “by hand” by first doing G * G' and then eigvals! (note the exclamation mark), it runs in about half the time as the svdvals! version but uses about 5 times as much memory. This is still about half the memory usage of the non-exclamation mark version (the original), very roughly.
I am puzzled why the dedicated svd routine is twice as slow as the direct computation I’m using.
(Normally you would take the SVD of G, not M, in which case you would take the square roots of ev_eig. Also, you might then have to omit some zero eigenvalues for non-square or rank-deficient G.)