Performance & Profiling Tips for Beginner Code

Dear all, many thanks already for all your hints, this is exactly what I was hoping to get. I will try to answer them all:

1a)

This is simply a reminiscent of the formula I derived by hand/Mathematica and stems from some spherical harmonics. When writing such code, I try to stay with the original form in order to not introduce any typo and for easier debugging. As expected, changing it to 16 \pi^2 did not really make any relevant difference.

1b)

I tried this by changing my code to

Z .+= tril(Z,-1)';
X .+= transpose(tril(X,-1));
Y .+= transpose(tril(Y,-1));

however this changed the benchmark result from

julia> @btime include("PerformanceTest.jl")
  1.010 s (12749484 allocations: 1.34 GiB)

to

@btime include("PerformanceTest.jl")
  1.078 s (12789251 allocations: 1.33 GiB)

I don’t really understand why this actually increases the number of allocations? (The memory is however slightly less…)

1c)

Unfortunately, due to some bug, I experience some memory leak when activating the Julia extension in VSCode. But I will keep it in mind when this bug might be fixed in the future.

Is there a way to see and analyze the number of calls to this functions within one execution?

Wow, that is quite the improvement! Thank you for the hint with using StaticArrays. I am pretty sure that I have read about this once in the performance tips or somewhere else, but I guess it was not so revealing to me until I have applied it to actual code of myself. As you showed,

@btime X,Y,Z = mat_xyz(var_params, d_arr, abcvals; mu_g=1.0, prefac=-10.0, theta = 0.0);
  92.730 ms (19 allocations: 14.65 MiB)

which is a big improvement. I will certainly be using StaticArrays in the future.

However I am curious, when calling the entire script, why does

@time X,Y,Z = mat_xyz(var_params, d_arr, abcvals; mu_g=1.0, prefac=-10.0, theta = 0.0);
@time evals,evecs = eigen(X+Y,Z);

yield

  0.553434 seconds (249.05 k allocations: 30.053 MiB, 2.41% gc time, 71.89% compilation time)
  0.281798 seconds (20 allocations: 14.695 MiB)

even after several executions? Why is it recompiling the function every time?