Timing Clebsch-Gordan-coefficients calls in Julia (beating c++ again)

misha_mikhasenko · December 27, 2019, 7:54am

I have compared 4 different implementations of the Clebsch-Gordan coefficients

freshly coded in my ThreeBodyDecay.jl package
from GSL.jl call (c++ implementation)
from SymPy.jl (for the sake of completeness)
julia package WignerSymbols.jl

The timing for 1000 calls of random coefficient with j₁, j₂ < 16 follows.

ThreeBodyDecay.jl               187 μs
GSL                             743 μs
WignerSymbols.jl                764 μs
SymPy                           1.61 s

Details are here.

I was glad to use the GSL until I discovered that the evaluation of the CG coefficients in some scattering amplitudes takes 50% of the total time. A native Julia implementation (perhaps, case-specific) turned to be 4 times (!) faster.
It seems that there is a room for optimization in the WignerSymbols.jl. I am happy to help if authors can point out what the right strategy could be. Having a dedicated, tested package is obviously preferable to the customary implementation.
I have a similar implementation of the Wigner-d functions. AFAIK, they do not exist in julia yet (?).

I am wondering if creating a new package (e.g. HEP-specific) is a good idea.

stevengj · December 27, 2019, 2:42pm

It looks like most of your 1000 calls are trivial cases where ClGd returns 0.0; maybe it would be better to benchmark only the nontrivial cases?

It looks like you can get some additional speedup by eliminating your f_logfact(n) and just doing logfact[n+1] directly with an @inbounds directive (assuming that you’ve previously checked the arguments to ensure in-bounds accesses). (Even without the @inbounds, the n < 0 check in f_logfact is redundant with bounds-checking.) (To get some slight additional speedup, you could replace logfact[1+div(x,2)] with logfact2[1+x], where logfact2 is a double-length array in which each entry is repeated twice.) Also, isn’t res = 0 type-unstable?

misha_mikhasenko · December 27, 2019, 7:47pm

Hi Steven, these are great suggestions!
The amount of zeros is ≈ 17%, and 14% are equal to 1.0. I tried to exclude obviosly-trivial Clebsches in the rand_clebsch() function (+fixed a typo).

Just checked the other suggestions:

type stability + logfact2 gives 1-2 μs gain
@inbounds removes extra 2-4 μs for the same tests.

Not so much, but still cool. Thanks a lot for pointing out!

Did you notice if I am doing something wrong with SymPy calls? My colleagues running tensorflow used to calculate both the Clebsch-Gordan coefficientss and the Wigner d using the sympy.physics.wigner. It is hard to believe that the call time which they are working with is x1000. I have not benchmarked it thoroughly, tho.

misha_mikhasenko · May 9, 2020, 2:57pm

update: the package PartialWaveFunctions.jl has been registered!

timing is here

github.com

mmikhasenko/PartialWaveFunctions.jl/blob/master/test/timing_different_packages.jl#L108


      
          @btime for _ in 1:1_000
              @unpack two_j1, two_j2, two_j, two_m1, two_m2, two_m = rand_clebsch()
              ClGd_WS(two_j1,two_m1,two_j2,two_m2,two_j,two_m)
          end # 7.483 ms
          
          @btime for _ in 1:1_000
              @unpack two_j1, two_j2, two_j, two_m1, two_m2, two_m = rand_clebsch()
              ClGd_sympy(two_j1,two_m1,two_j2,two_m2,two_j,two_m)
          end # 1.61 s
          
          # Summary
          # -------------
          # function      | `PartialWaveFunctions.jl` | `GSL.jl` | `WignerSymbols.jl` | `SymPy.jl` |
          # -------------
          # random Clebsh-Gordan coef.(j<15) | 227.2 μs | 921.8 μs | 7.483 ms | 1.61 s |
          # -------------

xzackli · May 9, 2020, 7:40pm

WignerSymbols.jl could use a bit of performance tuning, but I would mention that the prime factorization method in that package is able to compute symbols to very high quantum numbers exactly, in the form of roots of Rational{BigInt}.

This is somewhat orthogonal to the methods you’ve benchmarked, but I thought it might be helpful to mention – I’ve recently implemented (yet another) method in WignerFamilies.jl. In my field (cosmology) the Wigner 3j-symbols are not usually used via random access, but instead in summations over all nontrivial symbols for some fixed quantum numbers. The Wigner 3j-symbols are subject to convenient recurrence relations which can provide all symbols in a family with relatively few floating-point operations per symbol.

For example, there are 2001 nontrivial 3j symbols varying j with fixed j_2 = 1000, j_3 = 1000, m_2=2, m_3=-2, m_1 = -m_2 -m_3.

f(j) = \begin{pmatrix} j & j_2 & j_3 \\ 0 & 2 & -2 \end{pmatrix}

using WignerFamilies, BenchmarkTools
@btime wigner3j_f(1000,1000,2,-2)
  22.369 μs (2 allocations: 15.84 KiB)
2001-element WignerSymbolVector{Float64,Int64,Array{Float64,1}} with indices WignerFamilies.URange(0,2000):
  0.022355091700494854
  4.468784506164827e-5
 -0.011177416041038234
  ⋮
 -0.001741253798177275
  0.0003163901836069519
  0.0025009714219007464

misha_mikhasenko · May 9, 2020, 8:02pm

Hi, thanks for the comment. Good to know about WignerFamilies.jl, I should add a link to the package in my README. Indeed, my clebsches would not work above j=50 without extending precalculated factorials (well, it is never needed in HEP).

oschulz · May 12, 2020, 7:50am

Hi @misha_mikhasenko, your package will be very welcome here, if you’re interested: https://github.com/JuliaHEP

misha_mikhasenko · May 12, 2020, 8:45am

Hi Oliver,
yes, makes sense, It should be moved there. Thanks

Topic		Replies	Views
Jl_call - function call latency Performance embedding	5	318	November 29, 2023
Julia vs C++ speed General Usage performance , c	21	4567	September 2, 2021
Does Julia have fast expectations over multivariate Gaussians? Performance question	6	431	October 17, 2022
Benchmark function that uses CUDA.jl Performance cuda , benchmark , precompilation	2	709	November 7, 2021
Simple For Loops and Speed New to Julia question , performance	8	3429	November 19, 2017

Timing Clebsch-Gordan-coefficients calls in Julia (beating c++ again)

Related topics