Efficient calculation of many complex Lorentzians on a large array

Dear Julia community,

As obvious from the forum I’m posting in, I am indeed new to Julia but already love it. By education, I am an engineer with a PhD focusing on laser spectroscopy and not a programmer, but I do have solid experience with other languages, such as Matlab, Python, Java and a little C from back in the days where Arduinos weren’t a thing.
As Julia is said to be nearly as performant as FORTRAN with the advantage of… well… not being FORTRAN, I wanted to give it a try :slight_smile:

A little background on the problem at hand:
A while ago, I programmed an open-source code for fitting a special type of spectra (Coherent Anti-Stokes Raman=CARS) in MATLAB. The code works alright and is nearly as fast as a comparison code written in FORTRAN, mostly because the implementation of the comparison code is not perfect and the MATLAB code is close to what is achievable. Now I want to try to implement it in Julia.
The core of the program involves to calculate many (say >400) complex Lorentzians of the form:

lagrida_latex_editor(3)

In this equation, χ is what I want to calculate, the complex susceptibility. aᵢ, ωᵣᵢ and γᵢ are the amplitudes, position and linewidth of the Lorentzian for transition i. j represents the imaginary number and all individual transitions are summed up to a spectrum. ω is the array representing the spectral domain where the calculations are performed on.

For real life spectra, there are easily imax>500 transitions to be calculated on an array ω with length ~300k entries.
I ported the most computationally expensive part of the code to Julia directly, only to be at first a little bit disappointed that I didn’t manage to be much faster than Matlab. In Matlab, the calculation takes about 2 s on my laptop, and the direct port to Julia was not significantly faster. Obviously, I at first searched the problem somewhere between chair and keyboard and learned about how to do calculations in-place to avoid excessive allocations, dot syntax and some others. This brought me down to 1.2 seconds, so better, but not quite what I had hoped for.
Quite interestingly, doing the equivalent calculation not with complex numbers (ComplexF32) but with the real and imaginary component separately using twice as many Float32 arrays, the calculation is done in 0.25 s, see benchmark code below, where 436 transitions are calculated on a 322223-element vector of either Float32 or ComplexF32. This is 8 times faster than MATLAB, nice. However, I would like to understand why the separate calculation using two Float arrays is so much faster than a single Complex array and also, if there are any possibilities to even improve the performance. As I am a total noob with Julia, I simply have to assume that I’m still doing it wrong :slight_smile: and I’d appreciate any comment or help. I played around a little with some macros such as @inbounds, @simd and also @threads (without making chi thread-safe, just for testing) for the for-loop iterating over eachindex(i), which all didn’t really have an impact on runtime. Maybe I was using them wrong?

Thank you very much!

Max

using BenchmarkTools

function complexLorentz!(χIC::Vector{Float32}, χRC::Vector{Float32}, aᵢ::Float32, ωᵣᵢ::Float32, ω::Vector{Float32}, γᵢᵢ::Float32) 
    χIC .= @. aᵢ * γᵢᵢ / (γᵢᵢ^2 + (ωᵣᵢ - ω)^2)
    χRC .= @. aᵢ * (ωᵣᵢ - ω) / (γᵢᵢ^2 + (ωᵣᵢ - ω)^2)
    nothing
end

function complexLorentz!(χIC::Vector{ComplexF32}, χRC::Vector{ComplexF32}, aᵢ::ComplexF32, ωᵣᵢ::ComplexF32, ω::Vector{ComplexF32}, γᵢᵢ::ComplexF32) 
    χIC .= @. aᵢ / (ωᵣᵢ - ω - im*γᵢᵢ)
    nothing
end

function loopLorentz!(χR::Vector{Float32}, χI::Vector{Float32}, χRC::Vector{Float32}, χIC::Vector{Float32}, a::Vector{Float32}, ωᵣ::Vector{Float32}, ω::Vector{Float32}, γᵢ::Vector{Float32})
    for i = eachindex(γᵢ)
        complexLorentz!(χIC, χRC, a[i], ωᵣ[i], ω, γᵢ[i])
        χI .= χI .+ χIC
        χR .= χR .+ χRC
    end
    χ = (χR + im * χI)
    return χ
end

function loopLorentz!(χR::Vector{ComplexF32}, χI::Vector{ComplexF32}, χRC::Vector{ComplexF32}, χIC::Vector{ComplexF32}, a::Vector{ComplexF32}, ωᵣ::Vector{ComplexF32}, ω::Vector{ComplexF32}, γᵢ::Vector{ComplexF32})
    for i = eachindex(γᵢ)
        complexLorentz!(χIC, χRC, a[i], ωᵣ[i], ω, γᵢ[i])
        χI .= χI .+ χIC
    end
    return χI
end

function provideData(ctype)
    # wavenumber range
    ωmin = 2150
    ωmax = 2440
    ωres = 9e-4
    
    # collect as a vector of correct type
    ω = convert(Vector{ctype}, collect(ωmin:ωres:ωmax))

    # preallocate χ with same length as ω
    χR = Vector{ctype}(undef, length(ω))
    χI = Vector{ctype}(undef, length(ω))
    
    # preallocate the current calculated transition
    χRC = Vector{ctype}(undef, length(ω))
    χIC = Vector{ctype}(undef, length(ω))

    # these are the full width half max linewidth, amplitude and resonance frequency
    γ = convert(Vector{ctype}, vec([0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0241101316652921 0.0239696405278575 0.0238230532252038 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0280272457763299 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0252847154374324 0.0321460375993402 0.0303318154218017 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0241101316652921 0.0239696405278575 0.0238230532252038 0.0236690618786425 0.0235066473650465 0.0233350542536934 0.0231537657945814 0.0229624793443047 0.0227610825619214 0.0225496305745788 0.0223283242999883 0.0220974899959438 0.0218575601365424 0.0216090556014342 0.0213525692342441 0.0210887507007100 0.0208182926894235 0.0205419183425625 0.0202603699680496 0.0199743988779997 0.0196847564326183 0.0193921860824570 0.0190974165380251 0.0188011557876509 0.0185040861733736 0.0182068601393609 0.0179100969908608 0.0176143801151208 0.0173202552061921 0.0170282286884957 0.0167387672084915 0.0164522969767714 0.0161692043626936 0.0158898358474769 0.0156144996181651 0.0153434657792636 0.0150769689431232 0.0148152082583232 0.0145583511637073 0.0143065326154134 0.0140598604671369 0.0138184129339888 0.0135822465906845 0.0133513904444129 0.0131258585191133 0.0129056376176759 0.0126907083092061 0.0124810209379286 0.0122765324078116 0.0120771600155698 0.0118828484834103 0.0116934812350107 0.0115090062904270 0.0113292644023197 0.0111542313279882 0.0109836806425077 0.0108176598286612 0.0104983624374980 0.0321460375993402 0.0303318154218017 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0241101316652921 0.0239696405278575 0.0238230532252038 0.0236690618786425 0.0235066473650465 0.0233350542536934 0.0231537657945814 0.0229624793443047 0.0227610825619214 0.0225496305745788 0.0223283242999883 0.0220974899959438 0.0218575601365424 0.0216090556014342 0.0213525692342441 0.0210887507007100 0.0208182926894235 0.0205419183425625 0.0202603699680496 0.0199743988779997 0.0196847564326183 0.0193921860824570 0.0190974165380251 0.0188011557876509 0.0185040861733736 0.0182068601393609 0.0179100969908608 0.0176143801151208 0.0173202552061921 0.0170282286884957 0.0167387672084915 0.0164522969767714 0.0161692043626936 0.0158898358474769 0.0156144996181651 0.0153434657792636 0.0150769689431232 0.0148152082583232 0.0145583511637073 0.0143065326154134 0.0140598604671369 0.0138184129339888 0.0135822465906845 0.0133513904444129 0.0131258585191133 0.0129056376176759 0.0126907083092061 0.0124810209379286 0.0122765324078116 0.0120771600155698 0.0118828484834103 0.0115090062904270 0.0111542313279882 0.0321460375993402 0.0303318154218017 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0241101316652921 0.0239696405278575 0.0238230532252038 0.0236690618786425 0.0235066473650465 0.0233350542536934 0.0231537657945814 0.0229624793443047 0.0227610825619214 0.0225496305745788 0.0223283242999883 0.0220974899959438 0.0218575601365424 0.0216090556014342 0.0213525692342441 0.0210887507007100 0.0208182926894235 0.0205419183425625 0.0202603699680496 0.0199743988779997 0.0196847564326183 0.0193921860824570 0.0190974165380251 0.0188011557876509 0.0185040861733736 0.0182068601393609 0.0179100969908608 0.0176143801151208 0.0173202552061921 0.0170282286884957 0.0167387672084915 0.0164522969767714 0.0161692043626936 0.0158898358474769 0.0156144996181651 0.0153434657792636 0.0150769689431232 0.0148152082583232 0.0145583511637073 0.0143065326154134 0.0140598604671369 0.0138184129339888 0.0135822465906845 0.0133513904444129 0.0131258585191133 0.0126907083092061 0.0321460375993402 0.0303318154218017 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0241101316652921 0.0239696405278575 0.0238230532252038 0.0236690618786425 0.0235066473650465 0.0233350542536934 0.0231537657945814 0.0229624793443047 0.0227610825619214 0.0225496305745788 0.0223283242999883 0.0220974899959438 0.0218575601365424 0.0216090556014342 0.0213525692342441 0.0210887507007100 0.0208182926894235 0.0205419183425625 0.0202603699680496 0.0199743988779997 0.0196847564326183 0.0193921860824570 0.0190974165380251 0.0188011557876509 0.0185040861733736 0.0182068601393609 0.0179100969908608 0.0176143801151208 0.0173202552061921 0.0170282286884957 0.0167387672084915 0.0164522969767714 0.0161692043626936 0.0158898358474769 0.0156144996181651 0.0150769689431232 0.0145583511637073 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0241101316652921 0.0239696405278575 0.0238230532252038 0.0236690618786425 0.0235066473650465 0.0233350542536934 0.0231537657945814 0.0229624793443047 0.0227610825619214 0.0225496305745788 0.0223283242999883 0.0220974899959438 0.0218575601365424 0.0216090556014342 0.0213525692342441 0.0210887507007100 0.0208182926894235 0.0205419183425625 0.0202603699680496 0.0199743988779997 0.0196847564326183 0.0193921860824570 0.0190974165380251 0.0185040861733736 0.0179100969908608 0.0173202552061921 0.0262820762709812 0.0257174839690073 0.0252847154374324 0.0249405597454478 0.0246487386338929 0.0243796471294858 0.0241101316652921 0.0238230532252038 0.0235066473650465 0.0231537657945814 0.0227610825619214 0.0223283242999883 0.0218575601365424 0.0303318154218017 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0303318154218017 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0293909677287820 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0241101316652921 0.0239696405278575 0.0252847154374324 0.0249405597454478 0.0246487386338929 0.0243796471294858 0.0241101316652921 0.0238230532252038 0.0235066473650465]))
    a = convert(Vector{ctype}, vec([2.76674832091363e-21 2.36681408668054e-21 6.53275898449572e-21 4.11584029326841e-21 9.84110304653029e-21 5.67916815126275e-21 1.27767479850669e-20 7.04439497476061e-21 1.52872248611681e-20 8.18285512093513e-21 1.73191058586550e-20 9.07182020022599e-21 1.88369946582998e-20 9.69916405772032e-21 1.98282631692069e-20 1.00644177956632e-20 2.03034172664904e-20 1.01783080307700e-20 2.02941571252810e-20 1.00613748249970e-20 1.98499429664072e-20 1.03455157312988e-21 8.84170488841555e-22 2.43917749274521e-21 1.53630604362417e-21 3.67278578249855e-21 2.11938892315959e-21 4.76816838751841e-21 2.62908917485547e-21 5.70613124909710e-21 3.05483644764460e-21 6.46688707602503e-21 3.38817924571018e-21 7.03720451926530e-21 3.62454297403169e-21 7.41220201867026e-21 3.76363649404603e-21 7.59551245365868e-21 3.80929561134559e-21 6.97235085385984e-22 1.04936942907519e-21 6.05501253676967e-22 1.36225794410182e-21 7.51176395955906e-22 1.63052954594206e-21 8.73059718024617e-22 1.84857576080926e-21 9.68746039027827e-22 2.01261148539039e-21 1.03691495754788e-21 4.79452641904450e-22 1.93849262282522e-20 3.03954119276334e-20 9.94095944635917e-20 6.88352328305803e-20 1.74837908643640e-19 1.05278565741194e-19 2.44529100394880e-19 1.38243952836379e-19 3.06198520232869e-19 1.66728923501617e-19 3.58096229205161e-19 1.89989354311201e-19 3.99005801465568e-19 2.07556855856749e-19 4.28273962625286e-19 2.19246229367551e-19 4.45808010755341e-19 2.25145185136480e-19 4.52037832019126e-19 2.25587234803657e-19 4.47847076283963e-19 2.21110942787309e-19 4.34481147857702e-19 2.12409818996800e-19 4.13441166346567e-19 2.00277580533250e-19 3.86373383584460e-19 1.85553416092098e-19 3.54962880522279e-19 1.69071344684856e-19 3.20838920923179e-19 1.51616888028985e-19 2.85497366120781e-19 1.33893211258101e-19 2.50243339620942e-19 1.16497769812071e-19 2.16155141862194e-19 9.99094570876052e-20 1.84068479805211e-19 8.44853751964487e-20 1.54578553687788e-19 7.04657118072259e-20 1.28056520260424e-19 5.79848240376673e-20 1.04676341988047e-19 4.70864965714389e-20 8.44479884068806e-20 3.77414202939185e-20 6.72532893114140e-20 2.98651778139078e-20 5.28813365551579e-20 2.33353644163730e-20 4.10610740795797e-20 1.80068601364127e-20 3.14894969325502e-20 1.37246517955384e-20 2.38546130188087e-20 1.03339461328294e-20 1.78529458606852e-20 7.68759325152743e-21 1.32018392928172e-20 5.65104300659859e-21 9.64715621402872e-21 4.10518594383206e-21 6.96715240435166e-21 2.94749305688385e-21 4.97337465217130e-21 2.09187777392036e-21 3.50940821604817e-21 1.46767395056720e-21 2.44821157662607e-21 1.01806963557412e-21 1.68865376147597e-21 6.98270537464213e-22 1.15173377924550e-21 4.73599307243112e-22 7.76829641594465e-22 5.18207056835032e-22 7.21905982791785e-21 1.13197099759045e-20 3.70234703651530e-20 2.56384604246416e-20 6.51268199376577e-20 3.92209511281055e-20 9.11115391418990e-20 5.15186445099533e-20 1.14132046547726e-19 6.21602106704810e-20 1.33539294141205e-19 7.08691421442952e-20 1.48880042635506e-19 7.74701699210476e-20 1.59907942493643e-19 8.18921405619780e-20 1.66582921609136e-19 8.41644283474434e-20 1.69057638418321e-19 8.44071875875874e-20 1.67652619282503e-19 8.28165860761199e-20 1.62822834665510e-19 7.96465650344742e-20 1.55119025761610e-19 7.51888393883877e-20 1.45147227671257e-19 6.97528255047926e-20 1.33529709771784e-19 6.36469939622082e-20 1.20870042177558e-19 5.71628343520475e-20 1.07724291413941e-19 5.05622366686062e-20 9.45795498484384e-20 4.40686899586604e-20 8.18402093458708e-20 3.78623190392222e-20 6.98216821684750e-20 3.20784599293956e-20 5.87507104748237e-20 2.68092372124306e-20 4.87710225231625e-20 2.21074615337467e-20 3.99528955746415e-20 1.79921103821965e-20 3.23051572413464e-20 1.44546781558577e-20 2.57882674471508e-20 1.14657640151417e-20 2.03273321513190e-20 8.98138716994584e-21 1.58241651844708e-20 6.94865839538650e-21 1.21677969436763e-20 5.31057582586566e-21 9.24309646172808e-21 4.00983876466087e-21 6.93740392051379e-21 2.99167650054626e-21 5.14525105277740e-21 2.20576572182271e-21 3.77137278575846e-21 1.60735977251077e-21 2.73228753792676e-21 1.15777833389094e-21 1.95675312981648e-21 8.24411580082986e-22 1.38540022734482e-21 5.80383687346392e-22 9.69816225642525e-22 6.71309430573320e-22 4.59534633034371e-22 2.05804787763813e-21 3.22716334058416e-21 1.05556359108615e-20 7.31024049555216e-21 1.85713235075614e-20 1.11854942876921e-20 2.59881482154678e-20 1.46974533173340e-20 3.25665530519823e-20 1.77408332065914e-20 3.81222721177645e-20 2.02369874578806e-20 4.25260434796781e-20 2.21357190106326e-20 4.57067786957055e-20 2.34161279092207e-20 4.76514849927195e-20 2.40856558660279e-20 4.84015712843709e-20 2.41774051037220e-20 4.80459768419389e-20 2.37460440973005e-20 4.67118808919928e-20 2.28627269035858e-20 4.45539097139137e-20 2.16095014524794e-20 4.17427991451321e-20 2.00736768212979e-20 3.84544119392200e-20 1.83425688861749e-20 3.48598710732430e-20 1.64989591846748e-20 3.11173769738459e-20 1.46174967497107e-20 2.73660564442128e-20 1.27621609413060e-20 2.37219706653436e-20 1.09847975004629e-20 2.02762118598757e-20 9.32465015785786e-21 1.70948597116996e-20 7.80874270747637e-21 1.42204588837546e-20 6.45292441014597e-21 1.16746203002038e-20 5.26337447521645e-21 9.46133747655740e-21 4.23836605403768e-21 7.57063682350397e-21 3.37011176285745e-21 5.98223665163159e-21 2.64654550516955e-21 4.66896194212567e-21 2.05293358581948e-21 3.59973998620638e-21 1.57324680088772e-21 2.74207678147889e-21 1.19126047214399e-21 2.06397914339032e-21 8.91378703259530e-22 1.53533901490905e-21 6.59201177948561e-22 1.12883301888243e-21 4.81865409089656e-22 8.20412422148745e-22 5.89468257238218e-22 5.32331929727700e-22 8.34754639637943e-22 2.73051143335387e-21 1.89114059599092e-21 4.80483430397400e-21 2.89430970347301e-21 6.72558813969226e-21 3.80428452037381e-21 8.43120242068540e-21 4.59398329960728e-21 9.87421796057769e-21 5.24311260892453e-21 1.10211915466444e-20 5.73863117046031e-21 1.18535179813762e-20 6.07497592330801e-21 1.23674343254136e-20 6.25383198323960e-21 1.25731071880259e-20 6.28346409244939e-21 1.24929108247570e-20 6.17768721041799e-21 1.21590853100849e-20 5.95458307754264e-21 1.16110048299948e-20 5.63508232479668e-21 1.08922975619394e-20 5.24153087773194e-21 1.00480449372980e-20 4.79634716412107e-21 9.12225419806589e-21 4.32085576978322e-21 8.15575025740541e-21 3.83435701699051e-21 7.18457783455074e-21 3.35346391586718e-21 6.23894936874184e-21 2.89171122179888e-21 5.34272408476701e-21 2.45941845126723e-21 4.51336304004316e-21 2.06377135605520e-21 3.76227650724702e-21 1.70907531830997e-21 3.09546434054438e-21 1.39712935286237e-21 2.51434620198006e-21 1.12767015632430e-21 2.01668470537580e-21 8.98840733400673e-22 1.59751800323458e-21 7.07646143732970e-22 1.25003620743344e-21 5.50368421949383e-22 9.66355570274970e-22 7.38163277765898e-22 5.57222347304988e-22 6.75907536467190e-22 4.68165894164593e-22 1.18959169710972e-21 7.16669550604885e-22 1.66559555809908e-21 9.42298498730732e-22 2.08877730468673e-21 1.13838730178189e-21 2.44744104858555e-21 1.29992563758622e-21 2.73330793786935e-21 1.42367188835537e-21 2.94172005476807e-21 1.50821070770956e-21 3.07164733664742e-21 1.55390022585078e-21 3.12547207508911e-21 1.56271266278191e-21 3.10857627563517e-21 1.53798679058960e-21 3.02877705564988e-21 1.48411782595347e-21 2.89566530169573e-21 1.40621351972408e-21 2.71990580464007e-21 1.30974513845942e-21 2.51255405220490e-21 1.20021920881709e-21 2.28443693021583e-21 1.08289097502781e-21 2.04563322547450e-21 9.62534287865238e-22 1.80507666856160e-21 8.43275922720905e-22 1.57029090810556e-21 7.28495873077689e-22 1.34725365780590e-21 6.20789597106751e-22 1.14037734237097e-21 5.21983926007087e-22 9.52586508040621e-22 7.85468254729351e-22 6.39470817022797e-22 5.14126717319828e-22 5.07092409096337e-22 5.94449285680060e-22 6.64266658160163e-22 7.15402222629576e-22 7.47582433944502e-22 7.61352719054928e-22 7.57980643219352e-22 7.39322588985471e-22 7.07666930277610e-22 6.65567449195783e-22 6.15680086829353e-22 5.60614304509138e-22 5.02807695507262e-22 1.45760594571328e-21 4.84902927491088e-21 3.30450141697845e-21 8.26034807186170e-21 4.90693741795820e-21 1.12675854233457e-20 6.30789640757925e-21 1.38520413783390e-20 7.48514106161674e-21 1.59654214273236e-20 8.41679938435757e-21 1.75723263730044e-20 5.36810834818746e-22 1.79584908611688e-21 1.22626325170439e-21 3.06861800139697e-21 1.82418078549204e-21 4.19110965790634e-21 2.34739811201913e-21 5.15704561793617e-21 2.78780905100862e-21 5.94861768526215e-21 3.13729871846773e-21 6.55261277775953e-21 3.39112207985656e-21 6.96359930992422e-21 3.54865024370536e-21 7.18446008865135e-21 5.09038124471350e-22 8.72577344402127e-22 5.19100607968831e-22 1.19332832249266e-21 6.68692410880647e-22 1.46970180266519e-21 7.94822305208254e-22 1.69667312014571e-21 8.95186155201375e-22 1.87046782740858e-21 9.68416381177352e-22 1.98948859212561e-21 1.01430092823449e-21 2.05447752818317e-21 1.03377487249494e-21 2.06837043287504e-21 1.02879814042334e-21 2.03597448858440e-21 1.00216384140647e-21 4.84428664807833e-22 5.15699171070675e-22 5.33039025925385e-22 5.37181661811124e-22 5.29341914997040e-22 5.11107620403453e-22 4.84316914810449e-22]))
    ωᵣ = convert(Vector{ctype}, vec([2317.97429134631 2309.98175797216 2301.95518481219 2293.89484862162 2285.80102576824 2277.67399223070 2269.51402359806 2261.32139507066 2253.09638146230 2244.83925720364 2236.55029634704 2228.22977257250 2219.87795919512 2211.49512917364 2203.08155512043 2194.63750931271 2186.16326370509 2177.65908994335 2169.12525937964 2160.56204308879 2151.96971188613 2289.41817463711 2281.49498301665 2273.53762731776 2265.54638502732 2257.52153323568 2249.46334863240 2241.37210750320 2233.24808572822 2225.09155878161 2216.90280173240 2208.68208924662 2200.42969559079 2192.14589463667 2183.83095986731 2175.48516438436 2167.10878091675 2158.70208183062 2150.26533914053 2245.08529603411 2229.20777149147 2221.21868811125 2213.19642677427 2205.14126404050 2197.05347604445 2188.93333849347 2180.78112666730 2172.59711541897 2164.38157917692 2156.13479194846 2152.84478060910 2329.91168540000 2329.87695219362 2329.80748540920 2329.70328430473 2329.56434777044 2329.39067433266 2329.18226215904 2328.93910906502 2328.66121252159 2328.34856966438 2328.00117730402 2327.61903193781 2327.20212976266 2326.75046668939 2326.26403835822 2325.74284015568 2325.18686723271 2324.59611452412 2323.97057676932 2323.31024853436 2322.61512423524 2321.88519816255 2321.12046450740 2320.32091738860 2319.48655088120 2318.61735904630 2317.71333596214 2316.77447575650 2315.80077264041 2314.79222094314 2313.74881514846 2312.67054993226 2311.55742020142 2310.40942113397 2309.22654822058 2308.00879730732 2306.75616463973 2305.46864690816 2304.14624129447 2302.78894551992 2301.39675789448 2299.96967736733 2298.50770357872 2297.01083691310 2295.47907855356 2293.91243053754 2292.31089581387 2290.67447830110 2289.00318294708 2287.29701578990 2285.55598402010 2283.78009604416 2281.96936154930 2280.12379156959 2278.24339855334 2276.32819643176 2274.37820068898 2272.39342843332 2270.37389846985 2268.31963137428 2266.23064956812 2264.10697739515 2261.94864119916 2259.75566940305 2257.52809258912 2255.26594358078 2252.96925752545 2250.63807197882 2248.27242699038 2245.87236519026 2243.43793187734 2240.96917510869 2238.46614579030 2235.92889776906 2233.35748792610 2230.75197627140 2228.11242603968 2222.73147949227 2301.25136870000 2301.21650934922 2301.14679027601 2301.04221073836 2300.90276962648 2300.72846546672 2300.51929642673 2300.27526032193 2299.99635462334 2299.68257646656 2299.33392266224 2298.95038970767 2298.53197379977 2298.07867084935 2297.59047649664 2297.06738612816 2296.50939489486 2295.91649773153 2295.28868937761 2294.62596439912 2293.92831721208 2293.19574210708 2292.42823327521 2291.62578483530 2290.78839086240 2289.91604541760 2289.00874257914 2288.06647647481 2287.08924131564 2286.07703143088 2285.02984130432 2283.94766561185 2282.83049926033 2281.67833742781 2280.49117560496 2279.26900963784 2278.01183577199 2276.71965069777 2275.39245159703 2274.03023619104 2272.63300278976 2271.20075034237 2269.73347848913 2268.23118761449 2266.69387890152 2265.12155438768 2263.51421702180 2261.87187072241 2260.19452043738 2258.48217220480 2256.73483321520 2254.95251187506 2253.13521787161 2251.28296223892 2249.39575742528 2247.47361736192 2245.51655753297 2243.52459504673 2241.49774870830 2239.43603909336 2237.33948862344 2235.20812164231 2233.04196449377 2230.84104560071 2228.60539554544 2226.33504715136 2224.03003556589 2221.69039834474 2219.31617553737 2216.90740977392 2214.46414635328 2209.47432161761 2204.34712052720 2272.55539520000 2272.52040970483 2272.45043834283 2272.34548037198 2272.20553468252 2272.03059980078 2271.82067389441 2271.57575477884 2271.29583992508 2270.98092646874 2270.63101122046 2270.24609067753 2269.82616103688 2269.37121820931 2268.88125783506 2268.35627530064 2267.79626575700 2267.20122413895 2266.57114518589 2265.90602346388 2265.20585338892 2264.47062925160 2263.70034524303 2262.89499548201 2262.05457404360 2261.17907498890 2260.26849239615 2259.32282039313 2258.34205319086 2257.32618511862 2256.27521066018 2255.18912449143 2254.06792151925 2252.91159692166 2251.72014618934 2250.49356516836 2249.23185010426 2247.93499768739 2246.60300509959 2245.23587006216 2243.83359088504 2242.39616651742 2240.92359659955 2239.41588151587 2237.87302244948 2236.29502143782 2234.68188142972 2233.03360634372 2231.35020112769 2229.63167181970 2227.87802561030 2226.08927090597 2224.26541739393 2222.40647610824 2220.51245949722 2218.58338149208 2216.61925757695 2214.62010486015 2212.58594214674 2210.51679001244 2208.41267087876 2206.27360908947 2204.09963098839 2201.89076499837 2199.64704170176 2195.05515680634 2243.82113690000 2243.78602526043 2243.71580160963 2243.61046520561 2243.47001493856 2243.29444933484 2243.08376656209 2242.83796443576 2242.55704042682 2242.24099167092 2241.88981497868 2241.50350684740 2241.08206347400 2240.62548076928 2240.13375437348 2239.60687967312 2239.04485181914 2238.44766574635 2237.81531619417 2237.14779772864 2236.44510476576 2235.70723159613 2234.93417241083 2234.12592132871 2233.28247242480 2232.40381976020 2231.48995741315 2230.54087951143 2229.55658026608 2228.53705400636 2227.48229521604 2226.39229857101 2225.26705897815 2224.10657161550 2222.91083197372 2221.67983589888 2220.41357963652 2219.11205987700 2217.77527380216 2216.40321913328 2214.99589418032 2213.55329789246 2212.07542990996 2210.56229061725 2209.01388119744 2207.43020368796 2205.81126103764 2204.15705716504 2202.46759701799 2200.74288663460 2198.98293320540 2197.18774513687 2195.35733211623 2193.49170517757 2191.59087676916 2187.68367282093 2183.63585078518 2214.94025207645 2214.83453723923 2214.69358239460 2214.51738606890 2214.30594642978 2214.05926129267 2213.77732812857 2213.46014407310 2213.10770593690 2212.72001021726 2212.29705311111 2211.83883052924 2211.34533811190 2210.81657124560 2210.25252508129 2209.65319455377 2209.01857440246 2208.34865919340 2207.64344334260 2206.90292114065 2206.12708677865 2205.31593437542 2204.46945800600 2203.58765173150 2202.67050963016 2201.71802582975 2200.73019454131 2199.70701009410 2198.64846697190 2197.55455985060 2196.42528363707 2195.26063350935 2194.06060495810 2192.82519382940 2191.55439636879 2190.24820926661 2188.90662970472 2187.52965540440 2186.11728467560 2184.66951646751 2183.18635042037 2180.11382714540 2176.89972784557 2173.54408010830 2184.95407503031 2184.28205609512 2183.46850194822 2182.51338105032 2181.41665754343 2180.17829181074 2178.79824111944 2177.27646034646 2175.61290278720 2173.80752104716 2171.86026801653 2169.77109792776 2167.53996749598 2349.59847852619 2357.41664836745 2365.19911001577 2372.94558416124 2380.65579116698 2388.32945108257 2395.96628365874 2403.56600836331 2411.12834439854 2418.65301071964 2426.13972605469 2433.58820892576 2320.76373707094 2328.51193258473 2336.22429117776 2343.90053281753 2351.54037715625 2359.14354354686 2366.70975106028 2374.23871850396 2381.73016444183 2389.18380721540 2396.59936496623 2403.97655565974 2411.31509711020 2418.61470700714 2425.87510294294 2433.09600244184 2299.57067699124 2314.81843708546 2322.38766646838 2329.92008704522 2337.41541720708 2344.87337510109 2352.29367865281 2359.67604559004 2367.02019346788 2374.32583969503 2381.59270156149 2388.82049626750 2396.00894095373 2403.15775273288 2410.26664872247 2417.33534607900 2424.36356203333 2431.35101392743 2438.29741925242 2344.63343303162 2358.98409025158 2373.17680317005 2387.20930565887 2401.07933294935 2414.78462286959 2428.32291716464]))
    γᵢ = γ / 2 # convert to HWHM linewidth
    return ω,χR,χI,χRC,χIC,γᵢ,a,ωᵣ
end

function benchmark_susc()
    
    
    # Float 32
    ω,χR,χI,χRC,χIC,γᵢ,a,ωᵣ=provideData(Float32)
    @time χ=loopLorentz!(χR, χI, χRC, χIC, a, ωᵣ, ω, γᵢ)

    # ComplexF32
    ω,χR,χI,χRC,χIC,γᵢ,a,ωᵣ=provideData(ComplexF32)
    @time χ=loopLorentz!(χR, χI, χRC, χIC, a, ωᵣ, ω, γᵢ)

    nothing
end



benchmark_susc()
1 Like

Update:
I further separated the calculation by calculating the imaginary and real part in separate functions and separate loops, see updated code below. This is again an improvement of ~5 times compared to the better version above, as it takes only ~50 ms per spectrum now.

Which rises yet another question: why is this? :smiley: More specifically, why is calling two functions with a one-liner each faster than one function with two lines doing the exact same thing? I actually didn’t even bother to adjust the function signature.

using BenchmarkTools
using Plots

function imaginaryLorentz!(χIC::Vector{Float32}, χRC::Vector{Float32}, aᵢ::Float32, ωᵣᵢ::Float32, ω::Vector{Float32}, γᵢᵢ::Float32) 
    χIC .= @. aᵢ * γᵢᵢ / (γᵢᵢ^2 + (ωᵣᵢ - ω)^2)
    nothing
end
function realLorentz!(χIC::Vector{Float32}, χRC::Vector{Float32}, aᵢ::Float32, ωᵣᵢ::Float32, ω::Vector{Float32}, γᵢᵢ::Float32) 
    χRC .= @. aᵢ * (ωᵣᵢ - ω) / (γᵢᵢ^2 + (ωᵣᵢ - ω)^2)
    nothing
end

function complexLorentz!(χIC::Vector{ComplexF32}, χRC::Vector{ComplexF32}, aᵢ::ComplexF32, ωᵣᵢ::ComplexF32, ω::Vector{ComplexF32}, γᵢᵢ::ComplexF32) 
    χIC .= @. aᵢ / (ωᵣᵢ - ω - im*γᵢᵢ)
    nothing
end

function loopLorentz!(χR::Vector{Float32}, χI::Vector{Float32}, χRC::Vector{Float32}, χIC::Vector{Float32}, a::Vector{Float32}, ωᵣ::Vector{Float32}, ω::Vector{Float32}, γᵢ::Vector{Float32})
    for i = eachindex(γᵢ)
        imaginaryLorentz!(χIC, χRC, a[i], ωᵣ[i], ω, γᵢ[i])
        χI .= χI .+ χIC
    end
    for i = eachindex(γᵢ)
        realLorentz!(χIC, χRC, a[i], ωᵣ[i], ω, γᵢ[i])    
        χR .= χR .+ χRC
    end
    χ = (χR + im * χI)
    return χ
end

function loopLorentz!(χR::Vector{ComplexF32}, χI::Vector{ComplexF32}, χRC::Vector{ComplexF32}, χIC::Vector{ComplexF32}, a::Vector{ComplexF32}, ωᵣ::Vector{ComplexF32}, ω::Vector{ComplexF32}, γᵢ::Vector{ComplexF32})
    for i = eachindex(γᵢ)
        complexLorentz!(χIC, χRC, a[i], ωᵣ[i], ω, γᵢ[i])
        χI .= χI .+ χIC
    end
    return χI
end

function provideData(ctype)
    # wavenumber range
    ωmin = 2150
    ωmax = 2440
    ωres = 9e-4
    
    # collect as a vector of correct type
    ω = convert(Vector{ctype}, collect(ωmin:ωres:ωmax))

    # preallocate χ with same length as ω
    χR = Vector{ctype}(undef, length(ω))
    χI = Vector{ctype}(undef, length(ω))
    
    # preallocate the current calculated transition
    χRC = Vector{ctype}(undef, length(ω))
    χIC = Vector{ctype}(undef, length(ω))

    # these are the full width half max linewidth, amplitude and resonance frequency
    γ = convert(Vector{ctype}, vec([0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0241101316652921 0.0239696405278575 0.0238230532252038 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0280272457763299 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0252847154374324 0.0321460375993402 0.0303318154218017 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0241101316652921 0.0239696405278575 0.0238230532252038 0.0236690618786425 0.0235066473650465 0.0233350542536934 0.0231537657945814 0.0229624793443047 0.0227610825619214 0.0225496305745788 0.0223283242999883 0.0220974899959438 0.0218575601365424 0.0216090556014342 0.0213525692342441 0.0210887507007100 0.0208182926894235 0.0205419183425625 0.0202603699680496 0.0199743988779997 0.0196847564326183 0.0193921860824570 0.0190974165380251 0.0188011557876509 0.0185040861733736 0.0182068601393609 0.0179100969908608 0.0176143801151208 0.0173202552061921 0.0170282286884957 0.0167387672084915 0.0164522969767714 0.0161692043626936 0.0158898358474769 0.0156144996181651 0.0153434657792636 0.0150769689431232 0.0148152082583232 0.0145583511637073 0.0143065326154134 0.0140598604671369 0.0138184129339888 0.0135822465906845 0.0133513904444129 0.0131258585191133 0.0129056376176759 0.0126907083092061 0.0124810209379286 0.0122765324078116 0.0120771600155698 0.0118828484834103 0.0116934812350107 0.0115090062904270 0.0113292644023197 0.0111542313279882 0.0109836806425077 0.0108176598286612 0.0104983624374980 0.0321460375993402 0.0303318154218017 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0241101316652921 0.0239696405278575 0.0238230532252038 0.0236690618786425 0.0235066473650465 0.0233350542536934 0.0231537657945814 0.0229624793443047 0.0227610825619214 0.0225496305745788 0.0223283242999883 0.0220974899959438 0.0218575601365424 0.0216090556014342 0.0213525692342441 0.0210887507007100 0.0208182926894235 0.0205419183425625 0.0202603699680496 0.0199743988779997 0.0196847564326183 0.0193921860824570 0.0190974165380251 0.0188011557876509 0.0185040861733736 0.0182068601393609 0.0179100969908608 0.0176143801151208 0.0173202552061921 0.0170282286884957 0.0167387672084915 0.0164522969767714 0.0161692043626936 0.0158898358474769 0.0156144996181651 0.0153434657792636 0.0150769689431232 0.0148152082583232 0.0145583511637073 0.0143065326154134 0.0140598604671369 0.0138184129339888 0.0135822465906845 0.0133513904444129 0.0131258585191133 0.0129056376176759 0.0126907083092061 0.0124810209379286 0.0122765324078116 0.0120771600155698 0.0118828484834103 0.0115090062904270 0.0111542313279882 0.0321460375993402 0.0303318154218017 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0241101316652921 0.0239696405278575 0.0238230532252038 0.0236690618786425 0.0235066473650465 0.0233350542536934 0.0231537657945814 0.0229624793443047 0.0227610825619214 0.0225496305745788 0.0223283242999883 0.0220974899959438 0.0218575601365424 0.0216090556014342 0.0213525692342441 0.0210887507007100 0.0208182926894235 0.0205419183425625 0.0202603699680496 0.0199743988779997 0.0196847564326183 0.0193921860824570 0.0190974165380251 0.0188011557876509 0.0185040861733736 0.0182068601393609 0.0179100969908608 0.0176143801151208 0.0173202552061921 0.0170282286884957 0.0167387672084915 0.0164522969767714 0.0161692043626936 0.0158898358474769 0.0156144996181651 0.0153434657792636 0.0150769689431232 0.0148152082583232 0.0145583511637073 0.0143065326154134 0.0140598604671369 0.0138184129339888 0.0135822465906845 0.0133513904444129 0.0131258585191133 0.0126907083092061 0.0321460375993402 0.0303318154218017 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0241101316652921 0.0239696405278575 0.0238230532252038 0.0236690618786425 0.0235066473650465 0.0233350542536934 0.0231537657945814 0.0229624793443047 0.0227610825619214 0.0225496305745788 0.0223283242999883 0.0220974899959438 0.0218575601365424 0.0216090556014342 0.0213525692342441 0.0210887507007100 0.0208182926894235 0.0205419183425625 0.0202603699680496 0.0199743988779997 0.0196847564326183 0.0193921860824570 0.0190974165380251 0.0188011557876509 0.0185040861733736 0.0182068601393609 0.0179100969908608 0.0176143801151208 0.0173202552061921 0.0170282286884957 0.0167387672084915 0.0164522969767714 0.0161692043626936 0.0158898358474769 0.0156144996181651 0.0150769689431232 0.0145583511637073 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0241101316652921 0.0239696405278575 0.0238230532252038 0.0236690618786425 0.0235066473650465 0.0233350542536934 0.0231537657945814 0.0229624793443047 0.0227610825619214 0.0225496305745788 0.0223283242999883 0.0220974899959438 0.0218575601365424 0.0216090556014342 0.0213525692342441 0.0210887507007100 0.0208182926894235 0.0205419183425625 0.0202603699680496 0.0199743988779997 0.0196847564326183 0.0193921860824570 0.0190974165380251 0.0185040861733736 0.0179100969908608 0.0173202552061921 0.0262820762709812 0.0257174839690073 0.0252847154374324 0.0249405597454478 0.0246487386338929 0.0243796471294858 0.0241101316652921 0.0238230532252038 0.0235066473650465 0.0231537657945814 0.0227610825619214 0.0223283242999883 0.0218575601365424 0.0303318154218017 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0303318154218017 0.0293909677287820 0.0286505508781021 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0293909677287820 0.0280272457763299 0.0274922625128492 0.0270299666337758 0.0266294692945487 0.0262820762709812 0.0259803109090874 0.0257174839690073 0.0254874908728106 0.0252847154374324 0.0251039863936694 0.0249405597454478 0.0247901123526237 0.0246487386338929 0.0245129460254733 0.0243796471294858 0.0242461478523295 0.0241101316652921 0.0239696405278575 0.0252847154374324 0.0249405597454478 0.0246487386338929 0.0243796471294858 0.0241101316652921 0.0238230532252038 0.0235066473650465]))
    a = convert(Vector{ctype}, vec([2.76674832091363e-21 2.36681408668054e-21 6.53275898449572e-21 4.11584029326841e-21 9.84110304653029e-21 5.67916815126275e-21 1.27767479850669e-20 7.04439497476061e-21 1.52872248611681e-20 8.18285512093513e-21 1.73191058586550e-20 9.07182020022599e-21 1.88369946582998e-20 9.69916405772032e-21 1.98282631692069e-20 1.00644177956632e-20 2.03034172664904e-20 1.01783080307700e-20 2.02941571252810e-20 1.00613748249970e-20 1.98499429664072e-20 1.03455157312988e-21 8.84170488841555e-22 2.43917749274521e-21 1.53630604362417e-21 3.67278578249855e-21 2.11938892315959e-21 4.76816838751841e-21 2.62908917485547e-21 5.70613124909710e-21 3.05483644764460e-21 6.46688707602503e-21 3.38817924571018e-21 7.03720451926530e-21 3.62454297403169e-21 7.41220201867026e-21 3.76363649404603e-21 7.59551245365868e-21 3.80929561134559e-21 6.97235085385984e-22 1.04936942907519e-21 6.05501253676967e-22 1.36225794410182e-21 7.51176395955906e-22 1.63052954594206e-21 8.73059718024617e-22 1.84857576080926e-21 9.68746039027827e-22 2.01261148539039e-21 1.03691495754788e-21 4.79452641904450e-22 1.93849262282522e-20 3.03954119276334e-20 9.94095944635917e-20 6.88352328305803e-20 1.74837908643640e-19 1.05278565741194e-19 2.44529100394880e-19 1.38243952836379e-19 3.06198520232869e-19 1.66728923501617e-19 3.58096229205161e-19 1.89989354311201e-19 3.99005801465568e-19 2.07556855856749e-19 4.28273962625286e-19 2.19246229367551e-19 4.45808010755341e-19 2.25145185136480e-19 4.52037832019126e-19 2.25587234803657e-19 4.47847076283963e-19 2.21110942787309e-19 4.34481147857702e-19 2.12409818996800e-19 4.13441166346567e-19 2.00277580533250e-19 3.86373383584460e-19 1.85553416092098e-19 3.54962880522279e-19 1.69071344684856e-19 3.20838920923179e-19 1.51616888028985e-19 2.85497366120781e-19 1.33893211258101e-19 2.50243339620942e-19 1.16497769812071e-19 2.16155141862194e-19 9.99094570876052e-20 1.84068479805211e-19 8.44853751964487e-20 1.54578553687788e-19 7.04657118072259e-20 1.28056520260424e-19 5.79848240376673e-20 1.04676341988047e-19 4.70864965714389e-20 8.44479884068806e-20 3.77414202939185e-20 6.72532893114140e-20 2.98651778139078e-20 5.28813365551579e-20 2.33353644163730e-20 4.10610740795797e-20 1.80068601364127e-20 3.14894969325502e-20 1.37246517955384e-20 2.38546130188087e-20 1.03339461328294e-20 1.78529458606852e-20 7.68759325152743e-21 1.32018392928172e-20 5.65104300659859e-21 9.64715621402872e-21 4.10518594383206e-21 6.96715240435166e-21 2.94749305688385e-21 4.97337465217130e-21 2.09187777392036e-21 3.50940821604817e-21 1.46767395056720e-21 2.44821157662607e-21 1.01806963557412e-21 1.68865376147597e-21 6.98270537464213e-22 1.15173377924550e-21 4.73599307243112e-22 7.76829641594465e-22 5.18207056835032e-22 7.21905982791785e-21 1.13197099759045e-20 3.70234703651530e-20 2.56384604246416e-20 6.51268199376577e-20 3.92209511281055e-20 9.11115391418990e-20 5.15186445099533e-20 1.14132046547726e-19 6.21602106704810e-20 1.33539294141205e-19 7.08691421442952e-20 1.48880042635506e-19 7.74701699210476e-20 1.59907942493643e-19 8.18921405619780e-20 1.66582921609136e-19 8.41644283474434e-20 1.69057638418321e-19 8.44071875875874e-20 1.67652619282503e-19 8.28165860761199e-20 1.62822834665510e-19 7.96465650344742e-20 1.55119025761610e-19 7.51888393883877e-20 1.45147227671257e-19 6.97528255047926e-20 1.33529709771784e-19 6.36469939622082e-20 1.20870042177558e-19 5.71628343520475e-20 1.07724291413941e-19 5.05622366686062e-20 9.45795498484384e-20 4.40686899586604e-20 8.18402093458708e-20 3.78623190392222e-20 6.98216821684750e-20 3.20784599293956e-20 5.87507104748237e-20 2.68092372124306e-20 4.87710225231625e-20 2.21074615337467e-20 3.99528955746415e-20 1.79921103821965e-20 3.23051572413464e-20 1.44546781558577e-20 2.57882674471508e-20 1.14657640151417e-20 2.03273321513190e-20 8.98138716994584e-21 1.58241651844708e-20 6.94865839538650e-21 1.21677969436763e-20 5.31057582586566e-21 9.24309646172808e-21 4.00983876466087e-21 6.93740392051379e-21 2.99167650054626e-21 5.14525105277740e-21 2.20576572182271e-21 3.77137278575846e-21 1.60735977251077e-21 2.73228753792676e-21 1.15777833389094e-21 1.95675312981648e-21 8.24411580082986e-22 1.38540022734482e-21 5.80383687346392e-22 9.69816225642525e-22 6.71309430573320e-22 4.59534633034371e-22 2.05804787763813e-21 3.22716334058416e-21 1.05556359108615e-20 7.31024049555216e-21 1.85713235075614e-20 1.11854942876921e-20 2.59881482154678e-20 1.46974533173340e-20 3.25665530519823e-20 1.77408332065914e-20 3.81222721177645e-20 2.02369874578806e-20 4.25260434796781e-20 2.21357190106326e-20 4.57067786957055e-20 2.34161279092207e-20 4.76514849927195e-20 2.40856558660279e-20 4.84015712843709e-20 2.41774051037220e-20 4.80459768419389e-20 2.37460440973005e-20 4.67118808919928e-20 2.28627269035858e-20 4.45539097139137e-20 2.16095014524794e-20 4.17427991451321e-20 2.00736768212979e-20 3.84544119392200e-20 1.83425688861749e-20 3.48598710732430e-20 1.64989591846748e-20 3.11173769738459e-20 1.46174967497107e-20 2.73660564442128e-20 1.27621609413060e-20 2.37219706653436e-20 1.09847975004629e-20 2.02762118598757e-20 9.32465015785786e-21 1.70948597116996e-20 7.80874270747637e-21 1.42204588837546e-20 6.45292441014597e-21 1.16746203002038e-20 5.26337447521645e-21 9.46133747655740e-21 4.23836605403768e-21 7.57063682350397e-21 3.37011176285745e-21 5.98223665163159e-21 2.64654550516955e-21 4.66896194212567e-21 2.05293358581948e-21 3.59973998620638e-21 1.57324680088772e-21 2.74207678147889e-21 1.19126047214399e-21 2.06397914339032e-21 8.91378703259530e-22 1.53533901490905e-21 6.59201177948561e-22 1.12883301888243e-21 4.81865409089656e-22 8.20412422148745e-22 5.89468257238218e-22 5.32331929727700e-22 8.34754639637943e-22 2.73051143335387e-21 1.89114059599092e-21 4.80483430397400e-21 2.89430970347301e-21 6.72558813969226e-21 3.80428452037381e-21 8.43120242068540e-21 4.59398329960728e-21 9.87421796057769e-21 5.24311260892453e-21 1.10211915466444e-20 5.73863117046031e-21 1.18535179813762e-20 6.07497592330801e-21 1.23674343254136e-20 6.25383198323960e-21 1.25731071880259e-20 6.28346409244939e-21 1.24929108247570e-20 6.17768721041799e-21 1.21590853100849e-20 5.95458307754264e-21 1.16110048299948e-20 5.63508232479668e-21 1.08922975619394e-20 5.24153087773194e-21 1.00480449372980e-20 4.79634716412107e-21 9.12225419806589e-21 4.32085576978322e-21 8.15575025740541e-21 3.83435701699051e-21 7.18457783455074e-21 3.35346391586718e-21 6.23894936874184e-21 2.89171122179888e-21 5.34272408476701e-21 2.45941845126723e-21 4.51336304004316e-21 2.06377135605520e-21 3.76227650724702e-21 1.70907531830997e-21 3.09546434054438e-21 1.39712935286237e-21 2.51434620198006e-21 1.12767015632430e-21 2.01668470537580e-21 8.98840733400673e-22 1.59751800323458e-21 7.07646143732970e-22 1.25003620743344e-21 5.50368421949383e-22 9.66355570274970e-22 7.38163277765898e-22 5.57222347304988e-22 6.75907536467190e-22 4.68165894164593e-22 1.18959169710972e-21 7.16669550604885e-22 1.66559555809908e-21 9.42298498730732e-22 2.08877730468673e-21 1.13838730178189e-21 2.44744104858555e-21 1.29992563758622e-21 2.73330793786935e-21 1.42367188835537e-21 2.94172005476807e-21 1.50821070770956e-21 3.07164733664742e-21 1.55390022585078e-21 3.12547207508911e-21 1.56271266278191e-21 3.10857627563517e-21 1.53798679058960e-21 3.02877705564988e-21 1.48411782595347e-21 2.89566530169573e-21 1.40621351972408e-21 2.71990580464007e-21 1.30974513845942e-21 2.51255405220490e-21 1.20021920881709e-21 2.28443693021583e-21 1.08289097502781e-21 2.04563322547450e-21 9.62534287865238e-22 1.80507666856160e-21 8.43275922720905e-22 1.57029090810556e-21 7.28495873077689e-22 1.34725365780590e-21 6.20789597106751e-22 1.14037734237097e-21 5.21983926007087e-22 9.52586508040621e-22 7.85468254729351e-22 6.39470817022797e-22 5.14126717319828e-22 5.07092409096337e-22 5.94449285680060e-22 6.64266658160163e-22 7.15402222629576e-22 7.47582433944502e-22 7.61352719054928e-22 7.57980643219352e-22 7.39322588985471e-22 7.07666930277610e-22 6.65567449195783e-22 6.15680086829353e-22 5.60614304509138e-22 5.02807695507262e-22 1.45760594571328e-21 4.84902927491088e-21 3.30450141697845e-21 8.26034807186170e-21 4.90693741795820e-21 1.12675854233457e-20 6.30789640757925e-21 1.38520413783390e-20 7.48514106161674e-21 1.59654214273236e-20 8.41679938435757e-21 1.75723263730044e-20 5.36810834818746e-22 1.79584908611688e-21 1.22626325170439e-21 3.06861800139697e-21 1.82418078549204e-21 4.19110965790634e-21 2.34739811201913e-21 5.15704561793617e-21 2.78780905100862e-21 5.94861768526215e-21 3.13729871846773e-21 6.55261277775953e-21 3.39112207985656e-21 6.96359930992422e-21 3.54865024370536e-21 7.18446008865135e-21 5.09038124471350e-22 8.72577344402127e-22 5.19100607968831e-22 1.19332832249266e-21 6.68692410880647e-22 1.46970180266519e-21 7.94822305208254e-22 1.69667312014571e-21 8.95186155201375e-22 1.87046782740858e-21 9.68416381177352e-22 1.98948859212561e-21 1.01430092823449e-21 2.05447752818317e-21 1.03377487249494e-21 2.06837043287504e-21 1.02879814042334e-21 2.03597448858440e-21 1.00216384140647e-21 4.84428664807833e-22 5.15699171070675e-22 5.33039025925385e-22 5.37181661811124e-22 5.29341914997040e-22 5.11107620403453e-22 4.84316914810449e-22]))
    ωᵣ = convert(Vector{ctype}, vec([2317.97429134631 2309.98175797216 2301.95518481219 2293.89484862162 2285.80102576824 2277.67399223070 2269.51402359806 2261.32139507066 2253.09638146230 2244.83925720364 2236.55029634704 2228.22977257250 2219.87795919512 2211.49512917364 2203.08155512043 2194.63750931271 2186.16326370509 2177.65908994335 2169.12525937964 2160.56204308879 2151.96971188613 2289.41817463711 2281.49498301665 2273.53762731776 2265.54638502732 2257.52153323568 2249.46334863240 2241.37210750320 2233.24808572822 2225.09155878161 2216.90280173240 2208.68208924662 2200.42969559079 2192.14589463667 2183.83095986731 2175.48516438436 2167.10878091675 2158.70208183062 2150.26533914053 2245.08529603411 2229.20777149147 2221.21868811125 2213.19642677427 2205.14126404050 2197.05347604445 2188.93333849347 2180.78112666730 2172.59711541897 2164.38157917692 2156.13479194846 2152.84478060910 2329.91168540000 2329.87695219362 2329.80748540920 2329.70328430473 2329.56434777044 2329.39067433266 2329.18226215904 2328.93910906502 2328.66121252159 2328.34856966438 2328.00117730402 2327.61903193781 2327.20212976266 2326.75046668939 2326.26403835822 2325.74284015568 2325.18686723271 2324.59611452412 2323.97057676932 2323.31024853436 2322.61512423524 2321.88519816255 2321.12046450740 2320.32091738860 2319.48655088120 2318.61735904630 2317.71333596214 2316.77447575650 2315.80077264041 2314.79222094314 2313.74881514846 2312.67054993226 2311.55742020142 2310.40942113397 2309.22654822058 2308.00879730732 2306.75616463973 2305.46864690816 2304.14624129447 2302.78894551992 2301.39675789448 2299.96967736733 2298.50770357872 2297.01083691310 2295.47907855356 2293.91243053754 2292.31089581387 2290.67447830110 2289.00318294708 2287.29701578990 2285.55598402010 2283.78009604416 2281.96936154930 2280.12379156959 2278.24339855334 2276.32819643176 2274.37820068898 2272.39342843332 2270.37389846985 2268.31963137428 2266.23064956812 2264.10697739515 2261.94864119916 2259.75566940305 2257.52809258912 2255.26594358078 2252.96925752545 2250.63807197882 2248.27242699038 2245.87236519026 2243.43793187734 2240.96917510869 2238.46614579030 2235.92889776906 2233.35748792610 2230.75197627140 2228.11242603968 2222.73147949227 2301.25136870000 2301.21650934922 2301.14679027601 2301.04221073836 2300.90276962648 2300.72846546672 2300.51929642673 2300.27526032193 2299.99635462334 2299.68257646656 2299.33392266224 2298.95038970767 2298.53197379977 2298.07867084935 2297.59047649664 2297.06738612816 2296.50939489486 2295.91649773153 2295.28868937761 2294.62596439912 2293.92831721208 2293.19574210708 2292.42823327521 2291.62578483530 2290.78839086240 2289.91604541760 2289.00874257914 2288.06647647481 2287.08924131564 2286.07703143088 2285.02984130432 2283.94766561185 2282.83049926033 2281.67833742781 2280.49117560496 2279.26900963784 2278.01183577199 2276.71965069777 2275.39245159703 2274.03023619104 2272.63300278976 2271.20075034237 2269.73347848913 2268.23118761449 2266.69387890152 2265.12155438768 2263.51421702180 2261.87187072241 2260.19452043738 2258.48217220480 2256.73483321520 2254.95251187506 2253.13521787161 2251.28296223892 2249.39575742528 2247.47361736192 2245.51655753297 2243.52459504673 2241.49774870830 2239.43603909336 2237.33948862344 2235.20812164231 2233.04196449377 2230.84104560071 2228.60539554544 2226.33504715136 2224.03003556589 2221.69039834474 2219.31617553737 2216.90740977392 2214.46414635328 2209.47432161761 2204.34712052720 2272.55539520000 2272.52040970483 2272.45043834283 2272.34548037198 2272.20553468252 2272.03059980078 2271.82067389441 2271.57575477884 2271.29583992508 2270.98092646874 2270.63101122046 2270.24609067753 2269.82616103688 2269.37121820931 2268.88125783506 2268.35627530064 2267.79626575700 2267.20122413895 2266.57114518589 2265.90602346388 2265.20585338892 2264.47062925160 2263.70034524303 2262.89499548201 2262.05457404360 2261.17907498890 2260.26849239615 2259.32282039313 2258.34205319086 2257.32618511862 2256.27521066018 2255.18912449143 2254.06792151925 2252.91159692166 2251.72014618934 2250.49356516836 2249.23185010426 2247.93499768739 2246.60300509959 2245.23587006216 2243.83359088504 2242.39616651742 2240.92359659955 2239.41588151587 2237.87302244948 2236.29502143782 2234.68188142972 2233.03360634372 2231.35020112769 2229.63167181970 2227.87802561030 2226.08927090597 2224.26541739393 2222.40647610824 2220.51245949722 2218.58338149208 2216.61925757695 2214.62010486015 2212.58594214674 2210.51679001244 2208.41267087876 2206.27360908947 2204.09963098839 2201.89076499837 2199.64704170176 2195.05515680634 2243.82113690000 2243.78602526043 2243.71580160963 2243.61046520561 2243.47001493856 2243.29444933484 2243.08376656209 2242.83796443576 2242.55704042682 2242.24099167092 2241.88981497868 2241.50350684740 2241.08206347400 2240.62548076928 2240.13375437348 2239.60687967312 2239.04485181914 2238.44766574635 2237.81531619417 2237.14779772864 2236.44510476576 2235.70723159613 2234.93417241083 2234.12592132871 2233.28247242480 2232.40381976020 2231.48995741315 2230.54087951143 2229.55658026608 2228.53705400636 2227.48229521604 2226.39229857101 2225.26705897815 2224.10657161550 2222.91083197372 2221.67983589888 2220.41357963652 2219.11205987700 2217.77527380216 2216.40321913328 2214.99589418032 2213.55329789246 2212.07542990996 2210.56229061725 2209.01388119744 2207.43020368796 2205.81126103764 2204.15705716504 2202.46759701799 2200.74288663460 2198.98293320540 2197.18774513687 2195.35733211623 2193.49170517757 2191.59087676916 2187.68367282093 2183.63585078518 2214.94025207645 2214.83453723923 2214.69358239460 2214.51738606890 2214.30594642978 2214.05926129267 2213.77732812857 2213.46014407310 2213.10770593690 2212.72001021726 2212.29705311111 2211.83883052924 2211.34533811190 2210.81657124560 2210.25252508129 2209.65319455377 2209.01857440246 2208.34865919340 2207.64344334260 2206.90292114065 2206.12708677865 2205.31593437542 2204.46945800600 2203.58765173150 2202.67050963016 2201.71802582975 2200.73019454131 2199.70701009410 2198.64846697190 2197.55455985060 2196.42528363707 2195.26063350935 2194.06060495810 2192.82519382940 2191.55439636879 2190.24820926661 2188.90662970472 2187.52965540440 2186.11728467560 2184.66951646751 2183.18635042037 2180.11382714540 2176.89972784557 2173.54408010830 2184.95407503031 2184.28205609512 2183.46850194822 2182.51338105032 2181.41665754343 2180.17829181074 2178.79824111944 2177.27646034646 2175.61290278720 2173.80752104716 2171.86026801653 2169.77109792776 2167.53996749598 2349.59847852619 2357.41664836745 2365.19911001577 2372.94558416124 2380.65579116698 2388.32945108257 2395.96628365874 2403.56600836331 2411.12834439854 2418.65301071964 2426.13972605469 2433.58820892576 2320.76373707094 2328.51193258473 2336.22429117776 2343.90053281753 2351.54037715625 2359.14354354686 2366.70975106028 2374.23871850396 2381.73016444183 2389.18380721540 2396.59936496623 2403.97655565974 2411.31509711020 2418.61470700714 2425.87510294294 2433.09600244184 2299.57067699124 2314.81843708546 2322.38766646838 2329.92008704522 2337.41541720708 2344.87337510109 2352.29367865281 2359.67604559004 2367.02019346788 2374.32583969503 2381.59270156149 2388.82049626750 2396.00894095373 2403.15775273288 2410.26664872247 2417.33534607900 2424.36356203333 2431.35101392743 2438.29741925242 2344.63343303162 2358.98409025158 2373.17680317005 2387.20930565887 2401.07933294935 2414.78462286959 2428.32291716464]))
    γᵢ = γ / 2 # convert to HWHM linewidth
    return ω,χR,χI,χRC,χIC,γᵢ,a,ωᵣ
end

function benchmark_susc()
    
    
    # Float 32
    ω,χR,χI,χRC,χIC,γᵢ,a,ωᵣ=provideData(Float32)
    @time χ=loopLorentz!(χR, χI, χRC, χIC, a, ωᵣ, ω, γᵢ)

    # ComplexF32
    ω,χR,χI,χRC,χIC,γᵢ,a,ωᵣ=provideData(ComplexF32)
    @time χ=loopLorentz!(χR, χI, χRC, χIC, a, ωᵣ, ω, γᵢ)

    nothing
end



benchmark_susc()


I can reproduce : on my machine the output is :
0.065606 seconds (4 allocations: 4.917 MiB) #Float32
0.800648 seconds #Complex{Float32}

It seems that the complex inv function consumes the vast majority of the time

2 Likes

Hi @LaurentPlagne
thanks for the very quick reply and reproduction of the issue.
And already, I learned something new, I wasn’t aware of the PProf package which gives me a more comprehensible output than the options I looked at before.
I’ll have a look at it some more and try to optimize myself. However, any suggestions are highly appreciated!
Cheers
Max

Actually I have discovered PProf.jl very recently (from this forum) and I love it (you can obtain this annotated source as well as a call graph, and flame graph… via a simple @pprof myfunction()).

I guess that since most of the time is spent in a Julia native function you did not made obvious programming mistakes…

1 Like

The complex division imolementation
https://github.com/JuliaLang/julia/blob/aae68a5e030a04f3d4bfc31f4b30c002ddbc52ca/base/complex.jl#L348
performs a lot of checks for Inf etc. and also chooses branch based on the conditioning of the problem, it might be slower but more accurate than manually performing it with real arithmetic

1 Like

Actually, the implementation for floats is just below
https://github.com/JuliaLang/julia/blob/aae68a5e030a04f3d4bfc31f4b30c002ddbc52ca/base/complex.jl#L381

2 Likes

I have just realize that the pprof output shows that the time is spent in a Float64 specialization of the inv function (and not the generic method shown by @baggepinnen). An indication of a spurious conversion ?

There’s a fast version of inv for complex numbers
https://github.com/JuliaLang/julia/blob/aae68a5e030a04f3d4bfc31f4b30c002ddbc52ca/base/fastmath.jl#L227
so using fastmath is likely to speed things up (but do watch the result fidelity of using it)

2 Likes

Do you figure out why the time is spent in a Float64 inv method while the data is done with Float32 ?

Wow, thank you all for the fast replies. Awesome forum! I’ll have a closer look at the source information provided (takes some time, as the syntax is new to me). But a peek: @baggepinnen is right, adding @fastmath to the line calculating the complex Lorentzian reduces the effort from 1.2 s to 0.35 s. Not yet what I achieve with separate calculation, but significantly faster. Maybe an implicit conversion to Float64 as mentioned by @LaurentPlagne is another indication for overhead. However, wouldn’t I see that as allocations when using @time?

1 Like

If the calculation is type unstable, meaning the compiler couldn’t figure out the exact types of variables, maybe because the types change during computation, you will incur allocations. @code_wantype is useful when looking for type instabilities

Ok, tried to check for type instability:

function loopLorentz!(χR::Vector{ComplexF32}, χI::Vector{ComplexF32}, χRC::Vector{ComplexF32}, χIC::Vector{ComplexF32}, a::Vector{ComplexF32}, ωᵣ::Vector{ComplexF32}, ω::Vector{ComplexF32}, γᵢ::Vector{ComplexF32})
    for i = 1#eachindex(γᵢ)
        @code_warntype complexLorentz!(χIC, χRC, a[i], ωᵣ[i], ω, γᵢ[i])
        χI .= χI .+ χIC
    end
    return χI
end

function complexLorentz!(χIC::Vector{ComplexF32}, χRC::Vector{ComplexF32}, aᵢ::ComplexF32, ωᵣᵢ::ComplexF32, ω::Vector{ComplexF32}, γᵢᵢ::ComplexF32) 
    @fastmath χIC .= @. aᵢ / (ωᵣᵢ - ω - im*γᵢᵢ)
    nothing
end

Output:

MethodInstance for complexLorentz!(::Vector{ComplexF32}, ::Vector{ComplexF32}, ::ComplexF32, ::ComplexF32, ::Vector{ComplexF32}, ::ComplexF32)
  from complexLorentz!(χIC::Vector{ComplexF32}, χRC::Vector{ComplexF32}, aᵢ::ComplexF32, ωᵣᵢ::ComplexF32, ω::Vector{ComplexF32}, γᵢᵢ::ComplexF32) in Main at c:\Users\greifenstein\Documents\armstrong\Documents\Code\Julia Spielwiese\comparison_complex_float.jl:13
Arguments
  #self#::Core.Const(complexLorentz!)
  χIC::Vector{ComplexF32}
  χRC::Vector{ComplexF32}
  aᵢ::ComplexF32
  ωᵣᵢ::ComplexF32
  ω::Vector{ComplexF32}
  γᵢᵢ::ComplexF32
Body::Nothing
1 ─ %1  = Base.FastMath::Core.Const(Base.FastMath)
│   %2  = Base.getproperty(%1, :div_fast)::Core.Const(Base.FastMath.div_fast)
│   %3  = Base.FastMath::Core.Const(Base.FastMath)
│   %4  = Base.getproperty(%3, :sub_fast)::Core.Const(Base.FastMath.sub_fast)
│   %5  = Base.FastMath::Core.Const(Base.FastMath)
│   %6  = Base.getproperty(%5, :sub_fast)::Core.Const(Base.FastMath.sub_fast)
│   %7  = Base.broadcasted(%6, ωᵣᵢ, ω)::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(Base.FastMath.sub_fast), Tuple{ComplexF32, Vector{ComplexF32}}}
│   %8  = Base.FastMath::Core.Const(Base.FastMath)
│   %9  = Base.getproperty(%8, :mul_fast)::Core.Const(Base.FastMath.mul_fast)
│   %10 = Base.broadcasted(%9, Main.im, γᵢᵢ)::Core.PartialStruct(Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(Base.FastMath.mul_fast), Tuple{Complex{Bool}, ComplexF32}}, Any[Core.Const(Base.FastMath.mul_fast), Core.PartialStruct(Tuple{Complex{Bool}, ComplexF32}, Any[Core.Const(im), ComplexF32]), Core.Const(nothing)])
│   %11 = Base.broadcasted(%4, %7, %10)::Core.PartialStruct(Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(Base.FastMath.sub_fast), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(Base.FastMath.sub_fast), Tuple{ComplexF32, Vector{ComplexF32}}}, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(Base.FastMath.mul_fast), Tuple{Complex{Bool}, ComplexF32}}}}, Any[Core.Const(Base.FastMath.sub_fast), Core.PartialStruct(Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(Base.FastMath.sub_fast), Tuple{ComplexF32, Vector{ComplexF32}}}, 
Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(Base.FastMath.mul_fast), Tuple{Complex{Bool}, ComplexF32}}}, Any[Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(Base.FastMath.sub_fast), Tuple{ComplexF32, Vector{ComplexF32}}}, Core.PartialStruct(Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(Base.FastMath.mul_fast), Tuple{Complex{Bool}, ComplexF32}}, Any[Core.Const(Base.FastMath.mul_fast), Core.PartialStruct(Tuple{Complex{Bool}, ComplexF32}, Any[Core.Const(im), ComplexF32]), Core.Const(nothing)])]), Core.Const(nothing)])
│   %12 = Base.broadcasted(%2, aᵢ, %11)::Core.PartialStruct(Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(Base.FastMath.div_fast), Tuple{ComplexF32, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(Base.FastMath.sub_fast), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(Base.FastMath.sub_fast), Tuple{ComplexF32, Vector{ComplexF32}}}, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(Base.FastMath.mul_fast), Tuple{Complex{Bool}, ComplexF32}}}}}}, Any[Core.Const(Base.FastMath.div_fast), Core.PartialStruct(Tuple{ComplexF32, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(Base.FastMath.sub_fast), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, 
Nothing, typeof(Base.FastMath.sub_fast), Tuple{ComplexF32, Vector{ComplexF32}}}, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(Base.FastMath.mul_fast), Tuple{Complex{Bool}, ComplexF32}}}}}, Any[ComplexF32, Core.PartialStruct(Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(Base.FastMath.sub_fast), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(Base.FastMath.sub_fast), Tuple{ComplexF32, Vector{ComplexF32}}}, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(Base.FastMath.mul_fast), Tuple{Complex{Bool}, ComplexF32}}}}, Any[Core.Const(Base.FastMath.sub_fast), Core.PartialStruct(Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(Base.FastMath.sub_fast), Tuple{ComplexF32, Vector{ComplexF32}}}, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(Base.FastMath.mul_fast), Tuple{Complex{Bool}, ComplexF32}}}, Any[Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(Base.FastMath.sub_fast), Tuple{ComplexF32, Vector{ComplexF32}}}, Core.PartialStruct(Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(Base.FastMath.mul_fast), Tuple{Complex{Bool}, ComplexF32}}, Any[Core.Const(Base.FastMath.mul_fast), Core.PartialStruct(Tuple{Complex{Bool}, ComplexF32}, Any[Core.Const(im), ComplexF32]), Core.Const(nothing)])]), Core.Const(nothing)])]), Core.Const(nothing)])
│         Base.materialize!(χIC, %12)
└──       return Main.nothing

I don’t see a mention of ComplexF64 or Float64, however, @time instead of @code_warntype does show 18, sometimes 19 allocations :thinking:

  0.002631 seconds (18 allocations: 944 bytes)

  0.002507 seconds (19 allocations: 1.219 KiB)

Edit: allocations happen regardless of using fastmath or not
Edit2: Oh…wait… when using @time in a loop, does this by itself allocate memory? Because when I put the @time in benchmark_susc() like this, it does not show any allocations:

function benchmark_susc()
    
    
    # Float 32
    ω,χR,χI,χRC,χIC,γᵢ,a,ωᵣ=provideData(Float32)
    #@time χ=loopLorentz!(χR, χI, χRC, χIC, a, ωᵣ, ω, γᵢ)

    # ComplexF32
    ω,χR,χI,χRC,χIC,γᵢ,a,ωᵣ=provideData(ComplexF32)
    @time χ=loopLorentz!(χR, χI, χRC, χIC, a, ωᵣ, ω, γᵢ)

    nothing
end

@time prints and that allocates IIRC

Probably irrelevant but I see a lot of NaN32 when initializing the Complex{Float32} part:

julia> ω,χR,χI,χRC,χIC,γᵢ,a,ωᵣ=provideData(Float32)
(Float32[2150.0, 2150.001, 2150.0017, 2150.0027, 2150.0037, 2150.0044, 2150.0054, 2150.0063, 2150.007, 2150.008  …  2439.9917, 2439.9927, 2439.9934, 2439.9944, 2439.9954, 2439.996, 2439.997, 2439.998, 2439.9988, 2439.9998], Float32[-1.1871652f23, 5.076202, 1.0814404f17, 5.0762024, -9.829634f10, 5.0762024, 89128.96, 5.076203, -0.0806, 5.076203  …  0.0, 2213.8955, 0.0, 2213.8965, 0.0, 2213.8972, 0.0, 2213.8982, 0.0, 2213.8992], Float32[2216.3552, 0.0, 2216.3562, 0.0, 2216.357, 0.0, 2216.358, 0.0, 2216.359, 0.0  …  0.0, 2361.3516, 0.0, 2361.3523, 0.0, 2361.3533, 0.0, 2361.3542, 0.0, 2361.355], Float32[2363.8113, 0.0, 2363.812, 0.0, 2363.813, 0.0, 2363.814, 0.0, 2363.8147, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Float32[0.014695484, 0.014325275, 0.014013623, 0.013746131, 0.0135149835, 0.013314734, 0.013141038, 0.012990155, 0.012858742, 0.012743745  …  0.012123074, 0.0120550655, 0.0119848205, 0.0126423575, 0.01247028, 0.0123243695, 0.012189823, 0.0120550655, 0.011911526, 0.0117533235], Float32[2.7667483f-21, 2.3668141f-21, 6.532759f-21, 4.1158403f-21, 9.841103f-21, 5.679168f-21, 1.2776748f-20, 7.044395f-21, 1.5287225f-20, 8.182855f-21  …  1.0287981f-21, 2.0359744f-21, 1.0021638f-21, 4.8442865f-22, 5.1569915f-22, 5.3303904f-22, 5.3718166f-22, 5.293419f-22, 5.111076f-22, 4.843169f-22], Float32[2317.9744, 2309.9817, 2301.955, 2293.8948, 2285.801, 2277.674, 2269.514, 2261.3213, 2253.0964, 2244.8394  …  2424.3635, 2431.351, 2438.2974, 2344.6335, 2358.9841, 2373.1768, 2387.2092, 2401.0793, 2414.7847, 2428.323])

julia> ω,χR,χI,χRC,χIC,γᵢ,a,ωᵣ=provideData(ComplexF32)
(ComplexF32[2150.0f0 + 0.0f0im, 2150.001f0 + 0.0f0im, 2150.0017f0 + 0.0f0im, 2150.0027f0 + 0.0f0im, 2150.0037f0 + 0.0f0im, 2150.0044f0 + 0.0f0im, 2150.0054f0 + 0.0f0im, 2150.0063f0 + 0.0f0im, 2150.007f0 + 0.0f0im, 2150.008f0 + 0.0f0im  …  2439.9917f0 + 0.0f0im, 2439.9927f0 + 0.0f0im, 2439.9934f0 + 0.0f0im, 2439.9944f0 + 0.0f0im, 2439.9954f0 + 0.0f0im, 2439.996f0 + 0.0f0im, 2439.997f0 + 0.0f0im, 2439.998f0 + 0.0f0im, 2439.9988f0 + 0.0f0im, 2439.9998f0 + 0.0f0im], ComplexF32[0.0f0 + NaN32*im, 1.4587136f-37 + 3.4438f-41im, 1.1f-44 + 0.0f0im, 1.4594831f-37 + 3.4438f-41im, 1.1f-44 + 0.0f0im, 0.0f0 + NaN32*im, 1.4576266f-37 + 3.4438f-41im, 1.1f-44 + 0.0f0im, 1.4578634f-37 + 3.4438f-41im, 1.1f-44 + 0.0f0im  …  0.0f0 + 6.0955567f-22im, 0.0f0 + 6.105026f-22im, 0.0f0 + 6.112172f-22im, 0.0f0 + 6.1217555f-22im, 0.0f0 + 6.1314076f-22im, 0.0f0 + 6.138689f-22im, 0.0f0 + 6.1484536f-22im, 0.0f0 + 6.1582865f-22im, -0.0f0 + NaN32*im, 0.0f0 + 6.17565f-22im], ComplexF32[-0.0f0 + NaN32*im, -0.0f0 + NaN32*im, -0.0f0 + NaN32*im, -0.0f0 + NaN32*im, -0.0f0 + NaN32*im, -0.0f0 + NaN32*im, 0.0f0 + 1.3562106f-21im, 0.0f0 + 1.3507185f-21im, -0.0f0 + NaN32*im, -0.0f0 + NaN32*im  …  NaN32 + 6.0955567f-22im, NaN32 + 6.105026f-22im, NaN32 + 6.112172f-22im, NaN32 + 6.1217555f-22im, NaN32 + 6.1314076f-22im, NaN32 + 6.138689f-22im, NaN32 + 6.1484536f-22im, -1.2332106f-19 + 6.1582865f-22im, -1.2327342f-19 + NaN32*im, -1.232376f-19 + 6.17565f-22im], ComplexF32[NaN32 + NaN32*im, NaN32 + NaN32*im, NaN32 + NaN32*im, NaN32 + NaN32*im, NaN32 + NaN32*im, NaN32 + NaN32*im, -1.9681535f-19 + 1.3562106f-21im, -1.967118f-19 + 1.3507185f-21im, -1.9660859f-19 + NaN32*im, -1.9653147f-19 + NaN32*im  …  0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im], ComplexF32[0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im  …  -1.2362867f-19 + 6.0955567f-22im, -1.2358155f-19 + 6.105026f-22im, -1.2354626f-19 + 6.112172f-22im, -1.2349893f-19 + 6.1217555f-22im, -1.2345167f-19 + 6.1314076f-22im, -1.2341607f-19 + 6.138689f-22im, -1.2336857f-19 + 6.1484536f-22im, -1.2332106f-19 + 6.1582865f-22im, -1.2327341f-19 + 6.168184f-22im, -1.232376f-19 + 6.17565f-22im], ComplexF32[0.014695484f0 + 0.0f0im, 0.014325275f0 + 0.0f0im, 0.014013623f0 + 0.0f0im, 0.013746131f0 + 0.0f0im, 0.0135149835f0 + 0.0f0im, 0.013314734f0 + 0.0f0im, 0.013141038f0 + 0.0f0im, 0.012990155f0 + 0.0f0im, 0.012858742f0 + 0.0f0im, 0.012743745f0 + 0.0f0im  …  0.012123074f0 + 0.0f0im, 0.0120550655f0 + 0.0f0im, 0.0119848205f0 + 0.0f0im, 0.0126423575f0 + 0.0f0im, 0.01247028f0 + 0.0f0im, 0.0123243695f0 + 0.0f0im, 0.012189823f0 + 0.0f0im, 0.0120550655f0 + 0.0f0im, 0.011911526f0 + 0.0f0im, 0.0117533235f0 + 0.0f0im], ComplexF32[2.7667483f-21 + 0.0f0im, 2.3668141f-21 + 0.0f0im, 6.532759f-21 + 0.0f0im, 4.1158403f-21 + 0.0f0im, 9.841103f-21 + 0.0f0im, 5.679168f-21 + 0.0f0im, 1.2776748f-20 + 0.0f0im, 7.044395f-21 + 0.0f0im, 1.5287225f-20 + 0.0f0im, 8.182855f-21 + 0.0f0im  …  1.0287981f-21 + 0.0f0im, 2.0359744f-21 + 0.0f0im, 1.0021638f-21 + 0.0f0im, 4.8442865f-22 + 0.0f0im, 5.1569915f-22 + 0.0f0im, 5.3303904f-22 + 0.0f0im, 5.3718166f-22 + 0.0f0im, 5.293419f-22 + 0.0f0im, 5.111076f-22 + 0.0f0im, 4.843169f-22 + 0.0f0im], ComplexF32[2317.9744f0 + 0.0f0im, 2309.9817f0 + 0.0f0im, 2301.955f0 + 0.0f0im, 2293.8948f0 + 0.0f0im, 2285.801f0 + 0.0f0im, 2277.674f0 + 0.0f0im, 2269.514f0 + 0.0f0im, 2261.3213f0 + 0.0f0im, 2253.0964f0 + 0.0f0im, 2244.8394f0 + 0.0f0im  …  2424.3635f0 + 0.0f0im, 2431.351f0 + 0.0f0im, 2438.2974f0 + 0.0f0im, 2344.6335f0 + 0.0f0im, 2358.9841f0 + 0.0f0im, 2373.1768f0 + 0.0f0im, 2387.2092f0 + 0.0f0im, 2401.0793f0 + 0.0f0im, 2414.7847f0 + 0.0f0im, 2428.323f0 + 0.0f0im])

@baggepinnen
Ah, thanks for the clarification, than I did it wrong (nesting @time than obviously has to show allocation)

@LaurentPlagne
Strange, I’ll check that. I didn’t notice any NaNs.

@LaurentPlagne I just checked with the following code and I don’t see any NaNs. Different version of Julia? I’m on 1.7.1
Edit: Same result after update to 1.7.2

ω,χR,χI,χRC,χIC,γᵢ,a,ωᵣ=provideData(ComplexF32)
#benchmark_susc()
println(any(isnan.(ω)))
println(any(isnan.(χR)))
println(any(isnan.(χI)))
println(any(isnan.(χRC)))
println(any(isnan.(χIC)))
println(any(isnan.(γᵢ)))
println(any(isnan.(a)))
println(any(isnan.(ωᵣ)))

Vector{ctype}(undef

You initialize undefined arrays so you get whatever happens to be in memory. If everything is correct, you overwrite that memory before using it and the nans are not problematic.

1 Like

Yes, my NaN disapeared on a second try confirming @baggepinnen explanation.
I still wonder about the Float32->Float64 conversion. The broadcast expression seems to trigger this.
I should try a loop.

Thanks so far, I’ll try a little more maybe this evening - real life calls, have to pick up the kids now :wink:

Edit, Last update:

function complexLorentz!(χIC::Vector{ComplexF32}, χRC::Vector{ComplexF32}, aᵢ::ComplexF32, ωᵣᵢ::ComplexF32, ω::Vector{ComplexF32}, γᵢᵢ::ComplexF32) 
     @fastmath χIC .= @. aᵢ / (ωᵣᵢ - ω - im*γᵢᵢ)
    nothing
end

function complexLorentz!(χIC::Vector{ComplexF32}, χRC::Vector{ComplexF32}, aᵢ::ComplexF32, ωᵣᵢ::ComplexF32, ω::Vector{ComplexF32}, γᵢᵢ::ComplexF32) 
    @inbounds for j = eachindex(ω)
         @fastmath χIC[j] = aᵢ / (ωᵣᵢ - ω[j] - im*γᵢᵢ)
    end
    nothing
end

Both implementations yield the same efficiency (runtime of ~300 ms)
Without @inbounds, it runs a little slower (~400 ms)

1 Like