Making Julia as Fast as C++

This is an old post but it seems it hasn’t been discussed yet.

An interesting observation the author found is that non-integer powers in Julia are fairly slow at that time, compared to C++. I am not sure if this is really the case. He mentioned removing those operations speeds up the Julia code by more than 10 times. This seems too much, but maybe at that time there were some problems in the non-integer powers.

Note that this post was in 2019 so kind of old, but I think Julia 1.0 had been released at that time.

A lot of things changed then (really, too many to enumerate), but the basic strategies remain relevant.

That said, it is unclear what you want to discuss about this blog post. Asking a specific question would focus the discussion.

Cf

which was closed 5 years ago.

Really interesting, indeed its a bit old and I think they wanted to keep things standard julia but with a little of packages you can beat the pythonic version by 9K x without changing much :

using ConcreteStructs,LinearAlgebra, StaticArrays, BenchmarkTools
function generate_particles(PType, n, lambda; l=1, Gamma=SA[0.0,0.0,0.0])
    sigma = l/(n-1)*lambda
    particles = fill(zero(PType), n^3)
    xs = range(0, stop=l, length=n)
    # Adds particles in a regular lattice
    ind = 1
    for k in 1:n
        for j in 1:n
            for i in 1:n
                X = SA[xs[i], xs[j], xs[k]]
                particles[ind] = PType(X, Gamma, sigma)
                ind += 1
            end
        end
    end

    return particles
end
"""
This is a particle struct made up of ambiguous (unspecified) types
"""
@concrete struct ParticleAmbiguous
    # User inputs
    X                               # Position
    Gamma                           # Vectorial circulation
    sigma                           # Smoothing radius
    # Properties
    U                               # Velocity at particle
    J                               # Jacobian at particle J[i,j]=dUi/dxj
end
ParticleAmbiguous(X, Gamma, sigma; U=zeros(3), J=zeros(3,3)) = ParticleAmbiguous(X, Gamma, sigma, U, J )
# Empty initializer
Base.zero(::Type{<:ParticleAmbiguous}) = ParticleAmbiguous(SA[0.0,0.0,0.0], SA[0.0,0.0,0.0], 0.0)
# Winckelmans regularizing kernel
g_wnk(r) = r^3 * (r^2 + 2.5) / (r^2 + 1)^2.5
dgdr_wnk(r) = 7.5 * r^2 / (r^2 + 1)^3.5
const const4 = 1/(4*pi)
"""
    Pythonic programming approach
"""
@views @fastmath function P2P_pythonic(particles, g, dgdr)
    for Pi in particles
        @inbounds for Pj in particles    
            dX = Pi.X - Pj.X
            r = norm(dX)
            if r != 0
                gsgm = g(r / Pj.sigma)
                dgsgmdr = dgdr(r / Pj.sigma)
                crss = cross(-const4 * (dX/r^3), Pj.Gamma)
                Pi.U .+= gsgm * crss
                @inbounds @simd for j in 1:3
                    Pi.J[:, j] += ( dX[j] / (Pj.sigma*r) * (dgsgmdr*crss) -
                        gsgm * (3*dX[j]/r^2) * crss -
                        gsgm * (const4/r^3) * cross(SVector{3}(i==j for i in 1:3), Pj.Gamma) )
                end
            end
        end
    end
end
particles_amb = generate_particles(ParticleAmbiguous, 6, 1.0)
@btime P2P_pythonic($particles_amb, $g_wnk, $dgdr_wnk) # 2ms