Equivalent Fortran function is about 4.6 times faster

jabru · January 4, 2024, 2:08pm

Hey there,

for my current project, I wrote a function in plane Julia. It reads

#Kappa squared or the square of the inverse debye length
Debyelengthinnversesquared(is) = e^2*beta*2*is*n_a/(epsilon*epsilon_r)

#Kappa in SI
Kappa(is) = sqrt(Debyelengthinnversesquared(1000*is))

#Kappa in Angstroms
KappaA(is) = sqrt(Debyelengthinnversesquared(1000*is))*10^(-10)

#Surfacepotential
    function surface_potential(kappa::Float64, sigma::Float64)::Float64
        log_arg = c*sigma/(epsilon*epsilon_r* kappa) +sqrt((c *sigma/(epsilon*epsilon_r*kappa))^2+1)
        return 1/c*log(log_arg)
    end

#Defining gamma functions
    function gamma(kappa, sigma)
        return (0.25*e*surface_potential(kappa, sigma)*beta) |> tanh
    end


gamma_daly(is) = gamma(Kappa(is), -0.02)

function Guoy_chapman_potential_AU(is::Float64, z::Float64)::Float64
        arg1 = 1 + gamma_daly(is)*exp(-KappaA(is)*z)
        arg2 = 1 - gamma_daly(is)*exp(-KappaA(is)*z)
        result = 2*log(arg1/arg2)
        return result
    end

Later since I head a code base in Fortran (I will not repeat the code here). I then used the great Fortran capabilities via Ccall to call the function from Fortran (Just to check that it gives the right values.)

Just for fun I also did a speed test, and I found for the Fortran version

@time values = [Guoy_chapman_fortran(i*1.0, 0.005, -0.02) for i=0:10^6];

the output was

0.101856 seconds (23.92 k allocations: 9.236 MiB, 15.77% compilation time)

However, when I performed the same task in plane Julia, I received for

@time value_GC = [Guoy_chapman_potential_AU(0.005::Float64, i*1.0::Float64)::Float64 for i=0:10^6]

an output of

1.053988 seconds (83.02 M allocations: 1.246 GiB, 2.63% gc time, 0.81% compilation time)

So that is slower by a factor of 10. Does anyone know what I am doing wrong?

Thanks in advance!

Elrod · January 4, 2024, 2:14pm

This is using a lot of global variables.
What are their definitions?
If they aren’t const, performance will be bad.

Also, gamma_daly is calling Kappa instead of KappaA.

jabru · January 4, 2024, 2:20pm

Thanks! That was definitely imporved the performance, the global variables were not const.

They are given by

#Fluoresent intensity
    ##physical_quantities in Si
    #Boltzman constant
    const k::Float64=1.38*10^(-23)
    #Temperature
    const T::Float64=298.15
    #beta
    const beta::Float64=1/(k*T)
    #electron charge
    const e::Float64 = 1.60217663*10^(-19)
    #vacuum permitivity
    const epsilon::Float64 = 8.8541878128*10^(-12)
    #relative permitivity of water
    const epsilon_r::Float64 = 78.4
    #avogadros number
    const n_a::Float64 = 6.0221408e23

(I fixed the const thing now.) Now the performance is

0.462826 seconds (30.07 M allocations: 470.070 MiB, 4.71% gc time, 6.96% compilation time)

still a factor of 4.6, but already a whole lot better.

kappa is defined as

Kappa(is) = sqrt(Debyelengthinnversesquared(1000*is))

Thank you very much !

Any more ideas?

baggepinnen · January 4, 2024, 2:24pm

Try benchmarking with BechmarkTools.@btime instead of using @time:

julia> @btime value_GC = [Guoy_chapman_potential_AU(0.005, i*1.0) for i=0:10^6];
  77.754 ms (2 allocations: 7.63 MiB)

gdalle · January 4, 2024, 2:30pm

I think the constant c is still missing.

baggepinnen · January 4, 2024, 2:30pm

Yes, I set it arbitrarily to const c = 1

gdalle · January 4, 2024, 2:31pm

I took const c = 3e8 to pretend I’m a physicist

stevengj · January 4, 2024, 2:33pm

(These type declarations, in contrast, don’t help performance at all.)

You also don’t need to declare a type explicitly here — Julia knows the type from the value.

jabru · January 4, 2024, 2:34pm

Ah the constant c was the issue

it is given by

#surface potential
const c::Float64=e/(2*k*T)

I had overlooked it when I set everything to const

No i get for

@time value_GC = [Guoy_chapman_potential_AU(0.005::Float64, i*1.0::Float64)::Float64 for i=0:10^6]

the output:

0.081523 seconds (61.68 k allocations: 11.559 MiB, 28.63% compilation time)

Nice! Now it actually outperforms the Fortran version

Thanks a lot!!

baggepinnen · January 4, 2024, 2:36pm

note that this includes compilation time as well. Part of the reason for using BenchmarkTools is to avoid things like this, but also to get a more reliable measurement by running the expression multiple times in case your operating system is busy with other things just when you ran your timing test.

gdalle · January 4, 2024, 2:37pm

Unrelated but if you write lots of code like this, the use of unicode characters like greek letters is a nice feature of Julia

stevengj · January 4, 2024, 2:38pm

Note also that BenchmarkTools also automatically calls cheap functions in a loop as many times as needed to get good timing resolution, so that you can directly do:

@btime Guoy_chapman_fortran(1.0, 0.005, -0.02)
@btime Guoy_chapman_potential_AU(1.0, 0.005, -0.02)

and get accurate timing measurements (and not be measuring the overhead from allocating an array of results).

jabru · January 4, 2024, 2:40pm

Thank you very much everyone!

This was very illluminating!

DNF · January 4, 2024, 7:17pm

There is actually a standard syntax for floating point scientific notation, namely 1.38e-23 instead of 1.38*10^(-23).

At first I thought that there could be a performance difference, since the former is directly parsed as a Float64 while the latter is a Float64 multiplied by a exponentiated Int, but the compiler is smart enough to calculate the value for you.

But there is a difference in accuracy, since your version necessarily must be equal to the product of two floats (of very different magnitudes):

julia> 1.38e-23
1.38e-23

julia> 1.38*10^(-23)
1.3799999999999998e-23  # <- slight loss of precision

So the standard notation is more accurate, as well as more compact and idiomatic.

Eben60 · January 4, 2024, 9:16pm

@jabru thank you for the nice puzzle , here is a Unitful version.

Note we don’t need type annotations here

using BenchmarkTools, Unitful

# Boltzman constant
const k = Unitful.k
# electron charge
const e = Unitful.q
# vacuum permitivity
const epsilon = Unitful.ε0
# avogadros number
const n_a = Unitful.Na

# relative permitivity of water
const epsilon_r = 78.4

# surface charge density
const sigma = - 0.02u"C/m^2"

const T = 25.0u"°C" |> u"K"

const beta = 1/(k*T)
const c = e/(2*k*T) |> u"V^-1"

# concentration
is = 0.005u"mol/L"

#Kappa squared or the square of the inverse debye length
debyelength_inverse_sq(is) = e^2*beta*2*is*n_a/(epsilon*epsilon_r)

kappa(is) = sqrt(debyelength_inverse_sq(is)) |> upreferred

#Surfacepotential
function surface_potential(kappa, sigma)
    log_arg = c*sigma/(epsilon*epsilon_r* kappa) +sqrt((c *sigma/(epsilon*epsilon_r*kappa))^2+1)
    return (1/c)*log(log_arg) # @jabru - are these parens (1/c) what you meant?
end

#Defining gamma functions
function gamma(kappa, sigma)
    return (0.25*e*surface_potential(kappa, sigma)*beta) |> tanh
end

gamma_daly(is) = gamma(kappa(is), sigma)

function guoy_chapman_potential_AU(is, z)
    arg1 = 1 + gamma_daly(is)*exp(-kappa(is)*z)
    arg2 = 1 - gamma_daly(is)*exp(-kappa(is)*z)
    result = 2*log(arg1/arg2)
    return result
end

@btime value_GC = [guoy_chapman_potential_AU($is, i*1.0u"Å") for i=0:10^6]
;

#   72.224 ms (2 allocations: 7.63 MiB)

stevengj · January 4, 2024, 10:03pm

Note that, despite appearances, this value is also inexact.

Remember that the floating-point printing algorithm used in Julia (the Ryu algorithm) follows the well-established rubric of printing the shortest decimal representation that rounds to the same value. The Float64 constant 1.38e-23 is actually:

julia> BigFloat(1.38e-23, 1024)
1.380000000000000060010582465734078799297660966782642624395399644741944111814291318296454846858978271484375e-23

which differs from the exact decimal value by about a relative error of 4e-17:

julia> Float64(1.38e-23- big"1.38e-23") / 1.38e-23
4.3485929322995705e-17

though this is still better than the relative error generated by computing 1.38 * 10^-23 (which involves multiple rounding steps):

julia> Float64(1.38 * 10^-23 - big"1.38e-23") / 1.38e-23
-1.6946594582596943e-16

Eben60 · January 4, 2024, 10:42pm

const epsilon_r = 78.4

and the value above, despite appearances, is known to 1‰ at best

gdalle · January 5, 2024, 8:02am

Small moment of appreciation for our collective vulnerability to nerdsniping. Some may find it annoying but I love it when people pile up even after a topic has been solved to explore multiple ways to make the code better or more interesting

Topic		Replies	Views
For Monte Carlo simulation with same code same algorithm, how fast is Julia compared with Fortran? General Usage fortran , performance	59	8988	September 1, 2021
Fortran calling Julia. Julia 10x slower than Fortran Performance question , fortran	15	1985	May 27, 2021
Julia vs Fortran complaint General Usage fortran	25	14620	July 20, 2017
I just decided to migrate from Python+Fortran to Julia as Julia was faster in my test Community fortran , performance , python , tullio , loopvectorization	37	7116	June 25, 2021
Julia speed Performance question	28	2237	March 14, 2024

Equivalent Fortran function is about 4.6 times faster

Related topics