Equivalent Fortran function is about 4.6 times faster

Hey there,

for my current project, I wrote a function in plane Julia. It reads

#Kappa squared or the square of the inverse debye length
Debyelengthinnversesquared(is) = e^2*beta*2*is*n_a/(epsilon*epsilon_r)

#Kappa in SI
Kappa(is) = sqrt(Debyelengthinnversesquared(1000*is))

#Kappa in Angstroms
KappaA(is) = sqrt(Debyelengthinnversesquared(1000*is))*10^(-10)

#Surfacepotential
    function surface_potential(kappa::Float64, sigma::Float64)::Float64
        log_arg = c*sigma/(epsilon*epsilon_r* kappa) +sqrt((c *sigma/(epsilon*epsilon_r*kappa))^2+1)
        return 1/c*log(log_arg)
    end

#Defining gamma functions
    function gamma(kappa, sigma)
        return (0.25*e*surface_potential(kappa, sigma)*beta) |> tanh
    end


gamma_daly(is) = gamma(Kappa(is), -0.02)

function Guoy_chapman_potential_AU(is::Float64, z::Float64)::Float64
        arg1 = 1 + gamma_daly(is)*exp(-KappaA(is)*z)
        arg2 = 1 - gamma_daly(is)*exp(-KappaA(is)*z)
        result = 2*log(arg1/arg2)
        return result
    end

Later since I head a code base in Fortran (I will not repeat the code here). I then used the great Fortran capabilities via Ccall to call the function from Fortran (Just to check that it gives the right values.)

Just for fun I also did a speed test, and I found for the Fortran version

@time values = [Guoy_chapman_fortran(i*1.0, 0.005, -0.02) for i=0:10^6];

the output was

0.101856 seconds (23.92 k allocations: 9.236 MiB, 15.77% compilation time)

However, when I performed the same task in plane Julia, I received for

@time value_GC = [Guoy_chapman_potential_AU(0.005::Float64, i*1.0::Float64)::Float64 for i=0:10^6]

an output of

1.053988 seconds (83.02 M allocations: 1.246 GiB, 2.63% gc time, 0.81% compilation time)

So that is slower by a factor of 10. Does anyone know what I am doing wrong?

Thanks in advance!

This is using a lot of global variables.
What are their definitions?
If they aren’t const, performance will be bad.

Also, gamma_daly is calling Kappa instead of KappaA.

6 Likes

Thanks! That was definitely imporved the performance, the global variables were not const.

They are given by

#Fluoresent intensity
    ##physical_quantities in Si
    #Boltzman constant
    const k::Float64=1.38*10^(-23)
    #Temperature
    const T::Float64=298.15
    #beta
    const beta::Float64=1/(k*T)
    #electron charge
    const e::Float64 = 1.60217663*10^(-19)
    #vacuum permitivity
    const epsilon::Float64 = 8.8541878128*10^(-12)
    #relative permitivity of water
    const epsilon_r::Float64 = 78.4
    #avogadros number
    const n_a::Float64 = 6.0221408e23

(I fixed the const thing now.) Now the performance is

0.462826 seconds (30.07 M allocations: 470.070 MiB, 4.71% gc time, 6.96% compilation time)

still a factor of 4.6, but already a whole lot better.

kappa is defined as

Kappa(is) = sqrt(Debyelengthinnversesquared(1000*is))

Thank you very much !

Any more ideas?

Try benchmarking with BechmarkTools.@btime instead of using @time:

julia> @btime value_GC = [Guoy_chapman_potential_AU(0.005, i*1.0) for i=0:10^6];
  77.754 ms (2 allocations: 7.63 MiB)
1 Like

I think the constant c is still missing.

Yes, I set it arbitrarily to const c = 1 :slight_smile:

2 Likes

I took const c = 3e8 to pretend I’m a physicist :wink:

8 Likes

(These type declarations, in contrast, don’t help performance at all.)

You also don’t need to declare a type explicitly here — Julia knows the type from the value.

Ah the constant c was the issue

it is given by

#surface potential
const c::Float64=e/(2*k*T)

I had overlooked it when I set everything to const

No i get for

@time value_GC = [Guoy_chapman_potential_AU(0.005::Float64, i*1.0::Float64)::Float64 for i=0:10^6]

the output:

0.081523 seconds (61.68 k allocations: 11.559 MiB, 28.63% compilation time)

Nice! Now it actually outperforms the Fortran version

Thanks a lot!!

9 Likes

note that this includes compilation time as well. Part of the reason for using BenchmarkTools is to avoid things like this, but also to get a more reliable measurement by running the expression multiple times in case your operating system is busy with other things just when you ran your timing test.

6 Likes

Unrelated but if you write lots of code like this, the use of unicode characters like greek letters is a nice feature of Julia :slight_smile:

4 Likes

Note also that BenchmarkTools also automatically calls cheap functions in a loop as many times as needed to get good timing resolution, so that you can directly do:

@btime Guoy_chapman_fortran(1.0, 0.005, -0.02)
@btime Guoy_chapman_potential_AU(1.0, 0.005, -0.02)

and get accurate timing measurements (and not be measuring the overhead from allocating an array of results).

2 Likes

Thank you very much everyone!

This was very illluminating!

7 Likes

There is actually a standard syntax for floating point scientific notation, namely 1.38e-23 instead of 1.38*10^(-23).

At first I thought that there could be a performance difference, since the former is directly parsed as a Float64 while the latter is a Float64 multiplied by a exponentiated Int, but the compiler is smart enough to calculate the value for you.

But there is a difference in accuracy, since your version necessarily must be equal to the product of two floats (of very different magnitudes):

julia> 1.38e-23
1.38e-23

julia> 1.38*10^(-23)
1.3799999999999998e-23  # <- slight loss of precision

So the standard notation is more accurate, as well as more compact and idiomatic.

5 Likes

@jabru thank you for the nice puzzle :wink:, here is a Unitful version.

Note we don’t need type annotations here

using BenchmarkTools, Unitful

# Boltzman constant
const k = Unitful.k
# electron charge
const e = Unitful.q
# vacuum permitivity
const epsilon = Unitful.ε0
# avogadros number
const n_a = Unitful.Na

# relative permitivity of water
const epsilon_r = 78.4

# surface charge density
const sigma = - 0.02u"C/m^2"

const T = 25.0u"°C" |> u"K"

const beta = 1/(k*T)
const c = e/(2*k*T) |> u"V^-1"

# concentration
is = 0.005u"mol/L"

#Kappa squared or the square of the inverse debye length
debyelength_inverse_sq(is) = e^2*beta*2*is*n_a/(epsilon*epsilon_r)

kappa(is) = sqrt(debyelength_inverse_sq(is)) |> upreferred

#Surfacepotential
function surface_potential(kappa, sigma)
    log_arg = c*sigma/(epsilon*epsilon_r* kappa) +sqrt((c *sigma/(epsilon*epsilon_r*kappa))^2+1)
    return (1/c)*log(log_arg) # @jabru - are these parens (1/c) what you meant?
end

#Defining gamma functions
function gamma(kappa, sigma)
    return (0.25*e*surface_potential(kappa, sigma)*beta) |> tanh
end

gamma_daly(is) = gamma(kappa(is), sigma)

function guoy_chapman_potential_AU(is, z)
    arg1 = 1 + gamma_daly(is)*exp(-kappa(is)*z)
    arg2 = 1 - gamma_daly(is)*exp(-kappa(is)*z)
    result = 2*log(arg1/arg2)
    return result
end

@btime value_GC = [guoy_chapman_potential_AU($is, i*1.0u"Å") for i=0:10^6]
;

#   72.224 ms (2 allocations: 7.63 MiB)
6 Likes

Note that, despite appearances, this value is also inexact.

Remember that the floating-point printing algorithm used in Julia (the Ryu algorithm) follows the well-established rubric of printing the shortest decimal representation that rounds to the same value. The Float64 constant 1.38e-23 is actually:

julia> BigFloat(1.38e-23, 1024)
1.380000000000000060010582465734078799297660966782642624395399644741944111814291318296454846858978271484375e-23

which differs from the exact decimal value by about a relative error of 4e-17:

julia> Float64(1.38e-23- big"1.38e-23") / 1.38e-23
4.3485929322995705e-17

though this is still better than the relative error generated by computing 1.38 * 10^-23 (which involves multiple rounding steps):

julia> Float64(1.38 * 10^-23 - big"1.38e-23") / 1.38e-23
-1.6946594582596943e-16
4 Likes
const epsilon_r = 78.4

and the value above, despite appearances, is known to 1‰ at best :grinning:

3 Likes

Small moment of appreciation for our collective vulnerability to nerdsniping. Some may find it annoying but I love it when people pile up even after a topic has been solved to explore multiple ways to make the code better or more interesting

10 Likes