Speeding up my code

Hi, I’m quite new to Julia and I’m looking for suggestions to improve the performance of my code to hopefully beat the Python version. From these 2 functions do you see any margin of improvement?
For example I’am trying to avoid allocations but I don’t think I can apply the method provided in this example Performance Tips · The Julia Language

I appreciate any help.

function _logS(MF::AbsPL, m::Vector{Float64})
    δm, ml = MF.δm, MF.ml
    s           = zeros(size(m))   # already ok for mask_upp
    mask_low    = @. m <= ml
    mask_mid    = @. (m > ml) & (m < (ml + δm))
    s[mask_mid] = @. -log1pexp( δm/(m[mask_mid]-ml) + δm/(m[mask_mid]-(ml+δm)) ) 
    s[mask_low] .= -Inf
    s
end


function _logC(MF::AbsBPL, m::Vector{Float64}, N=200)
    δm, ml      = MF.δm, MF.ml 
    xx          = sort(vcat(LinRange(ml, ml+δm+0.1*δm, N),
                            LinRange(ml+δm+0.1*δm+0.1, maximum(m), N)))  
    cdf         = cumul_integrate(xx, exp.(_logpdfm2(MF, xx)), TrapezoidalFast())
    intcdf      = LinearInterpolation(xx, cdf, extrapolation_bc = NaN)(m)
    -log.(intcdf)
end

function _logpdfm1(MF::AbsBrokenPLMF, m::Vector{Float64})
    α1, α2, ml, mu, m_break   = MF.α1, MF.α2, MF.ml, MF.mu, MF.m_break
    res                 = fill(NaN, size(m))   
    mask_compute        = @. (ml < m) & (m < mu)
    mm                  = m[mask_compute]
    res[mask_compute]   = ifelse.(mm .< m_break, 
                                  (-α1 .* log.(mm)) .+ _logS(MF, mm), 
                                  (-α2 .* log.(mm)) .+ _logS(MF, mm) .+ (α2-α1)*log(m_break))
    res
end

Step 1 here is to not try so hard to vectorize everything. For example, _logS can just be written as

function _logS(MF::AbsPL, m::Vector{Float64})
    δm, ml = MF.δm, MF.ml
    s           = zeros(size(m))   # already ok for mask_upp
    for ind in eachindex(m)
        if m[ind] <= ml
            s[ind] = -Inf
        elseif m <m < (ml + δm)
            s[ind] = -log1pexp( δm/(m[mask_mid]-ml) + δm/(m[mask_mid]-(ml+δm))
        end
    end
end
2 Likes

In python, vectorization speeds things up because it allows the use of numpy. In julia, this isn’t necessary.

Python:

import numpy as np
from time import time

x = np.random.rand(10**7)
y = np.random.rand(10**7)

def vectorized(x, y):
    return x*x + y*y + x*y

def loop(x, y):
    [x*x + y*y + x*y for (x,y) in zip(x,y)]

t0 = time()
vectorized(x,y) # 0.24799013137817383
t1 = time()
vectorized(x,y) # 0.09563016891479492
t2 = time()

print(t1-t0)
print(t2-t1)

t0 = time()
loop(x,y) # 4.1851279735565186
t1 = time()
loop(x,y) # 4.027431964874268
t2 = time()

print(t1-t0)
print(t2-t1)

Juila:

x = rand(10^7)
y = rand(10^7)

vectorized(x, y) = x.*x .+ y.*y .+ x.*y

loop(x, y) = [x*x + y*y + x*y for (x,y) in zip(x,y)]

@time vectorized(x,y)
# 0.216033 seconds (462.89 k allocations: 99.817 MiB, 2.25% gc time, 50.76% compilation time)
@time vectorized(x,y)
# 0.189564 seconds (2 allocations: 76.294 MiB)

@time loop(x,y)
# 0.139699 seconds (300.01 k allocations: 92.649 MiB, 46.57% compilation time)
@time loop(x,y)
# 0.122678 seconds (2 allocations: 76.294 MiB)
3 Likes

Curious that the Python version is also slower the first time. What’s going on there?

I can’t really confirm Python slower for the first time, or only slightly slower sometimes, not always:

[..]
>>> x = np.random.rand(10**7); y = np.random.rand(10**7)

>>> t0 = time(); vectorized(x,y) # 0.14753150939941406
>>> t1 = time(); vectorized(x,y) # 0.1444408893585205

Note, I moved the rands for x and y after the defs to minimize cache pollution, otherwise pasted the script into my Python 3.6 REPL, without running (while still defining) the loop code.

Python does compile (the script, but doesn’t store on disk, unlike for the dependencies/imports), and I assume, and my first guess was that Python has a slight TTFX/TTFP problem like Julia, just not as pronounced. Maybe the compiler is more aggressive in later Python version, and then the effect more pronounced.

Can @Lilith or anyone confirm, and post Python version used? Posting the same code, in her order I get:

>>> print(t1-t0)
0.16790056228637695
>>> print(t2-t1)
0.1561567783355713

Running, not from the REPL (while not in her order, but doesn’t the order doesn’t seem to matter much), I actually get a bit more speed (by 18% to 28%), and then in my fastest case, the first run is actually faster:

$ time python3.6 test2.py
0.11993026733398438
0.12186145782470703
7.87701678276062
7.849463939666748

real	0m16,466s
user	0m16,698s
sys	0m1,338s

I wrote scripts and executed them from the commandline with $ julia scipt.jl and $ python3 script.py. I have python --version as Python 3.9.5 and julia --version as julia version 1.7.2.

Thanks, since your first run was 2.59x slower, very much unlike mine (on loaded or not loaded) machine, it certainly is intriguing. Can you check again, to see if just a fluke? I don’t have that version, and not on Windows if that matters. It could have been a transient spike, at least on Windows… or any OS really.

x@X % python3 timing.py
0.2230987548828125
0.09320807456970215
4.175736904144287
4.076184272766113
x@X % python3 timing.py
0.14977812767028809
0.09485483169555664
4.096446990966797
4.113823890686035
x@X % python3 timing.py
0.1487128734588623
0.09010791778564453
4.22429895401001
4.2131569385528564

Where timing.py is a file with the code from Speeding up my code - #3 by Lilith

I’m on a Mac.

1 Like