Speeding up some non-optimized Julia functions

As I mentioned in a previous comment, if you stick to the same structure of the code, you won’t benefit too much (or at all) from rewriting the code to Julia. One one side, if the function only performs simple scalar operations, like your fun, a @numba will probably accelerate the code reasonably well. If the function, on the other side, is written multiple vectorized operations, then writting the same vectorized operations in Julia won’t give you many benefits relative Python and Numpy. These are good software tools, of course.

If you were more used to Julia, you would probably be writing the code differently from start. I don’t know for sure how “optimized” the Python code is, also I don’t know how much it could benefit from a rewrite in the lines that we are suggesting here (avoiding unnecessary intermediate arrays mostly). But, from a first sight, it is very likely that your code can be tenths or hundreds of times faster following this rewrite.

Does that matter? Is it worth the learning effort? That depends. Sometimes it is the difference of being able to do something or not at all, sometimes is a huge gain in productivity. But sometimes if the code is fast “enough” for the purpose, you simply don’t need another tool at the moment.

Anyway, Julia doesn’t do magic. It allows programming in both high and low level styles, and optimizations beyond what you can do in Python. But there is a learning curve, of course.

4 Likes

Thanks, you are right. I will edit the post and remove @inbounds. I originally wrote the compute_L_R function with three Vector{<:Real} as inputs, but then I changed the input to AbstractMatrix

1 Like

Could you please provide the numba code? I’d like to offer help but without local benchmarking is painful…

1 Like

If, after optimizations, constructing the inrange neighbor list becomes a bottleneck, you can try using CellListMap.neighborlist:

julia> N = 15
15

julia> PT = rand(N^3,3);

julia> ip =collect(1:N^3);

julia> xx,yy,zz =  PT[:,1], PT[:,2], PT[:,3]; data = [xx yy zz];

julia> r = 0.3
0.3

julia> x = data';

julia> function nn(x, r)
           balltree = BallTree(x)
           st = inrange(balltree, x, r, true)
       end
nn (generic function with 1 method)

julia> @btime nn($x, $r);
  51.729 ms (16457 allocations: 22.71 MiB)

julia> @btime CellListMap.neighborlist($x, $r, parallel = false);
  10.785 ms (361 allocations: 14.94 MiB)

julia> @btime CellListMap.neighborlist($x, $r, parallel = true); # 4 cores
  5.787 ms (3560 allocations: 38.32 MiB)

The output is a list of pairs and their distances:

julia> list = CellListMap.neighborlist(x, r)
451317-element Vector{Tuple{Int64, Int64, Float64}}:
 (1, 305, 0.18333348583592726)
 (1, 530, 0.10208305984055893)
 (1, 542, 0.0564371990772332)
 (1, 655, 0.27887521324793013)
 ⋮
 (2990, 2769, 0.25077868150174176)
 (2990, 2849, 0.27190146964427236)
 (3066, 2505, 0.27644153578133657)

Thank you, I will consider that.

Now, since we are sad when we don’t understand performance differences :frowning:

Here I tested, as noted by @VaclavMacha, that only by changing spzeros by zeros the execution time is reduced by a factor of about 3.

Probably the main issue in comparison with the Python code is some use of SparseArrays which is not adequate.

Anyway, I leave here the code I tested. You can just run it with:

julia> include("file.jl")

julia> d = data()

julia> @time Bonder0.main(d...);  # original code
106.243666 seconds (865.12 k allocations: 195.326 GiB, 3.11% gc time)

julia> @time Bonder1.main(d...);  # just changing the spzeros to zeros
 30.016908 seconds (1.04 M allocations: 35.632 GiB, 1.90% gc time, 0.71% compilation time)

the code
using NearestNeighbors

function data()
    N = 15
    PT = rand(N^3,3);
    ip =collect(1:N^3);
    xx,yy,zz =  PT[:,1], PT[:,2], PT[:,3]; data = [xx yy zz];

    r = 0.3

    balltree = BallTree(data')
    st = inrange(balltree, data', r, true)  # https://github.com/KristofferC/NearestNeighbors.jl
    m = 0.5 

    return st, PT, ip, m
end

module Bonder0

using SparseArrays

function fun(data,c)
   x = data[:,1]; y = data[:,2]; z = data[:,3]; n = length(x);
   A,A1,B1,C1,A2,B2,C2 = zeros(n,n),zeros(n,n),zeros(n,n),zeros(n,n),zeros(n,n),zeros(n,n),zeros(n,n)

   for i in 1:n
         @inbounds for k in 1:n
           xx = x[k]-x[i]
           yy = y[k]-y[i]
           zz = z[k]-z[i]
           xyz = sqrt(xx^2+yy^2+zz^2)

           A[i,k] = sqrt(xyz^2+c^2)
           A1[i,k] = xx/A[i,k]
           A2[i,k] = 1/A[i,k]-xx^2/A[i,k]^3
           B1[i,k] = yy/A[i,k]
           B2[i,k] = 1/A[i,k]-yy^2/A[i,k]^3
           C1[i,k] = zz/A[i,k]
           C2[i,k] = 1/A[i,k]-zz^2/A[i,k]^3

       end
   end
   return A,A1,B1,C1,A2,B2,C2
end


function fun2(data)
    x = data[:,1]; y = data[:,2]; z = data[:,3];o = zero(x); II = ones(size(x));
    a =   [II x y z x .^2 x.*y x.*z y .^2 y.*z z .^2 x .^3 x .^2 .*y x .^2 .*z x.*y .^2 x.*y.*z x.*z .^2 y .^3 y .^2 .*z y.*z .^2 z .^3]
    a1 =  [o II o o x*2 y z o o o 3*x .^2 2*x.*y 2*x.*z y .^2 y.*z z .^2 o o o o]
    b1 =  [o o II o o x o y*2 z o o x .^2 o x.*y*2 x.*z o 3*y .^2 2*y.*z z .^2 o]
    c1 =  [o o o II o o x o y z*2 o o x .^2 o x.*y x.*z*2 o y .^2 y.*z*2 3*z .^2]
    a2 = [o o o o II*2 o o o o o x*6 2*y 2*z o o o o o o o]
    b2 = [o o o o o o o II*2 o o o o o x*2 o o y*6 2*z o o]
    c2 = [o o o o o o o o o II*2 o o o o o x*2 o o y*2 z*6]

    return a,a1,b1,c1,a2,b2,c2
end

function main(st,data::Array{Float64,2},p::Array{Int64,1},m)
    N = length(data[:,1])
    W,W11,W12,W13,W21,W22,W23 = spzeros(N,N),spzeros(N,N),spzeros(N,N),spzeros(N,N),spzeros(N,N),spzeros(N,N),spzeros(N,N)
    
    dd = zeros(20,20);
    @inbounds for i in p
       id=st[i]
       t_data = data[id,:];
       A,A1,B1,C1,A2,B2,C2 = fun(t_data,m)
       a,a1,b1,c1,a2,b2,c2  = fun2(t_data)
       R = [A a; a' dd];
       ind = filter(j -> id[j] == i,axes(id, 1)); 
       L = [[A;a'][:,ind] [A1;a1'][:,ind] [B1;b1'][:,ind] [C1;c1'][:,ind] [A2;a2'][:,ind] [B2;b2'][:,ind] [C2;c2'][:,ind]];

       soln = (R\L)[1:length(t_data[:,1]),:]; 
       W[i,id] = soln[:,1]
       W11[i,id] = soln[:,2]
       W12[i,id] = soln[:,3]
       W13[i,id] = soln[:,4]
       W21[i,id] = soln[:,5]
       W22[i,id] = soln[:,6]
       W23[i,id] = soln[:,7]
    end
    return W,W11,W12,W13,W21,W22,W23
end

end

module Bonder1


function fun(data,c)
   x = data[:,1]; y = data[:,2]; z = data[:,3]; n = length(x);
   A,A1,B1,C1,A2,B2,C2 = (zeros(n,n) for _ in 1:7)

   for i in 1:n
         @inbounds for k in 1:n
           xx = x[k]-x[i]
           yy = y[k]-y[i]
           zz = z[k]-z[i]
           xyz = sqrt(xx^2+yy^2+zz^2)

           A[i,k] = sqrt(xyz^2+c^2)
           A1[i,k] = xx/A[i,k]
           A2[i,k] = 1/A[i,k]-xx^2/A[i,k]^3
           B1[i,k] = yy/A[i,k]
           B2[i,k] = 1/A[i,k]-yy^2/A[i,k]^3
           C1[i,k] = zz/A[i,k]
           C2[i,k] = 1/A[i,k]-zz^2/A[i,k]^3

       end
   end
   return A,A1,B1,C1,A2,B2,C2
end


function fun2(data)
    x = data[:,1]; y = data[:,2]; z = data[:,3];o = zero(x); II = ones(size(x));
    a =   [II x y z x .^2 x.*y x.*z y .^2 y.*z z .^2 x .^3 x .^2 .*y x .^2 .*z x.*y .^2 x.*y.*z x.*z .^2 y .^3 y .^2 .*z y.*z .^2 z .^3]
    a1 =  [o II o o x*2 y z o o o 3*x .^2 2*x.*y 2*x.*z y .^2 y.*z z .^2 o o o o]
    b1 =  [o o II o o x o y*2 z o o x .^2 o x.*y*2 x.*z o 3*y .^2 2*y.*z z .^2 o]
    c1 =  [o o o II o o x o y z*2 o o x .^2 o x.*y x.*z*2 o y .^2 y.*z*2 3*z .^2]
    a2 = [o o o o II*2 o o o o o x*6 2*y 2*z o o o o o o o]
    b2 = [o o o o o o o II*2 o o o o o x*2 o o y*6 2*z o o]
    c2 = [o o o o o o o o o II*2 o o o o o x*2 o o y*2 z*6]

    return a,a1,b1,c1,a2,b2,c2
end

function main(st,data::Array{Float64,2},p::Array{Int64,1},m)
    N = length(data[:,1])
    W,W11,W12,W13,W21,W22,W23 = (zeros(N,N) for _ in 1:7)
    
    dd = zeros(20,20);
    for i in p
       id=st[i]
       t_data = data[id,:];
       A,A1,B1,C1,A2,B2,C2 = fun(t_data,m)
       a,a1,b1,c1,a2,b2,c2  = fun2(t_data)
       R = [A a; a' dd];
       ind = filter(j -> id[j] == i,axes(id, 1)); 
       L = [[A;a'][:,ind] [A1;a1'][:,ind] [B1;b1'][:,ind] [C1;c1'][:,ind] [A2;a2'][:,ind] [B2;b2'][:,ind] [C2;c2'][:,ind]];

       soln = (R\L)[1:length(t_data[:,1]),:]; 
       W[i,id] = soln[:,1]
       W11[i,id] = soln[:,2]
       W12[i,id] = soln[:,3]
       W13[i,id] = soln[:,4]
       W21[i,id] = soln[:,5]
       W22[i,id] = soln[:,6]
       W23[i,id] = soln[:,7]
    end
    return W,W11,W12,W13,W21,W22,W23
end

end
2 Likes

I agree. And without seeing the original python code it is a bit difficult to pin-point the problem. One possible explanation: In python the list-of-lists format might be used, which behaves “less worse” when growing sparse arrays. The scipy doc has a concise little overview when to prefer which format. scipy.sparse.lil_matrix — SciPy v1.8.1 Manual

1 Like

Thank you all for your time and help.

1 Like