Speeding up some non-optimized Julia functions

lmiq · July 18, 2022, 10:57pm

As I mentioned in a previous comment, if you stick to the same structure of the code, you won’t benefit too much (or at all) from rewriting the code to Julia. One one side, if the function only performs simple scalar operations, like your fun, a @numba will probably accelerate the code reasonably well. If the function, on the other side, is written multiple vectorized operations, then writting the same vectorized operations in Julia won’t give you many benefits relative Python and Numpy. These are good software tools, of course.

If you were more used to Julia, you would probably be writing the code differently from start. I don’t know for sure how “optimized” the Python code is, also I don’t know how much it could benefit from a rewrite in the lines that we are suggesting here (avoiding unnecessary intermediate arrays mostly). But, from a first sight, it is very likely that your code can be tenths or hundreds of times faster following this rewrite.

Does that matter? Is it worth the learning effort? That depends. Sometimes it is the difference of being able to do something or not at all, sometimes is a huge gain in productivity. But sometimes if the code is fast “enough” for the purpose, you simply don’t need another tool at the moment.

Anyway, Julia doesn’t do magic. It allows programming in both high and low level styles, and optimizations beyond what you can do in Python. But there is a learning curve, of course.

VaclavMacha · July 19, 2022, 6:02am

Thanks, you are right. I will edit the post and remove @inbounds. I originally wrote the compute_L_R function with three Vector{<:Real} as inputs, but then I changed the input to AbstractMatrix …

thautwarm · July 19, 2022, 6:50am

Could you please provide the numba code? I’d like to offer help but without local benchmarking is painful…

lmiq · July 19, 2022, 1:15pm

If, after optimizations, constructing the inrange neighbor list becomes a bottleneck, you can try using CellListMap.neighborlist:

julia> N = 15
15

julia> PT = rand(N^3,3);

julia> ip =collect(1:N^3);

julia> xx,yy,zz =  PT[:,1], PT[:,2], PT[:,3]; data = [xx yy zz];

julia> r = 0.3
0.3

julia> x = data';

julia> function nn(x, r)
           balltree = BallTree(x)
           st = inrange(balltree, x, r, true)
       end
nn (generic function with 1 method)

julia> @btime nn($x, $r);
  51.729 ms (16457 allocations: 22.71 MiB)

julia> @btime CellListMap.neighborlist($x, $r, parallel = false);
  10.785 ms (361 allocations: 14.94 MiB)

julia> @btime CellListMap.neighborlist($x, $r, parallel = true); # 4 cores
  5.787 ms (3560 allocations: 38.32 MiB)

The output is a list of pairs and their distances:

julia> list = CellListMap.neighborlist(x, r)
451317-element Vector{Tuple{Int64, Int64, Float64}}:
 (1, 305, 0.18333348583592726)
 (1, 530, 0.10208305984055893)
 (1, 542, 0.0564371990772332)
 (1, 655, 0.27887521324793013)
 ⋮
 (2990, 2769, 0.25077868150174176)
 (2990, 2849, 0.27190146964427236)
 (3066, 2505, 0.27644153578133657)

O_Julia · July 19, 2022, 1:17pm

Thank you, I will consider that.

lmiq · July 19, 2022, 2:58pm

Now, since we are sad when we don’t understand performance differences …

Here I tested, as noted by @VaclavMacha, that only by changing spzeros by zeros the execution time is reduced by a factor of about 3.

Probably the main issue in comparison with the Python code is some use of SparseArrays which is not adequate.

Anyway, I leave here the code I tested. You can just run it with:

julia> include("file.jl")

julia> d = data()

julia> @time Bonder0.main(d...);  # original code
106.243666 seconds (865.12 k allocations: 195.326 GiB, 3.11% gc time)

julia> @time Bonder1.main(d...);  # just changing the spzeros to zeros
 30.016908 seconds (1.04 M allocations: 35.632 GiB, 1.90% gc time, 0.71% compilation time)

the code

using NearestNeighbors

function data()
    N = 15
    PT = rand(N^3,3);
    ip =collect(1:N^3);
    xx,yy,zz =  PT[:,1], PT[:,2], PT[:,3]; data = [xx yy zz];

    r = 0.3

    balltree = BallTree(data')
    st = inrange(balltree, data', r, true)  # https://github.com/KristofferC/NearestNeighbors.jl
    m = 0.5 

    return st, PT, ip, m
end

module Bonder0

using SparseArrays

function fun(data,c)
   x = data[:,1]; y = data[:,2]; z = data[:,3]; n = length(x);
   A,A1,B1,C1,A2,B2,C2 = zeros(n,n),zeros(n,n),zeros(n,n),zeros(n,n),zeros(n,n),zeros(n,n),zeros(n,n)

   for i in 1:n
         @inbounds for k in 1:n
           xx = x[k]-x[i]
           yy = y[k]-y[i]
           zz = z[k]-z[i]
           xyz = sqrt(xx^2+yy^2+zz^2)

           A[i,k] = sqrt(xyz^2+c^2)
           A1[i,k] = xx/A[i,k]
           A2[i,k] = 1/A[i,k]-xx^2/A[i,k]^3
           B1[i,k] = yy/A[i,k]
           B2[i,k] = 1/A[i,k]-yy^2/A[i,k]^3
           C1[i,k] = zz/A[i,k]
           C2[i,k] = 1/A[i,k]-zz^2/A[i,k]^3

       end
   end
   return A,A1,B1,C1,A2,B2,C2
end


function fun2(data)
    x = data[:,1]; y = data[:,2]; z = data[:,3];o = zero(x); II = ones(size(x));
    a =   [II x y z x .^2 x.*y x.*z y .^2 y.*z z .^2 x .^3 x .^2 .*y x .^2 .*z x.*y .^2 x.*y.*z x.*z .^2 y .^3 y .^2 .*z y.*z .^2 z .^3]
    a1 =  [o II o o x*2 y z o o o 3*x .^2 2*x.*y 2*x.*z y .^2 y.*z z .^2 o o o o]
    b1 =  [o o II o o x o y*2 z o o x .^2 o x.*y*2 x.*z o 3*y .^2 2*y.*z z .^2 o]
    c1 =  [o o o II o o x o y z*2 o o x .^2 o x.*y x.*z*2 o y .^2 y.*z*2 3*z .^2]
    a2 = [o o o o II*2 o o o o o x*6 2*y 2*z o o o o o o o]
    b2 = [o o o o o o o II*2 o o o o o x*2 o o y*6 2*z o o]
    c2 = [o o o o o o o o o II*2 o o o o o x*2 o o y*2 z*6]

    return a,a1,b1,c1,a2,b2,c2
end

function main(st,data::Array{Float64,2},p::Array{Int64,1},m)
    N = length(data[:,1])
    W,W11,W12,W13,W21,W22,W23 = spzeros(N,N),spzeros(N,N),spzeros(N,N),spzeros(N,N),spzeros(N,N),spzeros(N,N),spzeros(N,N)
    
    dd = zeros(20,20);
    @inbounds for i in p
       id=st[i]
       t_data = data[id,:];
       A,A1,B1,C1,A2,B2,C2 = fun(t_data,m)
       a,a1,b1,c1,a2,b2,c2  = fun2(t_data)
       R = [A a; a' dd];
       ind = filter(j -> id[j] == i,axes(id, 1)); 
       L = [[A;a'][:,ind] [A1;a1'][:,ind] [B1;b1'][:,ind] [C1;c1'][:,ind] [A2;a2'][:,ind] [B2;b2'][:,ind] [C2;c2'][:,ind]];

       soln = (R\L)[1:length(t_data[:,1]),:]; 
       W[i,id] = soln[:,1]
       W11[i,id] = soln[:,2]
       W12[i,id] = soln[:,3]
       W13[i,id] = soln[:,4]
       W21[i,id] = soln[:,5]
       W22[i,id] = soln[:,6]
       W23[i,id] = soln[:,7]
    end
    return W,W11,W12,W13,W21,W22,W23
end

end

module Bonder1


function fun(data,c)
   x = data[:,1]; y = data[:,2]; z = data[:,3]; n = length(x);
   A,A1,B1,C1,A2,B2,C2 = (zeros(n,n) for _ in 1:7)

   for i in 1:n
         @inbounds for k in 1:n
           xx = x[k]-x[i]
           yy = y[k]-y[i]
           zz = z[k]-z[i]
           xyz = sqrt(xx^2+yy^2+zz^2)

           A[i,k] = sqrt(xyz^2+c^2)
           A1[i,k] = xx/A[i,k]
           A2[i,k] = 1/A[i,k]-xx^2/A[i,k]^3
           B1[i,k] = yy/A[i,k]
           B2[i,k] = 1/A[i,k]-yy^2/A[i,k]^3
           C1[i,k] = zz/A[i,k]
           C2[i,k] = 1/A[i,k]-zz^2/A[i,k]^3

       end
   end
   return A,A1,B1,C1,A2,B2,C2
end


function fun2(data)
    x = data[:,1]; y = data[:,2]; z = data[:,3];o = zero(x); II = ones(size(x));
    a =   [II x y z x .^2 x.*y x.*z y .^2 y.*z z .^2 x .^3 x .^2 .*y x .^2 .*z x.*y .^2 x.*y.*z x.*z .^2 y .^3 y .^2 .*z y.*z .^2 z .^3]
    a1 =  [o II o o x*2 y z o o o 3*x .^2 2*x.*y 2*x.*z y .^2 y.*z z .^2 o o o o]
    b1 =  [o o II o o x o y*2 z o o x .^2 o x.*y*2 x.*z o 3*y .^2 2*y.*z z .^2 o]
    c1 =  [o o o II o o x o y z*2 o o x .^2 o x.*y x.*z*2 o y .^2 y.*z*2 3*z .^2]
    a2 = [o o o o II*2 o o o o o x*6 2*y 2*z o o o o o o o]
    b2 = [o o o o o o o II*2 o o o o o x*2 o o y*6 2*z o o]
    c2 = [o o o o o o o o o II*2 o o o o o x*2 o o y*2 z*6]

    return a,a1,b1,c1,a2,b2,c2
end

function main(st,data::Array{Float64,2},p::Array{Int64,1},m)
    N = length(data[:,1])
    W,W11,W12,W13,W21,W22,W23 = (zeros(N,N) for _ in 1:7)
    
    dd = zeros(20,20);
    for i in p
       id=st[i]
       t_data = data[id,:];
       A,A1,B1,C1,A2,B2,C2 = fun(t_data,m)
       a,a1,b1,c1,a2,b2,c2  = fun2(t_data)
       R = [A a; a' dd];
       ind = filter(j -> id[j] == i,axes(id, 1)); 
       L = [[A;a'][:,ind] [A1;a1'][:,ind] [B1;b1'][:,ind] [C1;c1'][:,ind] [A2;a2'][:,ind] [B2;b2'][:,ind] [C2;c2'][:,ind]];

       soln = (R\L)[1:length(t_data[:,1]),:]; 
       W[i,id] = soln[:,1]
       W11[i,id] = soln[:,2]
       W12[i,id] = soln[:,3]
       W13[i,id] = soln[:,4]
       W21[i,id] = soln[:,5]
       W22[i,id] = soln[:,6]
       W23[i,id] = soln[:,7]
    end
    return W,W11,W12,W13,W21,W22,W23
end

end

aegger · July 19, 2022, 3:17pm

I agree. And without seeing the original python code it is a bit difficult to pin-point the problem. One possible explanation: In python the list-of-lists format might be used, which behaves “less worse” when growing sparse arrays. The scipy doc has a concise little overview when to prefer which format. scipy.sparse.lil_matrix — SciPy v1.8.1 Manual

O_Julia · July 19, 2022, 7:09pm

Thank you all for your time and help.

Topic		Replies	Views
My julia code is somehow much slower than the matlab code New to Julia question , performance , matlab	55	4254	December 30, 2022
My Julia code is slower than Python and Matlab Performance question	26	1108	March 4, 2025
Matlab's matmul much faster than julia's New to Julia	6	686	April 5, 2024
Matlab versus Julia General Usage	33	4837	July 15, 2021
Optimization, from MATLAB to Julia Optimization (Mathematical) matlab , optimization	10	863	April 13, 2024

Speeding up some non-optimized Julia functions

Related topics