Julia Beginner (from Python): Numba outperforms Julia in rewrite. Any tips to improve performance?

lmiq · August 15, 2021, 12:09am

Is it possible for you to post a working example of that function alone?

Edit:

function get_gains!(X, current_values, idxs, gains)
    @inbounds Threads.@threads for i in eachindex(idxs)
        s = 0.0
        for j in eachindex(current_values)
            s += @fastmath sqrt(current_values[j] + X[j, idxs[i]])
        end
        gains[i] = s
    end
end;

If that is the bottleneck, have you tried to use @tturbo in the inner (or outer) loops instead @threads? Or @turbo in the inner loop at least? Seems to me that what is being computed is too simple for standard threading to be useful. If the number of elements of current_values is large this is also probably an easy use case for a GPU accelerated map reduce.

Edit 2:

In [25]:
opt2 = LazyGreedy(X_digits);
res2 = @btime fit(opt2, k);
bench2 = @benchmark fit(opt2, k);
t2 = bench2.times[1]/1e9;
  13.836 s (22202 allocations: 8.95 GiB)

Those allocations, unless you know exactly where are they coming from, are an indication of a type instability. I find very hard to debug a notebook, but that may be an indication that the program can be much faster.

Topic		Replies	Views
General questions from Python user Performance	59	4600	March 8, 2021
Why is this simple function twice as slow as its Python version Performance question	97	4888	April 12, 2021
Quite bad performance of Julia 0.6.4 vs Python+Numpy General Usage	26	5408	November 13, 2018
Optimized Python is as good as Julia Performance question	28	27705	June 2, 2025
Julia Performance - Help Needed Performance question , python	40	3154	September 17, 2021

Julia Beginner (from Python): Numba outperforms Julia in rewrite. Any tips to improve performance?

Related topics