When appllying roll rank function, how much faster Julia can be compared to Python?

Brian1 · March 18, 2022, 5:10am

In Python, I tried many ways to apply roll rank function to a list, the speed is too slow. When the list length is about 100000, the fastest way in Python I tried takes more than 100ms.

Then in Juila, I tried the similar way used in Python, It still takes 80ms to the calculation.

So, my question is, Is there some way I am missing in using Julia, or is it just a matter of Julia limitation?
Here is my test code (I test it in Julia 1.8-beta1):

partslice(x::Int,window::Int,step::Int)=begin
    partitionsize=div(x-window,step)+1+1
    res=Vector{NTuple{3,Int}}(undef,partitionsize)
    for i=1:partitionsize
        if i==1
            startloc=1
            endloc=window
            starti=1
        elseif i==partitionsize
            startloc=x-window+1
            endloc=x
            starti=res[i-1][2]-startloc+2
        else
            startloc=(i-1)step+1
            endloc=startloc+window-1
            starti=res[i-1][2]-startloc+2
        end
        res[i]=(startloc,endloc,starti)
    end
    res
end

function rollRank(arr::AbstractArray,starti::Int=1)
    res=Vector{Float32}(undef,size(arr))
    sorted_res=sort(arr[1:starti-1])
    for i=starti:size(res,1)
        iloc=searchsortedfirst(sorted_res,arr[i])
        insert!(sorted_res,iloc,arr[i])
        res[i]=100*(iloc-1)/size(sorted_res,1)
    end
    res[starti:end]
end

rollRankVec(arr::Vector,window::Int,n::Int)=begin
    res=Vector{Float32}(undef,size(arr))
    slicevec=partslice(size(arr,1),window,n)
    @inbounds for (startloc,endloc,starti) in slicevec
        res_=rollRank(@view(arr[startloc:endloc]),starti)
        slicerange=startloc+starti-1:endloc
        res[slicerange]=res_
    end
    res
end

#TEST:

testdata=rand(100_000)

#first time

@time rollRankVec(testdata,3000,1000);

#0.089189 seconds (965.58 k allocations: 18.778 MiB)

amrods · March 18, 2022, 5:37am

is sorted_res always the same size as res in rollRank()? perhaps you can initialize it with a given size rather than inserting elements.

Brian1 · March 18, 2022, 5:57am

Sorry, there is something wrong in the code,
I have changed sorted_res=[] to sorted_res=sort(arr[1:starti-1]).
As you said,

it is. But it can not be initialized, in the text inline, I use function

If I initialize sorted_res, this function wil return wired value:

sorted_res=Vector{Float32}(undef,1000)

searchsortedfirst(sorted_res,10)

#995

DNF · March 18, 2022, 5:58am

It’s a bit too much to look into at the moment, but this immediately caught my eye:

That’s a vector with an abstract element type. Make sure to make it concrete, as in Vector{NTuple{3, Int}}, or whatever is appropriate.

Repeated insert!s seems expensive. Is that necessary?

It’s not clear how you can assume that this should be Float32.

You’re kind of abusing short-form function notation. The idiomatic way is to write

function foo(...) 
    # code
end

DNF · March 18, 2022, 6:12am

BTW, would you mind explaining what these functions do? It’s easier to review code when you know what it’s supposed to be doing.

sylvaticus · March 18, 2022, 6:26am

Seems related to sorting a multidimensional arrays…

amrods · March 18, 2022, 6:26am

Wouldn’t it be even faster to work with an array of numbers rather than a vector of tuples?

Isn’t that the same when starti == 1 (which is the default value)?

Brian1 · March 18, 2022, 6:39am

Yes, I changed the

to more crrect way, but it seems has no fluence in performance.

So far, in my opinion, insert! is necessary.

Yeah I use Float32 because it nees less memory.

DNF · March 18, 2022, 6:42am

I don’t think so. Depends on what you are using it for. Vectors of tuples shouldn’t have any performance issues.

But vectors of an abstract tuple type is definitely bad.

It’s not the same, since the former creates a Vector{Any}, and the latter creates a correctly typed vector.

Brian1 · March 18, 2022, 6:56am

I am targeting to calculate the rank value compared to it’s preceedents, which is a subvector with fixed window.
For exsample:
I have a vector named myvec containing 100 elements. If I set the window 20 and step 10;

for i=1:20, I calculate percentofscore(myvec[i], myvec[1:i-1])
step 10 foreward, for i=20:30, I calculate percentofscore( myvec[i], myvec[10:i-1])
.
.
.
At last, concatenate these.

Brian1 · March 18, 2022, 7:03am

It is the same when starti==1, but starti is changing.

DNF · March 18, 2022, 7:33am

No, the types are different, as mentioned.

Brian1 · March 18, 2022, 8:17am

Yes, types diference.
But It seems still no performance influence. Even worse, I test it, it takes 180ms, more than Python. Where is wrong?

DNF · March 18, 2022, 8:47am

Did you time your code twice, to avoid timing compilation? Because I get this when running your code:

1.7.2> @time rollRankVec(testdata,3000,1000); # first run, includes compilation
  0.215553 seconds (386.24 k allocations: 28.302 MiB, 4.52% gc time, 88.10% compilation time)

1.7.2> @time rollRankVec(testdata,3000,1000); # second run
  0.028715 seconds (504 allocations: 8.202 MiB)

So, 28ms, which seems, well, I dunno, but less bad. Notice the difference in memory allocations, and that 88% of the time is compilation in the first run.

Brian1 · March 18, 2022, 9:10am

Yes, of course. The difference lies in computer. My computer is out of time. Your computer is wonderful, maybe it has the 12th generation cpu?

DNF · March 18, 2022, 9:29am

I’m confused. Here’s the timing you showed:

It uses 89ms, and it has a lot of allocations, which does not seem right. Is this not up to date? Can you show your latest timing?

Also, what version of Julia are you using?

Brian1 · March 18, 2022, 9:37am

It is maybe because the type instability you mentioned above. After correct that, the allocations output is the same to yours.

@time rollRankVec(testdata,3000,1000);
0.074555 seconds (504 allocations: 8.202 MiB)

The Julia version I used is 1.8.0-beta1

DNF · March 18, 2022, 9:50am

It looks to me like the most obvious problems are fixed. In order to speed this up, I think a bit more thorough work is needed. Re-using vectors instead of creating new ones all the time. Make sure that insert! doesn’t need to re-allocate many times.

Brian1 · March 18, 2022, 10:43am

Yes, Thanks.
It is surprising that your computer is twice faster than mine. Probably, the most direct way is just to upgrade the cpu of my computer. May I ask you what the cpu and os you are using in your computer?

DNF · March 18, 2022, 10:55am

I really hope there’s a better way than upgrading the computer

It’s a work laptop, a couple of years old: Intel(R) Core™ i7-9750H CPU @ 2.60GHz in a Dell Precision 5540. Thin and light, but quite pricey, I think.

Windows 10.

Topic		Replies	Views
ANN: JJ.jl -- library for J-like verb rank in Julia Package Announcements	9	1259	June 28, 2022
Ironic observation about `sort` and `sortperm` speed for "small integers" vs R Performance sort , sortperm , r	32	4666	February 4, 2018
How to find rankings of each tuple in the list of tuples? General Usage question	7	626	December 9, 2022
JuliaDB: efficiently rank subgroups Data	2	792	May 20, 2019
10x faster sortperm() Performance sortperm	10	2702	October 29, 2024

When appllying roll rank function, how much faster Julia can be compared to Python?

Related topics