When appllying roll rank function, how much faster Julia can be compared to Python?

In Python, I tried many ways to apply roll rank function to a list, the speed is too slow. When the list length is about 100000, the fastest way in Python I tried takes more than 100ms.

Then in Juila, I tried the similar way used in Python, It still takes 80ms to the calculation.

So, my question is, Is there some way I am missing in using Julia, or is it just a matter of Julia limitation?
Here is my test code (I test it in Julia 1.8-beta1):

partslice(x::Int,window::Int,step::Int)=begin
    partitionsize=div(x-window,step)+1+1
    res=Vector{NTuple{3,Int}}(undef,partitionsize)
    for i=1:partitionsize
        if i==1
            startloc=1
            endloc=window
            starti=1
        elseif i==partitionsize
            startloc=x-window+1
            endloc=x
            starti=res[i-1][2]-startloc+2
        else
            startloc=(i-1)step+1
            endloc=startloc+window-1
            starti=res[i-1][2]-startloc+2
        end
        res[i]=(startloc,endloc,starti)
    end
    res
end

function rollRank(arr::AbstractArray,starti::Int=1)
    res=Vector{Float32}(undef,size(arr))
    sorted_res=sort(arr[1:starti-1])
    for i=starti:size(res,1)
        iloc=searchsortedfirst(sorted_res,arr[i])
        insert!(sorted_res,iloc,arr[i])
        res[i]=100*(iloc-1)/size(sorted_res,1)
    end
    res[starti:end]
end

rollRankVec(arr::Vector,window::Int,n::Int)=begin
    res=Vector{Float32}(undef,size(arr))
    slicevec=partslice(size(arr,1),window,n)
    @inbounds for (startloc,endloc,starti) in slicevec
        res_=rollRank(@view(arr[startloc:endloc]),starti)
        slicerange=startloc+starti-1:endloc
        res[slicerange]=res_
    end
    res
end

#TEST:

testdata=rand(100_000)

#first time

@time rollRankVec(testdata,3000,1000);

#0.089189 seconds (965.58 k allocations: 18.778 MiB)

is sorted_res always the same size as res in rollRank()? perhaps you can initialize it with a given size rather than inserting elements.

Sorry, there is something wrong in the code,
I have changed sorted_res=[] to sorted_res=sort(arr[1:starti-1]).
As you said,

it is. But it can not be initialized, in the text inline, I use function

If I initialize sorted_res, this function wil return wired value:

sorted_res=Vector{Float32}(undef,1000)

searchsortedfirst(sorted_res,10)

#995

It’s a bit too much to look into at the moment, but this immediately caught my eye:

That’s a vector with an abstract element type. Make sure to make it concrete, as in Vector{NTuple{3, Int}}, or whatever is appropriate.

Repeated insert!s seems expensive. Is that necessary?

It’s not clear how you can assume that this should be Float32.

You’re kind of abusing short-form function notation. The idiomatic way is to write

function foo(...) 
    # code
end
2 Likes

BTW, would you mind explaining what these functions do? It’s easier to review code when you know what it’s supposed to be doing.

3 Likes

Seems related to sorting a multidimensional arrays…

1 Like

Wouldn’t it be even faster to work with an array of numbers rather than a vector of tuples?

Isn’t that the same when starti == 1 (which is the default value)?

1 Like

Yes, I changed the

to more crrect way, but it seems has no fluence in performance.

So far, in my opinion, insert! is necessary.

Yeah I use Float32 because it nees less memory.

I don’t think so. Depends on what you are using it for. Vectors of tuples shouldn’t have any performance issues.

But vectors of an abstract tuple type is definitely bad.

It’s not the same, since the former creates a Vector{Any}, and the latter creates a correctly typed vector.

1 Like

I am targeting to calculate the rank value compared to it’s preceedents, which is a subvector with fixed window.
For exsample:
I have a vector named myvec containing 100 elements. If I set the window 20 and step 10;

  1. for i=1:20, I calculate percentofscore(myvec[i], myvec[1:i-1])
  2. step 10 foreward, for i=20:30, I calculate percentofscore( myvec[i], myvec[10:i-1])
    .
    .
    .
    At last, concatenate these.

It is the same when starti==1, but starti is changing.

No, the types are different, as mentioned.

1 Like

Yes, types diference.
But It seems still no performance influence. Even worse, I test it, it takes 180ms, more than Python. Where is wrong?

Did you time your code twice, to avoid timing compilation? Because I get this when running your code:

1.7.2> @time rollRankVec(testdata,3000,1000); # first run, includes compilation
  0.215553 seconds (386.24 k allocations: 28.302 MiB, 4.52% gc time, 88.10% compilation time)

1.7.2> @time rollRankVec(testdata,3000,1000); # second run
  0.028715 seconds (504 allocations: 8.202 MiB)

So, 28ms, which seems, well, I dunno, but less bad. Notice the difference in memory allocations, and that 88% of the time is compilation in the first run.

2 Likes

Yes, of course. The difference lies in computer. My computer is out of time. Your computer is wonderful, maybe it has the 12th generation cpu?

I’m confused. Here’s the timing you showed:

It uses 89ms, and it has a lot of allocations, which does not seem right. Is this not up to date? Can you show your latest timing?

Also, what version of Julia are you using?

It is maybe because the type instability you mentioned above. After correct that, the allocations output is the same to yours.

@time rollRankVec(testdata,3000,1000);
0.074555 seconds (504 allocations: 8.202 MiB)

The Julia version I used is 1.8.0-beta1

It looks to me like the most obvious problems are fixed. In order to speed this up, I think a bit more thorough work is needed. Re-using vectors instead of creating new ones all the time. Make sure that insert! doesn’t need to re-allocate many times.

1 Like

Yes, Thanks.
It is surprising that your computer is twice faster than mine. Probably, the most direct way is just to upgrade the cpu of my computer. May I ask you what the cpu and os you are using in your computer?

I really hope there’s a better way than upgrading the computer :smiley:

It’s a work laptop, a couple of years old: Intel(R) Core™ i7-9750H CPU @ 2.60GHz in a Dell Precision 5540. Thin and light, but quite pricey, I think.

Windows 10.