# When appllying roll rank function, how much faster Julia can be compared to Python?

In Python, I tried many ways to apply roll rank function to a list, the speed is too slow. When the list length is about 100000, the fastest way in Python I tried takes more than 100ms.

Then in Juila, I tried the similar way used in Python, It still takes 80ms to the calculation.

So, my question is, Is there some way I am missing in using Julia, or is it just a matter of Julia limitation?
Here is my test code (I test it in Julia 1.8-beta1):

``````partslice(x::Int,window::Int,step::Int)=begin
partitionsize=div(x-window,step)+1+1
res=Vector{NTuple{3,Int}}(undef,partitionsize)
for i=1:partitionsize
if i==1
startloc=1
endloc=window
starti=1
elseif i==partitionsize
startloc=x-window+1
endloc=x
starti=res[i-1][2]-startloc+2
else
startloc=(i-1)step+1
endloc=startloc+window-1
starti=res[i-1][2]-startloc+2
end
res[i]=(startloc,endloc,starti)
end
res
end

function rollRank(arr::AbstractArray,starti::Int=1)
res=Vector{Float32}(undef,size(arr))
sorted_res=sort(arr[1:starti-1])
for i=starti:size(res,1)
iloc=searchsortedfirst(sorted_res,arr[i])
insert!(sorted_res,iloc,arr[i])
res[i]=100*(iloc-1)/size(sorted_res,1)
end
res[starti:end]
end

rollRankVec(arr::Vector,window::Int,n::Int)=begin
res=Vector{Float32}(undef,size(arr))
slicevec=partslice(size(arr,1),window,n)
@inbounds for (startloc,endloc,starti) in slicevec
res_=rollRank(@view(arr[startloc:endloc]),starti)
slicerange=startloc+starti-1:endloc
res[slicerange]=res_
end
res
end

#TEST:

testdata=rand(100_000)

#first time

@time rollRankVec(testdata,3000,1000);

#0.089189 seconds (965.58 k allocations: 18.778 MiB)
``````

is `sorted_res` always the same size as `res` in `rollRank()`? perhaps you can initialize it with a given size rather than inserting elements.

Sorry, there is something wrong in the code,
I have changed `sorted_res=[]` to `sorted_res=sort(arr[1:starti-1])`.
As you said,

it is. But it can not be initialized, in the text inline, I use function

If I initialize sorted_res, this function wil return wired value:

``````sorted_res=Vector{Float32}(undef,1000)

searchsortedfirst(sorted_res,10)

#995
``````

Itâ€™s a bit too much to look into at the moment, but this immediately caught my eye:

Thatâ€™s a vector with an abstract element type. Make sure to make it concrete, as in `Vector{NTuple{3, Int}}`, or whatever is appropriate.

Repeated `insert!`s seems expensive. Is that necessary?

Itâ€™s not clear how you can assume that this should be `Float32`.

Youâ€™re kind of abusing short-form function notation. The idiomatic way is to write

``````function foo(...)
# code
end
``````
2 Likes

BTW, would you mind explaining what these functions do? Itâ€™s easier to review code when you know what itâ€™s supposed to be doing.

3 Likes

Seems related to sorting a multidimensional arraysâ€¦

1 Like

Wouldnâ€™t it be even faster to work with an array of numbers rather than a vector of tuples?

Isnâ€™t that the same when `starti == 1` (which is the default value)?

1 Like

Yes, I changed the

to more crrect way, but it seems has no fluence in performance.

So far, in my opinion, `insert!` is necessary.

Yeah I use Float32 because it nees less memory.

I donâ€™t think so. Depends on what you are using it for. Vectors of tuples shouldnâ€™t have any performance issues.

But vectors of an abstract tuple type is definitely bad.

Itâ€™s not the same, since the former creates a `Vector{Any}`, and the latter creates a correctly typed vector.

1 Like

I am targeting to calculate the rank value compared to itâ€™s preceedents, which is a subvector with fixed window.
For exsample:
I have a vector named myvec containing 100 elements. If I set the window 20 and step 10;

1. for i=1:20, I calculate percentofscore(myvec[i], myvec[1:i-1])
2. step 10 foreward, for i=20:30, I calculate percentofscore( myvec[i], myvec[10:i-1])
.
.
.
At last, concatenate these.

It is the same when `starti==1`, but starti is changing.

No, the types are different, as mentioned.

1 Like

Yes, types diference.
But It seems still no performance influence. Even worse, I test it, it takes 180ms, more than Python. Where is wrong?

Did you time your code twice, to avoid timing compilation? Because I get this when running your code:

``````1.7.2> @time rollRankVec(testdata,3000,1000); # first run, includes compilation
0.215553 seconds (386.24 k allocations: 28.302 MiB, 4.52% gc time, 88.10% compilation time)

1.7.2> @time rollRankVec(testdata,3000,1000); # second run
0.028715 seconds (504 allocations: 8.202 MiB)
``````

So, 28ms, which seems, well, I dunno, but less bad. Notice the difference in memory allocations, and that 88% of the time is compilation in the first run.

2 Likes

Yes, of course. The difference lies in computer. My computer is out of time. Your computer is wonderful, maybe it has the 12th generation cpu?

Iâ€™m confused. Hereâ€™s the timing you showed:

It uses 89ms, and it has a lot of allocations, which does not seem right. Is this not up to date? Can you show your latest timing?

Also, what version of Julia are you using?

It is maybe because the type instability you mentioned above. After correct that, the allocations output is the same to yours.

``````@time rollRankVec(testdata,3000,1000);
0.074555 seconds (504 allocations: 8.202 MiB)

``````

The Julia version I used is 1.8.0-beta1

It looks to me like the most obvious problems are fixed. In order to speed this up, I think a bit more thorough work is needed. Re-using vectors instead of creating new ones all the time. Make sure that `insert!` doesnâ€™t need to re-allocate many times.

1 Like

Yes, Thanks.
It is surprising that your computer is twice faster than mine. Probably, the most direct way is just to upgrade the cpu of my computer. May I ask you what the cpu and os you are using in your computer?

I really hope thereâ€™s a better way than upgrading the computer

Itâ€™s a work laptop, a couple of years old: Intel(R) Coreâ„˘ i7-9750H CPU @ 2.60GHz in a Dell Precision 5540. Thin and light, but quite pricey, I think.

Windows 10.