Julia Binary Heap Benchmark

ndrsh · December 16, 2020, 6:44pm

Hey Community.

Seems like it might be possible that Julia gets adopted at Codeforces. In order to make that happen I need to provide an implementation for this Binary Heap benchmark.

My Julia code is quite ugly, because I had to use Int32 to shave off some milliseconds. Since literals are apparently always Int64s, I had to pass the Int32s into the functions.

My question is: How can this code be improved? Right now we are on par with the Java implementation and about 25% above the C++ implementation.

(BTW, I fixed a few things on the Java reference. I did a warmup, initialized the array inside the run, and I didn’t use final static to get more speed. So the reference implementation you find in the above link might be a bit slower).

ndrsh · December 16, 2020, 6:45pm

And because new users can only put 2 links in a post, here is the actual code I have so far.

EDIT: What this is doing is basically bringing the array out of order first, and then fixing it.

Oscar_Smith · December 16, 2020, 7:03pm

From a performance standpoint, this mostly looks good. However, I would probably write it like this, which imo looks much cleaner.

const i32_1 = Int32(1)
const i32_2 = Int32(2)
function pushdown!(h::Vector{Int32}, pos, n)
    @inbounds while pos*i32_2 <= n
        j = pos*i32_2
        if j+i32_1 <= n && h[j+i32_1] > h[j]
            j += i32_1
        end
        if h[pos] >= h[j]
            break
        else
            h[pos], h[j] = h[j], h[pos]
            pos = j
        end
    end
end

I’m also not sure that all of your Int32 stuff is worthwhile. Is this slower?

function pushdown!(h::Vector{Int32}, pos, n)
    @inbounds while pos*2 <= n
        j = pos*2
        if j+1 <= n && h[j+1] > h[j]
            j += 1
        end
        if h[pos] >= h[j]
            break
        else
            h[pos], h[j] = h[j], h[pos]
            pos = j
        end
    end
end

I think it should be the same (or maybe a little faster).

ndrsh · December 16, 2020, 7:16pm

Hey @Oscar_Smith, thanks for the very fast reply.

Have you benchmarked your code? On my machine, your first version takes 40% longer than my version. Not using Int32 increases this to +47%. Maybe I am doing something wrong?

EDIT: The issue is the const declaration. If I pass the two variables into your function, the speed is the same.

Oscar_Smith · December 16, 2020, 7:37pm

I’m definitely not seeing that. I’m seeing my second version as faster than yours. That said, I also changed the benchmark to do all of it’s operations on Int rather than Int32 (except for the heap elements. The reason for this is that Array indexes are always Int (that doesn’t depend on the element type of the array). As such things like h[pos] should be slower when pos is an Int32.

ndrsh · December 16, 2020, 7:48pm

That sounds great. Could you please link the whole script so I can run and compare it? I am probably calling your function in another way than you are.

Oscar_Smith · December 16, 2020, 7:53pm

function pushdown2!(h::Vector{Int32}, pos, n)
    @inbounds while pos*2 <= n
        j = pos*2
            if j+1 <= n && h[j+1] > h[j]
        j += 1
        end
        if h[pos] >= h[j]
            break
        else
            h[pos], h[j] = h[j], h[pos]
        pos = j
        end
    end
end
function bench2(N)
    h = collect(Int32, 0:N-1)

    for i=div(N,2)+1:-1:1
        pushdown2!(h, i, N)
    end
     
    n = N+1
    while n > 2
         n -= 1
         @inbounds h[1], h[n] = h[n], h[1]
             pushdown2!(h, 1, n-1)
    end
       for i=0:N-1
           @inbounds @assert i == h[i+1]
       end
       h
end
heap = @btime bench2(10_000_000);

ndrsh · December 16, 2020, 8:14pm

So yes, when I called your functions, there were conversions going on. However, your functions are still slower than the initial setup. Here is the script I ran.

heap = @btime bench_ndrsh();
heap = @btime bench_oscar1(10_000_000)
heap = @btime bench_oscar2(10_000_000)

gives me

  786.433 ms (2 allocations: 38.15 MiB)
  870.576 ms (2 allocations: 38.15 MiB)
  870.140 ms (2 allocations: 38.15 MiB)

Can you confirm that running this script gives different results on your machine?

lungben · December 16, 2020, 8:16pm

I got the same timing when using Int64 (or Int, or removing the type parameter completely) here (when removing the element type constraint to h in pushdown2!).

Edit: I can confirm that @Oscar_Smith version is faster than the one of @ndrsh (original post, not latest post), but I used quite an exotic system (Synology DiskStation).

Juan · December 17, 2020, 2:22am

Are you allowed to use multithreading?
You can also try @avx

ndrsh · December 17, 2020, 6:38am

Single core only, and no external libraries allowed. (Apart from that, I don’t know how multithreading would work here logically because iterations in the internal loop affect later iterations, so there are happens-before relations.)

Anyway, if anybody can provide a script that runs faster on a standard CPU (they are running an i5-3470) I’d be happy, otherwise don’t worry. I’ll just submit it, it’s not a big deal anyway.

Thank you.

tamasgal · December 17, 2020, 10:09am

Are you launching Julia with -O3?

Btw. this is what I get on my MacBook (i5):

░ tamasgal@greybox.local:~/tmp
░ 11:05:49 1 > g++ -O3 heap.cpp && ./a.out
Done in 825

░ tamasgal@greybox.local:~/tmp
░ 11:05:56 > julia -O3 heap.jl
ndrsh
  824.606 ms (2 allocations: 38.15 MiB)
oscar
  920.897 ms (2 allocations: 38.15 MiB)

and on my workhorse (Intel(R) Xeon(R) Gold 6246R CPU @ 3.40GHz - 32 cores):

tgal@pi1080:~/tmp/binary-heap-benchmark/cpp$ g++ -O3 heap.cpp && ./a.out
Done in 526
tgal@pi1080:~/tmp/binary-heap-benchmark/cpp$ cd ../julia/
tgal@pi1080:~/tmp/binary-heap-benchmark/julia$ julia -O3 heap.jl
ndrsh
  595.149 ms (2 allocations: 38.15 MiB)
oscar
  586.633 ms (2 allocations: 38.15 MiB)

ndrsh · December 17, 2020, 10:25am

@tamasgal Thanks so much for double-checking. I tried both -O3 and -O2, but on my machine this makes almost no difference (–> can’t tell because variance is higher). But this confirms that everybody in this thread was right, it just depends on the machine. And since Codeforces is benchmarking with an i5, I will submit my solution. Let’s hope it gets accepted!

Thanks everybody!

Topic		Replies	Views
Help with binary trees benchmark games example General Usage benchmarks	53	3672	May 6, 2021
Small benchmark Performance benchmark	14	2729	November 21, 2018
Compare julia sum to a cpp implementation - julia is extremely slow?! Performance question	35	1792	October 7, 2019
Programming Language Benchmark 2 Performance	25	3372	April 8, 2024
Funny Benchmark with Julia (no longer) at the bottom Performance benchmark	149	6018	November 4, 2023

Julia Binary Heap Benchmark

Related topics