Funny Benchmark with Julia (no longer) at the bottom

jling · October 8, 2023, 12:17pm

[julia] add comments, reduce getindex(), use MVector to speed up by Moelf · Pull Request #162 · jinyus/related_post_gen · GitHub I think MVector and not checking if > ... every single time with getindex() helped reduce runtime by 33%, I checked the output diff seems okay:

> diff related_posts_julia.json related_posts_julia_new.json

JM_Beckers · October 8, 2023, 12:21pm

If I understand correctly, that finds the topn largest values in a list ? (edit: index of the ntop largest values)
If so, you might look into this

jling · October 8, 2023, 12:22pm

we need the index of topn largest values in a list, which is precise partialsortperm( ;rev=true), but it’s much slower

JM_Beckers · October 8, 2023, 12:30pm

That is then exactly (?) what was discussed

but for the minimum.

Lilith · October 8, 2023, 12:51pm

sortperm and friends are really slow in Julia sortperm has poor performance · Issue #939 · JuliaLang/julia · GitHub

Also, partialsort dispatches between partial-scratch-quicksort (good for long inputs) and insertionsort of the whole input (good for short inputs) but does not use partial-insertionsort (good for when the number of elements being searched for is small, asymptotically, when that number is less than log(length)).

jling · October 8, 2023, 2:34pm

github.com/jinyus/related_post_gen

[julia] put Julia code into a project better precompile

jinyus:main ← Moelf:main

opened 02:33PM - 08 Oct 23 UTC

Moelf

+123 -104

This should significantly reduce the "total" time, which before was unfairly sho…wing the compilation time. The native code caching was introduced in 1.9: https://julialang.org/blog/2023/04/julia-1.9-highlights/#caching_of_native_code ## this PR ``` Processing time (w/o IO): 24 milliseconds Time (mean ± σ): 407.2 ms ± 12.5 ms [User: 430.0 ms, System: 338.5 ms] Range (min … max): 394.0 ms … 431.2 ms 10 runs Checking output Success: related_posts_julia.json is valid! ```

should make the second column (“total”) more reasonable, basically just put it into its own project and utilize some of those native code caching.

Locally it reduced ~3 seconds to about 400 ms

> hyperfine 'julia --startup-file=no --project=Related/ related.jl'
Benchmark 1: julia --startup-file=no --project=Related/ related.jl
  Time (mean ± σ):     407.2 ms ±  12.5 ms    [User: 430.0 ms, System: 338.5 ms]
  Range (min … max):   394.0 ms … 431.2 ms    10 runs

algunion · October 8, 2023, 3:27pm

Current results:

The underlying data (used for the leaderboard) is changing from run to run, and it seems to affect the results regardless of underlying implementations.

For example, Zig was not updated in 4 days and jumped on top for the current run (the same implementation a day back did worse than Java - GraalVM).

Maybe a better way to optimize this is to use the random post generator and check the improvements by averaging the results of multiple randomly generated datasets.

Leaving the leaderboard aside - I find this kind of real-world benchmark more valuable than micro-benchmarking (although we can admit that we are not always comparing apples to apples here: because anybody is free to do whatever algorithmic tricks and end up with different implementations from language to language).

This was not merged yet - so it still has a large total time value.

jling · October 9, 2023, 10:57am

this function has the funny property that if I add @inbounds it actually becomes slower (given we’re using MVector now)

aplavin · October 9, 2023, 11:29am

As an alternative to the highly-optimized solution built of manual loops, I’d like to propose a solution using higher-level data manipulation tools.
It’s about 2x slower for me than the code in repo, but shorter and easier to understand what’s actually going on – once got used to each primitive.
Uses NamedTuples instead of custom structs.

using DataManipulation, StructArrays

function related_2(posts_raw)
	# arguably the processing is more convenient with namedtuples instead of custom structures
	# converting here, but potentially this could be created when reading in the first place:
	posts = @p posts_raw |> map(Accessors.getproperties) |> StructArray |> @insert __.ix = eachindex(__)

	tagmap = @p let
		posts
		flatmap(_.tags, (p, tag) -> (; p.ix, tag))
		group(_.tag)
		map(map(x->x.ix, _))
	end

	nsharedtags = zeros(Int, length(posts))
	@p let
		posts
		map() do post
			nsharedtags .= 0
			for t in post.tags
				@views nsharedtags[tagmap[t]] .+= 1
			end
			nsharedtags[post.ix] = 0

			related = partialsort(posts, 1:5; by=p -> nsharedtags[p.ix], rev=true)
			(; post..., related)
		end
		collect
	end
end


# code above works without these definitions,
# but Base.partialsort is muuuch slower
using DataStructures: nlargest
partialsort(v, k; kwargs...) = partialsort(v, Base.OneTo(k); kwargs...)
partialsort(v, k::Base.OneTo; by, rev=false) = @p v |> nlargest(length(k), eachindex(__); by=i -> by(__[i])) |> map(v[_])

Palli · October 9, 2023, 11:54am

Julia went from last to the top (while no longer fastest). It’s not new that Julia can be optimized (and it’s neither as scalable in its now optimized version, as e.g. Go’s).

What I would be curious about, then know problem with unoptimized code, could that be optimized (i.e. by Julia improvements, no code changes to the code/benchmark)? I’m ok with it, since I know of it, but dynamic code is maybe slower than it needs to be? And at least possibly surprising to some (new users)?

It’s sort of good that unoptimized is not too fast… then people would “know”, or not. Unoptimized code can’t of course be made as fast, but what would be a reasonable margin?

Palli · October 9, 2023, 12:09pm

Do you mean like LitteDict? It’s often much faster, if your Dict is small. But note it’s O(n), why not the default (I think that might though be ok for Julia itself, i.e. what does Julia need when e.g. compiling?).

I did think up a hybrid, LittleDict + regular, I or someone should implement, then it’s back to O(1), first scanning a limited n for LittleDict before the fallback. I could be ok for Base, or not…

That’s not the same, it seems then, though I guess helpful for something.

Palli · October 9, 2023, 12:20pm

Are they, and would this PermFast here, help in this case:

github.com/JuliaLang/julia

Speed up permsort by utilizing stability of the default sorting algorithm

JuliaLang:master ← petvana:pv/PermUnstable-v4

opened 09:46PM - 15 Nov 22 UTC

petvana

+88 -15

This is a draft on how to speed up `permsort`. The idea is that it is not necess…ary to maintain the stability by `Perm` if a stable sorting algorithm is used. Thus `PermUnstable` is introduces with a simplify and faster `lt(p::PermUnstable, a::Integer, b::Integer)` implementation. Notice the PR builds upon #47383 that has not been merged yet. @LilithHafner Do you like this idea (as the original author of the changes)? ```julia julia> using BenchmarkTools #47383 julia> @btime sortperm(x) setup=(x=rand(1000)); 29.741 μs (5 allocations: 15.92 KiB) # This PR julia> @btime sortperm(x) setup=(x=rand(1000)); 23.349 μs (5 allocations: 15.92 KiB) ```

The issue you linked was closed in 2022, since “greatly improved”, then reopened again (otherwise incredible that a 2012 issue was still open).

jling · October 9, 2023, 12:24pm

algunion:

function fastmaxindex!(xs::Vector{Int}, topn, maxn, maxv)
    maxn .= 1
    maxv .= 0
    for (i, x) in enumerate(xs)
        if x > maxv[1]
            maxv[1] = x
            maxn[1] = i
            for j in 2:topn
                if maxv[j-1] > maxv[j]
                    maxv[j-1], maxv[j] = maxv[j], maxv[j-1]
                    maxn[j-1], maxn[j] = maxn[j], maxn[j-1]
                end
            end
        end
    end

    reverse!(maxn)

    return maxn
end

I tried some other approach, especially the one Go-lang took, which is too insert at the correct location instead of doing the maxv[j-1], maxv[j] dance every time there’s a new top 5 candidate:

function fastmaxindex!(xs::Vector{Int64}, topn, maxn, maxv)
    maxn .= 1
    maxv .= _min = 0
    for (i, x) in enumerate(xs)
        x <= _min && continue

        pos = findfirst(<(x), maxv)

        for j in pos+1:4
            maxv[j+1] = maxv[j]
            maxn[j+1] = maxn[j]
        end
        maxv[pos] = x
        maxn[pos] = i
        _min = last(maxv)
    end

    return maxn
end

unfortunately this is slower than yours

jling · October 9, 2023, 12:25pm

what is this? I can’t find anything in Julia that has this name

jling · October 9, 2023, 12:27pm

algunion:

           for j in 2:topn
                if maxv[j-1] > maxv[j]
                    maxv[j-1], maxv[j] = maxv[j], maxv[j-1]
                    maxn[j-1], maxn[j] = maxn[j], maxn[j-1]
                end

Given we’re operating on small (size = 5) MVector, I also tried to unroll this via Base.@nexpr, but that make things slower again

Palli · October 9, 2023, 12:28pm

It’s actually in OrderedCollections.jl (and thus also in DataStructures.jl, since it includes and adds to it), so that may be all you need, a faster dependency, ~~at least at one point~~.

It’s still faster to import/use:

julia> @time import OrderedCollections
  0.020884 seconds (21.83 k allocations: 1.599 MiB)

julia> @time using DataStructures
  0.109690 seconds (55.32 k allocations: 4.020 MiB)

Both are then slower still if you use using rather than import, not something I’ve considered before. So I’ll use import from now on, unless people think you make of for it later somehow if you use using…

algunion · October 9, 2023, 12:32pm

@Palli, I would say that the original (the 600ms one) was just a straightforward implementation without any kind of type-instability. Maybe there was some unnecessary allocation (I don’t remember). And I remember that the original implementation actually used partialsortperm! (which was the sensible thing to do).

However - the point is that one might expect to get a reasonable performance out of the box without the need to start implementing manual stuff like I did with fastmaxindex!.

For example, the Go version seems pretty straightforward without needing acrobatics for a decent speed.

So we brought Julia to the top - but now what? One can honestly ask, what was the development time cost for getting a performant solution for a straightforward and real-world task?

This is in no way a critique (and even less finger-pointing) - just trying to be realistic here.

algunion · October 9, 2023, 12:38pm

Maybe we can try a different approach altogether and not get anchored (as in bias) in how I implemented the fastmaxindex!.

jling · October 9, 2023, 12:42pm

that’s what I did by copying Go-lang’s implementation, which insert and shifts everything. But that is slower than yours in Julia

jling · October 9, 2023, 1:41pm

github.com

jinyus/related_post_gen/blob/4b2a3910a5ef49c7f74327a3d576f908db07dbda/java/src/main/java/related_post_gen/App.java#L91-L110


      
          for (int j = 0; j < taggedPostCount.length; j++) {
              int count = taggedPostCount[j];
          
              if (count > minTags) {
                  // Find the position to insert
                  int pos = 4;
                  while (pos >= 0 && top5[pos].sharedTags < count) {
                      pos--;
                  }
                  pos++;
          
                  // Shift and insert
                  if (pos < 4) {
                      System.arraycopy(top5, pos, top5, pos + 1, 4 - pos);
                  }
          
                  top5[pos] = new PostWithSharedTags(j, count);
                  minTags = top5[4].sharedTags;
              }
          }

Java is using the same algorithm as Go-lang, hmm, maybe we should try this but with better implementation, maybe I did something wrong

it’s also possible that you have a better sorting algorithm than them, which means we’re extra slow in some other parts

Topic		Replies	Views
Julia programs now shown on benchmarks game website Community announcement	144	13790	December 3, 2019
Help to get my slow Julia code to run as fast as Rust/Java/Lisp Performance	100	4671	August 6, 2021
Shouldn't 1.8.0 be faster than Julia 1.7? Performance	30	2577	September 16, 2022
Benchmark game challenge and some optimization questions Performance	29	2825	January 13, 2024
Yet another language benchmark Performance benchmark	9	945	June 15, 2025

Funny Benchmark with Julia (no longer) at the bottom

Related topics