A. S. To prevent any confusion: at the time when this was originally posted, Julia’s run was about 600ms and was ~20x slower than Rust. This is not longer the case.
Resuming the original post…
… →
I stumbled upon this funnybenchmark with Julia at the bottom of the list.
They (whoever wrote that code - I wasn’t curious to see the git blame) didn’t even account for compiling time.
If somebody has some time, they might be interested in fixing this (I am not even sure that the Julia code is doing only the work in the benchmark description).
Context: the F# version was also unfairly represented, and somebody fixed it (a little overkill there with some unidiomatic F# - but it is fast).
However, the benchmark doesn’t seem concerned with comparing apples to apples - anybody can jump on it and propose an implementation as long as they do not break the benchmark rules.
Yes - I did the update. It’s pretty straightforward and naive - but the goal was to replace the weird/initial implementation that took about 0.6 seconds.
I guess if someone gripes about calling the function just to compile it prior to timing another call, maybe replace the first call with a precompile and check if that works as well? But the typical tendency of JIT compilation to make its way into 1-run benchmarks has already been adequately demonstrated, and that precompile wouldn’t solve this setup.
if you want to fix the second column (the including I/O, which is timing the julia script.jl from outside), you need to AOT I guess, not sure worth the effort…
I’m the author of the initial Julia implementations. (Sorry for finishing last !)
My goal wasn’t to write the fastest code but to be as close as possible to the plain Python and numpy implementation because that’s my main language at work. If you’re curious the Julia was 3x better than plain python and 1.5x than numpy.
Since then the repo blew out of proportion and I wanted to write a proper version.
I was able to push it toward 70ms in my machine with this code:
But I think I’m already outclassed by the julia wizards of this forum.
In anycase, to fix the second column maybe a tool like DaemonMode.jl could do the job?
Not sure if it’s worth it and may be considered cheating.
I hope there are no hard feelings on your end. I wasn’t curious to check the git blame since my only concern was related to seeing Julia in Rust/Go/Zig club. Objectively we can agree that the implementation was not close to the Rust level (and you had your freedom to create whatever implementation you saw fit and according to your objectives - you had no obligation to satisfy me or anybody else).
Also - since the benchmark is not some micro-benchmark and actually resembles a scenario that is probable to be encountered in real-world data processing, it was somehow painful to see Julia running 20x slower than Rust.
But this was my subjective experience - significant enough that I did something about it. Waiting for the next benchmark run to reflect the latest changes: but I estimate that Julia will get the second place in that benchmark. And that is enough for my personal satisfaction.
However, it would be nice to see somebody from the wizard class getting their hands dirty and doing some wonders (while also making sure to stay within the boundaries set by the benchmark rules).
Looking at the new code, it was a bit long. Here is a shorter version which tries to be short and fast (achieved close to current version in speed but still doesn’t do some of the optimizations in the current version):
using JSON3
using Dates
using DataStructures
struct PostData
_id::String
title::String
tags::Vector{String}
end
struct RelatedPost
_id::String
tags::Vector{String}
related::SubArray{PostData, 1, Vector{PostData}, Tuple{Vector{Int64}}, false}
end
function relatedIO()
posts = JSON3.read("posts.json", Vector{PostData})
start = now()
all_related_posts = related(posts)
println("Processing time (w/o IO): $(now() - start)")
open("../related_posts_julia.json", "w") do f
JSON3.write(f, all_related_posts)
end
end
function related(posts, topn = 5)
tagmap = Dict{String,Vector{Int}}()
for (idx, post) in enumerate(posts), tag in post.tags
tagmap[tag] = push!(get(()->Int[], tagmap, tag), idx)
end
scratch = zeros(Int, length(posts))
scratchbase = Ref(0)
return map(enumerate(posts)) do (i, post)
for t in post.tags, x in tagmap[t]
x == i && continue
scratch[x] = ifelse(scratch[x] < scratchbase[], scratchbase[], scratch[x])+1
end
tops = nlargest(topn, 1:length(scratch); by=x->scratch[x])
scratchbase[] = scratch[tops[1]]
RelatedPost(post._id, post.tags, @view posts[tops])
end
end
const res = relatedIO()
If anyone wants to try and make it nicer/faster…
A new speed trick is to avoid zeroing the scratch array by starting new counters from a higher scratchbase.
I solved the StrideArrays mishap, and the latest PR was reverted.
The previous non-StrideArrays version was not bad - but the performance gain after StrideArrays addition was clearly going against the rules.
@Dan, my current version (from the reverted PR) runs in 32ms on my machine - the version you proposed is around 54-56ms. The weird thing is that by looking at your code I intuitively expected a performance gain (which didn’t materialise when benchmarking it).
From what I see, some improvements can be done for this function I wrote:
function fastmaxindex!(xs::Vector{Int}, topn, maxn, maxv)
maxn .= 1
maxv .= 0
for (i, x) in enumerate(xs)
if x > maxv[1]
maxv[1] = x
maxn[1] = i
for j in 2:topn
if maxv[j-1] > maxv[j]
maxv[j-1], maxv[j] = maxv[j], maxv[j-1]
maxn[j-1], maxn[j] = maxn[j], maxn[j-1]
end
end
end
end
reverse!(maxn)
return maxn
end
Maybe a proper usage of a really fast PriorityQueue might help - or maybe somebody can do a smart algorithmic trick that is more compiler-friendly.
Tried to use PriorityQueue from DataStructures, but it was very slow and allocating. The problem could be an underlying Dict supporting the PriorityQueue.
A new Dict type which supports keys from a limited universe and is implemented using flat memory storage could be useful in other cases. I think there are many times a Dict is known to have keys from a small universe and speed is paramount.