Putting this in Offtopic as it’s not directly a Julia question: I recently asked on here about a fairly straightforward number crunching problem and - invariably - got some great solutions to it, the fastest of which beat my own implementation by 1,000x…
Now I just got a new laptop which I expected to run this problem much faster than my old one but I was disappointed to find the speedup was smaller than expected. There’s a possibility of setting up a dedicated workstation for solving this type of problem loads, and given the slightly disappointing speedup from my change in CPU I’m trying to work out whether there are alternative CPUs which would do significantly better on this workload.
I tried looking into AWS, Azure etc. but it seems they don’t offer workstation type processors like i9/Xeon or Ryzen/Threadripper for the most part - so I’m asking for help here: if you’ve got a reasonably current high performance CPU, could you run the below script (on all available threads) and post the vector of 9 numbers that it produces, together with your CPU?
Here’s my 20 × 13th Gen Intel(R) Core(TM) i7-13800H
:
$~-> julia --threads=auto
julia> include("cpu_test.jl")
[28.7, 28.6, 27.8, 27.3, 26.1, 27.0, 26.6, 26.4, 23.8]
On that CPU, the main loop that produces the numbers takes about 30 seconds to run, so the whole script hopefully shouldn’t take too long even with installing the few small packages that it relies on.
Code below the fold:
cpu_test.jl
using Pkg
Pkg.activate(; temp=true)
Pkg.add(["Chairmarks", "Combinatorics", "StatsBase"])
using Chairmarks, Combinatorics, StatsBase, Random
function mkbitmatrix(selections)
n = length(selections)
P = 64
res = falses(P,n)
for (i,c) in enumerate(selections)
res[c,i] .= true
end
res
end
function matchmeXT(actual_selections, possible_selections, ::Val{p}) where p
out = zeros(Int32, p+1, size(possible_selections,2))
mask = ~((-1%UInt64) << 10)
@inbounds Threads.@threads for i = 1:length(possible_selections.chunks)
a = possible_selections.chunks[i]
for jc = 1:1023:length(actual_selections.chunks)
tmp = 0
for j = jc:min(jc+1023-1, length(actual_selections.chunks))
b = actual_selections.chunks[j]
hits = count_ones(a & b)
tmp += (1 << ( (hits*10) & 63) )
end #j
for k = 0:p
out[k+1, i] += (tmp >>> (k*10)) & mask
end
tmp = 0
end
end
out
end
function get_time_one(P, p)
Q = binomial(P, p)
all_selections_iterator = multiset_combinations(1:P, p) # Iterator over all combinations
actual_selections = unique([sort(sample(1:P, p, replace = false)) for _ ∈ 1:round(Int, Q ÷ 10)])
actuals = mkbitmatrix(actual_selections);
possibilities = mkbitmatrix(all_selections_iterator)
x = @b matchmeXT(actuals, possibilities, Val(p))
return (; P, p, Q, n_selected = length(actual_selections), t = x.time, n_comparisons = Q*length(actual_selections),
bn_n_per_s = round(Q*length(actual_selections) / x.time / 1e9, digits = 1))
end
function get_time_many(sizes)
res = [get_time_one(x, 6) for x ∈ shuffle(sizes)]
end
print(getfield.(get_time_many(20:2:36), :bn_n_per_s))