I try to accelerate the computation of the following operation:
- I have a matrix X (n, p) and a vector v (p). For each j {= 1,…,p}, I want to compute the differences X[i, j] - v[j] {i = 1,…,n}.
and, this, on CPU (not GPU), with a function that is not ‘inplace’, and on quite large matrices (see the reproductible example below).
(X and v contain numbers of same type, but this type can vary depending on cases: Float64, …32, Int, etc.)
In other terms, my objective is to get a fast function that returns a new matrix from the column-wise substraction of v to X.
I tested many variants and, at present, my two faster versions are the following:
f1(X, v) = X .- v'
function f6_th(X, v) # fastest one
p = size(X)[2]
zX = similar(X)
@Threads.threads for j = 1:p
zX[:, j] .= view(X, :, j) .- v[j]
end
zX
end
An example:
using Chairmarks
n = 10^6 ; p = 500
X = rand(n, p)
v = rand(p)
@be f1($X, $v) samples = 3 evals = 1 seconds = 10
Benchmark: 3 samples with 1 evaluation
991.649 ms (3 allocs: 3.725 GiB, 13.86% gc time)
1.010 s (3 allocs: 3.725 GiB, 13.98% gc time)
1.261 s (3 allocs: 3.725 GiB, 31.36% gc time)
@be f6_th($X, $v) samples = 3 evals = 1 seconds = 10
Benchmark: 3 samples with 1 evaluation
410.142 ms (45 allocs: 3.725 GiB, 1.86% gc time)
611.232 ms (45 allocs: 3.725 GiB, 32.62% gc time)
611.792 ms (45 allocs: 3.725 GiB, 30.78% gc time)
[For info, I have ‘inplace’ functions (e.g. f1!) that are faster than above but I am not looking for an inplace version.]
My configuration:
julia> versioninfo()
Julia Version 1.11.2
Commit 5e9a32e7af (2024-12-01 20:02 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 16 × Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 16 virtual cores)
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS = 8
I would be gratefully interested by any ideas.