Hi all,
I wrote a very simple code to compute b=A*x and found that it is slow, please see the following codes to compare sum1, sum2 and sum3:
using TimerOutputs
const timer = TimerOutput()
function test(N=20000)
A = rand(N, N)
b = zeros(N)
x = rand(N)
@timeit timer "sum1" begin
for i in 1:N
f = 0.0
for j in 1:N
f += A[i, j] * x[j]
end
b[i] = f
end
end
@timeit timer "sum2" begin
for i in 1:N
f = 0.0
for j in 1:N
f = A[i, j] * x[j]
end
b[i] = f
end
end
@timeit timer "sum3" begin
for i in 1:N
f = 0.0
for j in 1:N
f += A[i, j] * x[j]
end
#b[i] = f
end
end
end
test()
println(timer)
The output is
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Time Allocations
βββββββββββββββββββββββ ββββββββββββββββββββββββ
Tot / % measured: 3.96s / 79.5% 3.00GiB / 0.0%
Section ncalls time %tot avg alloc %tot avg
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
sum1 1 2.90s 92.3% 2.90s 0.00B - % 0.00B
sum3 1 122ms 3.9% 122ms 0.00B - % 0.00B
sum2 1 121ms 3.8% 121ms 0.00B - % 0.00B
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
I just wonder why sum1 is much slower than sum2 or sum3.
The versioninfo() in the test computer gives
julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8 (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 24 Γ 13th Gen Intel(R) Core(TM) i7-13700F
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, goldmont)
Threads: 1 default, 0 interactive, 1 GC (on 24 virtual cores)
Environment:
JULIA_PKG_SERVER = https://mirrors.ustc.edu.cn/julia
JULIA_SSL_CA_ROOTS_PATH =