Simple Mat-Vec multiply (understanding performance, without the bugs)

Tullio is brilliant, not only do you get threads and avx for free, you get a concise notation that makes your code super clear and way less bug prone. I’m immediately adopting it for whenever I need to write array twiddling!

I’ll test the speed and post a comparison in a few hours. Thanks for this!

Ok, just to see how well it works… here’s the Tullio version:


function matmultull(A,v)
    if size(A,2) != length(v)
        throw(DimensionMismatch("second dimension of A, $size(A,2), does not match length of v, $length(v)"))
    end
    B = copy(v)
    @tullio B[i] = A[i,j] *v[j]
    return B
end

ZERO loops for that undergrad to get wrong.

How’s the speed? EXACTLY the same as with my hand written undergradish loop code + @avx (and basically the same as built in *)

julia> @benchmark matmul($testA,$testV)
BenchmarkTools.Trial: 
  memory estimate:  14.80 KiB
  allocs estimate:  24
  --------------
  minimum time:     990.039 μs (0.00% GC)
  median time:      1.084 ms (0.00% GC)
  mean time:        1.102 ms (0.00% GC)
  maximum time:     1.707 ms (0.00% GC)
  --------------
  samples:          4508
  evals/sample:     1

julia> @benchmark matmultull($testA,$testV)
BenchmarkTools.Trial: 
  memory estimate:  18.55 KiB
  allocs estimate:  99
  --------------
  minimum time:     993.795 μs (0.00% GC)
  median time:      1.054 ms (0.00% GC)
  mean time:        1.089 ms (0.00% GC)
  maximum time:     2.290 ms (0.00% GC)
  --------------
  samples:          4562
  evals/sample:     1

 @benchmark $testA * $testV
BenchmarkTools.Trial: 
  memory estimate:  11.88 KiB
  allocs estimate:  1
  --------------
  minimum time:     831.787 μs (0.00% GC)
  median time:      924.277 μs (0.00% GC)
  mean time:        1.027 ms (0.00% GC)
  maximum time:     10.739 ms (0.00% GC)
  --------------
  samples:          4811
  evals/sample:     1

Thanks again!

2 Likes