# Vector multiply vector vs for loop, type and size dependent performance

Hi,

I am trying to do x’*y vector multiplication. Then I have the function

`````` function vectorMvector_Int8(x::Array{Int8,1}, y::Array{Int8,1})
result = x'*y
result
end
``````

I have another version that using for loop to do the multiplication

`````` function vectorMvector2_Int8(x::Array{Int8,1}, y::Array{Int8,1})
result=0
n = size(x)
@inbounds @simd for i in 1:n
result += x[i]*y[i]
end
result
end
``````

When the vectors are small, vector multiplication are slower than for loop
e.g. a=rand([Int8(0),Int8(1)], 100);b=rand([Int8(0),Int8(1)], 100)
benchmark median for vectorMvector is 75 ns while vectorMvector2 is 29 ns.

When vectors are large, for example a=rand([Int8(0),Int8(1)], 100000);b=rand([Int8(0),Int8(1)], 100000), benchmark median for vectorMvector is 11 μs while vectorMvector2 is 24 μs.

However when x,y is Int64 type, vectorMvector2 seems always a little bit faster than vectorMvector

`````` function vectorMvector_Int64(x::Array{Int64,1}, y::Array{Int64,1})
result = x'*y
result
end
``````
`````` function vectorMvector2_Int64(x::Array{Int64,1}, y::Array{Int64,1})
result=0
n = size(x)
@inbounds @simd for i in 1:n
result += x[i]*y[i]
end
result
end
``````

a=rand([Int64(0),Int64(1)], 100);b=rand([Int64(0),Int64(1)], 100), 31 ns vs 28 ns.
a=rand([Int64(0),Int64(1)], 100000);b=rand([Int64(0),Int64(1)], 100000) 31.734 μs vs 28.916 μs

I prefer to use x’*y because it is more meaningful and less typing. But the performance is complicating my choices. Am I doing something wrong for the x’*y operation to cause slow performance?

Thank you!

If it is always multiplying just 0’s and 1’s, you might try `sum( x .& y)`

It is 3X slower in the Int64 datatype situation. Although it might relates to another issue, I feel doing element-wise operation then using the sum function is slower than explicitly sum them up like the for loop.

I realize that the Int8 version functions is not suitable for large vector, since it is likely to overthrow. I tested the Int64 version on more general vectors

a=rand(Int64(0):Int64(100), 100);b=rand(Int64(0):Int64(100), 100)
and larger size of the vector
a=rand(Int64(0):Int64(100), 100000);b=rand(Int64(0):Int64(100), 100000)

As expected the timing are exactly the same as the 0 & 1 vectors.

Unless this is a learning exercise, can’t you use `LinearAlgebra.dot`?

You know, your codes for the int8 and int64 input are identical, so it’s a bit weird to name them `vectorMvector_Int8` and `vectorMvector_Int64`, etc. It just makes it more likely that some typo will introduce discrepancies. It’s also a bit grating, because it so flagrantly flies in the face of multiple dispatch.

Can’t you just write your code like this

``````function vectorMvector(x, y)
x'*y
end

function vectorMvector2(x, y)
result=0
n = size(x)
@inbounds @simd for i in 1:n
result += x[i]*y[i]
end
result
end
``````

and then run it on the various input types?

You should perhaps consider, though, to replace the initialization `results = 0` with `result = zero(eltype(x))`, unless you specifically need the output to be `Int64`.

1 Like

x’*y is not dot multiply, right?

``````julia> using LinearAlgebra

julia> N = 10
10

julia> x, y = rand(Int, N), rand(Int, N);

julia> x'*y == dot(x, y)
true
``````
1 Like

I am sorry I am new for using Julia. I will refer more to the exist library. Thank you.

My actual problem is calling the dot operation multiple times in a for loop. Code manually the dot operation seems run faster than simply calling dot(). Again I am sure I did miss something. I really appreciate it if you can tell me how to do it right, since using the dot() function will make the code more clear.

``````function test(mat)
n, m = size(mat)
count = 0
@inbounds for i in 1:m, j in i:m
n, m = size(mat)
temp_sum = 0
#@views temp_sum = dot(mat[:,i], mat[:,j])
for k in 1:n
temp_sum += (mat[k,i]*mat[k,j])
end
if temp_sum==0
count += 1
end
end
count
end
``````

I am using this to generate input a=rand([0,1], 22, 1000).

Thank you. I am just switching from python+numba. It requires tell the type signatures. The ones you pointed out really work the same performance for different argument type combinations!

Try views.

Do you mean code as following?

``````function test(mat)
n, m = size(mat)
count = 0
@inbounds for i in 1:m, j in i:m
temp_sum = 0
@views temp_sum = dot(mat[:,i], mat[:,j])
if temp_sum==0
count += 1
end
end
count
end
``````

Yes, that should work.

Incidentally, I don’t see the purpose of

``````n, m = size(mat)
``````

inside the loop.

Note that `x' * y` actually calls `dot`:

2 Likes

Sorry that was my typo. I corrected. By using for inside, benchmark shows 3.8 ms with 16 bytes and 1 allocation. By using @views dot(), benchmark shows 14.654 ms with 45.82 MB and 1001001