Vector multiply vector vs for loop, type and size dependent performance

Hi,

I am trying to do x’*y vector multiplication. Then I have the function

 function vectorMvector_Int8(x::Array{Int8,1}, y::Array{Int8,1})
     result = x'*y
     result
 end

I have another version that using for loop to do the multiplication

 function vectorMvector2_Int8(x::Array{Int8,1}, y::Array{Int8,1})
     result=0
     n = size(x)[1]
     @inbounds @simd for i in 1:n
         result += x[i]*y[i]
     end
     result
 end

When the vectors are small, vector multiplication are slower than for loop
e.g. a=rand([Int8(0),Int8(1)], 100);b=rand([Int8(0),Int8(1)], 100)
benchmark median for vectorMvector is 75 ns while vectorMvector2 is 29 ns.

When vectors are large, for example a=rand([Int8(0),Int8(1)], 100000);b=rand([Int8(0),Int8(1)], 100000), benchmark median for vectorMvector is 11 μs while vectorMvector2 is 24 μs.

However when x,y is Int64 type, vectorMvector2 seems always a little bit faster than vectorMvector

 function vectorMvector_Int64(x::Array{Int64,1}, y::Array{Int64,1})
     result = x'*y
     result
 end
 function vectorMvector2_Int64(x::Array{Int64,1}, y::Array{Int64,1})
     result=0
     n = size(x)[1]
     @inbounds @simd for i in 1:n
         result += x[i]*y[i]
     end
     result
 end

a=rand([Int64(0),Int64(1)], 100);b=rand([Int64(0),Int64(1)], 100), 31 ns vs 28 ns.
a=rand([Int64(0),Int64(1)], 100000);b=rand([Int64(0),Int64(1)], 100000) 31.734 μs vs 28.916 μs

I prefer to use x’*y because it is more meaningful and less typing. But the performance is complicating my choices. Am I doing something wrong for the x’*y operation to cause slow performance?

Thank you!

If it is always multiplying just 0’s and 1’s, you might try sum( x .& y)

It is 3X slower in the Int64 datatype situation. Although it might relates to another issue, I feel doing element-wise operation then using the sum function is slower than explicitly sum them up like the for loop.

I realize that the Int8 version functions is not suitable for large vector, since it is likely to overthrow. I tested the Int64 version on more general vectors

a=rand(Int64(0):Int64(100), 100);b=rand(Int64(0):Int64(100), 100)
and larger size of the vector
a=rand(Int64(0):Int64(100), 100000);b=rand(Int64(0):Int64(100), 100000)

As expected the timing are exactly the same as the 0 & 1 vectors.

Unless this is a learning exercise, can’t you use LinearAlgebra.dot?

You know, your codes for the int8 and int64 input are identical, so it’s a bit weird to name them vectorMvector_Int8 and vectorMvector_Int64, etc. It just makes it more likely that some typo will introduce discrepancies. It’s also a bit grating, because it so flagrantly flies in the face of multiple dispatch.

Can’t you just write your code like this

function vectorMvector(x, y)
     x'*y
end

function vectorMvector2(x, y)
     result=0
     n = size(x)[1]
     @inbounds @simd for i in 1:n
         result += x[i]*y[i]
     end
     result
 end

and then run it on the various input types?

You should perhaps consider, though, to replace the initialization results = 0 with result = zero(eltype(x)), unless you specifically need the output to be Int64.

1 Like

x’*y is not dot multiply, right?

julia> using LinearAlgebra

julia> N = 10
10

julia> x, y = rand(Int, N), rand(Int, N);

julia> x'*y == dot(x, y)
true
1 Like

I am sorry I am new for using Julia. I will refer more to the exist library. Thank you.

My actual problem is calling the dot operation multiple times in a for loop. Code manually the dot operation seems run faster than simply calling dot(). Again I am sure I did miss something. I really appreciate it if you can tell me how to do it right, since using the dot() function will make the code more clear.

function test(mat)
    n, m = size(mat)
    count = 0
    @inbounds for i in 1:m, j in i:m
        n, m = size(mat)
        temp_sum = 0
        #@views temp_sum = dot(mat[:,i], mat[:,j])
        for k in 1:n
            temp_sum += (mat[k,i]*mat[k,j])
        end
        if temp_sum==0
            count += 1
        end
    end
    count
end

I am using this to generate input a=rand([0,1], 22, 1000).

Thank you. I am just switching from python+numba. It requires tell the type signatures. The ones you pointed out really work the same performance for different argument type combinations!

Try views.

Do you mean code as following?

function test(mat)
    n, m = size(mat)
    count = 0
    @inbounds for i in 1:m, j in i:m
        temp_sum = 0
        @views temp_sum = dot(mat[:,i], mat[:,j])
        if temp_sum==0
            count += 1
        end
    end
    count
end

Yes, that should work.

Incidentally, I don’t see the purpose of

n, m = size(mat)

inside the loop.

Note that x' * y actually calls dot:

2 Likes

Sorry that was my typo. I corrected. By using for inside, benchmark shows 3.8 ms with 16 bytes and 1 allocation. By using @views dot(), benchmark shows 14.654 ms with 45.82 MB and 1001001