Vector multiply vector vs for loop, type and size dependent performance


#1

Hi,

I am trying to do x’*y vector multiplication. Then I have the function

 function vectorMvector_Int8(x::Array{Int8,1}, y::Array{Int8,1})
     result = x'*y
     result
 end

I have another version that using for loop to do the multiplication

 function vectorMvector2_Int8(x::Array{Int8,1}, y::Array{Int8,1})
     result=0
     n = size(x)[1]
     @inbounds @simd for i in 1:n
         result += x[i]*y[i]
     end
     result
 end

When the vectors are small, vector multiplication are slower than for loop
e.g. a=rand([Int8(0),Int8(1)], 100);b=rand([Int8(0),Int8(1)], 100)
benchmark median for vectorMvector is 75 ns while vectorMvector2 is 29 ns.

When vectors are large, for example a=rand([Int8(0),Int8(1)], 100000);b=rand([Int8(0),Int8(1)], 100000), benchmark median for vectorMvector is 11 μs while vectorMvector2 is 24 μs.

However when x,y is Int64 type, vectorMvector2 seems always a little bit faster than vectorMvector

 function vectorMvector_Int64(x::Array{Int64,1}, y::Array{Int64,1})
     result = x'*y
     result
 end
 function vectorMvector2_Int64(x::Array{Int64,1}, y::Array{Int64,1})
     result=0
     n = size(x)[1]
     @inbounds @simd for i in 1:n
         result += x[i]*y[i]
     end
     result
 end

a=rand([Int64(0),Int64(1)], 100);b=rand([Int64(0),Int64(1)], 100), 31 ns vs 28 ns.
a=rand([Int64(0),Int64(1)], 100000);b=rand([Int64(0),Int64(1)], 100000) 31.734 μs vs 28.916 μs

I prefer to use x’*y because it is more meaningful and less typing. But the performance is complicating my choices. Am I doing something wrong for the x’*y operation to cause slow performance?

Thank you!


#2

If it is always multiplying just 0’s and 1’s, you might try sum( x .& y)


#3

It is 3X slower in the Int64 datatype situation. Although it might relates to another issue, I feel doing element-wise operation then using the sum function is slower than explicitly sum them up like the for loop.

I realize that the Int8 version functions is not suitable for large vector, since it is likely to overthrow. I tested the Int64 version on more general vectors

a=rand(Int64(0):Int64(100), 100);b=rand(Int64(0):Int64(100), 100)
and larger size of the vector
a=rand(Int64(0):Int64(100), 100000);b=rand(Int64(0):Int64(100), 100000)

As expected the timing are exactly the same as the 0 & 1 vectors.


#4

Unless this is a learning exercise, can’t you use LinearAlgebra.dot?


#5

You know, your codes for the int8 and int64 input are identical, so it’s a bit weird to name them vectorMvector_Int8 and vectorMvector_Int64, etc. It just makes it more likely that some typo will introduce discrepancies. It’s also a bit grating, because it so flagrantly flies in the face of multiple dispatch.

Can’t you just write your code like this

function vectorMvector(x, y)
     x'*y
end

function vectorMvector2(x, y)
     result=0
     n = size(x)[1]
     @inbounds @simd for i in 1:n
         result += x[i]*y[i]
     end
     result
 end

and then run it on the various input types?

You should perhaps consider, though, to replace the initialization results = 0 with result = zero(eltype(x)), unless you specifically need the output to be Int64.


#6

x’*y is not dot multiply, right?


#7
julia> using LinearAlgebra

julia> N = 10
10

julia> x, y = rand(Int, N), rand(Int, N);

julia> x'*y == dot(x, y)
true

#8

I am sorry I am new for using Julia. I will refer more to the exist library. Thank you.

My actual problem is calling the dot operation multiple times in a for loop. Code manually the dot operation seems run faster than simply calling dot(). Again I am sure I did miss something. I really appreciate it if you can tell me how to do it right, since using the dot() function will make the code more clear.

function test(mat)
    n, m = size(mat)
    count = 0
    @inbounds for i in 1:m, j in i:m
        n, m = size(mat)
        temp_sum = 0
        #@views temp_sum = dot(mat[:,i], mat[:,j])
        for k in 1:n
            temp_sum += (mat[k,i]*mat[k,j])
        end
        if temp_sum==0
            count += 1
        end
    end
    count
end

I am using this to generate input a=rand([0,1], 22, 1000).


#9

Thank you. I am just switching from python+numba. It requires tell the type signatures. The ones you pointed out really work the same performance for different argument type combinations!


#10

Try views.


#11

Do you mean code as following?

function test(mat)
    n, m = size(mat)
    count = 0
    @inbounds for i in 1:m, j in i:m
        temp_sum = 0
        @views temp_sum = dot(mat[:,i], mat[:,j])
        if temp_sum==0
            count += 1
        end
    end
    count
end

#12

Yes, that should work.

Incidentally, I don’t see the purpose of

n, m = size(mat)

inside the loop.


#13

Note that x' * y actually calls dot:


#14

Sorry that was my typo. I corrected. By using for inside, benchmark shows 3.8 ms with 16 bytes and 1 allocation. By using @views dot(), benchmark shows 14.654 ms with 45.82 MB and 1001001