If you use the dot product formulation, that is true. But matrix vector multiplication can also be viewed as forming a linear combination of the columns of A using the elements of x. In fact, I think most texts on introductory linear algebra start with the linear combination definition. In code the two variants might look like:
function mul_dot(A,x)
m, n = size(A)
result = zeros(eltype(x), m)
for j in 1:m
for k in 1:n
result[j] += A[j,k] * x[k]
end
end
return result
end
function mul_comb(A,x)
m, n = size(A)
result = zeros(eltype(x), m)
for k in 1:n
for j in 1:m
result[j] += A[j,k] * x[k]
end
end
return result
end
The first one is cache-friendly for row-major order and the second is cache-friendly for column-major order. And if you need matrix-matrix multiplication you need to do blocking to be cache friendly, which is what is done in BLAS.