Yeah, the problem in the first approach is that you are using non-constant global variables in your benchmarking. You are also using the same x and y which looks strange to me
julia> a = randn(2);
julia> b::typeof(a) = a
2-element Vector{Float64}:
-0.10486666637026627
0.25520465464656167
julia> a === b
true
Yes, division is more “expensive” in a couple of ways, neither of which matter in this case.
First, division has more latency than multiplication (so the CPU has to wait longer for the result). The compiler can get around that by scheduling other work in the meantime unless the result is needed immediately for the next computation.
Second, the CPU (depending on type) might be able to do more multiplication operations in parallel, because more silicon is allocated to this operation (which is used more often than division). However, this only matters if enough work can be supplied in parallel.
The operation that has the lowest throughput in this particular case is probably the memory read/writes, so the difference between multiplication and division “cost” is not important.
Yeah that first approach was confusing, especially cause x and y mean something different in the second approach.
I did the ::typeof thing cause I didn’t want a const, but not have the performance problem of non-typed globals.
is that it will lead to misleading benchmarking results if both are used in a ./ b since you only have to fetch half as much memory to perform the computation, and thus might be able to do it faster than if a !== b
This will affect also the second approach where you pass the arrays as variables, and is independent on non-constant global problems.
I don’t think inv(x) and 1/x could be any different in their result (for built-in Real types except maybe Irrational), due to the rules of IEEE754 arithmetic. I might believe inv could have a small performance benefit in some circumstance when the compiler can’t recognize the constant-ness of the 1 in 1/x.
When you are willing to sacrifice a bit or two of accuracy in the result, there can be performance benefits to these transformations. In many (most?) applications this is acceptable. But if full accuracy is a concern, you should absolutely not use x*inv(y) as a substitute for x/y. The compiler doesn’t do this transformation for you (except when inv(y) has an exact representation, such as for powers of 2) because they are different. For example,
The error is usually less than a bit which would be eps() ie the rounding error. You can see that will the following code. The first set mean and max shown are the error of one method vs the other. The next two sets are the errors relative to a bigfloat. In all the cases the errors are much smaller than epsilon
using Statistics
Tbetter=BigFloat
T=Float64
x=rand(T,100_000);
x= Tbetter.(x)
divisor = Tbetter(T(pi))
accuracy_error(v) = (T(T(v) * inv(T(divisor))) - T(T(v) / T(divisor)))/(T(T(v) / T(divisor)));
inv_accuracy(v) = (T(T(v) * inv(T(divisor))) - (v/divisor))/v
div_accuracy(v) = (T(T(v) / T(divisor)) - (v/divisor))/v;
##
@show T
@show eps(T);
##
@show mean(abs(accuracy_error(v)) for v in x);
@show maximum(abs(accuracy_error(v)) for v in x);
##
@show mean(abs(inv_accuracy(v)) for v in x);
@show maximum(abs(inv_accuracy(v)) for v in x);
##
@show mean(abs(div_accuracy(v)) for v in x);
@show maximum(abs(div_accuracy(v)) for v in x);
##
T = Float64
eps(T) = 2.220446049250313e-16
mean((abs(accuracy_error(v)) for v = x)) = 2.2765442951283494e-17
maximum((abs(accuracy_error(v)) for v = x)) = 2.2203276142503066e-16
mean((abs(inv_accuracy(v)) for v = x)) = 1.402660799251385796744785423641788904003187204601942703209097263778712947617089e-17
maximum((abs(inv_accuracy(v)) for v = x)) = 4.246814547038967479720962926321405463399871806338832126916190959044469092421897e-17
mean((abs(div_accuracy(v)) for v = x)) = 1.296835669171136569002146415223936518200939361869659888104630040244513160688326e-17
maximum((abs(div_accuracy(v)) for v = x)) = 3.518393627612048269235808282963237680727758091593581919586693839064292872920742e-17
Make the arrays bigger, and memory bandwidth will be the only thing that matters.
So if your question is about bigvec ./ scalar vs bigvec .* inv(scalar), the answer is, no, it doesn’t matter, because memory bandwidth is the limiting factor, not CPU performance.
But smallvec ./ scalar is much slower than smallvec .* inv(scalar).