EDIT: Reading your first post again it looks like you may eventually want this code to run on GPU. In that case, dot is probably the way to go, and if not, definitely mapreduce and not one of the sum variants below. But I don’t think having a zip in there is GPU friendly/compatible—I think the mapreduce invocation you want is the following:
mapreduce(*, +, local_patch, err)
The mapreduce expression is kind of obscure, and mapreduce is generally not as well optimized as it should be (it often creates unnecessary intermediate allocations). I’d recommend a simple and easy to understand generator sum:
sum(l * e for (l, e) in zip(local_patch, err))
or even
sum(local_patch[i] * err[i] for i in eachindex(local_patch, err))
Curious how they affect your benchmarks.
And to drive home the point from the thread above: @inbounds is dangerous, precisely because you can’t rely on anyone saving your bacon if the index is actually out of bounds. Sometimes you get a segfault, sometimes you corrupt some program state, and sometimes you simply get an arbitrary value, often but not always zero. Avoid @inbounds as much as you can, and if your benchmarks show that you really need it for performance, add a line of code in the same function, but outside the hot loop, that checks that all the indices that will be visited during the loop are inbounds. See this post for an example of how that might look: When does @inbounds increase performance? - #3 by danielwe