I have an array of vectors (of different lengths) and I want to perform a point wise operation on all values:
a= [[1 2 3 4], [1 2 3 4 5], [1 2 3]]
b = a .-1
This give me an error.
MethodError: no method matching -(::Array{Int64,2}, ::Int64)
Closest candidates are:
-(!Matched::Complex{Bool}, ::Real) at complex.jl:307
-(!Matched::Missing, ::Number) at missing.jl:115
-(!Matched::Base.CoreLogging.LogLevel, ::Integer) at logging.jl:108
…
Stacktrace:
[1] _broadcast_getindex_evalf at .\broadcast.jl:631 [inlined]
[2] _broadcast_getindex at .\broadcast.jl:604 [inlined]
[3] getindex at .\broadcast.jl:564 [inlined]
[4] copy at .\broadcast.jl:854 [inlined]
[5] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(-),Tuple{Array{Array{Int64,2},1},Int64}}) at .\broadcast.jl:820
[6] top-level scope at In[95]:8
How do I perform this operation simply without a loop?
While this is likely true in this particular case, loops can often be pretty far from the fastest way. Think dot from BLAS vs for-loop implementation - even for vectors a proper dot is several times faster.
This doesn’t really have anything to do with loops, as the BLAS functions are also implemented with loops. They are just using various extra clever techniques to make sure the loops simd-vectorize and are cache optimal, multithreaded, etc. etc.
For example:
function mydot(x, y)
length(x) == length(y) || error("Incompatible lengths")
out = zero(eltype(x))
@simd for i in eachindex(x, y)
@inbounds out += x[i] * y[i]
end
return out
end
julia> using LinearAlgebra, BenchmarkTools
julia> @btime dot(x, y) setup=(x=rand(10_000); y=rand(10_000))
2.098 μs (0 allocations: 0 bytes)
julia> @btime mydot(x, y) setup=(x=rand(10_000); y=rand(10_000))
2.090 μs (0 allocations: 0 bytes)
For longer vectors, multithreading becomes important, and I should add that to my loop to keep up with LinearAlgebra.dot, but it’s all loops.
My point being: Sometimes someone has worked hard on optimally implementing a really fast algorithm, and of course then you should call their function. But there is no reason to use broadcasting or map or comprehensions, for the sake of performance only. Rolling your own version with a straight loop has every chance of matching or outperforming these other solutions.