Performance of naive convolution against Python Numpy

In case anyone bumps into this thread again and wonders the same things than I am, @tturbo uses multithreading by default but you need to start Julia with several threads to see the difference. So for a fair comparison:

julia
julia> @btime C=np.convolve(A, B, "full") setup=(A=rand(10000); B=rand(10000)) evals=100;
  18.287 ms (41 allocations: 158.09 KiB)

julia> @btime naive_convol_full!(D,A,B) setup=(A=rand(10000); B=rand(10000); D=zeros(length(A)+length(B)-1)) evals=100;
  11.768 ms (0 allocations: 0 bytes)

and with

julia -t 16
julia> @btime C=np.convolve(A, B, "full") setup=(A=rand(10000); B=rand(10000)) evals=100;
  18.515 ms (41 allocations: 158.09 KiB)

julia> @btime naive_convol_full!(D,A,B) setup=(A=rand(10000); B=rand(10000); D=zeros(length(A)+length(B)-1)) evals=100;
  6.763 ms (0 allocations: 0 bytes)
6 Likes