I am trying to create a function for a direct (Not fft() based) 1D convolution.
Originally I used the classic implementation:
using BenchmarkTools;
function _Conv1D!( vO :: Array{T, 1}, vA :: Array{T, 1}, vB :: Array{T, 1} ) :: Array{T, 1} where {T <: Real}
lenA = length(vA);
lenB = length(vB);
fill!(vO, zero(T));
for idxB in 1:lenB
@simd for idxA in 1:lenA
@inbounds vO[idxA + idxB - 1] += vA[idxA] * vB[idxB];
end
end
return vO;
end
numSamplesA = 1000;
numSamplesB = 15;
vA = rand(numSamplesA);
vB = rand(numSamplesB);
vO = rand(numSamplesA + numSamplesB - 1);
@benchmark _Conv1D!($vO, $vA, $vB)
The code is pretty fast yet MATLAB’s conv() is faster.
It is crucial for me to get at least closer to MATLAB’s.
I read @Elrod 's blog post - Orthogonalize Indices.
I tried to generate a code where the index of the output array is linear:
using LinearAlgebra;
function Conv1D!( vO :: Array{T, 1}, vA :: Array{T, 1}, vB :: Array{T, 1} ) :: Array{T, 1} where {T <: Real}
lenA = length(vA);
lenB = length(vB);
lenO = length(vO);
vC = view(vB, lenB:-1:1);
@simd for ii in 1:lenO
# Rolling vB over vA
startIdxA = max(1, ii - lenB + 1);
endIdxA = min(lenA, ii);
startIdxC = max(lenB - ii + 1, 1);
endIdxC = min(lenB, lenO - ii + 1);
@inbounds vO[ii] = dot(view(vA, startIdxA:endIdxA), view(vC, startIdxC:endIdxC));
end
return vO;
end
Yet it made things much slower (~40 times slower).
I was wondering, is there a way to have a faster code?
Make it more SIMD friendly?