I’m writing a performance critical module. While using @inbounds
, I can read elements (getindex
) about 50x faster than I can write elements (setindex!
) in two arrays of identical type and size. Both arrays are accessed at the same level of irregularity.
Is there anything I can do to speed up writing to an array?
Store is usually slower than load though usually not this much slower. It’s almost impossible to give advice without more information about the code.
Here is a reduced version of my code:
function f!{T<:Real}(A::MyImmutableType, x::Vector{T}, y::Vector{T}, z::Vector{T}, nchunks::Integer)
for chunk = 1:nchunks
@inbounds for i = 1:128
row_ind, col_ind = ind2sub(A.dims, A.ints[chunk][i])
zval = z[A.ints[chunk][i]]
xval = x[col_ind]
yval = y[row_ind]
y[row_ind] = yval + xval*zval
end
end
end
function f{T<:Real}(A::MyImmutableType, x::Vector{T}, y::Vector{T}, z::Vector{T}, nchunks::Integer)
y_scalar = 0.0
for chunk = 1:nchunks
@inbounds for i = 1:128
row_ind, col_ind = ind2sub(A.dims, A.ints[chunk][i])
zval = z[A.ints[chunk][i]]
xval = x[col_ind]
yval = y[row_ind]
y_scalar += yval + xv*zval
end
end
return y_scalar
end
The first function takes ~1.5 seconds to run and the second ~0.03 seconds. Any thoughts?
Keno
May 8, 2017, 3:35pm
4
You’ll see some performance regressions because of aliasing issues, but it shouldn’t be this much (I’d expect maybe 4x).
Keno
May 8, 2017, 3:36pm
5
Also you’ll probably get a better response if you post a complete runnable example (including the definition of MyImmutableType and the @time
macros).
1 Like