Speed of writing vs. reading elements in an array

performance

#1

I’m writing a performance critical module. While using @inbounds, I can read elements (getindex) about 50x faster than I can write elements (setindex!) in two arrays of identical type and size. Both arrays are accessed at the same level of irregularity.

Is there anything I can do to speed up writing to an array?


#2

Store is usually slower than load though usually not this much slower. It’s almost impossible to give advice without more information about the code.


#3

Here is a reduced version of my code:

function f!{T<:Real}(A::MyImmutableType, x::Vector{T}, y::Vector{T}, z::Vector{T}, nchunks::Integer)
    for chunk = 1:nchunks
        @inbounds for i = 1:128
            row_ind, col_ind = ind2sub(A.dims, A.ints[chunk][i])
            zval = z[A.ints[chunk][i]]
            xval = x[col_ind]
            yval = y[row_ind]
            y[row_ind] = yval + xval*zval
        end
    end
end

function f{T<:Real}(A::MyImmutableType, x::Vector{T}, y::Vector{T}, z::Vector{T}, nchunks::Integer)
    y_scalar = 0.0
    for chunk = 1:nchunks
        @inbounds for i = 1:128
            row_ind, col_ind = ind2sub(A.dims, A.ints[chunk][i])
            zval = z[A.ints[chunk][i]]
            xval = x[col_ind]
            yval = y[row_ind]
            y_scalar += yval + xv*zval
        end
    end
    return y_scalar
end

The first function takes ~1.5 seconds to run and the second ~0.03 seconds. Any thoughts?


#4

You’ll see some performance regressions because of aliasing issues, but it shouldn’t be this much (I’d expect maybe 4x).


#5

Also you’ll probably get a better response if you post a complete runnable example (including the definition of MyImmutableType and the @time macros).