Speed up applying a function to matrices

I’ve been struggling to speed up this problem I’m working with. Apologies about posting this long snippet, I couldn’t think of a more minimal example that has all the features stumping me.

using BenchmarkTools
using FastGaussQuadrature
using NLsolve
using ComponentArrays

θ = ComponentArray{Float64}(m=(α=1.2, γ=[10.6, 10.03], θx=[-0.50, -0.005, 1.0]),
                            f=(α=1.2, γ=[10.6, 10.03], θx=[-0.50, -0.005, 1.0])
                        )

const ρ = 0.1

const glpoints = gausslaguerre(500)
const jump1 = 62
const jump2 = 65

const ll = 0.001
const ul = 500.0

transfθ(θ) = abs(θ)

function discountedintegral(f, r, T; glpoints=glpoints)
    x, w = glpoints
    exp(-r*T)/r * sum( w[i]*f(x[i]/r + T) for i in 1:length(x) )
end

function H̃i(i, ti, x; θ=θ, ρ=ρ, glpoints=glpoints)
    discountedintegral(s -> Zi(i, s; θ=θ), ρ, ti; glpoints=glpoints) * φi(i, x; θ=θ)
end

function Hi(i, si, xi; θ=θ)
    Zi(i, si; θ=θ) * φi(i, xi; θ=θ)
end

function φi(i, xi; θ=θ)
    if i == 1
        θx = θ.m.θx
    elseif i == 2
        θx = θ.f.θx
    end
    res = zero(eltype(θ))
    for i in 1:length(xi)
        res += θx[i + 1] * xi[i]
    end
    return exp(res + θx[1])
end

function Zi(i, si; θ=θ, jump1=jump1, jump2=jump2)
    if i == 1
        α = θ.m.α
        γ = θ.m.γ
    elseif i == 2
        α = θ.f.α
        γ = θ.f.γ
    end
    α = transfθ(α)
    γ .= transfθ.(γ)
    abs(si)^α + γ[1]*transitionF(si, (jump1 - 45)*4) + γ[2]*transitionF(si, (jump2 - 45)*4)
end

function transitionF(t, t̄)
    t < t̄                && return 0
    t̄ <= t < t̄ + 0.5     && return 2(t - t̄)^2
    t̄ + 0.5 <= t < t̄ + 1 && return 1 - 2(t - 1 - t̄)^2
    t̄ + 1 <= t           && return 1
end

function U1(t, x1; θ=θ, K=K, ρ=ρ, glpoints=glpoints)
    t1 = t[1]
    K1 = K[1]
    K1*ρ^(-1)*(1 - exp(-ρ*t1)) + H̃i(1, t1, x1; θ=θ, ρ=ρ, glpoints=glpoints)
end

function U2(t, x2; θ=θ, K=K, ρ=ρ, glpoints=glpoints)
    t2 = t[2]
    K2 = K[2]
    K2*ρ^(-1)*(1 - exp(-ρ*t2)) + H̃i(2, t2, x2; θ=θ, ρ=ρ, glpoints=glpoints) 
end

function uni_foc(sto, t; θ=θ, x=x, K=K)
    sto[1] = Hi(1, t[1], x[1]; θ=θ) - K[1]
    sto[2] = Hi(2, t[2], x[2]; θ=θ) - K[2]
end

function Aifun!(ret; x=x, θ=θ, K=K, uni_t0=[10.0, 11.0], ρ=ρ, uni_meth=:trust_region, autodiff=:forward,
                xtol=1e-6, ftol=1e-6)
    sol = nlsolve((sto, t) -> uni_foc(sto, t; θ=θ, x=x, K=K), uni_t0, method=uni_meth, autodiff=autodiff, xtol=xtol, ftol=ftol)
    ret[1] = U1(sol.zero, x[1]; θ=θ, K=K, ρ=ρ, glpoints=glpoints)
    ret[2] = U2(sol.zero, x[2]; θ=θ, K=K, ρ=ρ, glpoints=glpoints)
    return ret .* 0.6
end

x = [[10.0, 6.0], [6.0, 10.0]]
K = [1.6, 0.6]
ret = zeros(2)
Aifun!(ret)

It’s all good, but now suppose I have matrices for x and K. I’m trying to apply function Aifun! to those matrices:

matkk = abs.(randn(500, 2))
dataca = ComponentArray{Float64}(m=(x=rand(500, 2),), f=(x=rand(500, 2),))
Amat = similar(matkk)

function Aimat!(Amat, matkk, dataca; θ=θ, uni_t0=[10.0, 11.0], ρ=ρ, uni_meth=:trust_region, autodiff=:forward,
                xtol=1e-6, ftol=1e-6)
    Ai = similar(matkk, 2)
    Ki = similar(matkk, 2)
    covs = size(dataca.m.x, 2)
    xi = [similar(dataca.m.x, covs), similar(dataca.f.x, covs)]
    n = size(matkk, 1)
    for i in 1:n
        Ai[1] = Amat[i, 1]
        Ai[2] = Amat[i, 2]
        Ki[1] = matkk[i, 1]
        Ki[2] = matkk[i, 2]
        xi[1] = dataca.m.x[i, :]
        xi[2] = dataca.f.x[i, :]
        Aifun!(Ai; x=xi, θ=θ, K=Ki, uni_t0=uni_t0, ρ=ρ, uni_meth=uni_meth, autodiff=autodiff,
            xtol=xtol, ftol=ftol)
        Amat[i, 1] = Ai[1]
        Amat[i, 2] = Ai[2]
    end
    return Amat
end

Aimat!(Amat, matkk, dataca) works, but I’m looking for a more efficient and faster way, if there is any.

In these two lines it seems that you should be broadcasting, i. e. with .=

Also those two-element arrays could be static arrays (MArrays for convenience, at least). Maybe that speedups things a little. I’m assuming that the loop there is taking time. Probably it would be good to profile everything and see where are the bottlenecks.

Edit: looking again, probably these changes will not be related to the overall performance.

Since I want to return a mutated matrix Amat, I thought about using @view to access elements in the first 2 lines of the for loop:

        Ai[1] = @view Amat[i, 1]
        Ai[2] = @view Amat[i, 2]

but I get errors:

ERROR: MethodError: Cannot `convert` an object of type SubArray{Float64, 0, Matrix{Float64}, Tuple{Int64, Int64}, true} to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:250
  convert(::Type{T}, ::AbstractChar) where T<:Number at char.jl:180
  convert(::Type{T}, ::CartesianIndex{1}) where T<:Number at multidimensional.jl:136

Also because I want to return the mutated matrix, I have to add the last 2 lines:

        Amat[i, 1] = Ai[1]
        Amat[i, 2] = Ai[2]

Since Julia is column major, you generally want to slice along the other direction dataca.m.x[:, i], see Performance Tips · The Julia Language.

You also want to use views for slices, see Performance Tips · The Julia Language.

In general, the performance tips has a pretty good collection of stuff to try.

2 Likes

Yes, it feels like I am not using the loop efficiently, as I am calling Ai[1] = Amat[i, 1] at the beginning, and then reversing it at the end with Amat[i, 1] = Ai[1].

I tried views, but it seems that I cannot assign a view to be an element of a vector, so I’d have to bind those views to variables that I then have to compose into a vector, which is again not allowed. :man_shrugging:

Uh? Sure you can, the vector just has to be able to accept those views, i.e. its element type has to match the type of the view. In the case of a view, this is SubArray, which is an AbstractArray (or an AbstractVector, depending on the dimensionality of the view):

julia> v = AbstractVector[[1],[2],[3]]                          
3-element Vector{AbstractVector}:                               
 [1]                                                            
 [2]                                                            
 [3]                                                            
                                                                
julia> v[1] = view(v[2], 1:1)                                   
1-element view(::Vector{Int64}, 1:1) with eltype Int64:         
 2                                                              
                                                                
julia> v                                                        
3-element Vector{AbstractVector}:                               
 [2]                                                            
 [2]                                                            
 [3]                                                            
                                                                
julia> typeof(v[1])                                             
SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}
                                                                
julia> typeof(v[1]) |> supertype                                
AbstractVector{Int64} (alias for AbstractArray{Int64, 1})       

If you have a Matrix though, you can’t just set a single element equal to a vector (unless the elementtype of that Matrix is a vector as well, of course) - you’ll have to set the whole subset/column/row (and the shapes have to match, i.e. you can’t set a 2x3 matrix equal to an 3x2 matrix).

I changed Aimat! to this and still get an error

function Aimat!(Amat, matkk, dataca; θ=θ, uni_t0=[10.0, 11.0], ρ=ρ, uni_meth=:trust_region, autodiff=:forward,
                xtol=1e-6, ftol=1e-6)
    Ai = similar(matkk, SubArray{eltype(matkk)}, 2)
    Ki = similar(matkk, 2)
    covs = size(dataca.m.x, 2)
    xi = [similar(dataca.m.x, covs), similar(dataca.f.x, covs)]
    n = size(matkk, 1)
    for i in 1:n
        Ai[1] = view(Amat, i, 1)
        Ai[2] = view(Amat, i, 2)
        Ki[1] = matkk[i, 1]
        Ki[2] = matkk[i, 2]
        xi[1] = dataca.m.x[i, :]
        xi[2] = dataca.f.x[i, :]
        Aifun!(Ai; x=xi, θ=θ, K=Ki, uni_t0=uni_t0, ρ=ρ, uni_meth=uni_meth, autodiff=autodiff,
            xtol=xtol, ftol=ftol)
    end
    return Amat
end

julia> Aimat!(Amat, matkk, dataca)
ERROR: MethodError: Cannot `convert` an object of type 
  Float64 to an object of type 
  SubArray{Float64, N, P, I, L} where {N, P, I, L}

Sorry if this is clear already, but I have the impression that there is some confusion here. For example:

This is a vector of vectors, and you have preallocated the vectors it contains. Thus, when you do this:

You are just replacing the preallocated vector by a new array which is a slice of destacam.x. It would be more natural, since the vector is preallocated, to copy element-wise, i. e., with .=.

Alternatively, you may want to just put in the position xi[1] a view (not a copy) of the slice of destacam.x. In that case, use xi[1] = @view(dataca.m.x[i, :]), but then it does not really make sense to preallocate the elements of xi, or even having xi as a vector or vectors, instead of a temporary scalar (a label that is temporarily assigned to that slice) inside the loop.

The same goes for the other assignments/copies. Sorry not looking the code more carefully, but, for example:

If Amat[i,1] is an array and Ai is an array of arrays, this is not copying anything, just assigning a pointer:

julia> x = Vector{Vector{Float64}}(undef,2)
2-element Vector{Vector{Float64}}:
 #undef
 #undef

julia> y = rand(2);

julia> x[1] = y
2-element Vector{Float64}:
 0.6758178015247569
 0.060842494125993074

julia> x[1][1] = 0
0

julia> y # note that y[1] was changed
2-element Vector{Float64}:
 0.0
 0.060842494125993074

thus, it does not make sense to use “views” there.

I don’t think I can eliminate the allocations arising from creating [x1, x2] because I am extracting data from dataca with hardcoded “indices”. I will also work on transposing the data matrices to be able to access them in a more efficient way.

function Aimat!(Amat, matkk, dataca; θ=θ, ρ=ρ)
    n = size(matkk, 1)
    for i in 1:n
        Ai = @view(Amat[i, :])
        Ki = @view(matkk[i, :])
        x1 = @view(dataca.m.x[i, :])
        x2 = @view(dataca.f.x[i, :])
        Aifun!(Ai; x=[x1, x2], θ=θ, K=Ki, ρ=ρ)
    end
    return Amat
end