Speed up applying a function to matrices

amrods · August 12, 2021, 1:26am

I’ve been struggling to speed up this problem I’m working with. Apologies about posting this long snippet, I couldn’t think of a more minimal example that has all the features stumping me.

using BenchmarkTools
using FastGaussQuadrature
using NLsolve
using ComponentArrays

θ = ComponentArray{Float64}(m=(α=1.2, γ=[10.6, 10.03], θx=[-0.50, -0.005, 1.0]),
                            f=(α=1.2, γ=[10.6, 10.03], θx=[-0.50, -0.005, 1.0])
                        )

const ρ = 0.1

const glpoints = gausslaguerre(500)
const jump1 = 62
const jump2 = 65

const ll = 0.001
const ul = 500.0

transfθ(θ) = abs(θ)

function discountedintegral(f, r, T; glpoints=glpoints)
    x, w = glpoints
    exp(-r*T)/r * sum( w[i]*f(x[i]/r + T) for i in 1:length(x) )
end

function H̃i(i, ti, x; θ=θ, ρ=ρ, glpoints=glpoints)
    discountedintegral(s -> Zi(i, s; θ=θ), ρ, ti; glpoints=glpoints) * φi(i, x; θ=θ)
end

function Hi(i, si, xi; θ=θ)
    Zi(i, si; θ=θ) * φi(i, xi; θ=θ)
end

function φi(i, xi; θ=θ)
    if i == 1
        θx = θ.m.θx
    elseif i == 2
        θx = θ.f.θx
    end
    res = zero(eltype(θ))
    for i in 1:length(xi)
        res += θx[i + 1] * xi[i]
    end
    return exp(res + θx[1])
end

function Zi(i, si; θ=θ, jump1=jump1, jump2=jump2)
    if i == 1
        α = θ.m.α
        γ = θ.m.γ
    elseif i == 2
        α = θ.f.α
        γ = θ.f.γ
    end
    α = transfθ(α)
    γ .= transfθ.(γ)
    abs(si)^α + γ[1]*transitionF(si, (jump1 - 45)*4) + γ[2]*transitionF(si, (jump2 - 45)*4)
end

function transitionF(t, t̄)
    t < t̄                && return 0
    t̄ <= t < t̄ + 0.5     && return 2(t - t̄)^2
    t̄ + 0.5 <= t < t̄ + 1 && return 1 - 2(t - 1 - t̄)^2
    t̄ + 1 <= t           && return 1
end

function U1(t, x1; θ=θ, K=K, ρ=ρ, glpoints=glpoints)
    t1 = t[1]
    K1 = K[1]
    K1*ρ^(-1)*(1 - exp(-ρ*t1)) + H̃i(1, t1, x1; θ=θ, ρ=ρ, glpoints=glpoints)
end

function U2(t, x2; θ=θ, K=K, ρ=ρ, glpoints=glpoints)
    t2 = t[2]
    K2 = K[2]
    K2*ρ^(-1)*(1 - exp(-ρ*t2)) + H̃i(2, t2, x2; θ=θ, ρ=ρ, glpoints=glpoints) 
end

function uni_foc(sto, t; θ=θ, x=x, K=K)
    sto[1] = Hi(1, t[1], x[1]; θ=θ) - K[1]
    sto[2] = Hi(2, t[2], x[2]; θ=θ) - K[2]
end

function Aifun!(ret; x=x, θ=θ, K=K, uni_t0=[10.0, 11.0], ρ=ρ, uni_meth=:trust_region, autodiff=:forward,
                xtol=1e-6, ftol=1e-6)
    sol = nlsolve((sto, t) -> uni_foc(sto, t; θ=θ, x=x, K=K), uni_t0, method=uni_meth, autodiff=autodiff, xtol=xtol, ftol=ftol)
    ret[1] = U1(sol.zero, x[1]; θ=θ, K=K, ρ=ρ, glpoints=glpoints)
    ret[2] = U2(sol.zero, x[2]; θ=θ, K=K, ρ=ρ, glpoints=glpoints)
    return ret .* 0.6
end

x = [[10.0, 6.0], [6.0, 10.0]]
K = [1.6, 0.6]
ret = zeros(2)
Aifun!(ret)

It’s all good, but now suppose I have matrices for x and K. I’m trying to apply function Aifun! to those matrices:

matkk = abs.(randn(500, 2))
dataca = ComponentArray{Float64}(m=(x=rand(500, 2),), f=(x=rand(500, 2),))
Amat = similar(matkk)

function Aimat!(Amat, matkk, dataca; θ=θ, uni_t0=[10.0, 11.0], ρ=ρ, uni_meth=:trust_region, autodiff=:forward,
                xtol=1e-6, ftol=1e-6)
    Ai = similar(matkk, 2)
    Ki = similar(matkk, 2)
    covs = size(dataca.m.x, 2)
    xi = [similar(dataca.m.x, covs), similar(dataca.f.x, covs)]
    n = size(matkk, 1)
    for i in 1:n
        Ai[1] = Amat[i, 1]
        Ai[2] = Amat[i, 2]
        Ki[1] = matkk[i, 1]
        Ki[2] = matkk[i, 2]
        xi[1] = dataca.m.x[i, :]
        xi[2] = dataca.f.x[i, :]
        Aifun!(Ai; x=xi, θ=θ, K=Ki, uni_t0=uni_t0, ρ=ρ, uni_meth=uni_meth, autodiff=autodiff,
            xtol=xtol, ftol=ftol)
        Amat[i, 1] = Ai[1]
        Amat[i, 2] = Ai[2]
    end
    return Amat
end

Aimat!(Amat, matkk, dataca) works, but I’m looking for a more efficient and faster way, if there is any.

lmiq · August 12, 2021, 1:51am

In these two lines it seems that you should be broadcasting, i. e. with .=

Also those two-element arrays could be static arrays (MArrays for convenience, at least). Maybe that speedups things a little. I’m assuming that the loop there is taking time. Probably it would be good to profile everything and see where are the bottlenecks.

Edit: looking again, probably these changes will not be related to the overall performance.

amrods · August 12, 2021, 1:57am

Since I want to return a mutated matrix Amat, I thought about using @view to access elements in the first 2 lines of the for loop:

        Ai[1] = @view Amat[i, 1]
        Ai[2] = @view Amat[i, 2]

but I get errors:

ERROR: MethodError: Cannot `convert` an object of type SubArray{Float64, 0, Matrix{Float64}, Tuple{Int64, Int64}, true} to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:250
  convert(::Type{T}, ::AbstractChar) where T<:Number at char.jl:180
  convert(::Type{T}, ::CartesianIndex{1}) where T<:Number at multidimensional.jl:136

Also because I want to return the mutated matrix, I have to add the last 2 lines:

        Amat[i, 1] = Ai[1]
        Amat[i, 2] = Ai[2]

kristoffer.carlsson · August 12, 2021, 8:25am

Since Julia is column major, you generally want to slice along the other direction dataca.m.x[:, i], see Performance Tips · The Julia Language.

You also want to use views for slices, see Performance Tips · The Julia Language.

In general, the performance tips has a pretty good collection of stuff to try.

amrods · August 12, 2021, 10:18am

Yes, it feels like I am not using the loop efficiently, as I am calling Ai[1] = Amat[i, 1] at the beginning, and then reversing it at the end with Amat[i, 1] = Ai[1].

I tried views, but it seems that I cannot assign a view to be an element of a vector, so I’d have to bind those views to variables that I then have to compose into a vector, which is again not allowed.

Sukera · August 12, 2021, 10:42am

Uh? Sure you can, the vector just has to be able to accept those views, i.e. its element type has to match the type of the view. In the case of a view, this is SubArray, which is an AbstractArray (or an AbstractVector, depending on the dimensionality of the view):

julia> v = AbstractVector[[1],[2],[3]]                          
3-element Vector{AbstractVector}:                               
 [1]                                                            
 [2]                                                            
 [3]                                                            
                                                                
julia> v[1] = view(v[2], 1:1)                                   
1-element view(::Vector{Int64}, 1:1) with eltype Int64:         
 2                                                              
                                                                
julia> v                                                        
3-element Vector{AbstractVector}:                               
 [2]                                                            
 [2]                                                            
 [3]                                                            
                                                                
julia> typeof(v[1])                                             
SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}
                                                                
julia> typeof(v[1]) |> supertype                                
AbstractVector{Int64} (alias for AbstractArray{Int64, 1})

If you have a Matrix though, you can’t just set a single element equal to a vector (unless the elementtype of that Matrix is a vector as well, of course) - you’ll have to set the whole subset/column/row (and the shapes have to match, i.e. you can’t set a 2x3 matrix equal to an 3x2 matrix).

amrods · August 12, 2021, 10:59am

I changed Aimat! to this and still get an error

function Aimat!(Amat, matkk, dataca; θ=θ, uni_t0=[10.0, 11.0], ρ=ρ, uni_meth=:trust_region, autodiff=:forward,
                xtol=1e-6, ftol=1e-6)
    Ai = similar(matkk, SubArray{eltype(matkk)}, 2)
    Ki = similar(matkk, 2)
    covs = size(dataca.m.x, 2)
    xi = [similar(dataca.m.x, covs), similar(dataca.f.x, covs)]
    n = size(matkk, 1)
    for i in 1:n
        Ai[1] = view(Amat, i, 1)
        Ai[2] = view(Amat, i, 2)
        Ki[1] = matkk[i, 1]
        Ki[2] = matkk[i, 2]
        xi[1] = dataca.m.x[i, :]
        xi[2] = dataca.f.x[i, :]
        Aifun!(Ai; x=xi, θ=θ, K=Ki, uni_t0=uni_t0, ρ=ρ, uni_meth=uni_meth, autodiff=autodiff,
            xtol=xtol, ftol=ftol)
    end
    return Amat
end

julia> Aimat!(Amat, matkk, dataca)
ERROR: MethodError: Cannot `convert` an object of type 
  Float64 to an object of type 
  SubArray{Float64, N, P, I, L} where {N, P, I, L}

lmiq · August 12, 2021, 11:11am

Sorry if this is clear already, but I have the impression that there is some confusion here. For example:

This is a vector of vectors, and you have preallocated the vectors it contains. Thus, when you do this:

You are just replacing the preallocated vector by a new array which is a slice of destacam.x. It would be more natural, since the vector is preallocated, to copy element-wise, i. e., with .=.

Alternatively, you may want to just put in the position xi[1] a view (not a copy) of the slice of destacam.x. In that case, use xi[1] = @view(dataca.m.x[i, :]), but then it does not really make sense to preallocate the elements of xi, or even having xi as a vector or vectors, instead of a temporary scalar (a label that is temporarily assigned to that slice) inside the loop.

The same goes for the other assignments/copies. Sorry not looking the code more carefully, but, for example:

If Amat[i,1] is an array and Ai is an array of arrays, this is not copying anything, just assigning a pointer:

julia> x = Vector{Vector{Float64}}(undef,2)
2-element Vector{Vector{Float64}}:
 #undef
 #undef

julia> y = rand(2);

julia> x[1] = y
2-element Vector{Float64}:
 0.6758178015247569
 0.060842494125993074

julia> x[1][1] = 0
0

julia> y # note that y[1] was changed
2-element Vector{Float64}:
 0.0
 0.060842494125993074

thus, it does not make sense to use “views” there.

amrods · August 13, 2021, 1:52am

I don’t think I can eliminate the allocations arising from creating [x1, x2] because I am extracting data from dataca with hardcoded “indices”. I will also work on transposing the data matrices to be able to access them in a more efficient way.

function Aimat!(Amat, matkk, dataca; θ=θ, ρ=ρ)
    n = size(matkk, 1)
    for i in 1:n
        Ai = @view(Amat[i, :])
        Ki = @view(matkk[i, :])
        x1 = @view(dataca.m.x[i, :])
        x2 = @view(dataca.f.x[i, :])
        Aifun!(Ai; x=[x1, x2], θ=θ, K=Ki, ρ=ρ)
    end
    return Amat
end

Topic		Replies	Views
Any suggestion for speeding up my function? Performance question	11	427	November 17, 2022
Could anyone make this run faster? Performance	21	554	December 26, 2022
Improving Performance of a Loop Performance	9	416	November 3, 2021
Matrix performance General Usage performance	24	1616	August 23, 2017
Improving speed with iterative sums and functions within functions Performance question	4	312	January 17, 2024

Speed up applying a function to matrices

Related topics