Reducing allocations (again)

I still don’t have great intuition for quickly spotting my bottlenecks due to unneeded allocations and the associated work around. Here is some code I’m trying to optimize right now:

function logpost(w,data,μ,Σ)
    post = 0.0
#    rightWrong = data[:,1]
#    total = data[:,2]
#    xvec = [1.0,0.0]
    for (idx,d) in enumerate(data[:,2])
        #xvec[2] = d  # = [d^x for x in [0,1] ]
        #display(xvec)
        #μ = log(1 + exp(-w' * xvec))  # This is playing the role of the likelihood.
        if data[idx,1] < 1e-2
            @timeit "calc post1" post -= log(exp(w' * [1.0,d]) + 1)
        else
            @timeit "calc post2" post += w' * [1.0,d] - log(exp(w' * [1.0,d]) + 1)        
        end
        @timeit "calc post3" post += logPriorGaussian(w,μ,Σ)
    end
    return post
end


function getDraws(post::Function, data::Array{Float64},ndraws::Int64,candsig2::Array{Float64},μ::Array{Float64}, Σ )::Array{Float64}
    
    w = Array{Float64}(undef,(ndraws,2))
    w[1,:] = Float64[-8,10]
    i = 2
    draw = Float64[0]
    tempw = Float64[-8,10]
    accepted = [0,0]
    while i <= ndraws
        for j in 1:2  # Loop over number of parameters that we are estimating.
            w[i,j] = w[i-1,j]  # Set the next draw to be the previous one.  It will stay this way unless we decide to accept a new draw.
            
            cand = Normal(w[i-1,j],sqrt(candsig2[j]))  # Center the envelope distribution on the current value of w
            @timeit "Draw normal" draw = rand(cand,1)
            @timeit "reassign" tempw[j] = w[i-1,j]
            @timeit "denom" denom = post(tempw,data,μ,Σ)
            tempw[j] = draw[1]
            @timeit "numerator" numerator = post(tempw,data,μ,Σ)
            tempw[j] = w[i-1,j]
            @timeit "calc" r = numerator - denom
            unif = Uniform(0,1)
            unifdraw = log.(rand(unif,1))
            if r >= 1.0
                w[i,j] = draw[1]
                tempw[j] = draw[1]
                accepted[j] += 1
            elseif r < 1.0 && unifdraw[1] < r
                w[i,j] = draw[1]
                tempw[j] = draw[1]
                accepted[j] += 1
            end
        end
        i += 1

    end
    println(accepted, "  ", ndraws)
    println("Acceptance percentage:", accepted[1]/ndraws, " ",accepted[2]/ndraws)
    return w
end


(There are some functions that are called here that I haven’t included, but could if you want to see them.)

Thanks in advance for any insights

You’ll notice that I am repeatedly defining and drawing from statistical distributions and the rand function returns an array instead of a number. Can I make rand return a scalar instead of a vector? Would that help?

This copies the column of data, which will cause lots of unnecessary allocations. You want @view data[:, 2] most likely.

This constructs a brand new Vector just to hold [1.0, d], then immediately reduces that down to a scalar via the dot product. Vector’s aren’t terribly expensive, but they’re not free and they always allocate memory. For small fixed-size arrays, you definitely want StaticArrays.jl. With StaticArrays, this could be:

w' * SVector(1.0, d)

or, if you prefer, w' * @SVector[1.0, d].

4 Likes

rand(unif)

3 Likes

Yes, and yes! rand() or rand(distribution) should return a scalar, and this is much cheaper than returning a 1-element vector.

3 Likes

Help me understand this. I know about SVectors, but in your modification, it seems that a new SVector will have to be created every time. How is that cheaper than a standard vector?

I see that you are peppering your code with @timeit already. I’d recommend using @btime from BenchmarkTools for sufficiently encapsulated methods instead (it will also show you allocations then). Did you ever consider using a profiler to get a high level picture of what your code is doing?

SVector usually gets its memory from the stack or even registers instead of the heap

2 Likes

Calls to this function account for nearly 80% of my allocations:

function logPriorGaussian(w,μ,Σ)::Float64
    a = MvNormal(μ,Σ)
    return log(pdf(a,w))
end

The parameters to the MVN don’t change so I suppose I could define it once outside the main loop and pass it in instead of defining it every time. Other than that, any way to speed this up?

1 Like

If no statistician opposes (and stupid me doesn’t see a reason), go ahead.

If you let use see another MWE after this?

Instead of @timeit, a very nice tool for granular inspection of timings and allocations is this package: Timer Outputs