Reducing allocations (again)

I still don’t have great intuition for quickly spotting my bottlenecks due to unneeded allocations and the associated work around. Here is some code I’m trying to optimize right now:

function logpost(w,data,μ,Σ)
    post = 0.0
#    rightWrong = data[:,1]
#    total = data[:,2]
#    xvec = [1.0,0.0]
    for (idx,d) in enumerate(data[:,2])
        #xvec[2] = d  # = [d^x for x in [0,1] ]
        #display(xvec)
        #μ = log(1 + exp(-w' * xvec))  # This is playing the role of the likelihood.
        if data[idx,1] < 1e-2
            @timeit "calc post1" post -= log(exp(w' * [1.0,d]) + 1)
        else
            @timeit "calc post2" post += w' * [1.0,d] - log(exp(w' * [1.0,d]) + 1)        
        end
        @timeit "calc post3" post += logPriorGaussian(w,μ,Σ)
    end
    return post
end


function getDraws(post::Function, data::Array{Float64},ndraws::Int64,candsig2::Array{Float64},μ::Array{Float64}, Σ )::Array{Float64}
    
    w = Array{Float64}(undef,(ndraws,2))
    w[1,:] = Float64[-8,10]
    i = 2
    draw = Float64[0]
    tempw = Float64[-8,10]
    accepted = [0,0]
    while i <= ndraws
        for j in 1:2  # Loop over number of parameters that we are estimating.
            w[i,j] = w[i-1,j]  # Set the next draw to be the previous one.  It will stay this way unless we decide to accept a new draw.
            
            cand = Normal(w[i-1,j],sqrt(candsig2[j]))  # Center the envelope distribution on the current value of w
            @timeit "Draw normal" draw = rand(cand,1)
            @timeit "reassign" tempw[j] = w[i-1,j]
            @timeit "denom" denom = post(tempw,data,μ,Σ)
            tempw[j] = draw[1]
            @timeit "numerator" numerator = post(tempw,data,μ,Σ)
            tempw[j] = w[i-1,j]
            @timeit "calc" r = numerator - denom
            unif = Uniform(0,1)
            unifdraw = log.(rand(unif,1))
            if r >= 1.0
                w[i,j] = draw[1]
                tempw[j] = draw[1]
                accepted[j] += 1
            elseif r < 1.0 && unifdraw[1] < r
                w[i,j] = draw[1]
                tempw[j] = draw[1]
                accepted[j] += 1
            end
        end
        i += 1

    end
    println(accepted, "  ", ndraws)
    println("Acceptance percentage:", accepted[1]/ndraws, " ",accepted[2]/ndraws)
    return w
end


(There are some functions that are called here that I haven’t included, but could if you want to see them.)

Thanks in advance for any insights

You’ll notice that I am repeatedly defining and drawing from statistical distributions and the rand function returns an array instead of a number. Can I make rand return a scalar instead of a vector? Would that help?

This copies the column of data, which will cause lots of unnecessary allocations. You want @view data[:, 2] most likely.

This constructs a brand new Vector just to hold [1.0, d], then immediately reduces that down to a scalar via the dot product. Vector’s aren’t terribly expensive, but they’re not free and they always allocate memory. For small fixed-size arrays, you definitely want StaticArrays.jl. With StaticArrays, this could be:

w' * SVector(1.0, d)

or, if you prefer, w' * @SVector[1.0, d].

rand(unif)

Yes, and yes! rand() or rand(distribution) should return a scalar, and this is much cheaper than returning a 1-element vector.

Help me understand this. I know about SVectors, but in your modification, it seems that a new SVector will have to be created every time. How is that cheaper than a standard vector?

I see that you are peppering your code with @timeit already. I’d recommend using @btime from BenchmarkTools for sufficiently encapsulated methods instead (it will also show you allocations then). Did you ever consider using a profiler to get a high level picture of what your code is doing?

SVector usually gets its memory from the stack or even registers instead of the heap

Calls to this function account for nearly 80% of my allocations:

function logPriorGaussian(w,μ,Σ)::Float64
    a = MvNormal(μ,Σ)
    return log(pdf(a,w))
end

The parameters to the MVN don’t change so I suppose I could define it once outside the main loop and pass it in instead of defining it every time. Other than that, any way to speed this up?

If no statistician opposes (and stupid me doesn’t see a reason), go ahead.

If you let use see another MWE after this?

Instead of @timeit, a very nice tool for granular inspection of timings and allocations is this package: Timer Outputs