Speeding up contraction mapping

rubaiyat · April 7, 2022, 12:56am

Hello everyone,

I’m implementing the demand estimation for Berry-Levinsohn-Pakes (1995) for my research, and it would be great to get some help on this. While my code works, it takes a while, and so I was looking to speed it up.

I have about 6600 markets and a contraction mapping has to be done for each market separately . There are about 2-20 products per market. It takes anywhere between 40-200 iterations per market for convergence to happen, and so for a given set of values the whole process takes a while. I was curious to see if anyone had suggestions for how to speed up each iteration.

Here’s my code:

ns = size(draws.v1, 1)

function get_params(sigma, draws)
	betai = [sigma[1] .* draws.v1 sigma[2] .* draws.v2] # ns x 2 matrix
end


function calc_indiv_util(c)
	sum_of_exputils_indiv = 1 + sum(c)
	choice_prob_indiv = c ./ sum_of_exputils_indiv
	return choice_prob_indiv
end

function calc_predicted_shares(betai, delta, mkt, ns)
	
	mkt_obs = size(mkt, 1)
	X2_mkt = [mkt.var1 mkt.var2] 
	u1 = repeat(delta, outer=(1, ns)) + X2_mkt * betai'	
	exputils = exp.(u1)

	p_ijt = mapslices(calc_indiv_util, exputils, dims=1)
	p_jt = mapslices(sum, p_ijt, dims=2) ./ ns	
	
	return p_jt

end

function contraction_mapping(betai, ns, data, market_ids, num_mkts, maxtol, maxiter)

	tuning_par = 1.0
	delta_container = Vector{Any}(missing, num_mkts) 
	
	 Threads.@threads for i = 1:maximum(data.market_ids) 
		mkt = data[data.market_ids .== i,:]
		local delta =  log.(mkt.s_jt ./ mkt.s_0jt) 
		local iter = 1.0
		local tol = 1e6
		
		while ( (tol > maxtol) && (iter < maxiter) )
			new_delta = delta + tuning_par .* ( log.(mkt.s_jt) - log.(calc_predicted_shares(betai, delta, mkt, ns))  )
			tol = abs(maximum(new_delta .- delta))
			delta = new_delta
			iter = iter + 1
		end
		delta_container[i] = delta 
	end
	delta_full = reduce(vcat, delta_container)

	return delta_full
end
	
betai = get_params(sigma, draws)
delta = contraction_mapping(betai, ns, data, market_ids, num_mkts, maxtol, maxiter)

EDIT: There is now a MWE in the replies.

I also realize a lot of people might be confused about the reasoning behind the algorithm itself, so my question is asking a lot - but really any general help for speeding up iterations/function calls/variable types that you see here should be helpful. I’d think a decent number of economists use BLP 1995 and/or contraction mapping in their work, so the answers should be helpful to a wide audience. I’d also prefer to use my own code than a package implementing BLP written by someone else, although I’m happy to take a look and pick up ideas.

pdeffebach · April 7, 2022, 1:04am

Lot’s of free lunches here.

	delta_container = Vector{Any}(missing, num_mkts)

should probably be narrowed. Try Vector{Float64}(undef, num_mks). Unless num_mkts is small, in which case it doesn’t matter. Or zeros(num_mks)

mkt = data[data.market_ids .== i,:]

Assuming data is a DataFrame, this is very bad for performance. DataFrames are type unstable. But this doesn’t always have to mean bad performance. Create that vector of vectors earlier and use them in the loop. One thing that helps this is DataFramesMeta.jl’s @with which lets you make an anonymous function and work with data frames columnes in a type-stable way.

You allocate a lot of small arrays, such as [sigma[1] .* draws.v1 sigma[2] .* draws.v2]. Pass an array into the function and mutate it, or use a static array.

It’s hard to really dig into this without an MWE. Isolate as many things as possible into separate functions if you need to in order to let us replicate your slowness without knowing your full dataset.

rubaiyat · April 7, 2022, 1:48am

Okay here’s a MWE with some fake data. This has just one market; hence me commenting out the parts of the code that looped over each market. I’ve also added a println so that you can see how many iterations it took. Note that the original post also goes over the loop per market, which can (hopefully) be optimized?

using DataFramesMeta
using HaltonSequences
using Distributions

data = DataFrame(s_jt = [0.0011, 0.0131, 0.059, 0.074], s_0jt = repeat([0.80], 4), var1 = [-0.255, -1.35, -1.39, -1.67], var2 = [0, 1, 1, 1]) 

h1 = Halton(3, start=1000, length=100)
h2 = Halton(7, start=1000, length=100)

draws = DataFrame()
draws.v1 = Distributions.quantile(Normal(0, 1), h1)
draws.v2 = Distributions.quantile(Normal(0, 1), h2)

ns = size(draws.v1, 1)
sigma = [0.3, 5]
market_ids = 0
num_mkts = 1

maxtol = 1e-12
maxiter = 1900
ns = size(draws.v1, 1)

function get_params(sigma, draws)
	betai = [sigma[1] .* draws.v1 sigma[2] .* draws.v2] # ns x 2 matrix
end


function calc_indiv_util(c)
	sum_of_exputils_indiv = 1 + sum(c)
	choice_prob_indiv = c ./ sum_of_exputils_indiv
	return choice_prob_indiv
end

function calc_predicted_shares(betai, delta, mkt, ns)
	
	mkt_obs = size(mkt, 1)
	X2_mkt = [mkt.var1 mkt.var2] 
	u1 = repeat(delta, outer=(1, ns)) + X2_mkt * betai'	
	exputils = exp.(u1)

	p_ijt = mapslices(calc_indiv_util, exputils, dims=1)
	p_jt = mapslices(sum, p_ijt, dims=2) ./ ns	
	
	return p_jt

end

function contraction_mapping(betai, ns, data, market_ids, num_mkts, maxtol, maxiter)

	tuning_par = 1.0
	delta_container = Vector{Any}(missing, num_mkts) 
	
	 #Threads.@threads for i = 1:maximum(data.market_ids) 
		#mkt = data[data.market_ids .== i,:]
		mkt = data
		local delta =  log.(mkt.s_jt ./ mkt.s_0jt) 
		local iter = 1.0
		local tol = 1e6
		
		while ( (tol > maxtol) && (iter < maxiter) )
			new_delta = delta + tuning_par .* ( log.(mkt.s_jt) - log.(calc_predicted_shares(betai, delta, mkt, ns))  )
			tol = abs(maximum(new_delta .- delta))
			delta = new_delta
			iter = iter + 1
		end
		println(iter)
		i = 1
		delta_container[i] = delta 
	#end
	delta_full = reduce(vcat, delta_container)

	return delta_full
end
	
betai = get_params(sigma, draws)
delta = contraction_mapping(betai, ns, data, market_ids, num_mkts, maxtol, maxiter)

goerch · April 7, 2022, 6:31am

Interesting. Profiler shows some type instabilities. @pdeffebach is correct in that the usage of DataFrames in your kernels is dangerous, I changed them to NamedTuple in the MWE. Furthermore inhomogeneous types for data.var1 and data.var2 lead to Matrix{Any} construction in [mkt.var1 mkt.var2] (similar problem in [sigma[1] .* draws.v1 sigma[2] .* draws.v2]), so I tried to change these to have eltype Float64. The next dynamic dispatch was diagnosed here

	p_ijt = mapslices(calc_indiv_util, exputils, dims=1)::Matrix{Float64}
	p_jt = mapslices(sum, p_ijt, dims=2)::Matrix{Float64} ./ ns

which I fixed locally with type annotations. Complete code

using DataFramesMeta
using HaltonSequences
using Distributions
using BenchmarkTools

function get_params(sigma, draws)
	betai = [sigma[1] .* draws.v1 sigma[2] .* draws.v2] # ns x 2 matrix
end

function calc_indiv_util(c)
	sum_of_exputils_indiv = 1 + sum(c)
	choice_prob_indiv = c ./ sum_of_exputils_indiv
	return choice_prob_indiv
end

function calc_predicted_shares(betai, delta, mkt, ns)

	X2_mkt = [mkt.var1 mkt.var2]
	u1 = repeat(delta, outer=(1, ns)) + X2_mkt * betai'
	exputils = exp.(u1)

	p_ijt = mapslices(calc_indiv_util, exputils, dims=1)::Matrix{Float64}
	p_jt = mapslices(sum, p_ijt, dims=2)::Matrix{Float64} ./ ns

	return p_jt

end

function contraction_mapping(betai, ns, data, market_ids, num_mkts, maxtol, maxiter)

	tuning_par = 1.0
	delta_container = Vector{Any}(undef, num_mkts)

	 #Threads.@threads for i = 1:maximum(data.market_ids)
		#mkt = data[data.market_ids .== i,:]
		mkt = data
		local delta =  log.(mkt.s_jt ./ mkt.s_0jt)
		local iter = 1.0
		local tol = 1e6

		while ( (tol > maxtol) && (iter < maxiter) )
			new_delta = delta + tuning_par .* ( log.(mkt.s_jt) - log.(calc_predicted_shares(betai, delta, mkt, ns))  )
			tol = abs(maximum(new_delta .- delta))
			delta = new_delta
			iter = iter + 1
		end
		i = 1
		delta_container[i] = delta
	#end
	delta_full = reduce(vcat, delta_container)

	return delta_full
end

function test()
	data = (
		s_jt = Float64[0.0011, 0.0131, 0.059, 0.074],
		s_0jt = repeat([0.80], 4),
		var1 = Float64[-0.255, -1.35, -1.39, -1.67],
		var2 = Float64[0, 1, 1, 1])

	h1 = Halton(3, start=1000, length=100)
	h2 = Halton(7, start=1000, length=100)

	draws = (
		v1 = Distributions.quantile(Normal(0, 1), h1),
		v2 = Distributions.quantile(Normal(0, 1), h2))

	ns = size(draws.v1, 1)
	sigma = Float64[0.3, 5]
	market_ids = 0
	num_mkts = 1

	maxtol = 1e-12
	maxiter = 1900
	ns = size(draws.v1, 1)

	betai = get_params(sigma, draws)
	@btime delta = contraction_mapping($betai, $ns, $data, $market_ids, $num_mkts, $maxtol, $maxiter)
end

Benchmark before

 3.709 ms (53659 allocations: 3.30 MiB)

and after

 3.275 ms (52396 allocations: 3.28 MiB)

Now the runtime of contraction_mapping seems to be dominated by

	p_ijt = mapslices(calc_indiv_util, exputils, dims=1)::Matrix{Float64}

which shows significant internal dynamic dispatch. If I’d know what that does I’d try to find an alternate formulation.

goerch · April 7, 2022, 8:27am

Replacing calc_predicted_shares with

function calc_predicted_shares(betai, delta, mkt, ns)

	X2_mkt = [mkt.var1 mkt.var2]::Matrix{Float64}
	u1 = repeat(delta, outer=(1, ns)) + X2_mkt * betai'
	exputils = exp.(u1)

	# p_ijt = mapslices(calc_indiv_util, exputils, dims=1)::Matrix{Float64}
	# p_jt = mapslices(sum, p_ijt, dims=2)::Matrix{Float64} ./ ns

	p_ijt = hcat((calc_indiv_util(col) for col in eachcol(exputils))...)::Matrix{Float64}
	p_jt = [sum(row) for row in eachrow(p_ijt)] ./ ns

	return p_jt

end

yields

  1.272 ms (16578 allocations: 2.50 MiB)

Now I don’t see any dynamic dispatch, and

	p_ijt = hcat((calc_indiv_util(col) for col in eachcol(exputils))...)::Matrix{Float64}

is the hotspot due to allocations.

stevengj · April 7, 2022, 11:42am

Have you thought about accelerating this by using a better algorithm?

What you are doing In Berry-Levinson-Pakes demand estimation is essentially just a fixed-point iteration, right? Why not use something like Anderson accleration from e.g. FixedPointAcceleration.jl to see if you can get it to converge in fewer iterations?

(If you are able to compute Jacobians then there are even more options.)

rubaiyat · April 8, 2022, 12:17am

Ah I’ve heard of that but didn’t know Julia had a package that does this. Thanks for pointing out!

stevengj · April 8, 2022, 12:27am

There is also https://github.com/JuliaNLSolvers/NLsolve.jl#fixed-points

You should definitely experiment with a few options here, as the best algorithm for solving nonlinear equations is pretty problem-dependent.

Topic		Replies	Views
Kron vs scalar product speed difference. python code faster? New to Julia question	41	4178	January 14, 2017
Improving performance of tensor contractions Performance question , blas , tensors	7	1681	June 14, 2019
How to call a function that returns a function efficiently General Usage	24	2376	June 24, 2018
Boltzmann Factor in a loop Performance	38	1852	April 3, 2018
Help me optimize dynamic programming code General Usage	7	1614	October 12, 2019

Speeding up contraction mapping

Related topics