Repeated serialization for distributed simulation

nsna · May 28, 2019, 2:41am

I’m trying to maximize a likelihood function that involves a large number of simulations of a reasonably complex model. Serialization time is a bottleneck. Does anything jump out as being a problem?

Here is an example that captures the important part of my code, although I can’t be certain this is capturing the problem:

struct Observation
        X::Array{Float64,2}
        y::Array{Float64,2}
        d::Dict{Int,Float64}
        t::Array{Int64}
end

function logl(obs::Array{Observation,1},β::Array{Float64,1})
        out = @distributed (+) for ob in obs
                logl_i(ob,β) # perform simulation and return contribution to the likelihood
        return out
end

The code is spending 2/3 of the time serializing, and does so for each call to logl.

Is it the use of a custom struct to store the data? Could variation in the size of the objects in the struct be an issue?

Topic		Replies	Views
Performance issue when storing function in a struct as a field Performance	2	410	February 25, 2020
Custom likelihoods in Turing.jl General Usage	15	3705	October 26, 2018
Speeding up MCMC for logistic regression with large datasets using Turing or some alternative Statistics question	2	693	March 12, 2020
Distributed computing using mutable struct General Usage question , distributed	0	254	August 8, 2021
Best Practice for Logging MCMC Results? Probabilistic programming	4	558	February 12, 2021

Repeated serialization for distributed simulation

Related topics