I have implemented a particle filter for an agent-based model using JuliaDynamics but I seem to have ended up using deepcopy a lot. Also when I run the simple particle filter over a very simple model I don’t get very good performance
julia> @time runPf(inits, init_log_weights);
56.329763 seconds (28.13 M allocations: 3.170 GiB, 1.81% gc time, 0.07% compilation time)
The first thing you should do is to remove all global variables. For instance, here actuals, Q and R are globals:
function runPf(inits, init_log_weights)
for i in 2:l
(end_states2, logWeights2, inits2, a2) = pf(deepcopy(inits), init_log_weights, P, map(x -> convert(Float64,x), actuals[i]), Q, R);
predicted[i] = mean(end_states2);
inits = inits2;
init_log_weights = logWeights2;
end
end
You should do a basic profiling first. Help us help you by letting us know which lines take the most time. (also, try to make your code as small as possible, it increases likelyhood that someone else will read it )
From my side, what prevents me to running your code is that as it is I would need to install many packages which I do not want to. You don’t seem to be using all of them in that code. The simplest the example gets, the better the chances of someone actually working on it to show where it can be improved.
In general, after type instabilities are removed, you should chase allocations. Any copy of something mutable of course allocates. But there are other places where you seem to be allocating new arrays. For example, here:
that is allocating 3 arrays (one in og_w .- maximum(log_w)), one in the output of the map, and a third in the normalization:
julia> function f(log_w)
wn = map(x -> exp(x), log_w .- maximum(log_w));
wn = wn / sum(wn);
return wn
end
f (generic function with 1 method)
julia> x = rand(10);
julia> @btime f($x);
147.233 ns (3 allocations: 480 bytes)
If you rewrite that with some care:
julia> function f(log_w)
wn = exp.(log_w .- maximum(log_w));
swn = sum(wn)
wn .= wn ./ swn;
return wn
end
f (generic function with 1 method)
julia> @btime f($x);
97.897 ns (1 allocation: 160 bytes)
you can reduce that to one allocation and accelerate it quite a bit. This is one example. But profiling is important to know where to spend time optimizing. But only after you removed the type instabilities. otherwise the time spent on each function may not be representative of anything related to the algorithms used.