Deepcopy against the grain of Julia?

I have implemented a particle filter for an agent-based model using JuliaDynamics but I seem to have ended up using deepcopy a lot. Also when I run the simple particle filter over a very simple model I don’t get very good performance

julia> @time runPf(inits, init_log_weights);
 56.329763 seconds (28.13 M allocations: 3.170 GiB, 1.81% gc time, 0.07% compilation time)

I feel I am “working against the grain” of Julia. Here’s my code: deepcopy_use.jl · GitHub. How can I make it more Julia-esque? And how can I improve performance?

PS I realise I can parallelize the particle filter but I don’t want to do this until I am working with the grain of Julia

PPS I am a functional programmer by background and keep forgetting Julia does call by reference.

The first thing you should do is to remove all global variables. For instance, here actuals, Q and R are globals:

function runPf(inits, init_log_weights)
    for i in 2:l
        (end_states2, logWeights2, inits2, a2) = pf(deepcopy(inits), init_log_weights, P, map(x -> convert(Float64,x), actuals[i]), Q, R);
        predicted[i] = mean(end_states2);
        inits = inits2;
        init_log_weights = logWeights2;
    end
end

(or at least declare them as const).

3 Likes

Looking at a profile of your program it seems 80 % of runtime is spent at

        Agents.step!(jnits[i], agent_step!, 1)

Maybe you should isolate a M(inimal;)WE around that to get the discussion going.

2 Likes

Not really, functions pass around values, which of course may be containers.

Keep in mind that Julia does not have pointers in the C/C++/… sense.

1 Like

Thank you - Q and R are now parameters but I see no improvement in performance.

And actuals? The point is that while you have global variables and type instabilities, speculation about other sources of cost is a waste of time.

1 Like

I think I have done that but with no improvement. But perhaps I have erred: deepcopy_use.jl · GitHub (updated).

You should do a basic profiling first. Help us help you by letting us know which lines take the most time. (also, try to make your code as small as possible, it increases likelyhood that someone else will read it :slight_smile: )

Ok, now that looks ok on that first call.

From my side, what prevents me to running your code is that as it is I would need to install many packages which I do not want to. You don’t seem to be using all of them in that code. The simplest the example gets, the better the chances of someone actually working on it to show where it can be improved.

In general, after type instabilities are removed, you should chase allocations. Any copy of something mutable of course allocates. But there are other places where you seem to be allocating new arrays. For example, here:

    wn = map(x -> exp(x), log_w .- maximum(log_w))
    wn = wn / sum(wn)

that is allocating 3 arrays (one in og_w .- maximum(log_w)), one in the output of the map, and a third in the normalization:

julia> function f(log_w)
          wn = map(x -> exp(x), log_w .- maximum(log_w));
           wn = wn / sum(wn);
           return wn
       end   
f (generic function with 1 method)

julia> x = rand(10);

julia> @btime f($x);
  147.233 ns (3 allocations: 480 bytes)

If you rewrite that with some care:

julia> function f(log_w)
          wn = exp.(log_w .- maximum(log_w));
           swn = sum(wn)
           wn .= wn ./ swn;
           return wn
       end
f (generic function with 1 method)

julia> @btime f($x);
  97.897 ns (1 allocation: 160 bytes)

you can reduce that to one allocation and accelerate it quite a bit. This is one example. But profiling is important to know where to spend time optimizing. But only after you removed the type instabilities. otherwise the time spent on each function may not be representative of anything related to the algorithms used.

2 Likes

Thank you very much for taking the time - I am going through and removing unneeded imports. I hope to return with an easier-to-run example.

1 Like