Trying to understand how to write good code (example)

I’m a little lost at this point, but It seems like you’re still confusing variable names acting as mutable bindings with the mutability of the data structures those variable names are bound to.

A simple mental model for how variable names work is to assume that all variable names live in a dictionary. That dictionary is mutable. When you type rj = x, you are mutating that dictionary. The type of x is totally irrelevant and therefore the mutability of x is also irrelevant. The important thing is that you’re doing something like mutate_name!("rj", x), whereas an expression like rj[:] = x is doing something like mutate_value!(rj, :, x).

2 Likes

I’m not the OP, btw, but also not completely confident of having the right mental model.

My expectation would be that mutating the name is cheap, while mutating values is a bit more expensive, while (at least in the current example) allocating a new array is quite expensive.

So:

rj = Wr[i,:,iw]

allocates a new matrix on the right, and binds the variable rj to it.

rj .= Wr[i,:,iw]

allocates a new array on the right, and copies the data into rj. According to my benchmarks this is more expensive.

rj = @view Wr[i,:,iw]

makes a view, which just barely allocates a small amount of memory, and binds rj to it, while

rj .= @view Wr[i,:,iw]

makes a view on the right, and then, on top of that, copies the data from the view into rj. According to the benchmarks I’m looking at, the non-dotted version is cheaper in each case.

It feels like I’m really wrong here, but I don’t see how.

No, you’re right. It depends on what you’re trying to do though. If rj can, in the real application, be linked to the exact same memory as Wr, it’s best to just bind a new view to rj. However, if you plan on changing rj but don’t want Wr to change (or vice versa), then you need to keep the values separate and mutate.

As always, it’s about how you’re applying it that determines what the right method is.

(One of the things that makes the present discussion difficult is that the code that was posted doesn’t actually do anything. It just creates a bunch of matrix views or copies and then discards them. In that case, the optimal thing is just to delete all the code. :wink: )

9 Likes

OK guys, I keep working on this since I have to decide whether I shall invest more on Julia or surrender and move back to fortran. Since deleting all the code is not what I am looking for :slight_smile:, I can put now an almost-entire version of the function

function Drift_Force(W::Walker)
    Np,dim,NW    = size(W.r);
    F_drift      = zeros(Np,dim,NW);
    ri           = zeros(dim,1);
    rj           = zeros(dim,1);
    rij          = zeros(dim,1);
    aux          = zeros(dim,1);
    for iw in 1:NW
        for i in 1:Np-1
            for k in 1:dim 
                ri[k]  = W.r[i,k,iw];
            end;
            for j in i+1:Np
                for k in 1:dim 
                    rij[k] = W.r[i,k,iw]-ri[k]
                end;
                norm = sqrt(dot(rij,rij));
                aux  = rij/norm;
                F_drift[i,:,iw] += aux;
                F_drift[j,:,iw] -= aux;                
            end;        
        end;    
    end;
end; 

which performs like this:
0.070826 seconds (504.01 k allocations: 20.010 MB, 7.97% gc time)

while I see that most of the time/allocations go to the final steps of F_drift updating, since if I remove the last two F_drift lines

function Drift_Force(W::Walker)
    Np,dim,NW    = size(W.r);
    F_drift      = zeros(Np,dim,NW);
    ri           = zeros(dim,1);
    rj           = zeros(dim,1);
    rij          = zeros(dim,1);
    aux          = zeros(dim,1);
    for iw in 1:NW
        for i in 1:Np-1
            for k in 1:dim 
                ri[k]  = W.r[i,k,iw];
            end;
            for j in i+1:Np
                for k in 1:dim 
                    rij[k] = W.r[i,k,iw]-ri[k]
                end;
                norm = sqrt(dot(rij,rij));
                aux  = rij/norm;
            end;        
        end;    
    end;
end; 

then what I get is
0.003131 seconds (20.17 k allocations: 2.169 MB)

that is, much less time, much less allocations etc etc. So the problem is now, filling the results into the F_drift array (which, btw, is what the function must return in the end).
Do you know how shall I write everything here for performance, or is it in its best way already?

Thanks for the inputs again,

Feran.

Looks like you should do something like:

function Drift_Force(W::Walker)
    Np,dim,NW    = size(W.r);
    F_drift      = zeros(Np,dim,NW);
    ri          = zeros(dim);
    rij          = zeros(dim);
    @inbounds for iw in 1:NW
        for i in 1:Np-1
            for k in 1:dim 
                ri[k]  = W.r[i,k,iw];
            end;
            for j in i+1:Np
                @simd for k in 1:dim 
                    rij[k] = W.r[i,k,iw]-ri[k]
                end
                normalize!(rij)
                @simd for k in 1:dim 
                    F_drift[i,k,iw] += rij[k]
                    F_drift[j,k,iw] -= rij[k]
                end             
            end
        end 
    end
    return F_drift # presumably you want to return the result!
end

(Note that you don’t need all of the semicolons, and there is no need for ri and rij to be 2d arrays.)

Note also that you might want to re-order your dimensions so that F_drift = zeros(dim,Np,NW) and similarly for W.r, since that way the loops over k will be contiguous in memory.

Is this supposed to be rij[k] = W.r[j,k,iw]-ri[k]? Otherwise it is always zero.

Yes indeed, it is suppossed to be rij[k] = W.r[j,k,iw]-ri[k]… my fault :frowning:

You have some unnecessary cruft in your code. For example, there seems to be no point to defining ri, you can just write

rij[k] = W.r[j, k, iw] - W.r[i, k, iw]

without wasting time on ri. Also I don’t quite understand your code, so there might be algorithmic speedups possible, beyond the pure implemetation speedups.

Anyway, here are three versions of your code. The first one is basically your own version. The second is a cleaned-up version. The third one assumes that dim is always equal to 3, and uses StaticArrays:

using StaticArrays, BenchmarkTools

function Drift_Force1(W::Array{Float64, 3})
    Np,dim,NW    = size(W);
    F_drift      = zeros(Np,dim,NW);
    ri           = zeros(dim,1);
    rj           = zeros(dim,1);
    rij          = zeros(dim,1);
    aux          = zeros(dim,1);
    for iw in 1:NW
        for i in 1:Np-1
            for k in 1:dim 
                ri[k]  = W[i,k,iw];
            end
            for j in i+1:Np
                for k in 1:dim 
                    rij[k] = W[j,k,iw]-ri[k]
                end
                norm = sqrt(dot(rij,rij));
                aux  = rij/norm;
                F_drift[i,:,iw] += aux;
                F_drift[j,:,iw] -= aux;                
            end        
        end    
    end
    return F_drift
end

function Drift_Force2(W::Array{Float64, 3})
    dim = size(W, 2)
    F_drift = zeros(W)
    rij = zeros(dim)
    @inbounds for iw in 1:size(W, 3)
        for i in 1:size(W, 1)-1
            for j in i+1:size(W, 1)
                for k in 1:dim
                    rij[k] = W[j, k, iw] - W[i, k, iw]
                end
                normalize!(rij)
                for k in 1:dim
                    F_drift[i, k, iw] += rij[k]
                    F_drift[j, k, iw] -= rij[k]
                end
            end
        end    
    end
    return F_drift
end

function Drift_Force3(W::Array{Float64, 3})
    dim = 3
    F_drift = zeros(W)
    @inbounds for iw in 1:size(W, 3)
        for i in 1:size(W, 1)-1
            for j in i+1:size(W, 1)
                rij = SVector(W[j, 1, iw] - W[i, 1, iw],
                              W[j, 2, iw] - W[i, 2, iw],
                              W[j, 3, iw] - W[i, 3, iw])
                rij = normalize(rij)
                for k in 1:dim
                    F_drift[i, k, iw] += rij[k]
                    F_drift[j, k, iw] -= rij[k]
                end
            end
        end    
    end
    return F_drift
end

WW = rand(64,3,10)
maximum(Drift_Force1(WW) .- Drift_Force2(WW))
maximum(Drift_Force1(WW) .- Drift_Force3(WW))
@benchmark Drift_Force1(WW)
@benchmark Drift_Force2(WW)
@benchmark Drift_Force3(WW)

On my computer version 1 runs in 26ms, version 2 in 2.2ms, and version 3 in 190μs, so that’s well over 100x speed-up. If you re-arrange the input matrix so that the size is 3 along the first dimension, there’s an extra ~10% to gain.

4 Likes

You don’t seem to have used StaticArrays in the code you posted.

Try to scroll down a bit :wink:

Oops, thanks.

Seems like it shouldn’t be hard to parametrize this on the dimension.

Perhaps. I don’t see it. Some Val{3} kind of magic? Do you mean the number of dimensions or the size of the dimensions?

I should probably update that answer to include .= which is a shorthand for calling broadcast! which in turn, by default, does a bunch of []= indexed assignments (i.e. calls to setindex!.