Improving performance of mapping algorithm

Hi all,
I ported a code from MATLAB to Julia, because it takes too long right now:

using Random
using Printf
start = time()
Lk = 4
Lp = 1500 
# create test data: coordinates of relative points and points to be mapped
pload = Array{Any}(undef,Lk)
pload[1] = rand(Float64, (Lp, 3))
pload[2] = rand(Float64, (Lp, 3))
pload[3] = rand(Float64, (Lp, 3))
pload[4] = rand(Float64, (Lp, 3))
irs = Array{Any}(undef,Lk)
ics = Array{Any}(undef,Lk)
irs[1] = 1:Lp
ics[1] = 1:Lp
for k in 2:Lk
    @printf "in for loop, k = %.0f\n" k
    Li = size(pload[1],1)
    Lj = size(pload[k],1)
    A = zeros(Float64,Li,Lj)
    # create distance matrix
    for kk in 1:3
        A = A + (pload[1][1:Li,kk]*ones(1,Lj)-ones(Li,1)*pload[k][:,kk]').^2
    end
    # pick pairs of smallest distance & clear these pairs from A
    Ar = size(A,1)
    As = size(A)
    Ind = zeros(Int64, Ar,2)
    for i in 1:Ar
        ir, ic = Tuple(argmin(A)) 
        A[ir,:] .= Inf
        A[:,ic] .= Inf
        Ind[i,:] = [ir,ic]
    end
    # sort for relative / mapped points
    rs = sortperm(Ind[:,1]);
    irs[k] = Ind[rs,2]
    cs = sortperm(Ind[:,2]);
    ics[k] = Ind[cs,1]
end
elapsed = time() - start

Finding the minimum takes almost all the time and I was wondering if Julia with argmin(A) is maybe faster than MATLAB’s min(A). I also tried findmin(A), but the performance is the same. In reality Lp is greater than 1500, which is a real problem for me. Any suggestions how I can speed up the code?

Thank you in advance.

The biggest problem with this code by far is that it’s all in the gobal scope. Reading this https://docs.julialang.org/en/v1/manual/performance-tips/index.html will give you a good start towards optimizing the code.

2 Likes

Thank you for this advice. I created a function for the expensive task of finding the minimum and this speeds the code up by 10%. There might be more to gain, but MATLAB is still 5x faster. Therefore, I will probably try to improve the MATLAB code or try to implement another mapping algorithm.

You should put everything in a function.

1 Like

And don’t use Array{Any} – create your the array using e.g. ics = [1:Lp].
That way it will be concretely typed, which is what Julia needs for speed.

Thx for the suggestions. I think I am doing something wrong. Just enclosing everything in a function and calling it didn’t change speed.
Also using pload = Array{Float64}(undef,Lk,Lp,3) and likewise for other usage of “Any” did not increase speed.

If you want suggestions you are going to need to show us what you did.

Just a side note, just writing anything in Julia does not guarantee a speed increase. Optimizing code in any language first takes knowledge of that language and then lots of profiling and testing. I suspect that was done with your MATLAB implementation, I’m not so sure it was done with your Julia implementation.

2 Likes

There are a lot of array operations inside the loop. Each one of these allocates memory. Replace them by explicit loops.

It seems like you are mixing together the core code, that calculates the thing you want to find, and the test code, which creates test data, loops several times over stuff, etc.

Could you instead define the core functionality (for example, finding the elements with the smallest pairwise distance, if that is what you want) into a function with a clear interface, and separate it from the setup/test code?

This would make it easier for others to understand, it would make it faster and more readable.

BTW, you should do the same with your Matlab code. Putting units of functionality into functions is recommended in most languages.

2 Likes

As you say, all the time is in argmin(A). This takes 7.4ms, times 1500 times 3 gives about 30s, which is how long (a slightly improved) version takes to run. Can’t you avoid this entirely, perhaps call sortperm(vec(A)) which takes 240ms, but only 3 times? A sketch:

julia> A0 = rand(Int8, 3,3);

julia> A = copy(A0)
3×3 Array{Int8,2}:
 117   -66  -26
  41   -42   -1
  48  -122  -17

julia> i, j = argmin(A).I
(3, 2)

julia> A[i,:] .= 127; A[:,j] .= 127; # typemax(Int8)

julia> i, j = argmin(A).I
(1, 3)

julia> A[i,:] .= 127; A[:,j] .= 127;

julia> i, j = argmin(A).I
(2, 1)

julia> A
3×3 Array{Int8,2}:
 127  127  127
  41  127  127
 127  127  127

julia> CartesianIndices(A0)[sortperm(vec(A0))]
9-element Array{CartesianIndex{2},1}:
 CartesianIndex(3, 2) # first, eliminate 3,2
 CartesianIndex(1, 2)
 CartesianIndex(2, 2)
 CartesianIndex(1, 3) # second
 CartesianIndex(3, 3)
 CartesianIndex(2, 3)
 CartesianIndex(2, 1) # third
 CartesianIndex(3, 1)
 CartesianIndex(1, 1)

Now you just need to filter out which rows/cols have already been seen, perhaps something like

julia> is, js = BitSet(), BitSet();
julia> push!(is, 3); push!(js, 2); # first
julia> 1 in is || 2 in js
true
julia> 2 in is || 2 in js
true
julia> 1 in is || 3 in js # second
false
2 Likes

Thanks for confirming that the calculation of the minimum takes the most time. I had some problems understanding the output of the profiling.
This is an interesting shortcut for calculating the shortest distance between points. I will check if it gives the desired result.

Thanks for this idea. It works and is 10x faster. I tried to do the same in MATLAB, but maybe in this case I have done it less efficiently. Now Julia outperforms MATLAB by almost an order of magnitude.

Just in case anybody is interested, the solution looks like this:

using Random
using Printf

function evalMat(A)
  # pick pairs of smallest distance and return corresponding points (or their order)
  Ar = size(A,1)
  Ar2 = size(A,2)
  Ind = CartesianIndices(A)[sortperm(vec(A))]
  irf, icf = Int64[], Int64[]
  for i in 1:Ar*Ar2
    if !(Ind[i][1] in irf || Ind[i][2] in icf)
        push!(irf, Ind[i][1])
        push!(icf, Ind[i][2])
    end
  end
  return irf, icf
end

## get distance matrix A[i,j]; i relative points, j points to be mapped
Lk = 4     # 1 relative set, 3 working data sets
Lp = 1500    # number of points per set
pload = Array{Float64}(undef,Lk,Lp,3)
pload[1,:,:] = rand(Float64, (Lp, 3))   
pload[2,:,:] = rand(Float64, (Lp, 3))
pload[3,:,:] = rand(Float64, (Lp, 3))
pload[4,:,:] = rand(Float64, (Lp, 3))
irs = Array{Int64}(undef,Lk,Lp)
ics = Array{Int64}(undef,Lk,Lp)
irs[1,:] = 1:Lp
ics[1,:] = 1:Lp
for k in 2:Lk
    @printf "in for loop, k = %.0f\n" k
    Li = size(pload[1,:,:],1)
    Lj = size(pload[k,:,:],1)
    A = zeros(Float64,Li,Lj)
    for kk in 1:3
        A = A + (pload[1,:,kk]*ones(1,Lj)-ones(Li,1)*pload[k,:,kk]').^2
    end
    irs[k,:], ics[k,:] = evalMat(A)
end

@mcabbott: I could not use BitSet() or Set(), since I need to know the order of the points, which I use later for mapping some fields. Sets do not preserve the order.

1 Like

Great! If I’m thinking correctly it should also be N log(N), instead of N^3. (At least if sortperm is where the time is, not the filtering of its result.)

Your code looks nicer too. There are a few things you can tidy up, e.g. it’s not necessary for each term of a broadcast to make up the final shape, so I think you can do this (untested, but yesterday something similar worked):

pload = rand(Float64, 4,Lp,3)

Li = size(pload, 2)
@views for kk in 1:3
    A .= A .+ (pload[1,:,kk] .- pload[k,:,kk]').^2
end
1 Like

You don’t need to multiply with ones in Matlab either. Compatible shapes will broadcast correctly.

Finally, you can also use .+= instead of repeating the A.

1 Like

I don’t fully understand what you are trying to achieve… let’s say you have 4 lists of points, of same length (say 1500 black, 1500 red, 1500 green and 1500 blue). Do you want to:

  • for each of the four sets,
    • list all distinct pairs (black,coloured) formed by points in the first and points in the other set, sorted by their distance?

Anyway, regarding the structure of the code, you are repeating the exact same procedure Lk times, without any dependency between each run. You should probably define a function to do it for each pair of sets – it would be much clearer:

  for k in 2:Lk
    irs[k,:], ics[k,:] = sortpermByPairwiseDistance(pload[1],pload[k])
  end

(just out of curiosity, could you share the Matlab code and timing?)

2 Likes

Almost. I want to know the pairings, which for my data match more or less uniquely. It does not have to be sorted according to distance.

Some data sets have points matching perfectly, others have some distance between matching points. Otherwise I could use the distance matrix and just look for all entries below a tolerance, e.g., 1E-10.

Here is the MATLAB version:

Lk = 4;
Lp = 1500;
pload = rand(Lk,Lp,3);
irs = cell(Lk,1);
ics = cell(Lk,1);
irs{1} = 1:Lp;
ics{1} = 1:Lp;
for k = 2:Lk
    disp('In for loop, k = '); disp(k)
    [irs{k}, ics{k}] = sortpermByPairwiseDistance(pload(1,:,:),pload(k,:,:));
end

function [irf,icf] = sortpermByPairwiseDistance(pl1,pl2) 
  Li = size(pl1,2);
  Lj = size(pl2,2);
  % get distance matrix A[i,j]; i relative points, j points to be mapped
  A = zeros(Li,Lj);
  for kk = 1:3
    A = A + (pl1(:,kk) - pl2(:,kk)').^2;
  end
  [irf, icf] = evalMat(A);
end

function [irf,icf] = evalMat(A) 
    Ar = size(A);
    %% find pairs of smallest distance
    [~,I] = sort(A(:));
    [ir, ic] = ind2sub(Ar,I);
    irf = zeros(Ar(1),1); icf = zeros(Ar(1),1);
    j = 1;
    for i=1:Ar(1)*Ar(2)
        %if ~(ismember(ir(i),irf) || ismember(ic(i),icf))
        if ~(any(irf(:)==ir(i)) || any(icf(:)==ic(i)))
           irf(j) = ir(i);
           icf(j) = ic(i);
           j = j + 1;
        end
    end
end

I assume these iterations with “any(…)” are not efficient with MATLAB, while matrix operations would be better.

Well, this should be about julia, but… I send you my implementation in Matlab. It’s about 150 times faster :innocent:
This only works well if there is a guarantee that the distance between corresponding points in different sets is smaller than any distance intra-set… So, it fails if deviation introduced in second set is larger than the average (or expected) separation of points in original set.
If this is the case, we should either mark those points as undetermined, or do a minimization of total deviations…

function [irs,ics,pload,trueperm]=mytests(data)
if isempty(data) % Generate test data
    % Original data sets:
    npt=1500
    nsets=4
    noiseAmpl=0.1
    mixpoints=true %false
    checkresult=true
    dref=rand(npt,3)*100;
    pload{1}=dref;
    trueperm=[];
    for k = 1:nsets-1
        if mixpoints
            trueperm(end+1,:)=randperm(npt);
        else
            trueperm(end+1,:)=1:npt;
        end
        dnoisy=dref+noiseAmpl*(2*rand(npt,3)-1);
        pload{k+1}=dnoisy(trueperm(k,:),:);
    end
else % convert to cell if needed
    if ~iscell(data)
        for i=1:size(data,1)
            pload{i}=squeeze(data(i,:,:));
        end
    else
        pload=data;
    end
    nsets=numel(pload);
    npt=size(pload{1},1);
    checkresult=false
end

tic
irs{1}=1:npt;
ics{1}=1:npt;
for k = 2:nsets
    disp('In for loop, k = '); disp(k)
    [irs{k}, ics{k}] = sortpermByPairwiseDistance(pload{1},pload{k});
end
toc

if checkresult
    disp('checking for wrong matches: should all be empty.')
    for k=2:4
        [~,p]=sort(irs{k});
        find(ics{k}(p)~=trueperm(k-1,:))
    end
end


    function [ix1,ix2]=sortpermByPairwiseDistance(pl1,pl2)
        % process one pair of data sets:
        % We want to find/build a permutation of do2 that minimizes the distances
        % between the two data sets.

        t1=tic;
        %adj=zeros(npt,npt);
        d1=pl1;
        d2=pl2;
        d2_=d2;
        ix1=1:npt;
        ix2=ix1;
        d12=zeros(1,npt);
        for io=1:npt
            p1=pl1(io,:);
            [d,ix]=min(sum((d2_-repmat(p1,npt,1)).^2,2));
            d2(io,:)=d2_(ix,:);
            ix2(ix)=io;
            % or:
            %ix2(io)=ix;
            d12(io)=sqrt(d);
            d2_(ix,:)=NaN;
        end
        toc(t1)
    end

end

Thank you for your effort.

Unfortunately, this is not the case for my data…