Matrix shortest distance problem - Need suggestions to improve the performance

Hello All, I am new to Julia and working on below code in Julia and similar one in Python, Java.
But, Julia code is slower than Python and 10x slower than Java code. Can you please let me know how to optimize below code in Julia.

Problem statement:

Matrix (2D array) is filled with ones (represents personos) and zeros (represents bikes). A person can move top, bottom, left or right. We need to find out the nearest bike to him.

Solution that I wrote -

#1 - O(n2) - First iterate over matrix and prepare two seperate lists bikes and persons with their positions (row/x, col/y).
#2. O(n2) - In a nested for loop, iterate over persons and bikes list and find the distance between those two Positions to know the shortest distance between them and replace the person with least distance value.

Distance formula: abs(x1-x2) + abs(y1-y2) where Person=(x1, y1) and Bike=(x2, y2).

This solution take 5 sec in Java for 100x100 size, Python 15secs, Julia 16secs.
If I increase the size to 500, Java is 10x faster than Julia.

import Dates
import StatsBase.sample
import StatsBase.ProbabilityWeights

struct Position
    x::Int32
    y::Int32
    val::Int8
end

matrix = sample([0,1], ProbabilityWeights([1/6, 5/6]), (100, 100))
matrix_test = sample([0,1], ProbabilityWeights([1/6, 5/6]), (10, 10))

function prepare_persons_bikes(matrix)
    rows, cols = size(matrix)

    persons = []
    bikes = []
    for x in 1:rows
        for y in 1:cols
            pos = Position(x, y, matrix[x, y])
            if pos.val == 1
                push!(persons, pos)
            else
                push!(bikes, pos)
            end
        end
    end
    println("Size of persons=$(length(persons)), bikes=$(length(bikes))")
    return (persons, bikes)
end

function find_distance(matrix, persons, bikes)
    for (idx, person) in enumerate(persons)
        nearest_distance = abs(person.x-bikes[1].x) + abs(person.y-bikes[1].y)

        for bike in bikes[2:end]
            dist = abs(person.x-bike.x) + abs(person.y-bike.y)
            if dist < nearest_distance
                nearest_distance = dist
            end

            if nearest_distance == 1
                break
            end
        end
        matrix[person.x, person.y] = nearest_distance

        if idx % 1000 == 0
            println("Processed=$idx, $(Dates.now())")
            flush(stdout)
        end
    end
end

function find(matrix)
    rows, cols = size(matrix)

    if rows < 10 && cols < 10
        display(matrix)
    end

    persons, bikes = prepare_persons_bikes(matrix)
    @time begin
        find_distance(matrix, persons, bikes)
    end

    if rows < 10 && cols < 10
        display(matrix)
    end
end

find(matrix_test)
find(matrix)

persons and bikes is of type Vector{Any}. Since only Position types are pushed into it, you can initialize it to hold only Position values like Position[].

4 Likes

As said above, changing

function prepare_persons_bikes(matrix)
    rows, cols = size(matrix)

    persons = []
    bikes = []

to

function prepare_persons_bikes(matrix)
    rows, cols = size(matrix)

    persons = Position[]
    bikes = Position[]

will give a massive speedup, as that allows the compiler to infer the types of values in most places. Speedup I see here with that change is more than 25x (3.034878 to 0.117292 seconds)

3 Likes

Also, using global variable can be a performance-killer, better to put them in a function like main() and call that

In addition to the above, maybe use

import Dates
import StatsBase.sample
import StatsBase.ProbabilityWeights

struct Position
    x::Int32
    y::Int32
    val::Int8
end

matrix = sample([0,1], ProbabilityWeights([1/6, 5/6]), (100, 100))
matrix_test = sample([0,1], ProbabilityWeights([1/6, 5/6]), (10, 10))

function prepare_persons_bikes(matrix)
    rows, cols = size(matrix)

    persons = Position[]
    bikes = Position[]
    for x in 1:rows
        for y in 1:cols
            pos = Position(x, y, matrix[x, y])
            if pos.val == 1
                push!(persons, pos)
            else
                push!(bikes, pos)
            end
        end
    end
    println("Size of persons=$(length(persons)), bikes=$(length(bikes))")
    return (persons, bikes)
end

my_metric(p1::Position, p2::Position) = abs(p1.x - p2.x) + abs(p1.y - p2.y)

find_nearest(p1::Position, p2::Vector{Position}) = findmin(my_metric(p1,p2))

function build_distance_matrix!(D::AbstractMatrix, persons::Vector{Position}, bikes::Vector{Position})
    for p in persons
        D[p.x, p.y] = find_nearest(p, bikes)[1]
    end
end

function find_nearest_(p1::Position, p2::Vector{Position})
    dist = zero(Int32)
    best = Int32(1000000)
    for p in p2
        dist = my_metric(p1, p)
        dist < best ? (best = dist) : nothing
    end
    return best
end


function build_distance_matrix_!(D::AbstractMatrix, persons::Vector{Position}, bikes::Vector{Position})
    for p in persons
        D[p.x, p.y] = find_nearest_(p, bikes)
    end
end

function find_distance(matrix, persons, bikes)
    for (idx, person) in enumerate(persons)
        nearest_distance = abs(person.x-bikes[1].x) + abs(person.y-bikes[1].y)

        for bike in bikes[2:end]
            dist = abs(person.x-bike.x) + abs(person.y-bike.y)
            if dist < nearest_distance
                nearest_distance = dist
            end

            if nearest_distance == 1
                break
            end
        end

        matrix[person.x, person.y] = nearest_distance

    end
end

find(matrix_test)
find(matrix)
using BenchmarkTools
matrix = sample([0,1], ProbabilityWeights([1/6, 5/6]), (100, 100))
m1 = deepcopy(matrix)
m2 = deepcopy(matrix)
m3 = deepcopy(matrix)
persons, bikes = prepare_persons_bikes(matrix)
@btime find_distance(m1, persons, bikes)
@btime build_distance_matrix!(m2, persons, bikes)
@btime build_distance_matrix_!(m3, persons, bikes)

With the output

34.528 ms (16768 allocations: 155.69 MiB)
27.975 ms (8384 allocations: 52.71 MiB)
15.156 ms (0 allocations: 0 bytes)

From julia 1.6rc1 :wink:

5 Likes

For newcomers, a common mistake is the presumption of MATLAB style or Python style coding (meaning their direct translation) is equivalent in Julia style coding. Julia has different empty-array([]) meaning than Matlab. Also, Julia style elementwise computations use dotted-function style (like z .= abs.(x .- y) for already existing z). I recommend reading Noteworthy Differences from other Languages · The Julia Language and “Frequently Asked Questions · The Julia Language”

4 Likes

I guess that can be summarized as “write Python style code, get Python style performance” :wink:

4 Likes

Thank you all, trying your suggestions. I see that change bikes/persons from to Position increased the performance a lot.

Thanks a lot, with your changes now Julia code is 2x faster than Java @ 100x100 matrix.
Can you please help me understand the difference between the two below functions that you wrote?

find_nearest(p1::Position, p2::Vector{Position}) = findmin([my_metric(p1, p22) for p22 in p2])

function find_nearest_(p1::Position, p2::Vector{Position})
    dist = zero(Int32)
    best = Int32(1000000)
    for p in p2
        dist = my_metric(p1, p)
        dist < best ? (best = dist) : nothing
    end
    return best
end

Thanks for the swift reply, it improved the performance a lot. [Julius_Martensen] code change improved it further.

He is properly passing the variables as parameters to the functions, so that is not an issue there. (the “wrap everything” in a main() function suggestion goes against the purpose of a dynamic language, although it can be a quick workaround to check if one has mistakenly let some global variable misused around).

1 Like

Just a smll advice, it is better to make assignments outside of ? operator. Also it’s better to use typemax for “infinite” values.

function find_nearest_2(p1::Position, p2::Vector{Position})
    dist = zero(Int32)
    best = typemax(Int32)
    for p in p2
        dist = my_metric(p1, p)
        best = dist < best ? dist : best
    end
    return best
end

It doesn’t affect speed, but more readable and more friendly for the compiler.

But you can go further and notice, that

best = dist < best ? dist : best

is actually equivalent to

best = min(dist, best)

and after further consideration it is more or less obvious, that whole find_nearest function can be rewritten as

find_nearest_3(p1, p2) = mapreduce(x -> my_metric(x, p1), min, p2)

and this has a positive influence on the speed of calculations

using BenchmarkTools
using StableRNGs
import StatsBase.sample
import StatsBase.ProbabilityWeights

rng = StableRNG(2021)

matrix = sample(rng, [0,1], ProbabilityWeights([1/6, 5/6]), (100, 100))
persons, bikes = prepare_persons_bikes(matrix)

p1 = persons[1]

julia> @btime find_nearest_($p1, $bikes)
  2.236 ÎĽs (0 allocations: 0 bytes)
2

julia> @btime find_nearest_2($p1, $bikes)
  2.236 ÎĽs (0 allocations: 0 bytes)
2

julia> @btime find_nearest_3($p1, $bikes)
  808.820 ns (0 allocations: 0 bytes)
2

and we can verify it on the whole calculation (here other functions are the same as in your snippet)

m1 = deepcopy(matrix)
m2 = deepcopy(matrix)
m3 = deepcopy(matrix)

function build_distance_matrix_3!(D, persons, bikes)
    for p in persons
        D[p.x, p.y] = find_nearest_3(p, bikes)
    end
end

@btime find_distance($m1, $persons, $bikes)
  # 30.296 ms (16620 allocations: 161.42 MiB)
@btime build_distance_matrix_!($m2, $persons, $bikes)
  # 14.175 ms (0 allocations: 0 bytes)
@btime build_distance_matrix_3!($m3, $persons, $bikes)
  # 6.755 ms (0 allocations: 0 bytes)
3 Likes

Here, in [my_metric(p1, p22) for p22 in p2] you are creating a new intermediate vector with the distances. This intermediate vector is not necessary for your calculation and is avoided in the alternatives proposed. This is one important reason for the performance differences.

2 Likes

Maybe it is time to change the title? It was good to attract attention, but it is not true anymore, I presume.

I guess I’m missing some subtle issues when it comes to global variables. The first tip in the Performance Tips section of the manual is pretty clear: avoid global variables. It does say

Variables should be local, or passed as arguments to functions, whenever possible.

but that leaves open whether passing a global variable to a function could cause performance issues. But I guess you imply that that isn’t the case (or at least not here).

I don’t really see that. It’s clear that Julia is different from the average dynamic language, like Python, and that writing code in a style that usually fits dynamic languages can cause all sorts of (performance) issues in Julia. I also think approaching Julia as Python-but-with-a-compiler-bolted-on is one of the main sources of misconception and failed promises.

The Julia code is compiled at the function level, and specialized at that level for the variable types. That means that what is most important is that the type of a variable is known along the execution of a function, such that the compiler can optimize the code for that type of variable.

The problem with the non-constant global variables that are not passed as arguments is that they can change at any time, and, therefore, the compiler cannot know for certain that the type is constant during the function execution, such as:

julia> y = 20;

julia> function f(x)
          x + y
       end
f (generic function with 1 method)

julia> @code_warntype f(10)
Variables
  #self#::Core.Compiler.Const(f, false)
  x::Int64

Body::Any
1 ─ %1 = (x + Main.y)::Any
└──      return %1


x is known at compile time (it is an Int64), because the funciton was specialized for that type of variable. What is not known at compile time is y, because I could well run the function again changing y in the global scope.

If, instead, I define a function with y as a parameter as well:

julia> function g(x,y)
          x + y
       end
g (generic function with 1 method)

julia> @code_warntype g(10,y)
Variables
  #self#::Core.Compiler.Const(g, false)
  x::Int64
  y::Int64

Body::Int64
1 ─ %1 = (x + y)::Int64
└──      return %1

All types are known at compile time, because the function was optimized for type of x and y which were passed to the function. This is perfectly fine and performant. Indeed, if one changes the type of y now, what happens is that the function gets compiled again in a specialized version for the new type of y:

julia> y = 10.0;

julia> @code_warntype g(10,y)
Variables
  #self#::Core.Compiler.Const(g, false)
  x::Int64
  y::Float64

Body::Float64
1 ─ %1 = (x + y)::Float64
└──      return %1

The care that must be taken is, then, to not use non-constant global variables inside functions (in general, but most importantly in code that is critical for performance). But you can use Julia dynamically (that is the idea), for example, redefining your data at will at the global scope and calling functions to analyze, or do computations, with that data:

julia> my_data = # something that defines my data

julia> result = analyze(my_data) # all good with that, passed as parameter

julia> my_data = # change something in data

julia> result2 = analyze(my_data) # perfectly fine

Even if my_data changes type there (for example from being an array of floats to an array of integers, the analyze function will specialize to the type of data received and, therefore, be fast).

5 Likes

There is one caveat in global variable usage, it is slightly less obvious.

y0 = 20

function g(x, y = y0)
  x + y
end

@code_warntype g(10)
MethodInstance for g(::Int64)
  from g(x) in Main at REPL[2]:1
Arguments
  #self#::Core.Const(g)
  x::Int64
Body::Any
1 ─ %1 = (#self#)(x, Main.y0)::Any
└──      return %1

It’s more or less obvious when you know about its existence, but still, it is easy to make this mistake.

4 Likes

Side note: If you are planing to scale this problem you should probably use cell lists to find the nearest bike to every person. That scales with O(n) (you may need a fallback when a bike is not found within a reasonable cutoff distance from one person).

5 Likes

:slight_smile: Done

2 Likes

I observed that for num in array_num is way faster than for num in array_num[1:end] which is also one of main reason why my code was slower than Python. Any reason for this slowness with index based iteration? Is it creating a new copy of original array and then iterating?

# k is an array of 10k integers

@time j = [l for l in k[2:end]]
  0.104574 seconds (39.22 k allocations: 3.506 MiB)

@time j = [l for l in k]
  0.000666 seconds (3 allocations: 781.344 KiB)