Parallelization way

Hello, please help to parallelize the task, maybe with CUDA?

There are two point clouds A and B, the clouds have a different number of points.

In cloud A there can be several Y along the X axis,
In cloud B, there is only one Y for every X.

Data is ordered along the x-axis.

For each point of cloud A on the X-axis within [a_x; a_x + 10]
find the nearest point from cloud B whose sum of y-values is more than zero: (a_y + b_y) >= 0

I do it but it’s so slow:

## 
using Random

limit = 10

# points number
A_n = 100_000_000
B_n = 30_000_000

#
func(a_y, b_y) = (a_y + b_y) >= 0

# cloud A
A_x = Int32.(cumsum(rand(0:3, A_n)))
A_y = Int32.(rand(-2:2, A_n))

# cloud B
B_x = Int32.(cumsum(rand(1:9, B_n)))
B_y = Int32.(rand(-2:2, B_n))

# for results
R = zeros(Int32, A_n)

## calculations
b = 1
bb = 1
for a = 1 : A_n
    while B_x[b] < A_x[a]
        b += 1
        b == B_n && break
    end
    bb = b
    while B_x[bb] <= (A_x[a] + limit)
        if func(A_y[a], B_y[bb])
            R[a] = B_x[bb]
            break
        end
        bb == B_n && break
        bb += 1
    end
    a % 1_000_000 == 0 && println(a)  # progress
    b == B_n && break
end

Before that, check the performance tips of the manual.

First of all, put all the critical code in a function and do not use global variables.

https://docs.julialang.org/en/v1/manual/performance-tips/

2 Likes

Since the sizes are quite large, this is perfect for parallelisation. I would recommend taking a look at multithreading first - using Threads.@threads on the start of a for loop. This kind of problem is known as “embarrasingly” parallel and is a good chance to get to grips with parallel programming.

But I agree with @lmiq, putting the code in a function first is your best bet.

1 Like