# Parallelization way

There are two point clouds A and B, the clouds have a different number of points.

In cloud A there can be several Y along the X axis,
In cloud B, there is only one Y for every X.

Data is ordered along the x-axis.

For each point of cloud A on the X-axis within [a_x; a_x + 10]
find the nearest point from cloud B whose sum of y-values is more than zero: (a_y + b_y) >= 0

I do it but it’s so slow:

``````##
using Random

limit = 10

# points number
A_n = 100_000_000
B_n = 30_000_000

#
func(a_y, b_y) = (a_y + b_y) >= 0

# cloud A
A_x = Int32.(cumsum(rand(0:3, A_n)))
A_y = Int32.(rand(-2:2, A_n))

# cloud B
B_x = Int32.(cumsum(rand(1:9, B_n)))
B_y = Int32.(rand(-2:2, B_n))

# for results
R = zeros(Int32, A_n)

## calculations
b = 1
bb = 1
for a = 1 : A_n
while B_x[b] < A_x[a]
b += 1
b == B_n && break
end
bb = b
while B_x[bb] <= (A_x[a] + limit)
if func(A_y[a], B_y[bb])
R[a] = B_x[bb]
break
end
bb == B_n && break
bb += 1
end
a % 1_000_000 == 0 && println(a)  # progress
b == B_n && break
end
``````

Before that, check the performance tips of the manual.

First of all, put all the critical code in a function and do not use global variables.

https://docs.julialang.org/en/v1/manual/performance-tips/

2 Likes

Since the sizes are quite large, this is perfect for parallelisation. I would recommend taking a look at multithreading first - using `Threads.@threads` on the start of a for loop. This kind of problem is known as “embarrasingly” parallel and is a good chance to get to grips with parallel programming.

But I agree with @lmiq, putting the code in a function first is your best bet.

1 Like