Hello!

I have a code which has to deal with large arrays:

```
l = length(A)
m = 56^2
flag = trues(l)
flatten_connectivity = Array{Float32, 2}(undef, l, m)
for i = 1:l
flatten_connectivity[i, :] = sort(vec(B[A[i], A[i]]))
end
for i = 1:l
flag[i] || continue
for j = i+1:l
flag[j] || continue
if all(abs(flatten_connectivity[i, k] - flatten_connectivity[j, k]) <= 0.0001 for k=1:m)
flag[j] = false
end
end
end
A = A[flag]
```

Which is the fastest code I was able to get for this work (without threading)

“A” is a Vector{Vector{Int8}} that can get quite large (100_000_000x56) and B is Matrix{Float32}(120x120), in this extreme case I have to find a balance between speed and memory usage. The maximum memory usage is close to 1 TB.

I have access to an HPC where each node have up to 512 GB so I have to make sure to use at least 2 nodes.

I am not sure how to spread the work with the Distributed package, my first approach was to do:

```
flatten_connectivity = SharedArray{Float32}(l, m)
@distributed (+) for i = 1:l
flatten_connectivity[i, :] = sort(vec(connectivity[li1s_li2s[i], li1s_li2s[i]]))
end
```

Which work, but as soon as I leave the loop, all the memory is placed again on the main process, which will provoke an OOM error. I tried to use the same thing on the second loop but I get “break or continue outside of for loop”…

Should I just split the array and use @spawnat so that each process deal with smaller part of the array?