Double for loop using filtered ranges


#1

I have some MWE below. What I want is to have a subsection of a range, interact with the rest of the range, but not itself.

Also, I posted this on stackoverflow … I should have put it here first I think…

For instance if the range is 1:100 , I want to have a for loop that will have each index in 4:6 , interact with all values of 1:100 BUT NOT 4:6 .

I want to do this using ranges/filters to avoid generating temporary arrays.

In my case the total range is the number of atoms in the system. The sub-range, is the atoms in a specific molecule. I need to do calculations where each atom in a molecule interacts with all other atoms, but not the atoms in the same molecule.

Further

I am trying to avoid using if statements because that can potentially complicate parallelizing the code later (such is my impression, correct me if I am wrong). Doing this with an if statement would be

for i=4:6
    for j = 1:100
        if j == 4 || j==5 || j==6
            continue
        end
        println(i, " ", j) #roughly 0.1-100 microseconds to run the real call here
    end
end

I have actual indexing in my code, I would never hardcode values like the above… But I want to avoid that if statement.

Trials

The following does what I want, but I now realize that using filter is bad when it comes to memory and the amount used scales linearly with b .

a = 4:6
b = 1:100
for  i in a
    for j in filter((b) -> !(b in a),b) 
        print(i, " ", j) 
    end
end

Is there a way to get the double for loop I want where the outer is a sub-range of the inner, but the inner does not include the outer sub-range and most importantly is fast and does not create alot of memory usage like filter does?


#2

A user on stackoverflow recommended the following package

It looks like it does what I am after, but it doesn’t seem to be registered :S


#3

You can use a generator to do the filtering. Below I also changed your inner function (print is not so nice for benchmarking) and made it return a summed value to be able to verify correctness:

func(i, j) = sin(i) * cos(j)

function loop(range, n)
    s = 0.0
    first, last = range[1], range[end]
    for i = range
        for j = (k for k = 1:n if k < first || k > last)
            s += func(i, j)
        end
    end
    s
end

Usage and timing:

julia> @btime loop(4:6, 100)
  5.140 μs (0 allocations: 0 bytes)
2.239502224477359

Could you explain a bit more how you want to parallelize this, and why using if statements or similar is not an option? Do you intend to parallelize the inner loop directly? What will the ranges be for your real code, and for how many molecules will you run this?


#4

Great stuff… I should get your business card one of these days.

I must start by saying I do not know much about parallel coding. I have used openMP before. My reason for not wanting if statements is that I perhaps naively think the early returns will interfere with the management of the threads. Edit: in this case, each pass has the same amount of if statements, so running things in parallel should not affect timing at all.

I would like to make my code GPU usable, rather than cpu parallel if I can. I don’t know what dragons come along with that choice. However, I think the easiest form of parallel coding is cpu openMP, which I can’t seem to find for Julia. I have prime algorithms for openMP but it must be able to handle some sort of reduction. @threads does not seem to do this, so it gives me wrong summations. Using mpi is an alright option, but I think gpu would be preferable.

Molecules… well, thousands, which can lead to upwards of 30,000 atoms. I don’t plan on making my code enter the millions of atoms range, but I think it should be able to do 30,000. To be honest though, I will get a great deal done with it using only 5000 atoms.

So long as it is serial I have to stick to around 3000 atoms or less. My monte carlo code is inevitable N^2, but the molecular dynamics can be made N \log N


#5

Shadowing both range, first and last?:face_with_raised_eyebrow: That seems, at least, like a dangerous habit :wink:

It could be more generic and shorter by using !(k in myrange).


#6

What do you mean by @threads not supporting reductions?

Do you intend to parallelize this on a single computer only (using all cores), or a cluster?

When I asked about the number of molecules I meant if you’ll be calling this loop function multiple times, or just a single time? Since if you’ll be calling it multiple times, it might be wiser to parallelize those calls, rather than parallelizing the inner-most loop.

Haha, yes good point. The only thing missing was calling it sum instead of s :slight_smile:

Again good point, I was initially thinking an implementation like this:

function loop2(molecule_range, n)
    s = 0.0
    first_idx, last_idx = molecule_range[1], molecule_range[end]
    for i = molecule_range
        for j = 1:first_idx-1
            s += func(i, j)
        end
        for j = last_idx+1:n
            s += func(i, j)
        end
    end
    s
end

(but probably cleaned up to remove the code duplication), which would be a lot faster for large ranges:

julia> @btime loop(2:99, 100)
  12.864 μs (0 allocations: 0 bytes)
-0.6483986089918431

julia> @btime loop2(2:99, 100)
  2.191 μs (0 allocations: 0 bytes)
-0.6483986089918431

But for small ranges it’s likely not worth it, in which case I agree that your suggestion is cleaner.