Most effective way to check if neighboring values in a matrix are the same

Suppose A is the following matrix.

 0.0  0.0  0.0
 1.0  0.0  0.0
 0.0  1.0  0.0
 1.0  1.0  0.0
 0.0  0.0  1.0
 1.0  0.0  1.0
 0.0  1.0  1.0
 1.0  1.0  1.0

I want to check each row of the matrix for the number of times an entry is bordered by different values. i.e. the number of instances where a 0 is surrounded by 1’s and a 1 is surrounded by zeros. For purposes of this test A[1,1] is considered to be bordered by both A[1,2] and A[1,3].

What is the most efficient way to go about this comparison?

From posts that I’ve seen it is recommended that I place ghost columns to correct for the circular element so I would end up with the matrix C = hcat(A[:,3] ,A ,A[:1])

0.0  0.0  0.0  0.0  0.0
0.0  1.0  0.0  0.0  1.0
0.0  0.0  1.0  0.0  0.0
0.0  1.0  1.0  0.0  1.0
1.0  0.0  0.0  1.0  0.0
1.0  1.0  0.0  1.0  1.0
1.0  0.0  1.0  1.0  0.0
1.0  1.0  1.0  1.0  1.0 

From there the only way I could think of doing it was something like this.

    for j in 2:n+1 # where n is the original number of columns  
        B[:,j-1] = C[:,j-1] .!= C[:,j] .!= C[:,j+1]
    end 
    neighbors = sum(B, dims = 2)

is this the correct approach for this type of comparison? I’m wondering whether there is a more efficient/direct way to approach this kind of matching question? Thank you very much for you insight.

1 Like

Realize that this allocates temporary arrays for the result of the broadcast operations, and even more if you don’t use @views. You can get rid of the extra allocations if you use .= (fusing broadcasted assignment):

@views function neighbors1(C)
    B = Array{Bool}(undef, size(C,1), size(C,2)-2)
    for j in 2:size(C,2)-1
        B[:,j-1] .= C[:,j-1] .!= C[:,j] .!= C[:,j+1]
    end 
    return sum(B, dims=2)
end

You could also just write a double loop:

function neighbors2(C)
    count = zeros(Int, size(C,1))
    for j = 2:size(C,2)-1, i = 1:size(C,1)
        count[i] += C[i,j-1] != C[i,j] != C[i,j+1]
    end
    return count
end
3 Likes

Hmm, for some reason the loop is still allocating, not sure why?

1 Like

By the way, is it possible to get better results with the CircularArrays.jl package than with the code below, which is an order of magnitude slower than Steve’s neighbors1() solution?

using CircularArrays

@views function circular1(a)
    B = Array{Bool}(undef, size(a))
    for j in axes(a, 2)
        @inbounds B[:,j] .= a[:,j-1] .!= a[:,j] .!= a[:,j+1]
    end
    return sum(B, dims=2)
end

# where:
a = CircularArray(A)
1 Like

thanks this was very helpful as was your previous post explaining @views. I am getting slightly improved speeds and reduced allocations when using the @views and .= operator. Does this look about right?

@btime originialfunc(20)
  292.932 ms (1048884 allocations: 1.04 GiB)
5.0

julia> @btime updatedfunc(20) # with @views and .= 
  268.725 ms (1048804 allocations: 1.04 GiB)
5.0

This is still way too many allocations. It looks like chained comparisons (a .!= b .!= c) aren’t fusing into a single loop? The following does better:

@views function neighbors3(C)
    B = Array{Bool}(undef, size(C,1), size(C,2)-2)
    for j in 2:size(C,2)-1
        @. B[:,j-1] = (C[:,j-1] != C[:,j]) & (C[:,j] != C[:,j+1])
    end 
    return sum(B, dims=2)
end

Note that if you want to use circular boundary conditions you could do:

@views function neighbors3c(C)
    B = Array{Bool}(undef, size(C)...)
    for j in 1:size(C,2)
        jp1 = j < size(C, 2) ? j+1 : 1
        jm1 = j > 1 ? j-1 : size(C,2)
        @. B[:,j] = (C[:,jm1] != C[:,j]) & (C[:,j] != C[:,jp1])
    end 
    return sum(B, dims=2)
end
1 Like

I can’t test it right now, but would an xor perform better? I.e.

        @. B[:,j-1] = (C[:,j-1] ⊻ C[:,j]) & (C[:,j] ⊻ C[:,j+1])

Since the array contains only 1’s and 0’s.

Edit: Just noticed the values are floats, however. Any conversion is probably costly enough to not be worth it.

1 Like

So this is a more unconventional approach, but I think its nice because you don’t have to create ghost columns:


#Overload the mod1 function to wrap indexing around 
Base.mod1(x::CartesianIndex{N}, i::Int...) where N = CartesianIndex(mod1.(x.I,i))

#Create a function to count differing neighbors in each row
function countDifferentNeighbors(A)

    Ξ”y = CartesianIndex(0,1)
    gridSize = size(A)
    B = falses(gridSize)

    for i in CartesianIndices(A)
 
        left = mod1(i-Ξ”y, gridSize...)
        right = mod1(i+Ξ”y, gridSize...)

        if A[left] β‰  A[i] β‰  A[right]
            B[i] = true
        end

    end

    return sum(B, dims=2)
end

It seems pretty efficient with allocations as well:

julia> A = rand([0.0,1.0], 100, 3);

julia> @benchmark countDifferentNeighbors($A)
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):  6.380 ΞΌs … 499.400 ΞΌs  β”Š GC (min … max): 0.00% … 97.61%
 Time  (median):     6.820 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   7.161 ΞΌs Β±   5.035 ΞΌs  β”Š GC (mean Β± Οƒ):  0.68% Β±  0.98%

  β–…β–ˆβ–„ β–†β–ˆβ–‡β–‚β–†β–‡β–„β–β–„β–„β–‚  ▁▁ β–‚β–‚  ▁  ▁   ▂▁  β–‚                        β–‚
  β–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–‡β–†β–ˆβ–ˆβ–‡β–ˆβ–ˆβ–†β–‡β–ˆβ–ˆβ–…β–ˆβ–ˆβ–ˆβ–„β–…β–ˆβ–ˆβ–…β–‚β–‡β–ˆβ–†β–„β–†β–ˆβ–†β–†β–ƒβ–…β–ˆβ–‡β–„β–ƒβ–‚β–‡ β–ˆ
  6.38 ΞΌs      Histogram: log(frequency) by time      10.9 ΞΌs <

 Memory estimate: 1.09 KiB, allocs estimate: 7.

You could probably get it a bit faster with @inbounds or @views or something.

3 Likes

Thank you all very much for the amazingly insightful responses! There’s so much to learn here and its very helpful. @RobertGregg would you mind very much explaining a little how the overload of mod1 function is working? still trying to understand it. When I benchmark with @stevengj neighbors3 these are the results that I get where D is 2^20 x 20 matrix.

@benchmark neighbors3(D)
BenchmarkTools.Trial: 237 samples with 1 evaluation.
 Range (min … max):  18.999 ms … 25.482 ms  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     21.342 ms              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   21.148 ms Β±  1.163 ms  β”Š GC (mean Β± Οƒ):  1.48% Β± 2.93%

           β–ˆβ–…            ▁▆▅ β–‚β–„β–‚                               
  β–ƒβ–β–β–„β–…β–…β–…β–…β–ƒβ–ˆβ–ˆβ–ˆβ–ƒβ–…β–ƒβ–ƒβ–ƒβ–ƒβ–β–ƒβ–ƒβ–β–…β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–…β–†β–„β–β–„β–ƒβ–ƒβ–β–ƒβ–β–β–ƒβ–ƒβ–ƒβ–ƒβ–β–β–ƒβ–β–β–ƒβ–β–β–β–β–ƒβ–β–„ β–ƒ
  19 ms           Histogram: frequency by time        24.6 ms <

 Memory estimate: 28.00 MiB, allocs estimate: 8.

@benchmark countDifferentNeighbors(D)
BenchmarkTools.Trial: 31 samples with 1 evaluation.
 Range (min … max):  160.966 ms … 165.997 ms  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     162.355 ms               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   162.759 ms Β±   1.553 ms  β”Š GC (mean Β± Οƒ):  0.13% Β± 0.49%

   β–ƒ β–ƒ  β–ˆ    β–ƒ    β–ˆ       β–ƒ                          β–ƒ           
  β–‡β–ˆβ–β–ˆβ–‡β–β–ˆβ–‡β–‡β–β–β–ˆβ–β–β–‡β–‡β–ˆβ–‡β–β–β–β–β–β–‡β–ˆβ–β–β–β–β–β–β–‡β–β–‡β–β–β–β–β–β–β–β–β–‡β–β–‡β–β–β–β–β–β–β–ˆβ–β–‡β–β–β–β–β–‡β–β–‡ ▁
  161 ms           Histogram: frequency by time          166 ms <

 Memory estimate: 10.75 MiB, allocs estimate: 9.

I’m not sure if I’m doing this right but it seems like there are significant memory efficiencies but at a meaningful speed cost.

For sure! It’s definitely a little perplexing at first. The function mod1(x,n) maps the value x to be between 1 and n. If x>n then it wraps around back to 1. Likewise, if x<1 then x wraps around and is mapped to n.

A CartesianIndex is basically just a Tuple of indices, so the when I overloaded the mod1 function I told it to map all the indices to be between 1 and n where n is the relevant array dimension. So using the matrix size of (2^20, 20) from your example:

julia> Ξ”y = CartesianIndex(0,1)
CartesianIndex(0, 1)

julia> i = CartesianIndex(1,20)
CartesianIndex(1, 20)

julia> i + Ξ”y
CartesianIndex(1, 21) #oops we only have 20 columns

julia> mod1(i + Ξ”y, (2^20,20)...) #wrap the 21 back to the 1st column
CartesianIndex(1, 1)

A few other notes:

  • When benchmarking with global variables you’ll want to use a $ (see the readme in BenchmarkTools.)
  • Are neighbors3() and countDifferentNeighbors() giving you the same result? I think neighbors3() requires the ghost columns and countDifferentNeighbors() does not, but I may be wrong. If that’s the case, the benchmark might be comparing different things.
  • I didn’t spend to much time optimizing, so you can probably get more speed out of countDifferentNeighbors() if you add @inbounds or make B a vector.

Hope this helps!

1 Like

I expose some measures (which I hope I have done correctly) waiting for the functioning of the function neighbors3(...) to be clarified, I compare only the following 3

julia> A=rand([0,1], 2^20, 20);
julia> neighbors3c(CircularArray(A))==neighbors3(A)
false
ulia> function countDifferentNeighbors(A)

           Ξ”y = CartesianIndex(0,1)
           gridSize = size(A)
           B = falses(gridSize)

           for i in CartesianIndices(A)

               left = mod1(i-Ξ”y, gridSize...)
               right = mod1(i+Ξ”y, gridSize...)

               if A[left] β‰  A[i] β‰  A[right]
                   B[i] = true


           end

           return sum(B, dims=2)
       end
countDifferentNeighbors (generic function with 1 method)     

julia> 
function cdn(A)
    rows, cols = size(A)
    B = falses(rows, cols)
    for c in axes(A,2)
        left = ifelse(c==1,cols, c-1)
        right = ifelse(c==cols, 1, c+1)
        for r in axes(A,1) 
            B[r,c] = A[r,left] == A[r,right] β‰  A[r,c]
        end
    end
    return sum(B, dims=2)
end
cdn (generic function with 1 method)

julia> @views function neighbors3c(C)
           B = Array{Bool}(undef, size(C)...)
           for j in 1:size(C,2)
               jp1 = j < size(C, 2) ? j+1 : 1
               jm1 = j > 1 ? j-1 : size(C,2)               
           @. B[:,j] = (C[:,jm1] != C[:,j]) & (C[:,j] != C[:,jp1])
           end
           return sum(B, dims=2)
       end
neighbors3c (generic function with 1 method)

julia> using CircularArrays, BenchmarkTools

julia> @btime countDifferentNeighbors(A);
  390.397 ms (9 allocations: 10.50 MiB)

julia> @btime cdn(A);
  126.136 ms (9 allocations: 10.50 MiB)

julia> @btime neighbors3c(CircularArray(A));
  247.197 ms (9 allocations: 28.00 MiB)

regarding the use of the mod1() function (*) I too have been curious to understand how it works, I used the help and I found a too formal explanation which would not hurt if one added that in essence mod1 (x, y) = mod (x, y) == 0? y: mod (x, y)

(*) I also thank @RobertGregg for having β€œexposed” it to our attention, as well as for its elegant solution to the problem that I was closely inspired by to propose a modification.
just an annotation instead of left =/= middle =/= right maybe it’s better left == right =/= middle, (even if in the case of only two values they are equivalent).

1 Like

You can use neighbors3c if you want the cyclic behavior. However, on my machine it is close to the same speed as neighbors3 (not surprising since the inner loop is identical).

1 Like

And this version (which also has cyclic boundaries) gets rid of the matrix allocation in B (it only allocates the final counts array), and is faster than neighbors3c on my machine:

function neighbors4c(C::AbstractMatrix)
    firstindex(C,1) == firstindex(C,2) == 1 || throw(ArgumentError("1-based array required"))
    c = zeros(Int, size(C,1))
    for j in 1:size(C,2)
        jp1 = j < size(C, 2) ? j+1 : 1
        jm1 = j > 1 ? j-1 : size(C,2)
        for i = 1:size(C,1)
            @inbounds c[i] += (C[i,jm1] != C[i,j]) & (C[i,j] != C[i,jp1])
        end
    end 
    return c
end

(The fastest thing in Julia is usually ultimately to write your own inner loops, if you know how to optimize them. You might be able to do even better by exploiting SIMD operations, e.g. with LoopVectorization.jl.)

using BenchmarkTools
C = rand(Bool, 1000, 200);
@btime neighbors3c($C);
@btime neighbors4c($C);
@btime countDifferentNeighbors($C);

gives

  54.754 ΞΌs (7 allocations: 203.44 KiB)
  48.908 ΞΌs (1 allocation: 7.94 KiB)
  5.491 ms (8 allocations: 32.55 KiB)
1 Like

Note that @RobertGregg’s countDifferentNeighbors is paying a big performance price for calling mod1 in the innermost loop (and calling it on all of the indices), whereas my neighbors3c and neighbors4c do not (because the cyclic behavior is only for the column indices, in the outer loop).

countDifferentNeighbors also uses a BitMatrix for B, which is slower (but uses 8x less memory) than a Matrix{Bool} (but the memory savings are even greater for neighbors4c, which doesn’t allocate B at all).

2 Likes

Just for fun, here is an even faster version that unrolls the column loop 4x to combine some of the column operations:

function neighbors5c(C::AbstractMatrix)
    firstindex(C,1) == firstindex(C,2) == 1 || throw(ArgumentError("1-based array required"))
    c = zeros(Int, size(C,1))
    local j::Int
    # unroll column loop by 4 to combine column comparisons
    for outer j in 1:4:size(C,2)-3
        jm1 = j > 1 ? j-1 : size(C,2)
        jp1 = j < size(C, 2) ? j+1 : 1
        jp2 = j+2 ≀ size(C, 2) ? j+2 : j+2 - size(C, 2)
        jp3 = j+3 ≀ size(C, 2) ? j+3 : j+3 - size(C, 2)
        jp4 = j+4 ≀ size(C, 2) ? j+4 : j+4 - size(C, 2)
        @inbounds for i = 1:size(C,1)
            c1 = C[i,j] != C[i,jp1]
            c2 = C[i,jp1] != C[i,jp2]
            c3 = C[i,jp2] != C[i,jp3]
            c4 = C[i,jp3] != C[i,jp4]
            c[i] += ((C[i,jm1] != C[i,j]) & c1) + (c1&c2) + (c2&c3) + (c3&c4)            
        end
    end 
    for j in j+4:size(C,2)
        jm1 = j > 1 ? j-1 : size(C,2)
        jp1 = j < size(C, 2) ? j+1 : 1
        @inbounds for i = 1:size(C,1)
            c[i] += (C[i,jm1] != C[i,j]) & (C[i,j] != C[i,jp1])
        end
    end 
    return c
end

However, much as optimizing these kinds of things is a fun exercise, I should caution you that:

  1. Optimizations (like unrolling above) that make the code less readable are only worth it if benchmarking shows that it’s really a critical operation in your code.
  2. Even if it is a performance-critical operation, you are often better served by not considering the code in isolation. Can you use a better data structure? Can you combine this code with preceding or subsequent operations on the data so that you aren’t doing individual small β€œvectorized” operations?
3 Likes

@stevengj makes some great points. It can be fun to optimize small code snippets into oblivion but its also good to take a step back and think whether or not you need every bit of performance at the cost of readability. For example, this version is very readable:

function neighbors6c(A)
    
    center = 1:size(A,2)
    left = circshift(center, 1)
    right = circshift(center, -1)


    return @views sum(A[:,left] .β‰  A[:,center] .β‰  A[:,right], dims=2)
end

But it can’t compete with the unrolled loops:

julia> A = rand([0.0,1.0],2^20,20);

julia> @btime neighbors5c($A);
  6.237 ms (2 allocations: 8.00 MiB)

julia> @btime neighbors6c($A);
  46.092 ms (16 allocations: 13.01 MiB)

So I guess (1) sorry for taking over your post to discuss the philosophy behind code optimization (2) I hope whatever you’re doing runs much faster :slight_smile:

2 Likes

I made some changes to the cdn() function, following @stevengj 's remarks on using a vector, rather than a matrix, to account for good triples.
I also made a version that does not have to check for each triplet if it is at the beginning or at the end of the row.
I would have expected for arrays with many columns a marked improvement of cdnznoif() over cdnz(). But it doesn’t seem like that.

julia> A=rand([0.0,1.0], 10^3, 10^4);

julia> function neighbors5c(C::AbstractMatrix)
           firstindex(C,1) == firstindex(C,2) == 1 || throw(ArgumentError("1-based array required"))
           c = zeros(Int, size(C,1))
           local j::Int
           # unroll column loop by 4 to combine column comparisons
           for outer j in 1:4:size(C,2)-3
               jm1 = j > 1 ? j-1 : size(C,2)
               jp1 = j < size(C, 2) ? j+1 : 1
               jp2 = j+2 ≀ size(C, 2) ? j+2 : j+2 - size(C, 2)
               jp3 = j+3 ≀ size(C, 2) ? j+3 : j+3 - size(C, 2)
               jp4 = j+4 ≀ size(C, 2) ? j+4 : j+4 - size(C, 2)
               @inbounds for i = 1:size(C,1)
                   c1 = C[i,j] != C[i,jp1]
                   c2 = C[i,jp1] != C[i,jp2]
                   c3 = C[i,jp2] != C[i,jp3]
                   c4 = C[i,jp3] != C[i,jp4]
                   c[i] += ((C[i,jm1] != C[i,j]) & c1) + (c1&c2) + (c2&c3) + (c3&c4)   

               end
           end 
           for j in j+4:size(C,2)
               jm1 = j > 1 ? j-1 : size(C,2)
               jp1 = j < size(C, 2) ? j+1 : 1
               @inbounds for i = 1:size(C,1)
                   c[i] += (C[i,jm1] != C[i,j]) & (C[i,j] != C[i,jp1])
               end
           end 
           return c
       end
neighbors5c (generic function with 1 method)

julia> function neighbors6c(A)

           center = 1:size(A,2)
           left = circshift(center, 1)
           right = circshift(center, -1)
       
           return @views sum(A[:,left] .β‰  A[:,center] .β‰  A[:,right], dims=2)
       end
neighbors6c (generic function with 1 method)

julia> function cdn(A)
           rows, cols = size(A)
           B = falses(rows, cols)
           for c in axes(A,2)
               left = ifelse(c==1,cols, c-1)
               right = ifelse(c==cols, 1, c+1)
               for r in axes(A,1)
                   B[r,c] = A[r,left] == A[r,right] β‰  A[r,c]
               end
           end
           return sum(B, dims=2)
       end
cdn (generic function with 1 method)

julia> function cdnz(A)
           rows, cols = size(A)
           B = zeros(Int, size(A,1))
           for c in 1:cols
               left = ifelse(c==1,cols, c-1)
               right = ifelse(c==cols, 1, c+1)
               for r in 1:rows
       @inbounds      B[r] += A[r,left] == A[r,right] β‰  A[r,c]
               end
           end
           return B
       end
cdnz (generic function with 1 method)

julia> function cdnznoif(A)
           rows, cols = size(A)
           B = zeros(Int, size(A,1))
           for r in axes(A,1)
       @inbounds        B[r] += A[r,cols] == A[r,2] β‰  A[r,1]
           end
           for c in 2:cols-1
               for r in axes(A,1)
       @inbounds        B[r] += A[r,c-1] == A[r,c+1] β‰  A[r,c]
               end
           end
           for r in axes(A,1)
       @inbounds        B[r] += A[r,cols-1] == A[r,1] β‰  A[r,cols]
           end
           return B
       end
cdnznoif (generic function with 1 method)

julia> using CircularArrays, BenchmarkTools

julia> neighbors6c(CircularArray(A))==cdn(A)
true

julia> neighbors5c(A)==cdn(A)[:]
true

julia> cdnz(A)==cdn(A)[:]
true

julia> cdnznoif(A)==cdn(A)[:]
true

julia> @btime neighbors6c(CircularArray(A));
  168.290 ms (18 allocations: 2.55 MiB)

julia> @btime neighbors5c(A);
  3.817 ms (1 allocation: 7.94 KiB)

julia> @btime cdn(A);
  61.634 ms (8 allocations: 1.20 MiB)

julia> @btime cdnz(A);
  4.764 ms (1 allocation: 7.94 KiB)

julia> @btime cdnznoif(A);
  4.762 ms (1 allocation: 7.94 KiB)
1 Like

The if statement in the column loop is not expensive because it is only in the outer loop, not the inner loop. That is, for an m \times n matrix, the if statement is only executed O(n) times, whereas the inner-loop body is executed O(mn) times, so the latter dominates the runtime when m \gg 1.

1 Like

It seems that even in the case of not very large n, expressions with ifelse do not weigh much.
I wonder if the compiler is not able to transform, in some way, the code of cdnz() into that of cdnznoif().

julia> A=rand([0.0,1.0], 10, 10^6);

julia> @btime cdnz(A);
  45.234 ms (1 allocation: 144 bytes)

julia> @btime cdnznoif(A);
  45.123 ms (1 allocation: 144 bytes)



julia> A=rand([0.0,1.0], 1, 10^6);

julia> @btime cdnznoif(A);
  6.456 ms (1 allocation: 64 bytes)

julia> @btime cdnz(A);
  6.253 ms (1 allocation: 64 bytes)
1 Like

I feel like I’ve just gone through some serious Jedi mind training so I just want to thank all the Yoda’s for providing such an amazing education. Not only the code itself but the philosophy behind it is really invaluable to someone who doesn’t know what they are doing…yet. So thank you! Just in case anyone’s understanding requires as much assistance as I did, I’m posting a walkthrough of the fastest solution (@stevengj 's neighbors5c) and the simplest solution (@RobertGregg 's neighbors6c). (Sorry it’s taking me so long to work through the code, I just saw @rocco_sprmnt21 solution and I still have to work through that one.) My intention is just to provide a reference for any novice still working through the Julia manual, but if I end up inadvertently lowering the collective Julia IQ with this post please let me know and I’ll remove it.

fast solution @stevengj 's neighbors5c

As a reminder C is a f x r array where size(C,1) is the number of permutations possible given size(C,2) entries. E.g. given 6 entries size(C,2) == 6, size(C,1) == 2^6. For benchmarking purposes C = 2^20x 20

@benchmark neighbors5c(C)
BenchmarkTools.Trial: 747 samples with 1 evaluation.
 Range (min … max):  5.863 ms … 15.767 ms  β”Š GC (min … max): 0.00% … 59.38%
 Time  (median):     6.413 ms              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   6.685 ms Β±  1.347 ms  β”Š GC (mean Β± Οƒ):  3.99% Β±  9.96%

     β–ˆβ–‡β–ƒ                                                      
  β–„β–β–‡β–ˆβ–ˆβ–ˆβ–„β–β–β–„β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–„β–β–β–β–…β–„β–…β–…β–β–β–…β–„β–„β–…β–β–…β–…β–β–… β–‡
  5.86 ms      Histogram: log(frequency) by time       14 ms <

 Memory estimate: 8.00 MiB, allocs estimate: 2.
function neighbors5c(C::AbstractMatrix)
    firstindex(C,1) == firstindex(C,2) == 1 || throw(ArgumentError("1-based array required"))
    # I may be wrong here but I think this is to check that C is a two dimensional array with 
    # 1 based indexing. 
    c = zeros(Int, size(C,1))
    # This creates an array of zeros to tally the number of entries with different neighbors in 
    # each permutation.  The values in the array will be of type Int.  
    # The length of the array is the number of possible permutations given n # of neighbors, i.e. size(C,1).     
    local j::Int
    # local j declares a new local variable in this scope, regardless of whether 
    # there is already a variable named j in an outer scope. `::` defines j to be type Int.    
    for outer j in 1:4:size(C,2)-3
    # unroll column loop by 4 to combine column comparisons
    # generally in julia you can only shadow an existing local variable 
    # (i.e. create a new variable with the same name as an existing variable)
    # by explicitly stating `local` as shown above.  
    # A for loop or comprehension iteration variable however is always a new variable. 
    # So even if  `j` were used again it would be a new variable rather than the `j` defined above.   
    # "for outer" re-use's the existing local variable j as the iteration variable.  
    # 1:4:size(C,2)-3 counts from 1:size(C,2)-3, in increments of 4.  Incidentally size(C,2) == size(C)[2]
    # Because ghost columns aren't being used, he defines who is considered a neighbor in the following manner      
        jm1 = j > 1 ? j-1 : size(C,2)  
        # a ? b:c is shorthand for if a, evaluabe b otherwise c i.e. 
        # if j > 1 then jm1 = the neighbor on your left, if you are the first element in the array 
        # then j would ==1, and the neighbor to your left is the last element in the array.  
        jp1 = j < size(C, 2) ? j+1 : 1
        # if j < number of entries, then the neighbor to the right = j + 1, for the last element in the array the neighbor to the right  = the first element. 
        jp2 = j+2 ≀ size(C, 2) ? j+2 : j+2 - size(C, 2)
        # likewise here if j+2 < = number of entries, then two players to the right = j+2, 
        # otherwise say you are n - 1,  then 2 players to the right of you is = 1 == j + 2 - n   
        jp3 = j+3 ≀ size(C, 2) ? j+3 : j+3 - size(C, 2)
        jp4 = j+4 ≀ size(C, 2) ? j+4 : j+4 - size(C, 2)
        # @inbounds eliminates bounds checking to improve performance.  
        @inbounds for i = 1:size(C,1)
            c1 = C[i,j] != C[i,jp1]   # if j != to right neighbor, c1 == 1
            c2 = C[i,jp1] != C[i,jp2] # if j's right neighbor != to his right neighbor, c2 == 1 
            c3 = C[i,jp2] != C[i,jp3]
            c4 = C[i,jp3] != C[i,jp4]
            c[i] += ((C[i,jm1] != C[i,j]) & c1) + (c1&c2) + (c2&c3) + (c3&c4) 
            # Here we tally all the times we have differing neighbors into the array we created above.
            # `(C[i,jm1] != C[i,j])` will yield a 1 for true and 0 for false the `&` here means bitwise &.  
            # A bitwise & B converts A and B to binary numbers and compare the bits of each number to its counterpart
            # when inputs A and B are true, output is true; otherwise output is false
            # A   B   A & B 
            # -------------
            # 0 | 0 | 0
            # 0 | 1 | 0
            # 1 | 0 | 0
            # 1 | 1 | 0
            #  now suppose you have an array of 6 values.  c1 == 1 != 2, c2 == 2 != 3, c3 ==  3!=4, c4 == c4 !=5 
            # `(C[i,jm1] != C[i,j]& c1)` checks if the 1st element is different from the 2nd and last element i.e. 6 != 1 & 1 !=2 
            #  likewise (c1&c2) checks 2 for different neighbors, (c2&c3) checks 3 for differing neighbors, (c3&c4) checks 4 for differing neighbors   
        end
    end 
    # in the above example where size(C,2) == 6 the above loop would check values 1:4 for differing neighbors. So we still 
    # have to check the remaining players in the same manner as above. 
    for j in j+4:size(C,2)
        jm1 = j > 1 ? j-1 : size(C,2)
        jp1 = j < size(C, 2) ? j+1 : 1
        @inbounds for i = 1:size(C,1)
            c[i] += (C[i,jm1] != C[i,j]) & (C[i,j] != C[i,jp1])
        end
    end 
    return c
end

simple solution @RobertGregg 's neighbors6c. This one is very cool because it accomplishes all of the above in only three lines of code but it takes a bit longer for large n.

@benchmark neighbors6c(E)
BenchmarkTools.Trial: 110 samples with 1 evaluation.
 Range (min … max):  43.807 ms … 57.498 ms  β”Š GC (min … max): 0.00% … 13.07%
 Time  (median):     44.483 ms              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   45.474 ms Β±  3.067 ms  β”Š GC (mean Β± Οƒ):  1.11% Β±  3.62%

  β–β–„β–ˆβ–…β–„                                                        
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–β–‡β–†β–β–β–β–β–β–β–…β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–…β–β–β–β–…β–β–β–β–…β–…β–β–β–β–…β–β–β–β–β–β–β–…β–β–β–…β–…β–β–… β–…
  43.8 ms      Histogram: log(frequency) by time      57.3 ms <

 Memory estimate: 13.01 MiB, allocs estimate: 16.
function neighbors6c(A)
    center = 1:size(A,2)
    # create an array counting the number of entries 
    left = circshift(center, 1)
    # rotate the array 1 to the left 
    right = circshift(center, -1)
    # rotate the array 1 to the right 
    return @views sum(A[:,left] .β‰  A[:,center] .β‰  A[:,right], dims=2)
    # since the indexes are shifted this compares A[1:1] !=A[1,2] != A[1,size(A,2)]
    # the `.` proceeding the `!=` indicated an element wise operator 
    # @stevengj has a really great post about @views here https://discourse.julialang.org/t/could-you-explain-what-are-views/17535/2
end
2 Likes