Most effective way to check if neighboring values in a matrix are the same

phantom · August 24, 2022, 9:41pm

Suppose A is the following matrix.

 0.0  0.0  0.0
 1.0  0.0  0.0
 0.0  1.0  0.0
 1.0  1.0  0.0
 0.0  0.0  1.0
 1.0  0.0  1.0
 0.0  1.0  1.0
 1.0  1.0  1.0

I want to check each row of the matrix for the number of times an entry is bordered by different values. i.e. the number of instances where a 0 is surrounded by 1’s and a 1 is surrounded by zeros. For purposes of this test A[1,1] is considered to be bordered by both A[1,2] and A[1,3].

What is the most efficient way to go about this comparison?

From posts that I’ve seen it is recommended that I place ghost columns to correct for the circular element so I would end up with the matrix C = hcat(A[:,3] ,A ,A[:1])

0.0  0.0  0.0  0.0  0.0
0.0  1.0  0.0  0.0  1.0
0.0  0.0  1.0  0.0  0.0
0.0  1.0  1.0  0.0  1.0
1.0  0.0  0.0  1.0  0.0
1.0  1.0  0.0  1.0  1.0
1.0  0.0  1.0  1.0  0.0
1.0  1.0  1.0  1.0  1.0

From there the only way I could think of doing it was something like this.

    for j in 2:n+1 # where n is the original number of columns  
        B[:,j-1] = C[:,j-1] .!= C[:,j] .!= C[:,j+1]
    end 
    neighbors = sum(B, dims = 2)

is this the correct approach for this type of comparison? I’m wondering whether there is a more efficient/direct way to approach this kind of matching question? Thank you very much for you insight.

stevengj · August 24, 2022, 10:04pm

Realize that this allocates temporary arrays for the result of the broadcast operations, and even more if you don’t use @views. You can get rid of the extra allocations if you use .= (fusing broadcasted assignment):

@views function neighbors1(C)
    B = Array{Bool}(undef, size(C,1), size(C,2)-2)
    for j in 2:size(C,2)-1
        B[:,j-1] .= C[:,j-1] .!= C[:,j] .!= C[:,j+1]
    end 
    return sum(B, dims=2)
end

You could also just write a double loop:

function neighbors2(C)
    count = zeros(Int, size(C,1))
    for j = 2:size(C,2)-1, i = 1:size(C,1)
        count[i] += C[i,j-1] != C[i,j] != C[i,j+1]
    end
    return count
end

stevengj · August 24, 2022, 10:15pm

Hmm, for some reason the loop is still allocating, not sure why?

rafael.guerra · August 24, 2022, 11:11pm

By the way, is it possible to get better results with the CircularArrays.jl package than with the code below, which is an order of magnitude slower than Steve’s neighbors1() solution?

using CircularArrays

@views function circular1(a)
    B = Array{Bool}(undef, size(a))
    for j in axes(a, 2)
        @inbounds B[:,j] .= a[:,j-1] .!= a[:,j] .!= a[:,j+1]
    end
    return sum(B, dims=2)
end

# where:
a = CircularArray(A)

phantom · August 24, 2022, 11:36pm

thanks this was very helpful as was your previous post explaining @views. I am getting slightly improved speeds and reduced allocations when using the @views and .= operator. Does this look about right?

@btime originialfunc(20)
  292.932 ms (1048884 allocations: 1.04 GiB)
5.0

julia> @btime updatedfunc(20) # with @views and .= 
  268.725 ms (1048804 allocations: 1.04 GiB)
5.0

stevengj · August 25, 2022, 12:27am

This is still way too many allocations. It looks like chained comparisons (a .!= b .!= c) aren’t fusing into a single loop? The following does better:

@views function neighbors3(C)
    B = Array{Bool}(undef, size(C,1), size(C,2)-2)
    for j in 2:size(C,2)-1
        @. B[:,j-1] = (C[:,j-1] != C[:,j]) & (C[:,j] != C[:,j+1])
    end 
    return sum(B, dims=2)
end

Note that if you want to use circular boundary conditions you could do:

@views function neighbors3c(C)
    B = Array{Bool}(undef, size(C)...)
    for j in 1:size(C,2)
        jp1 = j < size(C, 2) ? j+1 : 1
        jm1 = j > 1 ? j-1 : size(C,2)
        @. B[:,j] = (C[:,jm1] != C[:,j]) & (C[:,j] != C[:,jp1])
    end 
    return sum(B, dims=2)
end

digital_carver · August 25, 2022, 4:43am

I can’t test it right now, but would an xor perform better? I.e.

        @. B[:,j-1] = (C[:,j-1] ⊻ C[:,j]) & (C[:,j] ⊻ C[:,j+1])

Since the array contains only 1’s and 0’s.

Edit: Just noticed the values are floats, however. Any conversion is probably costly enough to not be worth it.

RobertGregg · August 25, 2022, 7:54am

So this is a more unconventional approach, but I think its nice because you don’t have to create ghost columns:


#Overload the mod1 function to wrap indexing around 
Base.mod1(x::CartesianIndex{N}, i::Int...) where N = CartesianIndex(mod1.(x.I,i))

#Create a function to count differing neighbors in each row
function countDifferentNeighbors(A)

    Δy = CartesianIndex(0,1)
    gridSize = size(A)
    B = falses(gridSize)

    for i in CartesianIndices(A)
 
        left = mod1(i-Δy, gridSize...)
        right = mod1(i+Δy, gridSize...)

        if A[left] ≠ A[i] ≠ A[right]
            B[i] = true
        end

    end

    return sum(B, dims=2)
end

It seems pretty efficient with allocations as well:

julia> A = rand([0.0,1.0], 100, 3);

julia> @benchmark countDifferentNeighbors($A)
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):  6.380 μs … 499.400 μs  ┊ GC (min … max): 0.00% … 97.61%
 Time  (median):     6.820 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.161 μs ±   5.035 μs  ┊ GC (mean ± σ):  0.68% ±  0.98%

  ▅█▄ ▆█▇▂▆▇▄▁▄▄▂  ▁▁ ▂▂  ▁  ▁   ▂▁  ▂                        ▂
  ███▇███████████████▇██▇▆██▇██▆▇██▅███▄▅██▅▂▇█▆▄▆█▆▆▃▅█▇▄▃▂▇ █
  6.38 μs      Histogram: log(frequency) by time      10.9 μs <

 Memory estimate: 1.09 KiB, allocs estimate: 7.

You could probably get it a bit faster with @inbounds or @views or something.

phantom · August 25, 2022, 9:02pm

Thank you all very much for the amazingly insightful responses! There’s so much to learn here and its very helpful. @RobertGregg would you mind very much explaining a little how the overload of mod1 function is working? still trying to understand it. When I benchmark with @stevengj neighbors3 these are the results that I get where D is 2^20 x 20 matrix.

@benchmark neighbors3(D)
BenchmarkTools.Trial: 237 samples with 1 evaluation.
 Range (min … max):  18.999 ms … 25.482 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     21.342 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   21.148 ms ±  1.163 ms  ┊ GC (mean ± σ):  1.48% ± 2.93%

           █▅            ▁▆▅ ▂▄▂                               
  ▃▁▁▄▅▅▅▅▃███▃▅▃▃▃▃▁▃▃▁▅███████▆▅▆▄▁▄▃▃▁▃▁▁▃▃▃▃▁▁▃▁▁▃▁▁▁▁▃▁▄ ▃
  19 ms           Histogram: frequency by time        24.6 ms <

 Memory estimate: 28.00 MiB, allocs estimate: 8.

@benchmark countDifferentNeighbors(D)
BenchmarkTools.Trial: 31 samples with 1 evaluation.
 Range (min … max):  160.966 ms … 165.997 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     162.355 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   162.759 ms ±   1.553 ms  ┊ GC (mean ± σ):  0.13% ± 0.49%

   ▃ ▃  █    ▃    █       ▃                          ▃           
  ▇█▁█▇▁█▇▇▁▁█▁▁▇▇█▇▁▁▁▁▁▇█▁▁▁▁▁▁▇▁▇▁▁▁▁▁▁▁▁▇▁▇▁▁▁▁▁▁█▁▇▁▁▁▁▇▁▇ ▁
  161 ms           Histogram: frequency by time          166 ms <

 Memory estimate: 10.75 MiB, allocs estimate: 9.

I’m not sure if I’m doing this right but it seems like there are significant memory efficiencies but at a meaningful speed cost.

RobertGregg · August 25, 2022, 10:26pm

For sure! It’s definitely a little perplexing at first. The function mod1(x,n) maps the value x to be between 1 and n. If x>n then it wraps around back to 1. Likewise, if x<1 then x wraps around and is mapped to n.

A CartesianIndex is basically just a Tuple of indices, so the when I overloaded the mod1 function I told it to map all the indices to be between 1 and n where n is the relevant array dimension. So using the matrix size of (2^20, 20) from your example:

julia> Δy = CartesianIndex(0,1)
CartesianIndex(0, 1)

julia> i = CartesianIndex(1,20)
CartesianIndex(1, 20)

julia> i + Δy
CartesianIndex(1, 21) #oops we only have 20 columns

julia> mod1(i + Δy, (2^20,20)...) #wrap the 21 back to the 1st column
CartesianIndex(1, 1)

A few other notes:

When benchmarking with global variables you’ll want to use a $ (see the readme in BenchmarkTools.)
Are neighbors3() and countDifferentNeighbors() giving you the same result? I think neighbors3() requires the ghost columns and countDifferentNeighbors() does not, but I may be wrong. If that’s the case, the benchmark might be comparing different things.
I didn’t spend to much time optimizing, so you can probably get more speed out of countDifferentNeighbors() if you add @inbounds or make B a vector.

Hope this helps!

rocco_sprmnt21 · August 26, 2022, 9:36am

I expose some measures (which I hope I have done correctly) waiting for the functioning of the function neighbors3(...) to be clarified, I compare only the following 3

julia> A=rand([0,1], 2^20, 20);
julia> neighbors3c(CircularArray(A))==neighbors3(A)
false

ulia> function countDifferentNeighbors(A)

           Δy = CartesianIndex(0,1)
           gridSize = size(A)
           B = falses(gridSize)

           for i in CartesianIndices(A)

               left = mod1(i-Δy, gridSize...)
               right = mod1(i+Δy, gridSize...)

               if A[left] ≠ A[i] ≠ A[right]
                   B[i] = true


           end

           return sum(B, dims=2)
       end
countDifferentNeighbors (generic function with 1 method)     

julia> 
function cdn(A)
    rows, cols = size(A)
    B = falses(rows, cols)
    for c in axes(A,2)
        left = ifelse(c==1,cols, c-1)
        right = ifelse(c==cols, 1, c+1)
        for r in axes(A,1) 
            B[r,c] = A[r,left] == A[r,right] ≠ A[r,c]
        end
    end
    return sum(B, dims=2)
end
cdn (generic function with 1 method)

julia> @views function neighbors3c(C)
           B = Array{Bool}(undef, size(C)...)
           for j in 1:size(C,2)
               jp1 = j < size(C, 2) ? j+1 : 1
               jm1 = j > 1 ? j-1 : size(C,2)               
           @. B[:,j] = (C[:,jm1] != C[:,j]) & (C[:,j] != C[:,jp1])
           end
           return sum(B, dims=2)
       end
neighbors3c (generic function with 1 method)

julia> using CircularArrays, BenchmarkTools

julia> @btime countDifferentNeighbors(A);
  390.397 ms (9 allocations: 10.50 MiB)

julia> @btime cdn(A);
  126.136 ms (9 allocations: 10.50 MiB)

julia> @btime neighbors3c(CircularArray(A));
  247.197 ms (9 allocations: 28.00 MiB)

regarding the use of the mod1() function (*) I too have been curious to understand how it works, I used the help and I found a too formal explanation which would not hurt if one added that in essence mod1 (x, y) = mod (x, y) == 0? y: mod (x, y)

(*) I also thank @RobertGregg for having “exposed” it to our attention, as well as for its elegant solution to the problem that I was closely inspired by to propose a modification.
just an annotation instead of left =/= middle =/= right maybe it’s better left == right =/= middle, (even if in the case of only two values they are equivalent).

stevengj · August 26, 2022, 12:57pm

You can use neighbors3c if you want the cyclic behavior. However, on my machine it is close to the same speed as neighbors3 (not surprising since the inner loop is identical).

stevengj · August 26, 2022, 1:07pm

And this version (which also has cyclic boundaries) gets rid of the matrix allocation in B (it only allocates the final counts array), and is faster than neighbors3c on my machine:

function neighbors4c(C::AbstractMatrix)
    firstindex(C,1) == firstindex(C,2) == 1 || throw(ArgumentError("1-based array required"))
    c = zeros(Int, size(C,1))
    for j in 1:size(C,2)
        jp1 = j < size(C, 2) ? j+1 : 1
        jm1 = j > 1 ? j-1 : size(C,2)
        for i = 1:size(C,1)
            @inbounds c[i] += (C[i,jm1] != C[i,j]) & (C[i,j] != C[i,jp1])
        end
    end 
    return c
end

(The fastest thing in Julia is usually ultimately to write your own inner loops, if you know how to optimize them. You might be able to do even better by exploiting SIMD operations, e.g. with LoopVectorization.jl.)

using BenchmarkTools
C = rand(Bool, 1000, 200);
@btime neighbors3c($C);
@btime neighbors4c($C);
@btime countDifferentNeighbors($C);

gives

  54.754 μs (7 allocations: 203.44 KiB)
  48.908 μs (1 allocation: 7.94 KiB)
  5.491 ms (8 allocations: 32.55 KiB)

stevengj · August 26, 2022, 1:12pm

Note that @RobertGregg’s countDifferentNeighbors is paying a big performance price for calling mod1 in the innermost loop (and calling it on all of the indices), whereas my neighbors3c and neighbors4c do not (because the cyclic behavior is only for the column indices, in the outer loop).

countDifferentNeighbors also uses a BitMatrix for B, which is slower (but uses 8x less memory) than a Matrix{Bool} (but the memory savings are even greater for neighbors4c, which doesn’t allocate B at all).

stevengj · August 26, 2022, 2:04pm

Just for fun, here is an even faster version that unrolls the column loop 4x to combine some of the column operations:

function neighbors5c(C::AbstractMatrix)
    firstindex(C,1) == firstindex(C,2) == 1 || throw(ArgumentError("1-based array required"))
    c = zeros(Int, size(C,1))
    local j::Int
    # unroll column loop by 4 to combine column comparisons
    for outer j in 1:4:size(C,2)-3
        jm1 = j > 1 ? j-1 : size(C,2)
        jp1 = j < size(C, 2) ? j+1 : 1
        jp2 = j+2 ≤ size(C, 2) ? j+2 : j+2 - size(C, 2)
        jp3 = j+3 ≤ size(C, 2) ? j+3 : j+3 - size(C, 2)
        jp4 = j+4 ≤ size(C, 2) ? j+4 : j+4 - size(C, 2)
        @inbounds for i = 1:size(C,1)
            c1 = C[i,j] != C[i,jp1]
            c2 = C[i,jp1] != C[i,jp2]
            c3 = C[i,jp2] != C[i,jp3]
            c4 = C[i,jp3] != C[i,jp4]
            c[i] += ((C[i,jm1] != C[i,j]) & c1) + (c1&c2) + (c2&c3) + (c3&c4)            
        end
    end 
    for j in j+4:size(C,2)
        jm1 = j > 1 ? j-1 : size(C,2)
        jp1 = j < size(C, 2) ? j+1 : 1
        @inbounds for i = 1:size(C,1)
            c[i] += (C[i,jm1] != C[i,j]) & (C[i,j] != C[i,jp1])
        end
    end 
    return c
end

However, much as optimizing these kinds of things is a fun exercise, I should caution you that:

Optimizations (like unrolling above) that make the code less readable are only worth it if benchmarking shows that it’s really a critical operation in your code.
Even if it is a performance-critical operation, you are often better served by not considering the code in isolation. Can you use a better data structure? Can you combine this code with preceding or subsequent operations on the data so that you aren’t doing individual small “vectorized” operations?

RobertGregg · August 26, 2022, 5:11pm

@stevengj makes some great points. It can be fun to optimize small code snippets into oblivion but its also good to take a step back and think whether or not you need every bit of performance at the cost of readability. For example, this version is very readable:

function neighbors6c(A)
    
    center = 1:size(A,2)
    left = circshift(center, 1)
    right = circshift(center, -1)


    return @views sum(A[:,left] .≠ A[:,center] .≠ A[:,right], dims=2)
end

But it can’t compete with the unrolled loops:

julia> A = rand([0.0,1.0],2^20,20);

julia> @btime neighbors5c($A);
  6.237 ms (2 allocations: 8.00 MiB)

julia> @btime neighbors6c($A);
  46.092 ms (16 allocations: 13.01 MiB)

So I guess (1) sorry for taking over your post to discuss the philosophy behind code optimization (2) I hope whatever you’re doing runs much faster

rocco_sprmnt21 · August 28, 2022, 1:57pm

I made some changes to the cdn() function, following @stevengj 's remarks on using a vector, rather than a matrix, to account for good triples.
I also made a version that does not have to check for each triplet if it is at the beginning or at the end of the row.
I would have expected for arrays with many columns a marked improvement of cdnznoif() over cdnz(). But it doesn’t seem like that.

julia> A=rand([0.0,1.0], 10^3, 10^4);

julia> function neighbors5c(C::AbstractMatrix)
           firstindex(C,1) == firstindex(C,2) == 1 || throw(ArgumentError("1-based array required"))
           c = zeros(Int, size(C,1))
           local j::Int
           # unroll column loop by 4 to combine column comparisons
           for outer j in 1:4:size(C,2)-3
               jm1 = j > 1 ? j-1 : size(C,2)
               jp1 = j < size(C, 2) ? j+1 : 1
               jp2 = j+2 ≤ size(C, 2) ? j+2 : j+2 - size(C, 2)
               jp3 = j+3 ≤ size(C, 2) ? j+3 : j+3 - size(C, 2)
               jp4 = j+4 ≤ size(C, 2) ? j+4 : j+4 - size(C, 2)
               @inbounds for i = 1:size(C,1)
                   c1 = C[i,j] != C[i,jp1]
                   c2 = C[i,jp1] != C[i,jp2]
                   c3 = C[i,jp2] != C[i,jp3]
                   c4 = C[i,jp3] != C[i,jp4]
                   c[i] += ((C[i,jm1] != C[i,j]) & c1) + (c1&c2) + (c2&c3) + (c3&c4)   

               end
           end 
           for j in j+4:size(C,2)
               jm1 = j > 1 ? j-1 : size(C,2)
               jp1 = j < size(C, 2) ? j+1 : 1
               @inbounds for i = 1:size(C,1)
                   c[i] += (C[i,jm1] != C[i,j]) & (C[i,j] != C[i,jp1])
               end
           end 
           return c
       end
neighbors5c (generic function with 1 method)

julia> function neighbors6c(A)

           center = 1:size(A,2)
           left = circshift(center, 1)
           right = circshift(center, -1)
       
           return @views sum(A[:,left] .≠ A[:,center] .≠ A[:,right], dims=2)
       end
neighbors6c (generic function with 1 method)

julia> function cdn(A)
           rows, cols = size(A)
           B = falses(rows, cols)
           for c in axes(A,2)
               left = ifelse(c==1,cols, c-1)
               right = ifelse(c==cols, 1, c+1)
               for r in axes(A,1)
                   B[r,c] = A[r,left] == A[r,right] ≠ A[r,c]
               end
           end
           return sum(B, dims=2)
       end
cdn (generic function with 1 method)

julia> function cdnz(A)
           rows, cols = size(A)
           B = zeros(Int, size(A,1))
           for c in 1:cols
               left = ifelse(c==1,cols, c-1)
               right = ifelse(c==cols, 1, c+1)
               for r in 1:rows
       @inbounds      B[r] += A[r,left] == A[r,right] ≠ A[r,c]
               end
           end
           return B
       end
cdnz (generic function with 1 method)

julia> function cdnznoif(A)
           rows, cols = size(A)
           B = zeros(Int, size(A,1))
           for r in axes(A,1)
       @inbounds        B[r] += A[r,cols] == A[r,2] ≠ A[r,1]
           end
           for c in 2:cols-1
               for r in axes(A,1)
       @inbounds        B[r] += A[r,c-1] == A[r,c+1] ≠ A[r,c]
               end
           end
           for r in axes(A,1)
       @inbounds        B[r] += A[r,cols-1] == A[r,1] ≠ A[r,cols]
           end
           return B
       end
cdnznoif (generic function with 1 method)

julia> using CircularArrays, BenchmarkTools

julia> neighbors6c(CircularArray(A))==cdn(A)
true

julia> neighbors5c(A)==cdn(A)[:]
true

julia> cdnz(A)==cdn(A)[:]
true

julia> cdnznoif(A)==cdn(A)[:]
true

julia> @btime neighbors6c(CircularArray(A));
  168.290 ms (18 allocations: 2.55 MiB)

julia> @btime neighbors5c(A);
  3.817 ms (1 allocation: 7.94 KiB)

julia> @btime cdn(A);
  61.634 ms (8 allocations: 1.20 MiB)

julia> @btime cdnz(A);
  4.764 ms (1 allocation: 7.94 KiB)

julia> @btime cdnznoif(A);
  4.762 ms (1 allocation: 7.94 KiB)

stevengj · August 28, 2022, 4:25pm

The if statement in the column loop is not expensive because it is only in the outer loop, not the inner loop. That is, for an m \times n matrix, the if statement is only executed O(n) times, whereas the inner-loop body is executed O(mn) times, so the latter dominates the runtime when m \gg 1.

rocco_sprmnt21 · August 28, 2022, 5:44pm

It seems that even in the case of not very large n, expressions with ifelse do not weigh much.
I wonder if the compiler is not able to transform, in some way, the code of cdnz() into that of cdnznoif().

julia> A=rand([0.0,1.0], 10, 10^6);

julia> @btime cdnz(A);
  45.234 ms (1 allocation: 144 bytes)

julia> @btime cdnznoif(A);
  45.123 ms (1 allocation: 144 bytes)



julia> A=rand([0.0,1.0], 1, 10^6);

julia> @btime cdnznoif(A);
  6.456 ms (1 allocation: 64 bytes)

julia> @btime cdnz(A);
  6.253 ms (1 allocation: 64 bytes)

phantom · August 30, 2022, 12:07am

I feel like I’ve just gone through some serious Jedi mind training so I just want to thank all the Yoda’s for providing such an amazing education. Not only the code itself but the philosophy behind it is really invaluable to someone who doesn’t know what they are doing…yet. So thank you! Just in case anyone’s understanding requires as much assistance as I did, I’m posting a walkthrough of the fastest solution (@stevengj 's neighbors5c) and the simplest solution (@RobertGregg 's neighbors6c). (Sorry it’s taking me so long to work through the code, I just saw @rocco_sprmnt21 solution and I still have to work through that one.) My intention is just to provide a reference for any novice still working through the Julia manual, but if I end up inadvertently lowering the collective Julia IQ with this post please let me know and I’ll remove it.

fast solution @stevengj 's neighbors5c

As a reminder C is a f x r array where size(C,1) is the number of permutations possible given size(C,2) entries. E.g. given 6 entries size(C,2) == 6, size(C,1) == 2^6. For benchmarking purposes C = 2^20x 20

@benchmark neighbors5c(C)
BenchmarkTools.Trial: 747 samples with 1 evaluation.
 Range (min … max):  5.863 ms … 15.767 ms  ┊ GC (min … max): 0.00% … 59.38%
 Time  (median):     6.413 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.685 ms ±  1.347 ms  ┊ GC (mean ± σ):  3.99% ±  9.96%

     █▇▃                                                      
  ▄▁▇███▄▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▅▄▅▅▁▁▅▄▄▅▁▅▅▁▅ ▇
  5.86 ms      Histogram: log(frequency) by time       14 ms <

 Memory estimate: 8.00 MiB, allocs estimate: 2.

function neighbors5c(C::AbstractMatrix)
    firstindex(C,1) == firstindex(C,2) == 1 || throw(ArgumentError("1-based array required"))
    # I may be wrong here but I think this is to check that C is a two dimensional array with 
    # 1 based indexing. 
    c = zeros(Int, size(C,1))
    # This creates an array of zeros to tally the number of entries with different neighbors in 
    # each permutation.  The values in the array will be of type Int.  
    # The length of the array is the number of possible permutations given n # of neighbors, i.e. size(C,1).     
    local j::Int
    # local j declares a new local variable in this scope, regardless of whether 
    # there is already a variable named j in an outer scope. `::` defines j to be type Int.    
    for outer j in 1:4:size(C,2)-3
    # unroll column loop by 4 to combine column comparisons
    # generally in julia you can only shadow an existing local variable 
    # (i.e. create a new variable with the same name as an existing variable)
    # by explicitly stating `local` as shown above.  
    # A for loop or comprehension iteration variable however is always a new variable. 
    # So even if  `j` were used again it would be a new variable rather than the `j` defined above.   
    # "for outer" re-use's the existing local variable j as the iteration variable.  
    # 1:4:size(C,2)-3 counts from 1:size(C,2)-3, in increments of 4.  Incidentally size(C,2) == size(C)[2]
    # Because ghost columns aren't being used, he defines who is considered a neighbor in the following manner      
        jm1 = j > 1 ? j-1 : size(C,2)  
        # a ? b:c is shorthand for if a, evaluabe b otherwise c i.e. 
        # if j > 1 then jm1 = the neighbor on your left, if you are the first element in the array 
        # then j would ==1, and the neighbor to your left is the last element in the array.  
        jp1 = j < size(C, 2) ? j+1 : 1
        # if j < number of entries, then the neighbor to the right = j + 1, for the last element in the array the neighbor to the right  = the first element. 
        jp2 = j+2 ≤ size(C, 2) ? j+2 : j+2 - size(C, 2)
        # likewise here if j+2 < = number of entries, then two players to the right = j+2, 
        # otherwise say you are n - 1,  then 2 players to the right of you is = 1 == j + 2 - n   
        jp3 = j+3 ≤ size(C, 2) ? j+3 : j+3 - size(C, 2)
        jp4 = j+4 ≤ size(C, 2) ? j+4 : j+4 - size(C, 2)
        # @inbounds eliminates bounds checking to improve performance.  
        @inbounds for i = 1:size(C,1)
            c1 = C[i,j] != C[i,jp1]   # if j != to right neighbor, c1 == 1
            c2 = C[i,jp1] != C[i,jp2] # if j's right neighbor != to his right neighbor, c2 == 1 
            c3 = C[i,jp2] != C[i,jp3]
            c4 = C[i,jp3] != C[i,jp4]
            c[i] += ((C[i,jm1] != C[i,j]) & c1) + (c1&c2) + (c2&c3) + (c3&c4) 
            # Here we tally all the times we have differing neighbors into the array we created above.
            # `(C[i,jm1] != C[i,j])` will yield a 1 for true and 0 for false the `&` here means bitwise &.  
            # A bitwise & B converts A and B to binary numbers and compare the bits of each number to its counterpart
            # when inputs A and B are true, output is true; otherwise output is false
            # A   B   A & B 
            # -------------
            # 0 | 0 | 0
            # 0 | 1 | 0
            # 1 | 0 | 0
            # 1 | 1 | 0
            #  now suppose you have an array of 6 values.  c1 == 1 != 2, c2 == 2 != 3, c3 ==  3!=4, c4 == c4 !=5 
            # `(C[i,jm1] != C[i,j]& c1)` checks if the 1st element is different from the 2nd and last element i.e. 6 != 1 & 1 !=2 
            #  likewise (c1&c2) checks 2 for different neighbors, (c2&c3) checks 3 for differing neighbors, (c3&c4) checks 4 for differing neighbors   
        end
    end 
    # in the above example where size(C,2) == 6 the above loop would check values 1:4 for differing neighbors. So we still 
    # have to check the remaining players in the same manner as above. 
    for j in j+4:size(C,2)
        jm1 = j > 1 ? j-1 : size(C,2)
        jp1 = j < size(C, 2) ? j+1 : 1
        @inbounds for i = 1:size(C,1)
            c[i] += (C[i,jm1] != C[i,j]) & (C[i,j] != C[i,jp1])
        end
    end 
    return c
end

simple solution @RobertGregg 's neighbors6c. This one is very cool because it accomplishes all of the above in only three lines of code but it takes a bit longer for large n.

@benchmark neighbors6c(E)
BenchmarkTools.Trial: 110 samples with 1 evaluation.
 Range (min … max):  43.807 ms … 57.498 ms  ┊ GC (min … max): 0.00% … 13.07%
 Time  (median):     44.483 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   45.474 ms ±  3.067 ms  ┊ GC (mean ± σ):  1.11% ±  3.62%

  ▁▄█▅▄                                                        
  ██████▁▇▆▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▅▁▁▁▅▅▁▁▁▅▁▁▁▁▁▁▅▁▁▅▅▁▅ ▅
  43.8 ms      Histogram: log(frequency) by time      57.3 ms <

 Memory estimate: 13.01 MiB, allocs estimate: 16.

function neighbors6c(A)
    center = 1:size(A,2)
    # create an array counting the number of entries 
    left = circshift(center, 1)
    # rotate the array 1 to the left 
    right = circshift(center, -1)
    # rotate the array 1 to the right 
    return @views sum(A[:,left] .≠ A[:,center] .≠ A[:,right], dims=2)
    # since the indexes are shifted this compares A[1:1] !=A[1,2] != A[1,size(A,2)]
    # the `.` proceeding the `!=` indicated an element wise operator 
    # @stevengj has a really great post about @views here https://discourse.julialang.org/t/could-you-explain-what-are-views/17535/2
end

Topic		Replies	Views
Optimal way of checking neighboring elements in matrix New to Julia indexing	3	1802	October 28, 2019
Find neighbored segments/labels in Matrix General Usage images , array	7	974	July 6, 2021
How to efficiently find columns of the matrix which are the same? New to Julia question , optimization	10	1113	November 7, 2023
Fastest way possible to find index of value equals 1 across a matrix column Performance question , performance , speed-optimization	7	984	October 6, 2023
Simple method for getting indices of a sub array along top, bottom or side of larger matrix A? General Usage	2	237	March 18, 2022

Most effective way to check if neighboring values in a matrix are the same

Related topics