Compare permutaion of two numbers efficiently?

I’m trying to compare two numbers to check if they have identical digits but in different orders. I extracted the digits and sorted them, then I compare the two extracted vectors to check equality. This works fine, but I feel there is a more efficient way, any ideas? Thanks.

function isPerm(n,phi)
    n_digits = sort(digits(n))
    phi_digits = sort(digits(phi))
    n_digits == phi_digits
end

x = rand(9485221:9485229,1000);
y = rand(9485221:9485229,1000);

using BenchmarkTools

@benchmark isPerm.($x,$y)
BenchmarkTools.Trial:
  memory estimate:  566.92 KiB
  allocs estimate:  4003
  --------------
  minimum time:     320.700 μs (0.00% GC)
  median time:      327.600 μs (0.00% GC)
  mean time:        337.926 μs (1.59% GC)
  maximum time:     1.079 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Algorithmically, this should be fine; perhaps you can bail early if the numbers have a different number of digits because then the condition cannot hold. This in turn can be checked with eg ndigits before calling digits. Benchmark, YMMV depending on how of then you expect this.

julia> x = rand(9485221:9485229,1000);

julia> y = rand(9485221:9485229,1000);

julia> @benchmark isPerm2.($x,$y)
BenchmarkTools.Trial:
  memory estimate:  160.67 KiB
  allocs estimate:  1003
  --------------
  minimum time:     62.100 μs (0.00% GC)
  median time:      63.400 μs (0.00% GC)
  mean time:        76.013 μs (3.00% GC)
  maximum time:     1.122 ms (87.89% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark isPerm_old.($x,$y)
BenchmarkTools.Trial:
  memory estimate:  566.92 KiB
  allocs estimate:  4003
  --------------
  minimum time:     303.400 μs (0.00% GC)
  median time:      310.600 μs (0.00% GC)
  mean time:        337.907 μs (2.39% GC)
  maximum time:     1.221 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Doesn’t look too much better, right? But don’t worry, that’s because your initial numbers are small:

julia> x = rand(94852219876975:9485229765865123,1000);

julia> y = rand(94852219876975:9485229765865123,1000);

julia> @benchmark isPerm2.($x,$y)
BenchmarkTools.Trial:
  memory estimate:  160.67 KiB
  allocs estimate:  1003
  --------------
  minimum time:     81.700 μs (0.00% GC)
  median time:      83.000 μs (0.00% GC)
  mean time:        94.922 μs (2.26% GC)
  maximum time:     878.900 μs (86.01% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark isPerm_old.($x,$y)
BenchmarkTools.Trial:
  memory estimate:  819.28 KiB
  allocs estimate:  4020
  --------------
  minimum time:     807.300 μs (0.00% GC)
  median time:      814.300 μs (0.00% GC)
  mean time:        857.359 μs (1.36% GC)
  maximum time:     2.376 ms (41.96% GC)
  --------------
  samples:          5811
  evals/sample:     1

Here’s the code:

function isPerm2(a,b)
    test = fill(0, 10)
    while a > 0 && b > 0
        a, a_ind = divrem(a, 10)
        b, b_ind = divrem(b, 10)
        test[a_ind + 1] += 1
        test[b_ind + 1] -= 1
    end
    a != 0 || b != 0 && return false
    !any(!=(0), test)
end

Now if only the allocation of the test array was elided…

I just noticed, this doesn’t take negative numbers into account :thinking:

2 Likes

Actually, checking the lengths early had no noticable difference, I guess ndigits is not very light?. I’m thinking of an adhoc algorithm to check this equality using a loop or something that can exit early with the first difference. This function is called repeatedly in a hot loop.

That’s hard to do! The first difference isn’t enough, since that might be fixed later on. You could check the remaining number of digits against the remaining number of inconsistencies though.

Thanks, that definitely is significantly faster than original function. And the numbers are guaranteed to be positive integers, so no problem for negatives. Only thing I don’t like is the test array with the fixed length of 10.

Hm, I’m not even sure the allocations come from the test array. Its statically sized and should be elided.

The size is 10 since there are 10 digits - the algorithm works by getting the current digit via modulo and indexing into the array.

If all your test numbers have the same number of digits (as they did in your benchmark), there’s no gain.

3 Likes

This was really hard to read for some reason. You mean

all(==(0), test)

right? (or all(iszero, test))

1 Like

No, that was deliberate - in theory, any can bail early, whereas all has to check… well all elements :smiley: The array is only 10 elements large, so it shouldn’t matter too much, but I did see smaller average maximum time compared to the same code with all. We don’t care if there’s more than one digit that caused a mismatch, we just care for any mismatch to occur.

But yes, the result of !any(!=(0), test) the same as with all(==(0), test).

Are you sure? all should bail immediately if it finds one false.

2 Likes

What happens if you use StaticArrays.MVector?

2 Likes

Yes, StaticArrays.MVector is 2X faster on my machine. Surprisingly, allocations are the same.

Well, it’s what I see on my machine :man_shrugging:

High Maximum with all
julia> function isPerm2(a,b)
           test = zeros(UInt, 10)
           while a > 0 && b > 0
               a, a_ind = divrem(a, 10)
               b, b_ind = divrem(b, 10)
               test[a_ind + 1] += one(eltype(test))
               test[b_ind + 1] -= one(eltype(test))
           end
           a != 0 || b != 0 && return false
           all(==(0), test)
       end
isPerm2 (generic function with 1 method)

julia> @benchmark isPerm2.($x,$y)
BenchmarkTools.Trial:
  memory estimate:  160.67 KiB
  allocs estimate:  1003
  --------------
  minimum time:     82.400 μs (0.00% GC)
  median time:      83.600 μs (0.00% GC)
  mean time:        96.823 μs (3.53% GC)
  maximum time:     2.159 ms (83.55% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark isPerm2.($x,$y)
BenchmarkTools.Trial:
  memory estimate:  160.67 KiB
  allocs estimate:  1003
  --------------
  minimum time:     82.500 μs (0.00% GC)
  median time:      83.900 μs (0.00% GC)
  mean time:        97.933 μs (3.68% GC)
  maximum time:     1.457 ms (90.87% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark isPerm2.($x,$y)
BenchmarkTools.Trial:
  memory estimate:  160.67 KiB
  allocs estimate:  1003
  --------------
  minimum time:     82.700 μs (0.00% GC)
  median time:      89.400 μs (0.00% GC)
  mean time:        117.816 μs (4.09% GC)
  maximum time:     4.955 ms (93.30% GC)
  --------------
  samples:          10000
  evals/sample:     1

Faster overall, but this comes at a cost of larger maximum time, significantly so:

julia> function isPerm2(a,b)
           test = zeros(MVector{10,UInt})
           while a > 0 && b > 0
               a, a_ind = divrem(a, 10)
               b, b_ind = divrem(b, 10)
               test[a_ind + 1] += one(eltype(test))
               test[b_ind + 1] -= one(eltype(test))
           end
           a != 0 || b != 0 && return false
           !any(!=(0), test)
       end
isPerm2 (generic function with 1 method)

julia> @benchmark isPerm2.($x,$y)
BenchmarkTools.Trial:
  memory estimate:  98.17 KiB
  allocs estimate:  1003
  --------------
  minimum time:     56.600 μs (0.00% GC)
  median time:      58.000 μs (0.00% GC)
  mean time:        69.160 μs (7.61% GC)
  maximum time:     2.781 ms (95.85% GC)
  --------------
  samples:          10000
  evals/sample:     1

@inbounds to the rescue!

Version with any
julia> function isPerm2(a,b)
           test = zeros(MVector{10,UInt})
           @inbounds while a > 0 && b > 0
               a, a_ind = divrem(a, 10)
               b, b_ind = divrem(b, 10)
               test[a_ind + 1] += one(eltype(test))
               test[b_ind + 1] -= one(eltype(test))
           end
           a != 0 || b != 0 && return false
           !any(!=(0), test)
       end
isPerm2 (generic function with 1 method)

julia> @benchmark isPerm2.($x,$y)
BenchmarkTools.Trial:
  memory estimate:  4.42 KiB
  allocs estimate:  3
  --------------
  minimum time:     52.900 μs (0.00% GC)
  median time:      53.200 μs (0.00% GC)
  mean time:        55.344 μs (0.00% GC)
  maximum time:     227.600 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

And now, the difference between the extra negation from any and no negation from all is important:

Version with all
julia> function isPerm2(a,b)
           test = zeros(MVector{10,UInt})
           @inbounds while a > 0 && b > 0
               a, a_ind = divrem(a, 10)
               b, b_ind = divrem(b, 10)
               test[a_ind + 1] += one(eltype(test))
               test[b_ind + 1] -= one(eltype(test))
           end
           a != 0 || b != 0 && return false
           all(==(0), test)
       end
isPerm2 (generic function with 1 method)

julia> @benchmark isPerm2.($x,$y)
BenchmarkTools.Trial:
  memory estimate:  4.42 KiB
  allocs estimate:  3
  --------------
  minimum time:     52.800 μs (0.00% GC)
  median time:      53.200 μs (0.00% GC)
  mean time:        54.569 μs (0.00% GC)
  maximum time:     158.900 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Notably, @inbounds doesn’t help if the array isn’t an MVector:

Regular zeros call
julia> function isPerm2(a,b)
           test = zeros(UInt, 10)
           @inbounds while a > 0 && b > 0
               a, a_ind = divrem(a, 10)
               b, b_ind = divrem(b, 10)
               test[a_ind + 1] += one(eltype(test))
               test[b_ind + 1] -= one(eltype(test))
           end
           a != 0 || b != 0 && return false
           all(==(0), test)
       end
isPerm2 (generic function with 1 method)

julia> @benchmark isPerm2.($x,$y)
BenchmarkTools.Trial:
  memory estimate:  160.67 KiB
  allocs estimate:  1003
  --------------
  minimum time:     78.500 μs (0.00% GC)
  median time:      79.800 μs (0.00% GC)
  mean time:        93.693 μs (4.20% GC)
  maximum time:     3.420 ms (93.06% GC)
  --------------
  samples:          10000
  evals/sample:     1

For the ultimate speed, switch from broadcasting to map to save 2 more allocations, leaving only the allocation of the result array:

Map for speed
julia> function isPerm2(a,b)
           test = zeros(MVector{10,UInt})
           @inbounds while a > 0 && b > 0
               a, a_ind = divrem(a, 10)
               b, b_ind = divrem(b, 10)
               test[a_ind + 1] += one(eltype(test))
               test[b_ind + 1] -= one(eltype(test))
           end
           a != 0 || b != 0 && return false
           all(==(0), test)
       end
isPerm2 (generic function with 1 method)

julia> @benchmark map(a -> isPerm2(a...), zip($x,$y))
BenchmarkTools.Trial:
  memory estimate:  1.06 KiB
  allocs estimate:  1
  --------------
  minimum time:     51.600 μs (0.00% GC)
  median time:      51.900 μs (0.00% GC)
  mean time:        54.417 μs (0.00% GC)
  maximum time:     142.000 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1
1 Like

I don’t buy it. There’s something fishy going on. all shouldn’t be slower than any. They definitely have the same bailout behaviour.

Edit: I can’t see any difference between any and all in your benchmark.

It’s the maximum time - 227µs vs 158µs. In the latest version with @inbounds, I think that’s due to the extra ! the any version has to do.

In the versions before that, I don’t have better statistics myself :man_shrugging: It’s just what I saw across repeated runs, I don’t have an explanation for that either.

That’s just some random fluctuation from the OS or whatever, no? Anyway, in that case, it’s the all version that is faster.

For clarity, you can write the return statement as

return iszero(a) && iszero(b) && all(iszero, test)

or

return a == 0 && b == 0 && all(==(0), test)

if you prefer.

1 Like