I’m trying to compare two numbers to check if they have identical digits but in different orders. I extracted the digits and sorted them, then I compare the two extracted vectors to check equality. This works fine, but I feel there is a more efficient way, any ideas? Thanks.
function isPerm(n,phi)
n_digits = sort(digits(n))
phi_digits = sort(digits(phi))
n_digits == phi_digits
end
x = rand(9485221:9485229,1000);
y = rand(9485221:9485229,1000);
using BenchmarkTools
@benchmark isPerm.($x,$y)
BenchmarkTools.Trial:
memory estimate: 566.92 KiB
allocs estimate: 4003
--------------
minimum time: 320.700 μs (0.00% GC)
median time: 327.600 μs (0.00% GC)
mean time: 337.926 μs (1.59% GC)
maximum time: 1.079 ms (0.00% GC)
--------------
samples: 10000
evals/sample: 1
Algorithmically, this should be fine; perhaps you can bail early if the numbers have a different number of digits because then the condition cannot hold. This in turn can be checked with eg ndigits before calling digits. Benchmark, YMMV depending on how of then you expect this.
julia> x = rand(9485221:9485229,1000);
julia> y = rand(9485221:9485229,1000);
julia> @benchmark isPerm2.($x,$y)
BenchmarkTools.Trial:
memory estimate: 160.67 KiB
allocs estimate: 1003
--------------
minimum time: 62.100 μs (0.00% GC)
median time: 63.400 μs (0.00% GC)
mean time: 76.013 μs (3.00% GC)
maximum time: 1.122 ms (87.89% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark isPerm_old.($x,$y)
BenchmarkTools.Trial:
memory estimate: 566.92 KiB
allocs estimate: 4003
--------------
minimum time: 303.400 μs (0.00% GC)
median time: 310.600 μs (0.00% GC)
mean time: 337.907 μs (2.39% GC)
maximum time: 1.221 ms (0.00% GC)
--------------
samples: 10000
evals/sample: 1
Doesn’t look too much better, right? But don’t worry, that’s because your initial numbers are small:
julia> x = rand(94852219876975:9485229765865123,1000);
julia> y = rand(94852219876975:9485229765865123,1000);
julia> @benchmark isPerm2.($x,$y)
BenchmarkTools.Trial:
memory estimate: 160.67 KiB
allocs estimate: 1003
--------------
minimum time: 81.700 μs (0.00% GC)
median time: 83.000 μs (0.00% GC)
mean time: 94.922 μs (2.26% GC)
maximum time: 878.900 μs (86.01% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark isPerm_old.($x,$y)
BenchmarkTools.Trial:
memory estimate: 819.28 KiB
allocs estimate: 4020
--------------
minimum time: 807.300 μs (0.00% GC)
median time: 814.300 μs (0.00% GC)
mean time: 857.359 μs (1.36% GC)
maximum time: 2.376 ms (41.96% GC)
--------------
samples: 5811
evals/sample: 1
Here’s the code:
function isPerm2(a,b)
test = fill(0, 10)
while a > 0 && b > 0
a, a_ind = divrem(a, 10)
b, b_ind = divrem(b, 10)
test[a_ind + 1] += 1
test[b_ind + 1] -= 1
end
a != 0 || b != 0 && return false
!any(!=(0), test)
end
Now if only the allocation of the test array was elided…
I just noticed, this doesn’t take negative numbers into account
Actually, checking the lengths early had no noticable difference, I guess ndigits is not very light?. I’m thinking of an adhoc algorithm to check this equality using a loop or something that can exit early with the first difference. This function is called repeatedly in a hot loop.
That’s hard to do! The first difference isn’t enough, since that might be fixed later on. You could check the remaining number of digits against the remaining number of inconsistencies though.
Thanks, that definitely is significantly faster than original function. And the numbers are guaranteed to be positive integers, so no problem for negatives. Only thing I don’t like is the test array with the fixed length of 10.
No, that was deliberate - in theory, any can bail early, whereas all has to check… well all elements The array is only 10 elements large, so it shouldn’t matter too much, but I did see smaller average maximum time compared to the same code with all. We don’t care if there’s more than one digit that caused a mismatch, we just care for any mismatch to occur.
But yes, the result of !any(!=(0), test) the same as with all(==(0), test).
It’s the maximum time - 227µs vs 158µs. In the latest version with @inbounds, I think that’s due to the extra ! the any version has to do.
In the versions before that, I don’t have better statistics myself It’s just what I saw across repeated runs, I don’t have an explanation for that either.