Compare permutaion of two numbers efficiently?

There’s something else going on as well, something shouldn’t be cached here, I think:

julia> @benchmark map(a -> isPerm2(a...), zip($x,$y))
BenchmarkTools.Trial:
  memory estimate:  1.06 KiB
  allocs estimate:  1
  --------------
  minimum time:     51.200 μs (0.00% GC)
  median time:      51.400 μs (0.00% GC)
  mean time:        53.868 μs (0.00% GC)
  maximum time:     153.700 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark map(a -> isPerm2(a...), zip(g,h)) setup=(g = rand(94852219876975:9485229765865123,1000); h = rand(94852219876975:9485229765865123,1000))
BenchmarkTools.Trial:
  memory estimate:  1.06 KiB
  allocs estimate:  1
  --------------
  minimum time:     51.400 μs (0.00% GC)
  median time:      51.800 μs (0.00% GC)
  mean time:        53.125 μs (0.27% GC)
  maximum time:     1.550 ms (93.81% GC)
  --------------
  samples:          10000
  evals/sample:     1

Why are you using UInt as eltype in the test vector?

That is an elegant solution.
It seems to be a count sort in disguise.
It should be possible to add an additional entry for the sign, if you care about negative numbers.

There’s no special reason, other than me writing it that way. Both UInt and Int are equally fast. Conceptually, having a negative number of a “wrong” digit could indicate which of the two input numbers has more of that digit, but since test isn’t returned either way, it didn’t matter to me.

One thing I did deliberately was one(eltype(test)), to make sure whatever we add is of the same type as the place we add it onto.

1 Like

This is the fastest I got so far, pretty impressive! Thank you all for the useful discussions.

function isPerm2(a,b)
   test = @MVector fill(0, 10)
   while a > 0 && b > 0
       a, a_ind = divrem(a, 10)
       b, b_ind = divrem(b, 10)
       test[a_ind + 1] += 1
       test[b_ind + 1] -= 1
   end
   a == 0 && b == 0 && all(iszero, test)
end

@benchmark isPerm2.($x,$y)
BenchmarkTools.Trial:
  memory estimate:  4.42 KiB
  allocs estimate:  3
  --------------
  minimum time:     25.399 μs (0.00% GC)
  median time:      25.701 μs (0.00% GC)
  mean time:        27.687 μs (0.00% GC)
  maximum time:     195.100 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1
1 Like

Comparing with your initial post, that’s a 12x speedup in both the minimum and mean time and a 5.5x speedup in the maximum time - that’s pretty neat :smiley:

1 Like

You didn’t use @inbounds. On my computer it makes a difference, but apparently not on yours?

Oh, you’re right! Maybe it’s the version?

julia> versioninfo()
Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4

depending on the size of your vectors and the size of the numbers it may be worth preallocating test

this does not reflect the updates above though


function isperm3(test,a,b)
    fill!(test,0)
    while a > 0 && b > 0
        a, a_ind = divrem(a, 10)
        b, b_ind = divrem(b, 10)
        test[a_ind + 1] += 1
        test[b_ind + 1] -= 1
    end
    a != 0 || b != 0 && return false
    !any(!=(0), test)
end

function isperm3(a::Vector,b::Vector)
    sz=length(a)
    res = falses(sz)
    test = fill(0, 10)
    @inbounds for i=1:sz
        res[i] = _isperm(test,a[i],b[i]) 
    end
    return res
end

@btime isPerm2.($x,$y); # 103 micro seconds
@btime isperm3($x,$y) #70 micro seconds

Yes, of course, I prefer the --check-bounds=no flag since I tend to forget annotating my code @inbounds and it helps reduce code clutter.

Does that completely disable bounds checks in all cases? That sounds pretty risky. I thought @inbounds was for cases where you are absolutely positive that you are guaranteed to not go out of bounds.

1 Like

I usually run two different instances of the REPL during development. One with almost all optimizations turned off and a nother with full optimizations. These are called from a .bat file. In 99% of the time, I notice no need for bounds-checking. This way, I free my mind from the burden of annotating code and focus on the algorithm. I also don’t use @simd as I add --inline=yes in the production version. Not only this, I also used to turn on fast-math in final versions without problems other than known issues like sinpi, cospi. I was sad when they lately decided to disable the --math-mode=fast flag, why then there is an /Ofast in C and Fortran? Sometimes one wants to squeeze that last drop of performance out of their code.

It may be worth terminating early if the number of digits do not match (no need to perform any divisions). For the median time this is a bit faster on my machine.
Of course, depending on your input / use case, this may not be meaningful


function isPerm2b(a,b)
    ndigits(a) == ndigits(b) || return false
    test = @MVector fill(0, 10)
    while a > 0 && b > 0
        a, a_ind = divrem(a, 10)
        b, b_ind = divrem(b, 10)
        test[a_ind + 1] += 1
        test[b_ind + 1] -= 1
    end
    a == 0 && b == 0 && all(iszero, test)
end