Any idea on how to improve the following code for computing the permanent of a matrix?

I used the Glynn formula where delta is obtained via Gray code.

```
# from https://stackoverflow.com/questions/34236332/c-code-for-generating-next-bit-to-flip-in-a-gray-code
function grayBitToFlip(n::Int)
n1 = (n-1) ⊻ ((n-1)>>1)
n2 = n ⊻ (n>>1)
d = n1 ⊻ n2
j = 0
while d>0
d >>= 1
j += 1
end
j
end
function perm(A::AbstractMatrix{T}) where T
n,m = size(A)
if (n == m)
D = Int8(1)
δ = [Int8(2) for i ∈ 1:n]
v = sum(A,2)
p = prod(v)
@inbounds for i ∈ 1:(2^(n-1)-1)
a = grayBitToFlip(i)+1
@views v .-= δ[a].*A[:,a]
δ[a] = -δ[a]
D = -D
p += D*prod(v)
end
p * 2.0^(1-n)
else
throw(ArgumentError("perm: argument must be a square matrix"))
end
end
```

A few details of the code. First of all, technically I compute perm(transpose(A)), which is equal to perm(A), but expressions are faster because of column-major ordering. D is the product of deltas. Because of the Gray code, at each step D changes its sign. The vector v is the sum of delta*A. The above formula comes from calculating how v changes after a step i, and including the resulting overall factor 2 into the definition of delta.