How to compute gradient with respect to a circuit parameter?

Hi everyone!

I’m trying to compute the gradient with respect to a certain circuit parameter and I would like to do it efficiently. I know there exist several efficient methods for this in Yao, i.e., faithful_grad() and expect’(). However, as far as I understand this computes the gradient with respect to all circuit parameters and I’m only interested in one particular parameter. At the moment, I’m doing it as follows:

Screenshot 2021-05-24 at 09.08.12

const SINGLE_GATES = [Rx, Ry, Rz]
@const_gate ZZ = mat(kron(Z,Z))
rng = MersenneTwister()
randG = Random.Sampler(rng, SINGLE_GATES)

function circuit_layer(n::Int, β::Array{Float64, 1}, eps::Float64)
    U = chain(n)
    append!(U, chain(n, put(n, loc=>chain(rand(rng,randG)((2*rand())*eps*π))) for loc = 1:n))
    for j = 1:n-1
        append!(U, chain(n, put(n, (j,j+1)=>rot(ZZ, β[j]))))
    end
    return U
end

function cgrad_flayer(n::Int, β::Array{Float64, 1}, eps::Float64, θ::Array{Float64, 1}, pm::Int)
    U = chain(n)
    θ[1] = θ[1] + pm*(π/2)
    append!(U, chain(n, put(n, loc=>chain(rand(rng,randG)(θ[loc]))) for loc = 1:n))
    for j = 1:n-1
        append!(U, chain(n, put(n, (j,j+1)=>rot(ZZ, β[j]))))
    end
    return U
end

function get_gradients(pmax::Int, spacing::Float64, n::Int, eps::Float64, O::PutBlock)
    fp = trunc(log2(pmax)) 
    times = union(trunc.([2^i for i in 1:spacing:fp]))
    
    ψ1 = Yao.uniform_state(n)
    ψ2 = Yao.uniform_state(n)
    
    β = (π*0.5)*(2.0*rand(n-1) .- 1.0)
    θ = (2*π*eps)*rand(n)
    grads  = zeros(length(times))
    count = 0
    
    ψ1 |> cgrad_flayer(n, β, eps, θ, 1)
    ψ2 |> cgrad_flayer(n, β, eps, θ, -1)
    
    for t in 1:last(times)
        U = circuit_layer(n, β, eps)
        ψ1 |> U
        ψ2 |> U
        if (t in times) == true
            count = count + 1
            grads[count] = real(expect(O, ψ1)) - real(expect(O, ψ2))
        end
    end
    grads
end

Here I’m only computing the gradient with respect to the first parameter of the circuit. The code seems to work but it is way slower than the same code but for the entropy instead. I would really appreciate any advice/suggestion you can provide me.

1 Like

in the reverse mode differentiation, you can use NoParams to mark certain blocks that don’t contain parameters, but if you only have one parameter you should use the parameter shift rule, since it has lower complexity when the number of parameters is not huge, which is what you currently implements.

help?> dispatch!
search: dispatch! isdispatchtuple

  dispatch!(x::AbstractBlock, collection)

  Dispatch parameters in collection to block tree x.

  │ Note
  │
  │  it will try to dispatch the parameters in collection first.

  ──────────────────────────────────────────────────────────────────────────────────────

  dispatch!(x::AbstractBlock, collection)

  Dispatch parameters in collection to block tree x.

  │ Note
  │
  │  it will try to dispatch the parameters in collection first.

from my first glance, if you are using the same circuits, consider use dispatch! to dispatch new parameters in place instead of creating a new circuit every time inside the loop. But I’m not able to test your program since I don’t know what are the arguments here. My naive try of your implementation errors.

can you elaborate how to use your implementation and what the parameters mean? it would be nice if you can actually provide an executable MWE (minimum working example)

If you are computing the gradient of only one parameter, you can just use ForwardDiff:

julia> using ForwardDiff, Yao

julia> function f(x::T) where T
           reg = zero_state(Complex{T}, 3)
           c = put(3, 2=>Rx(x))
           real(expect(put(3, 2=>Z), reg |> c))
       end
f (generic function with 1 method)

julia> f(ForwardDiff.Dual(2.0, 1.0))
Dual{Nothing}(-0.4161468365471423,-0.9092974268256818)

# Or vectorized version
julia> function fv(x::AbstractVector{T}) where T
           reg = zero_state(Complex{T}, 3)
           c = put(3, 2=>Rx(x[1]))
           real(expect(put(3, 2=>Z), reg |> c))
       end
fv (generic function with 1 method)

julia> ForwardDiff.gradient(fv, [2.0])
1-element Vector{Float64}:
 -0.9092974268256818

Note: The overhead of ForwardDiff to linear to the input number of parameters. So for a couple of parameters (like <10), it should be faster than the builtin reverse mode AD.

First of all, thanks for your answer. So I tried to run the same code I uploaded and it works. I missed the imports this time but it is basically

using Yao, Random, YaoExtensions

With that, it should work ok. Now, the parameters in the function get_gradients are as follows:

  1. pmax = maximum circuit depth. Since I want the numerical points to be equally spaced in log-scale for the x-axis, I defined the “times” vector which depends on the spacing parameter. Take, for example pmax=2000 and spacing = 0.15.

  2. n=number of qubits.

  3. eps controls the strength of the single-qubit rotations => eps = 0.1

  4. O is the operator I use to compute the specific cost function. Take, for example, O=put(n, (1,2)=>ZZ)

I hope I could make things clearer now. Below I attach the working code so that it can be tested easiy

using Yao, Random, YaoExtensions

const SINGLE_GATES = [Rx, Ry, Rz]
@const_gate ZZ = mat(kron(Z,Z))
rng = MersenneTwister()
randG = Random.Sampler(rng, SINGLE_GATES)

function circuit_layer(n::Int, β::Array{Float64, 1}, eps::Float64)
    U = chain(n)
    append!(U, chain(n, put(n, loc=>chain(rand(rng,randG)((2*rand())*eps*π))) for loc = 1:n))
    for j = 1:n-1
        append!(U, chain(n, put(n, (j,j+1)=>rot(ZZ, β[j]))))
    end
    return U
end

function cgrad_flayer(n::Int, β::Array{Float64, 1}, eps::Float64, θ::Array{Float64, 1}, pm::Int)
    U = chain(n)
    θ[1] = θ[1] + pm*(π/2)
    append!(U, chain(n, put(n, loc=>chain(rand(rng,randG)(θ[loc]))) for loc = 1:n))
    for j = 1:n-1
        append!(U, chain(n, put(n, (j,j+1)=>rot(ZZ, β[j]))))
    end
    return U
end

function get_gradients(pmax::Int, spacing::Float64, n::Int, eps::Float64, O::PutBlock)
    fp = trunc(log2(pmax)) 
    times = union(trunc.([2^i for i in 1:spacing:fp]))
    
    ψ1 = Yao.uniform_state(n)
    ψ2 = Yao.uniform_state(n)
    
    β = (π*0.5)*(2.0*rand(n-1) .- 1.0)
    θ = (2*π*eps)*rand(n)
    grads  = zeros(length(times))
    count = 0
    
    ψ1 |> cgrad_flayer(n, β, eps, θ, 1)
    ψ2 |> cgrad_flayer(n, β, eps, θ, -1)
    
    for t in 1:last(times)
        U = circuit_layer(n, β, eps)
        ψ1 |> U
        ψ2 |> U
        if (t in times) == true
            count = count + 1
            grads[count] = real(expect(O, ψ1)) - real(expect(O, ψ2))
        end
    end
    grads
end

@time data = get_gradients(2000, 0.15, 10, 0.1, put(10, (1,2)=>ZZ))

I didn’t know about that, thanks! Maybe I got it wrong but how would this work in my case. I mean, let’s suppose I update the register in each iteration and I don’t keep track of the whole unitary circuit which is what I’m currently doing. How would I then compute the gradient using ForwardDiff if, let’s say, I’m interested in the first parameter of the circuit θ_1?

  1. I think there are not room for improvement if the overhead is only 2x.

  2. The forward diff implementation, the correctness of gradients is not checked.

using YaoExtensions: LinearAlgebra
using Yao, Random, YaoExtensions
using ForwardDiff: Dual
using LinearAlgebra

const SINGLE_GATES = [Rx, Ry, Rz]
@const_gate _ZZ = mat(kron(Z,Z))
rng = MersenneTwister()
randG = Random.Sampler(rng, SINGLE_GATES)

# an ugly patch
@inline function LinearAlgebra.__normalize!(v::AbstractVector, nrm)
    # The largest positive floating point number whose inverse is less than infinity
    δ = inv(prevfloat(typemax(nrm)))

    if nrm ≥ δ # Safe to multiply with inverse
        invnrm = inv(nrm)
        rmul!(v, invnrm)

    else # scale elements to avoid overflow
        εδ = eps(one(nrm))/δ
        rmul!(v, εδ)
        rmul!(v, inv(nrm*εδ))
    end

    v
end

function circuit_layer(n::Int, β::Vector, eps::Float64)
    U = chain(n)
    append!(U, chain(n, put(n, loc=>chain(rand(rng,randG)((2*rand())*eps*π))) for loc = 1:n))
    for j = 1:n-1
        append!(U, chain(n, put(n, (j,j+1)=>rot(_ZZ, β[j]))))
    end
    return U
end

function cgrad_flayer(n::Int, β::Vector, eps::Float64, θ::Array{Float64, 1})
    U = chain(n)
    append!(U, chain(n, put(n, loc=>chain(rand(rng,randG)(θ[loc]))) for loc = 1:n))
    for j = 1:n-1
        append!(U, chain(n, put(n, (j,j+1)=>rot(_ZZ, β[j]))))
    end
    return U
end

_onehot(n, i) = (x=zeros(n); x[i]=1; x)
function get_gradients(pmax::Int, spacing::Float64, n::Int, eps::Float64, O::PutBlock)
    fp = trunc(log2(pmax)) 
    times = union(trunc.([2^i for i in 1:spacing:fp]))
    
    β = Dual.((π*0.5)*(2.0*rand(n-1) .- 1.0), _onehot(n-1, 1))
    ψ1 = Yao.uniform_state(Complex{eltype(β)}, n)
    θ = (2*π*eps)*rand(n)
    grads  = zeros(length(times))
    count = 0
    
    ψ1 |> cgrad_flayer(n, β, eps, θ)
    
    for t in 1:last(times)
        U = circuit_layer(n, β, eps)
        ψ1 |> U
        if (t in times) == true
            count = count + 1
            grads[count] = real(expect(O, ψ1)).partials[1]
        end
    end
    grads
end

@time data = get_gradients(2000, 0.15, 10, 0.1, put(10, (1,2)=>_ZZ))

Hi,
Sorry for the late reply. That worked, thanks a lot!
It is true though that the difference in performance is not very noticeable, but it is always good to know :slight_smile: