How to work with array of reals in HMC

Hi @Tamas_Papp, I’m sorry if this is a simple question.

I am trying to use your DynamicHMC package and sample from R^n . I have set up my code similarly to: https://github.com/tpapp/DynamicHMCExamples.jl/blob/master/src/example_logistic_regression.jl .

However, upon inspection of the type of \beta after line 23 I get something like:
Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##34#35")){TransformedLogDensity{TransformVariables.TransformTuple{NamedTuple{(:h,),Tuple{TransformVariables.ArrayTransform{TransformVariables.Identity,1}}}},NustarProblem}},Float64},Float64,10},1}

How can I make it so I am working with an array of reals after this point? I keep coming across no method matching errors such as:

Float64(::ForwardDiff.Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##34#35")){TransformedLogDensity{TransformVariables.TransformTuple{NamedTuple{(:h,),Tuple{TransformVariables.ArrayTransform{TransformVariables.Identity,1}}}},NustarProblem}},Float64},Float64,10})

EDIT (include code):

using TransformVariables, LogDensityProblems, DynamicHMC,
    DynamicHMC.Diagnostics, Parameters, Statistics, Random,
    Distributions

const N_SOURCES_TRUTH = 3
const NUSTAR_IMAGE_LENGTH = 64
const PSF_IMAGE_LENGTH = 1300

# In radians/pixel
const NUSTAR_PIXEL_SIZE =  5.5450564776903175e-05
const PSF_PIXEL_SIZE = 2.9793119397393605e-06

const XY_MIN, XY_MAX, = -1.1 * PSF_IMAGE_LENGTH/2.0 * PSF_PIXEL_SIZE, 1.1 * PSF_IMAGE_LENGTH/2.0 * PSF_PIXEL_SIZE

const P_SOURCE_XY = Uniform(XY_MIN, XY_MAX)
const P_SOURCE_B = Uniform(exp(0), exp(7))

struct NustarProblem
    observed_image::Array{Int64,2}
end


function apply_psf_transformation(x, y, b)
    x_loc_pixels, y_loc_pixels = -x/NUSTAR_PIXEL_SIZE, -y/NUSTAR_PIXEL_SIZE
    psf_half_length = NUSTAR_IMAGE_LENGTH/2
    function power_law(row_col)
        i, j = row_col
        distance = sqrt(
            ((psf_half_length - i) - y_loc_pixels)^2 +
            ((psf_half_length - j) - x_loc_pixels)^2
        )
        return 1.0/(1 + 0.1 * distance^4)
    end
    psf = map(power_law, ((i, j) for i in 1:NUSTAR_IMAGE_LENGTH, j in 1:NUSTAR_IMAGE_LENGTH))
    return psf * 1.0/sum(psf) * exp(b) # normalize and scale by b
end


function compose_mean_image(sources)
    return sum(
        apply_psf_transformation(source[1], source[2], source[3])
        for source in sources
    )
end


function log_prior(θ)
    # all 3 are uniform distributions over finite range
    return sum(
        log(pdf(P_SOURCE_XY, source[1])) +
        log(pdf(P_SOURCE_XY, source[2])) +
        log(pdf(P_SOURCE_B, exp(source[3])))
        for source in θ
    )
end


function log_likelihood(θ, observed_image)
    model_rate_image = compose_mean_image(θ)
    lg_likelihood = sum(
        logpdf(Poisson(model_rate_image[i]), observed_image[i])
        for i in 1:length(observed_image)
    )
    return lg_likelihood
end


function (problem::NustarProblem)(θ)
    @unpack h = θ
    head = [(h[i], h[i+1], h[i+2]) for i in 1:3:length(h)]
    return log_likelihood(head, problem.observed_image) + log_prior(head)
end


function random_sources(n_sources)
    sources_x = rand(P_SOURCE_XY, n_sources)
    sources_y = rand(P_SOURCE_XY, n_sources)
    sources_b = rand(P_SOURCE_B, n_sources)
    return [(sources_x[i], sources_y[i], log(sources_b[i])) for i in 1:n_sources]
end


sources_truth = random_sources(N_SOURCES_TRUTH)

mean_image =  compose_mean_image(sources_truth)
observed_image = [rand(Poisson(λ)) for λ in mean_image]

θ_init = random_sources(N_SOURCES_TRUTH)
q_init = vcat([[t[1], t[2], t[3]] for t in θ_init]...)
q_transformed = (h = q_init,)

p = NustarProblem(observed_image)
println("logp init: ", p(q_transformed))

t = as((h = as(Array, 3 * N_SOURCES_TRUTH), ))
Pr = TransformedLogDensity(t, p)

grad_P = ADgradient(:ForwardDiff, Pr)

results = mcmc_with_warmup(Random.GLOBAL_RNG, grad_P, 1000; initialization = (q = q_init, ),)

The model is basically a poisson mixture model where each distribution has 3 parameters. I have written my own metropolis hastings sampler and had chosen to write all the code with each sample defined as an array of tuples of size 3. Perhaps there is a way to write the transformation this way, but to make things simple I was hoping to work with just an array of reals and explicitly construct the array of tuples from this with the line head = [(h[i], h[i+1], h[i+2]) for i in 1:3:length(h)]. I was expecting typeof(h) to be Array{Float64,1} but I am not understanding how this works.

EDIT 2: Code changed to include fixes. Fix involved rewriting log likelihood in functional way. Code still fails to find initial optimum.

Please post the code you’ve tried.

Based on the error, I think your transformation should be something like:

tr = as((h = as_real,))

Sorry, should have included this earlier. With that transformation however, I would only be working with one number.

I can’t replicate the error because your code is not self-contained, but my guess is that you are using pre-allocated arrays which cannot accommodate dual types of ForwardDiff.

The solution is to use a functional approach (do not preallocate, just operations, map, comprehensions, etc).

Hi Tamas,

Thanks for the reply. I’ve edited the above to make the code entirely self contained; it’s just ~100 lines so I didn’t want to fully include initially.

You were exactly right, that was the issue! After changing the likelihood function to the above it appears to work; however, the initial position is outside of the support of the prior distribution, and so the initial likelihood is non-finite. I suppose I could fix this by providing an initial position, but I cannot figure out how. I’ve tried by removing the commented section of the last line above:

results = mcmc_with_warmup(Random.GLOBAL_RNG, grad_P, 1000; initialization = (q = q_init, ),)

I think I’m probably doing this incorrectly?

Thanks again.

EDIT: I realized that the initial position is to be untransformed! Sorry. Things are working now. However, it fails to find an initial optimum in the maximum number of iterations. Do you have any advice here? In the past I had asked a similar question but had been using outside image operations so I struggled to provide a completely self-contained MWE. However, the code above has removed those.

I am not familiar with the kind of model you are working with, but there seems to be a large variation in the log density around q_init so I get

julia> results = mcmc_with_warmup(Random.GLOBAL_RNG, grad_P, 1000;
       initialization = (q = q_init, ))
[ Info: finding initial optimum
ERROR: Reached maximum number of iterations while bisecting interval for ϵ.

If the likelihood is well-behaved, then a q_init should not be necessary either. The fact that without it, you get

ERROR: ArgumentError: Value and slope at step length = 0 must be finite.
Stacktrace:
 [1] (::LineSearches.HagerZhang{Float64,Base.RefValue{Bool}})(::Function, ::LineSearches.var"#ϕdϕ#6"{Optim.ManifoldObjective{NLSolversBase.OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}},Array{Float64,1},Array{Float64,1},Array{Float64,1}}, ::Float64, ::Float64, ::Float64) at /home/tamas/.julia/packages/LineSearches/WrsMD/src/

suggests the same.

You can of course disable both stepsize search and local optimization, and go with

results = mcmc_with_warmup(Random.GLOBAL_RNG, grad_P, 1000;
                           initialization = (ϵ = 0.1, q = q_init),
                           warmup_stages = default_warmup_stages(; local_optimization = nothing,
                                                                 stepsize_search = nothing))

but then the you get adapted to a very, very small stepsize. And if you look at the tree statistics,

julia> summarize_tree_statistics(results.tree_statistics)
Hamiltonian Monte Carlo sample of length 1000
  acceptance rate mean: 0.81, 5/25/50/75/95%: 0.0 0.8 0.9 0.94 0.96
  termination: divergence => 100%, max_depth => 0%, turning => 0%
  depth: 0 => 6%, 1 => 11%, 2 => 19%, 3 => 32%, 4 => 29%, 5 => 2%

you will see that everything ended with divergence, which again suggests the same problem: either a miscoded posterior, or one with very high or variable curvature.

I would recommend checking the code, simplifying the problem, or if all else fails, mitigating the above.

Thank you Tamas. The likelihood surface definitely has very steep variations.

I have tried drastically smoothing the likelihood surface, as well as initializing with the true parameters and I’m still left with a very small stepsize that has 100% divergence.

I am familiar with how HMC works, but what exactly does the divergence stat mean? Does this mean that the step size was too large so the final position of a proposal diverged from the minima?

It may be that this type of model does not work well with HMC, but I will keep trying.

HMC takes small steps along a differential equation that has a conservation property, but this holds imperfectly because of the approximation with steps.

Divergence means that this approximation is very bad and this conservation property was violated. If you are interested, I would recommend

Numerical problems like this usually indicate a problem with the model fitting the data, and/or the implementation. I would recommend learning HMC gradually, building up the model from something very, very simple, in an iterative way.