The summary from importance sampling is very different from NUTS, and just seems to summarize the prior distribution:
using Turing
@model function gdemo(x, y)
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
x ~ Normal(m, sqrt(s²))
return y ~ Normal(m, sqrt(s²))
end
chn = sample(gdemo(1.5, 2), NUTS(), 10_000, progress=false)
chn2 = sample(gdemo(1.5, 2), IS(), 10_000, progress=false)
julia> chn1 = sample(gdemo(1.5, 2), NUTS(), 10_000, progress=false)
┌ Info: Found initial step size
└ ϵ = 3.2
Chains MCMC chain (10000×14×1 Array{Float64, 3}):
Iterations = 1001:1:11000
Number of chains = 1
Samples per chain = 10000
Wall duration = 0.55 seconds
Compute duration = 0.55 seconds
parameters = s², m
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
s² 2.0417 2.0415 0.0323 5093.6770 5398.2521 1.0000 9227.6757
m 1.1658 0.8129 0.0116 5234.5603 5015.1256 1.0001 9482.8991
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
s² 0.5564 1.0201 1.4965 2.3454 6.5310
m -0.4584 0.6915 1.1598 1.6477 2.8359
julia> chn2 = sample(gdemo(1.5, 2), IS(), 10_000, progress=false)
Chains MCMC chain (10000×3×1 Array{Float64, 3}):
Log evidence = -3.716418865604326
Iterations = 1:1:10000
Number of chains = 1
Samples per chain = 10000
Wall duration = 0.64 seconds
Compute duration = 0.64 seconds
parameters = s², m
internals = lp
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
s² 2.9386 4.5679 0.0445 10261.6859 9964.4439 0.9999 15934.2949
m 0.0107 1.7156 0.0170 10143.6700 10127.8010 1.0001 15751.0404
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
s² 0.5431 1.1107 1.7913 3.1765 12.6045
m -3.3421 -0.9112 0.0133 0.9038 3.4850
I understand that sampling from the prior is a part of the importance sampling algorithm, but I would expect the weights to be used in the summary to get to the posterior distribution.