I have a distribution with a somewhat wide range of parameters in a hierarchical model. Some range are in [0,1], while another is in [0,60] roughly, with most mass around 20-30.
a_0 ~ Beta(8,2)
b_0 ~ truncated(Normal(20, 10); lower=0, upper=60)
a ~ filldist(Beta(a_0*10,(1-a_0)*10),100)
b ~ filldist(truncated(Normal(b_0, 10); lower=0, upper=60),100)
I would manually reparametrize this to making sampling easier. However, according to this thread, NUTS seems to have a default initial number (n=1000?) of steps for calculating the mass matrix. Nevertheless, sampling is pretty slow and the ESS is quite low.
This is the line I’m using to get 100 samples from the posterior:
chain_inferred = sample(posterior, NUTS(),adtype=AutoReverseDiff(;compile=true), 100)
I also wanted to know how the n_steps passed to NUTS differs from num_warmup and discard_initial passed to sample directly? It seems like the default value for num_warmup is half of the number of samples, although I didn’t find this officially documented anywhere. I’m assuming that doing something like
chain_inferred = sample(posterior, NUTS(0,0.65),adtype=AutoReverseDiff(;compile=true), num_warmup=100,100)
means that NUTS does not have time for step size estimation/mass matrix calculation, but the sampler will nevertheless discard the 100 initial samples whereas
chain_inferred = sample(posterior, NUTS(100,0.65),adtype=AutoReverseDiff(;compile=true), num_warmup=0,100)
will effectively get the same number of total samples but allow NUTS to calculate mass matrix etc. during warmup.
Firstly, I’d appreciate it if someone could break down the differences between NUTS warmup, sample num_warmup, and discard_initial. Secondly, is manual reparametrization (e.g. confining all parameters to [0,1]) worth it?
Thank you!