NUTS mass matrix estimation and num_warmup

I have a distribution with a somewhat wide range of parameters in a hierarchical model. Some range are in [0,1], while another is in [0,60] roughly, with most mass around 20-30.

a_0 ~ Beta(8,2)
b_0 ~ truncated(Normal(20, 10); lower=0, upper=60)
a ~ filldist(Beta(a_0*10,(1-a_0)*10),100)
b ~ filldist(truncated(Normal(b_0, 10); lower=0, upper=60),100)

I would manually reparametrize this to making sampling easier. However, according to this thread, NUTS seems to have a default initial number (n=1000?) of steps for calculating the mass matrix. Nevertheless, sampling is pretty slow and the ESS is quite low.

This is the line I’m using to get 100 samples from the posterior:
chain_inferred = sample(posterior, NUTS(),adtype=AutoReverseDiff(;compile=true), 100)

I also wanted to know how the n_steps passed to NUTS differs from num_warmup and discard_initial passed to sample directly? It seems like the default value for num_warmup is half of the number of samples, although I didn’t find this officially documented anywhere. I’m assuming that doing something like

chain_inferred = sample(posterior, NUTS(0,0.65),adtype=AutoReverseDiff(;compile=true), num_warmup=100,100)

means that NUTS does not have time for step size estimation/mass matrix calculation, but the sampler will nevertheless discard the 100 initial samples whereas

chain_inferred = sample(posterior, NUTS(100,0.65),adtype=AutoReverseDiff(;compile=true), num_warmup=0,100)

will effectively get the same number of total samples but allow NUTS to calculate mass matrix etc. during warmup.

Firstly, I’d appreciate it if someone could break down the differences between NUTS warmup, sample num_warmup, and discard_initial. Secondly, is manual reparametrization (e.g. confining all parameters to [0,1]) worth it?

Thank you!

Hi @contejuz unfortunately the NUTS keyword argument interface is a mess.

Here is a low down (as of Turing v0.41):

  • Don’t use num_warmup=100. It will make NUTS do extra steps, but it doesn’t actually perform adaptation. Just leave num_warmup at 0, or omit it (which is the same). Yeahhh, don’t use the officially documented keywords.
  • Instead, use the keyword argument nadapts=100. (You can also put this inside the NUTS constructor, which has the same effect.)
  • You can use discard_initial=100 to discard the adaptation steps.

BTW, due to a bug, Turing NUTS didn’t actually perform mass matrix adaptation until fairly recently; you’d need to use 0.41.3 to have that work (released last week).

Fixing this is on the todo list Use `step_warmup` for AdaptiveHamiltonian rather than `nadapts` · Issue #2678 · TuringLang/Turing.jl · GitHub but I don’t know when we will be able to get to it.

Wow, thank you! What a coincidence.

It looks like NUTS automatically adds half the number of samples as adaptation steps. For example, if I ask for 1000 samples without any other kwargs, it does 1500 samples and discards the first 500. Does this mean that both nadapts and discard_initial are set to half the number of samples by default?

Also, I’m trying to update to Turing@0.41.3 and it tells me that I’m restricted to 0.39.6. Same issue when I directly update:

(turing) pkg> add Turing@0.41.3
   Resolving package versions...
ERROR: Unsatisfiable requirements detected for package Turing [fce5fe82]:
 Turing [fce5fe82] log:
 ├─possible versions are: 0.5.0-0.41.3 or uninstalled
 ├─restricted to versions 0.41.3 by an explicit requirement, leaving only versions: 0.41.3
 └─restricted by julia compatibility requirements to versions: 0.5.0-0.39.6 or uninstalled — no versions left

I’m using a dedicated environment for testing with the following packages:

[deps]
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
GLMakie = "e9467ef8-e4e7-5192-8a1a-b1aee30e663a"
ReverseDiff = "37e2e3b7-166d-5795-8a7a-e32c996b4267"
Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"

I’m running Julia 1.10.5. I assume I need to update that as well? I’d rather not because I’m worried it might break other things.

Edit: fixed num_adapts to nadapts to avoid future confusion

Okay, I updated my Julia version to 1.12.2, ran ] update and it updated Turing to 0.41.3. Perhaps it’s worth changing the readme to reflect that the newest version is not compatible with Julia 1.10.5 and below? I didn’t any Julia releases in between, but I can if it’d be helpful.

Does this mean that both num_adapts and discard_initial are set to half the number of samples by default?

The default nadapts is N/2 or 1000, whichever is smaller. And yup, the default discard_initial is nadapts.

Sure, I can update the readme! The min version is 1.10.8 so if you want to stay on LTS that’s fine too. IIRC, there were base Julia bugs that caused builds to fail on < 1.10.8. I think Turing itself is fine, but multiple subdeps don’t work correctly.

Perfect, thanks so much for your help! Truly a lifesaver. I’ll use 1.10.8.

1 Like

Just to be sure, the current LTS is 1.10.10, so there isn’t really a reason to use 1.10.8, and sticking with the latest LTS minimises issues like the one you’ve encountered here.

If you’re using juliaup, you can just to juliaup add lts and then juliaup up will get you the latest LTS whenever there’s a new release.

2 Likes

Yes, I forgot that I’d manually installed 1.10.8 so I just added the lts channel. Thanks for the tip!

1 Like