Turing: help with slow model

In my opinion, vectorisation can sometime even simplify the model formulation and make it easier to read.


So running that in VS Code (which I really like) gives an error

ERROR: LoadError: MethodError: no method matching sample(::DynamicPPL.Model{var"#8#9",(:sᵇ, :nᵇ, :sⁿ, :nⁿ, :T),(:T,),(),Tuple{Array{Int64,1},Int64,Array{Int64,1},Int64,Type{Float64}},Tuple{Type{Float64}}}, ::NUTS{Turing.Core.ReverseDiffAD{false},(),AdvancedHMC.DiagEuclideanMetric}, ::MCMCThreads, ::Int64, ::typeof(chains))

I tried doing it in the REPL and it doesn’t error, but it gives a warning that it’s not actually sampling in parallel.

Warning: Only a single thread available: MCMC chains are not sampled in parallel

which is consistent with what my activity monitor says.
Following the train of instructions makes it look like you have to start your Julia session specifying the number of threads. Although doing that with julia --threads 4
gives the error

zsh: command not found: julia

I’m not much of a command line ninja, but I think this means I need to set up julia environment variables.

Am I missing something here? Seems like it should be a bit easier than this?

This looks like you haven’t set an alias for julia. And your command line cannot find the command. Do you know where the Julia binary is located?

Evidently, years ago I setup environment variables and forgot about it. Once you set it up, you do not have to reset it. On Mac and Linux, there is a hidden file called .bashrc in your home directory. Add the following line to it:


Close VS Code and reopen it. Type the following to verify the number of threads:


Once you set that variable, it should always default to the specified number of threads.

I managed to get the alias working on MacOS by following the instructions here Platform Specific Instructions for Official Binaries.

Manually starting with julia --threads 8 then running the code works!

Now I feel better about spending that money on my 8 core iMac :slight_smile:

But then to get that working all the time, I edited my .basrc file to add the line export JULIA_NUM_THREADS=8. Then I quit the terminal and VS Code.

Running Threads.nthreads() in the REPL and in terminal in VS Code gives the answer 1 :frowning: Maybe I need a computer restart?
EDIT: No, that doesn’t seem to work.

1 Like

There’s a setting for the VS Code Julia extension. Type Cmd + , to enter the settings viewer, then look for

Extension > Julia > Num Threads

Thanks @CameronBieganek, that works just great for VS Code.

It doesn’t recognise the export JULIA_NUM_THREADS=8 in the .basrc file when manually running julia from the terminal. But I don’t do that very often, so it’s not a problem.

Thanks for all the suggestions!

Since you’re on Mac, I think you need to use a .bash_profile file instead of a .bashrc file.

Sorry that was a typo. I did actually edit .bash_profile. So bit of a head scratcher

Did you restart the terminal after that? Do you mean launching Julia from Terminal.app, or launching Julia from the integrated bash terminal in VS Code?

Yes, quit the Terminal app. Opened Terminal app, typed julia (my alias now works!) but Threads.nthreads() results in 1.

In terms of VS Code, I get 8 threads if I run some Julia code from the editor which invokes a julia session in the terminal.

But if I manually open a julia session from the terminal inside VS Code, then the same things happens as above from a Terminal… 1 thread.

Huh, that’s weird. I’m not sure why that would be.

Yeah, when I want to run Julia code in VS Code I either run the code from the editor with one of the keyboard shortcuts, or, more often, I first launch a Julia session by hitting Cmd + Shift + P and then running the Julia: Start REPL command (which does have a keyboard shortcut, but it’s more complicated).

1 Like

But it’s basically working in VS Code now using my regular workflow, so that’s great. Thanks for the tips :+1:

1 Like

From what I can tell you are not using bash but zsh. So you will need to edit the zsh profile file, not the one for bash. :wink:


In my defence I grew up on Matlab and I’m not a computer scientists :stuck_out_tongue:

FYI for anyone else…

In a terminal window, type open ~/.zshrc to open up your .zshrc file and then add the following line


Save, close. Quit and restart terminal, and you’re set.


Doing some final tests on this I’m finding the fully vectorized code to take ~60 seconds, but the partially vectorized code to take ~120 seconds. Weird.

I must have mixed up the code somehow. :roll_eyes: I can reproduce your findings on my computer. The fully vectorized model is about twice as fast. It would be nice if both versions performed similarly. Sometimes vectorized code is easier to read and write. In other cases, it makes the code awkward.

On the surface it looks like the model with the loop has fewer memory allocations, but this is not true in practice. It has about 3x more allocations. It must be related to the way that ReverseDiff computes the gradients under the hood.

1 Like

I also started using the MCMCThreads approach, and it seems to work. However, what I notice is that for a long time I will have no ETA and thus no idea how far along the sampling is, is this ‘expected’ behavior? Like suddenly it will jump to 50% I once noticed.

This could happen on the first execution when you start a new session. Unfortunately, compilation latency is long on the first execution. Do you still experience this issue on the second execution?

It seems quite consistent, unrelated to the first execution weirdly enough.
It really seems to only show progress whenever 1 entire thread is done, because now I saw it jump to 25% for example (and I have 4 threads that it is being sampled from currently).