StackOverflowError in Bayesian Neural Networks Tutorial

Hi there,

I am learning Bayesian Neural Networks (BNN) using Turing. I have copied the codes from the tutorial, https://turing.ml/dev/tutorials/3-bayesnn/.

The original code trains a BNN model with a synthetic dataset with 80 rows. The step “ch = sample(bayes_nn(hcat(xs…), ts), HMC(0.05, 4), N);” costs 0:02:03 on my machine. If I change the “N = 80” to N=800, it costs 0:03:25. Pretty fast! However, if I change it N=8000, it gives me the error “StackOverflowError”. I have copied some rows of the detailed error information at the bottom of this post.

I want to build a BNN model to predict Admission Yield, and the dataset has about 40,000 rows and 90 variables, so I need to learn how to train a BNN model with relatively large dataset. Would you please help me to solve the error? Please let me know if I need to provide any other information.

Thanks,
Chuan

StackOverflowError:
in top-level scope at Learn Turing_20200316.jl:105
in sample at Turing\azHIm\src\inference\Inference.jl:136
in #sample#1 at Turing\azHIm\src\inference\Inference.jl:136
in sample at Turing\azHIm\src\inference\Inference.jl:148
in #sample#2 at Turing\azHIm\src\inference\Inference.jl:149
in Sampler at Turing\azHIm\src\inference\hmc.jl:302
in DynamicPPL.Sampler at Turing\azHIm\src\inference\hmc.jl:310
in Turing.Inference.HMCState at Turing\azHIm\src\inference\hmc.jl:533
in #HMCState#52 at Turing\azHIm\src\inference\hmc.jl:562
in sample_init at AdvancedHMC\haUrH\src\sampler.jl:13
in phasepoint at AdvancedHMC\haUrH\src\hamiltonian.jl:129
in phasepoint at AdvancedHMC\haUrH\src\hamiltonian.jl:59
in ∂H∂θ at AdvancedHMC\haUrH\src\hamiltonian.jl:28
in ∂logπ∂θ at Turing\azHIm\src\inference\hmc.jl:401
in gradient_logp at Turing\azHIm\src\core\ad.jl:73
in gradient_logp_reverse at Turing\azHIm\src\core\ad.jl:141
in at Tracker\cpxco\src\back.jl:149
in #18 at Tracker\cpxco\src\back.jl:140
in back at Tracker\cpxco\src\back.jl:125
in back_ at Tracker\cpxco\src\back.jl:113
in foreach at base\abstractarray.jl:1921
in #16 at Tracker\cpxco\src\back.jl:113
in back at Tracker\cpxco\src\back.jl:125
in back_ at Tracker\cpxco\src\back.jl:113
in foreach at Tracker\cpxco\src\back.jl:113
in back at Tracker\cpxco\src\back.jl:125
in back_ at Tracker\cpxco\src\back.jl:113
in foreach at Tracker\cpxco\src\back.jl:113
in back at Tracker\cpxco\src\back.jl:125
in back_ at Tracker\cpxco\src\back.jl:113
in foreach at Tracker\cpxco\src\back.jl:113
in back at Tracker\cpxco\src\back.jl:125
in back_ at Tracker\cpxco\src\back.jl:113
in foreach at Tracker\cpxco\src\back.jl:113
in back at Tracker\cpxco\src\back.jl:125
in back_ at Tracker\cpxco\src\back.jl:113
in foreach at Tracker\cpxco\src\back.jl:113
in back at Tracker\cpxco\src\back.jl:125
in back_ at Tracker\cpxco\src\back.jl:113
in foreach at Tracker\cpxco\src\back.jl:113
in back at Tracker\cpxco\src\back.jl:125
in back_ at Tracker\cpxco\src\back.jl:113
in foreach at Tracker\cpxco\src\back.jl:113
in back at Tracker\cpxco\src\back.jl:125
in back_ at Tracker\cpxco\src\back.jl:113
in foreach at Tracker\cpxco\src\back.jl:113
in back at Tracker\cpxco\src\back.jl:125
in back_ at Tracker\cpxco\src\back.jl:113

1 Like

I have seen this error before and it seems to be a Tracker issue with large loops. Zygote doesn’t have this problem. If you go on Turing#master you can use Zygote for AD with:

using Zygote, Turing; Turing.setadbackend(:zygote)

However, Zygote will take a lot of memory when compiling the gradient the first time.

1 Like

Thanks, Mohamed. I have added Turing#master and am testing it with Turing.setadbackend(:zygote). It runs!

However, for 8000 rows, it is estimated to cost 10:39:00, which is much longer than 0:03:25 for N=800.

Thanks,
Chuan

Another question I want to ask is about ForwardDiff. If I use Turing.setadbackend(:forward_diff), the program runs fast with 8000 rows for 0:05:40. However, the acceptance rate is constantly 0 for the 5000 samples by HMC, and thus the std is just 0 for nn_params. It is not the case with 80 rows. With 80 rows, I see the std is larger than 0 for each nn_params in the chain ch from “ch = sample(bayes_nn(hcat(xs…), ts), HMC(0.05, 4), N);”.

With HMC, as you increase the number of data points, you need to lower the step size. Otherwise, just use NUTS.

1 Like

Thanks for your reply, Mohamed. I will lower the step size.