Iβve been playing around with Gen.jl for a couple of weeks and one of the things I did was compare its performance to a hand-coded model in order to measure the overhead of the underlying trace data structure, the dynamic graph stuff and whatnot.

The example I chose to post here for no particular reason other than simplicity is a Random Walk MH algorithm where our target is a standard normal distribution and the proposal is a scaled uniform distribution around the current state: x_t ~ Uniform(x_{t-1} - d, x_{t-1} + d), with d = 0.25.

## no Gen

```
logpdf = x -> -.5 * x^2
@benchmark begin
nowAt = 0.1
M = 10
trace = Array{Float64, 1}(undef, M)
trace[1] = nowAt
for i = 2:M
nextMaybe = nowAt + (rand()-.5)/2
if logpdf(nextMaybe) - logpdf(nextMaybe) > log(rand())
nowAt = nextMaybe
end
trace[i] = nowAt
end
end
```

Benchmarking:

```
BenchmarkTools.Trial: 10000 samples with 63 evaluations.
Range (min β¦ max): 908.111 ns β¦ 86.429 ΞΌs β GC (min β¦ max): 0.00% β¦ 98.50%
Time (median): 977.913 ns β GC (median): 0.00%
Time (mean Β± Ο): 1.156 ΞΌs Β± 2.299 ΞΌs β GC (mean Β± Ο): 5.59% Β± 2.79%
β
βββ
ββββ β
ββ
β
ββββββ β β β
βββββββββββββββββββββββββββββββββββββββββββββ
ββββ
βββββββββββ β
908 ns Histogram: log(frequency) by time 2.01 ΞΌs <
Memory estimate: 1.00 KiB, allocs estimate: 55.
```

## yes Gen

```
@gen function normalModel()
x ~ normal(0,1)
end;
@gen function proposal(nowAt, d)
x ~ uniform(nowAt[:x] - d, nowAt[:x] + d)
end;
initTrace, _ = generate(normalModel, ());
@benchmark let nowAt = initTrace
M = 10
trace = Array{Float64, 1}(undef, M)
trace[1] = nowAt[:x]
for i = 2:M
nowAt, _ = mh(nowAt, proposal, (.25,))
trace[i] = nowAt[:x]
end
end
```

Benchmarking:

```
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 56.838 ΞΌs β¦ 7.063 ms β GC (min β¦ max): 0.00% β¦ 98.10%
Time (median): 59.847 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 72.540 ΞΌs Β± 186.232 ΞΌs β GC (mean Β± Ο): 7.61% Β± 2.95%
ββββ
βββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββ
βββ
β
β
ββ
β
ββ
β
β
56.8 ΞΌs Histogram: log(frequency) by time 132 ΞΌs <
Memory estimate: 75.45 KiB, allocs estimate: 993.
```

I get that Gen wasnβt made to be competitive speed wise neither to be used in such a simple algorithm, but is this much of a slowdown to be expected? I tried profiling but canβt properly interpret the results.

How is this overhead related to the complexity of the inference algorithm? Does it increase when using block updates of various kinds, in transdimensional algorithms, etc.?