Why/where does this allocate?

EDIT: I get my 0 allocations for typlues (not helping with the main issue of the thread) with @generated in front of the function here getting me an NTuple! My first use of that obscure feature. But it still doesn’t help me enough, ok often, not here when needing a pointer to it for ccall

Are (tuples and) NTuple inherently slow, and allocate more?

julia> mytuple_200() = ([UInt8(0) for i = 1:200]...,)

julia> @time mytuple_200()  # not always 5 allocations for first...?!
  0.000057 seconds (5 allocations: 2.109 KiB)

next time around:

julia> @btime mytuple_200()
  8.320 μs (3 allocations: 464 bytes)

julia> @btime my_smallertuple_()
  229.241 ns (3 allocations: 80 bytes)

Always 3 allocations min. independent of size.

I was looking into SmallVector at:

and it’s based on NTuple, and my code on simplified ntuple, or Base._ntuple it calls.

I realize I can’t get a pointer to it so not a substitute Memory or (any) Vector type, for me. But at least I though I would get fewer allocations that way, then worry about if getting a pointer to it would be possible with a hack.

I was hoping this about global scope but no:

julia> function my_code()
         mytuple_200()
         mytuple_200()
         nothing
       end
my_code (generic function with 1 method)

julia> @btime my_code()
  16.614 μs (6 allocations: 928 bytes)

vs.

julia> @benchmark Memory{UInt8}(undef, 200)
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
 Range (min … max):  19.397 ns …  28.090 μs  ┊ GC (min … max):  0.00% … 99.75%
 Time  (median):     29.724 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   73.294 ns ± 537.958 ns  ┊ GC (mean ± σ):  27.37% ±  6.50%