These allocations should be a warning sign! There shouldnâ€™t be any. Remember to interpolate your variables when benchmarking. Hereâ€™s how to rewrite your test in a more efficient, allocation-free way (including the original code, for a self-containment):

```
create_new_vec(x) = begin
n = length(x)
new_vec = typeof(x)(undef, Int((n*n-n)/2) + n)
upto = 0
@inbounds for i=1:n
for j=i:n
upto += 1
new_vec[upto] = x[i]*x[j]
end
end
new_vec
end
struct MyVec{T} x::Vector{T} end
@inline function Base.getindex(v::MyVec, I)
(i,j) = ind(I, length(v.x))
v.x[i]*v.x[j]
end
@inline function ind(I, N)
b = -(2N+3)
c = 2(I+N)
delta = b^2-4*c
i = -0.5*(b + sqrt(delta)) |> floor |> Int
j = I - N*(i-1) + div((i-1)*i, 2)
(i,j)
end
```

Testing it:

```
julia> using BenchmarkTools
julia> x = rand(Float32, 1000); y1 = create_new_vec(x); y2 = MyVec(x);
julia> test(v) = @btime (s = 0f0; @inbounds for i=1:length($y1) s += $v[i] end; s);
julia> all(y1[i] == y2[i] for i in 1:length(y1))
true
julia> test(y1)
557.438 ÎĽs (0 allocations: 0 bytes)
127086.86f0
julia> test(y2)
2.883 ms (0 allocations: 0 bytes)
127086.86f0
```

So itâ€™s a lot faster overall, but the difference from the lookup table is now huge â€“ around 5 times slower in this test. One way to speed it up is to build an enormous hard-coded binary tree basically to avoid the costly square root. Bare with me, this is a major hack

```
function bin(a, b, n)
idx(k) = 1+(k-1)*(2n+2-k)Ă·2
if a == b
"($a,k-$(idx(a)-a))"
else
m = 1 + (a + b) Ă· 2
"k<$(idx(m)) ? $(bin(a, m-1, n)) : $(bin(m, b, n))"
end
end
eval(Meta.parse("ind_bin(k) = $(bin(1, 1000, 1000))"))
```

This creates a method which looks like this:

```
ind_bin(k) = k<375251 ? k<218876 ? k<117251 ? k<61048 ? k<31505 ?
k<15881 ? k<7973 ? k<3995 ? k<2000 ? k<1001 ? (1,k-0) : (2,k-999) :
k<2998 ? (3,k-1997) : 4,k-2994) : k<5986 ? k<4991 ... (27500 chars omitted)
```

Then add the boilerplate code:

```
struct MyVecBin{T} x::Vector{T} end
@inline function Base.getindex(v::MyVecBin, I)
(i,j) = ind_bin(I)
v.x[i]*v.x[j]
end
```

Testing it:

```
julia> y3 = MyVecBin(x);
julia> all(y1[i] == y3[i] for i in 1:length(y1))
true
julia> test(y3)
2.178 ms (0 allocations: 0 bytes)
127086.86f0
```

So, faster than square root, but still a lot slower than the lookup table. Iâ€™m sure you can do better. (Notably, if youâ€™re actually iterating over all values, as opposed to random access, the indexes can be computed easily.)