# Creating a performant function using metaprogramming

I have a function that constructs a NamedTuple out of a vector of numbers subject to some transformation, something like

``````function vec2tuple(x)
( a = x, b = 1 + exp(x), c = 2 + exp(x), d = exp(x) / (1 + exp(x))
end
``````

This function does 2 things. It transforms individual elements of `x` an appropriate transformation and then assembles the results into a named tuple with keys `a`, `b`, `c`, and `d`.

`vec2tuple` runs very fast and doesn’t allocate. But the downside is code duplication, which gets tedious as the size of the tuple increases. Also, vec2tuple is kind of hard to read b/c it doesn’t make explicit the logic that informs the choice of the transformation. Ultimately, every `xi` in `x` will get transformed based on a known range of values each element of the NamedTuple can take. In the example above, a can take any value on the real line, b must be in (1,Inf), c must be in (2,Inf), and d must be in (0,1).

What I really want is to specify a set of rules in a legible way and then have these transformations be constructed accordingly. Here is an example for how to express these rules, but I’m not wedded to this.

``````bounds = (a = (-Inf,Inf), b = (1,Inf), c = (2,Inf), d = (0,1) )
``````

I first decided to tackle this using functional programming. I wrote a function that returns anonymous functions to perform the transformation for any kind of open interval:

``````function get_transform_function(arg)
lb, ub = arg
if lb>=ub
throw(ArgumentError("Upper bound must be strictly greater than lower bound"))
end
if lb==-Inf && ub==Inf
return x -> x
elseif lb!=Inf && ub==Inf
return x -> lb + exp(x)
elseif lb==-Inf && ub!=Inf
return x -> ub-exp(-x)
else
return x -> lb + (ub-lb)*exp(x)/(1+exp(x))
end
end
``````

I then constructed rules for each of a,b, c, and d:

``````rules = map(x -> get_transform_function(x), bounds)
``````

And wrote `transform` to call instead `vec2tuple`:

``````function transform(rules::NamedTuple{Names,<:Tuple}, vec::AbstractVector) where Names
N = length(rules)
return NamedTuple{Names}(ntuple(i -> rules[i](vec[i]), N))
end
``````

Problem is, `transform` is about 20x slower than `vec2tuple` because it allocates a lot.

It seems like the solution should be in meta-programming. Something as performant as `vec2tuple` should be generated once based on the information in `bounds` before being called many times.

I tried rewriting `get_transform_function` so that it takes an additional argument for the value to be transformed and returns expressions instead of anonymous functions, and then construct a transformation as follows. Note that I’m passing an expression as an argument to `get_transform_function` so it gets interpolated as `x[i]`, not any specific value.

``````names=keys(bounds)
args=[get_transform_function(:(x[\$i]), v) for (i,v) in enumerate(bounds)]
tup=:(\$(Expr(:tuple, arguments...)))
transform=:(NamedTuple{\$names}(  :(\$(Expr(:tuple, arguments...))) ))
``````

But `transform` is an expression that needs to be evaluated for a given value of `x` when `x` is in scope. This sounds like it should be a macro that returns a `vec2tuple(x)`-style function, but I can’t quite figure out how to write this.

And furthermore, what if I don’t know `bounds` at compile-time? I can still know it before I need to call `vec2tuple` a whole bunch of time, so I don’t mind paying a price to construction this function once during runtime, as long as it’s fast when I call it.

Sorry for the long post and a less-than-specific question, but any ideas are really welcome! Thanks.

If the vector can be a static vector, or of you pass the size of the vector, N, as a type parameter, that version should be fast. I would totally opt for that (the key is: N must be known at compile time, which means at the moment you call the function).

MWE:

``````julia> f(x,y,::Val{N}) where N = ntuple(i -> x[i] => y[i], N)
f (generic function with 1 method)

julia> x = [:a,:b]; y = [1,2]; N=2;

julia> @btime f(\$x,\$y,\$(Val(N)))
3.181 ns (0 allocations: 0 bytes)
(:a => 1, :b => 2)

julia> f(x,y,Val(N))
(:a => 1, :b => 2)

``````

or

``````julia> using StaticArrays

julia> f(x,y::SVector{N}) where N = ntuple(i -> x[i] => y[i], N)
f (generic function with 1 method)

julia> x = [:a,:b,:c]; y = SVector{3,Float64}(1,2,3);

julia> @btime f(\$x,\$y)
3.493 ns (0 allocations: 0 bytes)
(:a => 1.0, :b => 2.0, :c => 3.0)

``````

I get about the same performance for both functions:

``````julia> using BenchmarkTools

julia> @btime vec2tuple([1.0,2.0,3.0,4.0]);
61.417 ns (1 allocation: 112 bytes)

julia> @btime transform(\$rules, [1.0,2.0,3.0,4.0]);
64.351 ns (1 allocation: 112 bytes)
``````

(This is on Julia 1.6.2 on an M1 using rosetta.)

Edit: I also tried Julia 1.7.0-beta4 M1-native, which cut the times in half, but still no difference between the two functions.

This is what you need IMO:

``````julia> function transform(rules::NamedTuple, vec::AbstractVector, ::Val{N}) where N
return ntuple(i -> rules[i](vec[i]), N)
end
transform (generic function with 1 method)

julia> @btime transform(\$rules, \$[1.0,2.0,3.0,4.0], \$(Val(4)))
26.567 ns (0 allocations: 0 bytes)
(1.0, 8.38905609893065, 22.085536923187668, 0.9820137900379085)

``````

Sorry for the late response, guys. Crazy week and only now getting a chance to get back to this.

I can replicate your results for 4-element vectors. But something changes drastically between 10 and 11. Here’s a self-contained MWE:

``````function transform(rules::NamedTuple, vec::AbstractVector, ::Val{N}) where N
return ntuple(i -> rules[i](vec[i]), N)
end

keynames = [Symbol(Char(n+'0')) for n=49:(49+25)] # creates :a, :b, :c, ... :z

N = 10
x = rand(N)
rules = NamedTuple{tuple(keynames[1:N]...)}(tuple([exp for i in 1:N]...))
@btime transform(\$rules,\$x,\$(Val(N)))

N = 11
x = rand(N)
rules = NamedTuple{tuple(keynames[1:N]...)}(tuple([exp for i in 1:N]...))
@btime transform(\$rules,\$x,\$(Val(N)))
``````

Output is

``````  64.257 ns (0 allocations: 0 bytes)
566.486 ns (13 allocations: 448 bytes)
``````

Just to confirm the discontinuity at 11, I ran it for the whole alphabet:

``````times = Vector{Float64}(undef,26)
allocs = Vector{Float64}(undef,26)
memory = Vector{Float64}(undef,26)
for N=1:26
x = rand(N)
rules = NamedTuple{tuple(keynames[1:N]...)}(tuple([exp for i in 1:N]...))
bres = @benchmark transform(\$rules,\$x,\$(Val(N)))
times[N] = mean(bres.times)
allocs[N] = mean(bres.allocs)
memory[N] = mean(bres.memory)
end
``````

Allocations: Time: Note that the discontinuity isn’t just in levels but also in slope (flat when N<=10, increasing linearly when N>10). Same thing happens if I make `x` an SVector and then dispatch on `vec::SVector{N} where N`. Of course, this degradation does not happen with `vec2tuple`.

In my application, I don’t expect N to be arbitrarily large, but values in the range of 11-30 are possible, so it would be nice to avoid this degradation. Sorry about using N=4 in my original example. It was just an example, so I didn’t realize your suggestions would be optimized for N<=10.

Take a look at this thread: Efficient recursive tuple construction

it is the same problem. At some point you just have to use standard arrays.

Indeed. Just looked at the code for `ntuple()` and it’s just manually optimized for N=1,…10:
julia/ntuple.jl at master · JuliaLang/julia (github.com)

2 Likes

At least it is easy to optimize it to other sizes 1 Like

Actually, it looks like the manual optimization is just for the standard call `ntuple(f,N)`. But you can avoid the N=11 cliff by calling `ntuple(f,Val(N))` (just learned that you can do this). So basically just do propagate your suggested improvement for my function down to `ntuple`.

1 Like

That’s so funny. The cause of the performance dropoff is totally obvious if you take a second to look for it. I would never have known.