Subset tuples with macros

The problem, I have some data stored in tuples and want to access it in a type stable way:

x = ntuple(i -> UInt8(i), 80)
f(y) = y[1:4]
@code_warntype f(x)

Returns an NTuple of undefined length.

f(y) = (y[1], y[2], y[3], y[4])

would be type stable. But tedious to type if the resulting tuples are long.
Is it possible to create a macro (without using eval) to save some code?

E.g. something like:

julia> @ntuple(y, 1:4)

should expand to

:((y[1], y[2], y[3], y[4]))

Hi; I’m a Julia beginner myself, so this is also more like a question if this macro is any good…?

macro ntplaccess(sym, idx)
    elements = []
    for i in (@eval $idx)
        push!(elements, :($sym[$i]))
    end

    return :( ($(elements...),) )
end

f(y) = @ntplaccess y 1:4

Oh and @code_warntype gives me

Body::NTuple{4,Any}
12 1 ─ %1 = (Base.getindex)(Main.y, 1)::Any                                                                                                                                                                             │
   │   %2 = (Base.getindex)(Main.y, 2)::Any                                                                                                                                                                             │
   │   %3 = (Base.getindex)(Main.y, 3)::Any                                                                                                                                                                             │
   │   %4 = (Base.getindex)(Main.y, 4)::Any                                                                                                                                                                             │
   │   %5 = (Core.tuple)(%1, %2, %3, %4)::NTuple{4,Any}                                                                                                                                                                 │
   └──      return %5  

but actually everything in red which is blue here. Why is it Any here instead of UInt8 when explicitly typing out the tuple?

If you are always using a simple range (like 1:4, or even 11:14), why not use the ntuple method again, like this ?

f(y) = ntuple(i->y[i], 4)
julia> @code_warntype f(x)
Body::NTuple{4,UInt8}
1 ─ %1 = (Base.getfield)(y, 1, true)::UInt8
│   %2 = (Base.getfield)(y, 2, true)::UInt8
│   %3 = (Base.getfield)(y, 3, true)::UInt8
│   %4 = (Base.getfield)(y, 4, true)::UInt8
│   %5 = (Core.tuple)(%1, %2, %3, %4)::NTuple{4,UInt8}
└──      return %5

And if you want this to work for arbitrary iterable sequences of indices (not just ranges like i:j), then I don’t know how a macro could evaluate this sequence (assuming it is const) to generate the code, without resorting to using eval or the like.

1 Like

Also, consider proper boundscheck elision.

Base.@propagate_inbounds function load_ntup(x, offset, ::Val{N}) where N
@boundscheck checkbounds(x, offset:offset+N)
return ntuple(i -> (@inbounds x[i+offset]), N)
end

That can become a single vector-load instruction (also for extracting a tuple from an Array). Boundschecks inside the loop prevent simd, and hence it is better to do them outside.

1 Like

Two things:

  • Use esc(sym). You can see in your code_warntype that it’s not using the y that’s the argument to f but instead using Main.y. This is macro hygiene.
  • Don’t @eval idx. Instead, manually dig into the AST of 1:4 and grab out the endpoints manually. Of course, you’ll want to add error handling to ensure callers don’t give you want you don’t want. The Meta module can be helpful as you explore what the AST looks like:
julia> idx = :(1:4)
:(1:4)

julia> Meta.show_sexpr(idx)
(:call, :(:), 1, 4)
julia> idx.head
:call

julia> idx.args
3-element Array{Any,1}:
  :(:)
 1
 4

Thanks everyone! Playing around with the answers, I tried:

f1(x) = (x[4], x[5], x[6], x[7])
f2(x) = ntuple(i -> x[i + 3], 4)
Base.@propagate_inbounds function f3(x, offset, ::Val{N}) where N
    # this line causes a runtime error
    # @boundscheck checkbounds(x, offset:offset+N)
    return ntuple(i -> (@inbounds x[i+offset - 1]), N)
end
f4(x, from, to) = ntuple(i -> x[i + from - 1], to - from + 1)
f5(x, from, to) = ntuple(i -> (@inbounds x[i + from - 1]), to - from + 1)
f6(x, ::Val{from}, ::Val{to}) where from where to = ntuple(i -> x[i + from - 1], to - from + 1)

macro ntplaccess(sym, idx)
    elements = []
    for i in (idx.args[2]:idx.args[3])
        push!(elements, :($(esc(sym))[$i]))
    end

    return :( ($(elements...),) )
end
@ntplaccess(x, 1:4)
f7(x) = @ntplaccess x 4:7

where f1, f2, f6, and f7 give the best @code_warntype:

Body::NTuple{4,UInt8}
1 ─ %1 = (Base.getfield)(x, 4, true)::UInt8
│   %2 = (Base.getfield)(x, 5, true)::UInt8
│   %3 = (Base.getfield)(x, 6, true)::UInt8
│   %4 = (Base.getfield)(x, 7, true)::UInt8
│   %5 = (Core.tuple)(%1, %2, %3, %4)::NTuple{4,UInt8}
└──      return %5

Speed-wise they are all more or less the same with 0.017ns. I like the ones that use value types the most.

I like the macro version. Why? Because you can also check (in the macro) which type of indexing is used. That is, one could call the macro with a single index, range indexing, an array of indices, …

Then I was thinking; why specify the symbol and the indexing as two arguments? There’s the @view macro, which you can also use as @view x[1:3]. I didn’t look up how it is implemented but wanted to come up with my own solution (as an exercise), and it works!

macro ntplaccess(expr)

    sym = expr.args[1] # the symbol
    idx = expr.args[2] # the index, or the indexing expression

    if typeof(idx) <: Int
        # single index
        return :($(esc(sym))[$idx])

    else
        # range, array, tuple
        elements = []

        if idx.head == :call && idx.args[1] == :(:)
            # range
            for i in (idx.args[2]:idx.args[3])
                push!(elements, :($(esc(sym))[$i]))
            end

        elseif idx.head == :tuple || idx.head == :vect
            # tuple or array
            for i in idx.args
                push!(elements, :($(esc(sym))[$i]))
            end
        end

        return :( ($(elements...),) )
    end

end

# can do...
f(y) = @ntplaccess y[1]
f(y) = @ntplaccess y[[3,5]]
f(y) = @ntplaccess y[(1,4,7)]
f(y) = @ntplaccess y[1:4]
1 Like

I would not worry too much about enclosing (from, to) indices in value types, since I would expect that these constructs would always (or at least most of the time) be used in larger functions, with possibly variable data, but always the same indices.

Say you define f4 in the same way as above:

f4(x, from, to) = ntuple(i -> x[i + from - 1], to - from + 1)

Of course a top-level call is not type stable:

julia> @code_warntype f4(x, 1, 4)
Body::Tuple{Vararg{UInt8,N} where N}
[...]

But as soon as you enclose it in a larger function, if indices arguments are constant (as is the case above and in all examples in this thread so far), then constant propagation simplifies everything, and you’re type-stable again:

julia> big_function(x) = f4(x, 1, 4);
julia> @code_warntype big_function(x)
Body::NTuple{4,UInt8}
[...]

From there, regular compiler optimizations further simplify the code and essentially remove all forms of integer operations for indices so that what you get in the end mostly performs as good as the by-hand tuple construction:

julia> @btime big_function($x)
  0.024 ns (0 allocations: 0 bytes)
(0x01, 0x02, 0x03, 0x04)

julia> by_hand(x) = (x[1], x[2], x[3], x[4]);
julia> @btime by_hand($x)
  0.025 ns (0 allocations: 0 bytes)
(0x01, 0x02, 0x03, 0x04)

I tend to like macros too. Still, this is probably not what I would do here: the polymorphism you introduce in your macro is essentially a manual implementation of some sort of dispatching depending on the index type.

Therefore, I would probably define a function to perform what I need, with multiple methods depending on the index type. For example, this is what a minimal implementation for unit ranges could look like:

function ntpl(x, idx::UnitRange)
    from = idx.start
    to = idx.stop
    ntuple(i->x[i+from-1], to - from + 1)
end
julia> f(x) = ntpl(x, 2:4)
f (generic function with 1 method)

julia> @code_warntype f(x)
Body::Tuple{UInt8,UInt8,UInt8}
[...]

julia> @btime f($x)
  0.024 ns (0 allocations: 0 bytes)
(0x02, 0x03, 0x04)

For tuples of indices, the implementation could be:

function ntpl(x, idx::Tuple)
    ntuple(i->x[idx[i]], length(idx))
end
julia> g(x) = ntpl(x, (2,3,4))
g (generic function with 1 method)

julia> @code_warntype g(x)
Body::Tuple{UInt8,UInt8,UInt8}
[...]

julia> @btime g($x)
  0.024 ns (0 allocations: 0 bytes)
(0x02, 0x03, 0x04)

And a new implementation could be added for each useful type of indices set (StepRanges and so on).


If you want to have a bracket-like syntax (without overloading the standard Base.getindex) you can add then implement only the syntax-to-syntax transformation in a macro:

macro ntpl(expr)
    sym = expr.args[1]
    idx = expr.args[2]
    quote
        ntpl($(esc(sym)), $(esc(idx)))
    end
end
julia> @macroexpand @ntpl x[2:4]
quote
    #= REPL[25]:5 =#
    (Main.ntpl)(x, 2:4)
end

julia> h(x) = @ntpl x[2:4]
h (generic function with 1 method)

julia> @code_warntype h(x)
Body::Tuple{UInt8,UInt8,UInt8}
[...]

julia> @btime h($x)
  0.024 ns (0 allocations: 0 bytes)
(0x02, 0x03, 0x04)

This has the advantage (IMO) that specific implementations of the NTuple extraction can be added for many types of indices, without having to change the macro itself.

1 Like

FWIW, there’s a similar utility in Unrolled.jl:

julia> using Unrolled
[ Info: Precompiling Unrolled [9602ed7d-8fef-5bc8-8597-8f21381861e8]

julia> @code_warntype (1,2,3)[@fixed_range 2:3]
Body::Tuple{Int64,Int64}
1 ─ %1 = (Base.getfield)(seq, 2, true)::Int64
│   %2 = (Base.getfield)(seq, 3, true)::Int64
│   %3 = (Unrolled.tuple)(%1, %2)::Tuple{Int64,Int64}
└──      return %3

help?> @fixed_range
  @fixed_range 3:10 behaves like the standard range 3:10, but is stored within
  the type system, so that some_tuple[@fixed_range 3:10] is type-stable. Also
  supports some_tuple[@fixed_range 3:end-5] 

Note that timings this low are benchmarking artifacts. It occurs when the expressions are replaced with constants during compilation, and you end up benchmarking nothing at all. You can tell that this is the case, since one CPU clock cycle is ~0.3 ns on a 3 GHz computer, so 0.017 ns would be a tiny fraction of a clock cycle, which is not realistic.

3 Likes