Subset tuples with macros

gdkrmr · February 15, 2019, 7:17pm

The problem, I have some data stored in tuples and want to access it in a type stable way:

x = ntuple(i -> UInt8(i), 80)
f(y) = y[1:4]
@code_warntype f(x)

Returns an NTuple of undefined length.

f(y) = (y[1], y[2], y[3], y[4])

would be type stable. But tedious to type if the resulting tuples are long.
Is it possible to create a macro (without using eval) to save some code?

E.g. something like:

julia> @ntuple(y, 1:4)

should expand to

:((y[1], y[2], y[3], y[4]))

asprionj · February 15, 2019, 10:04pm

Hi; I’m a Julia beginner myself, so this is also more like a question if this macro is any good…?

macro ntplaccess(sym, idx)
    elements = []
    for i in (@eval $idx)
        push!(elements, :($sym[$i]))
    end

    return :( ($(elements...),) )
end

f(y) = @ntplaccess y 1:4

Oh and @code_warntype gives me

Body::NTuple{4,Any}
12 1 ─ %1 = (Base.getindex)(Main.y, 1)::Any                                                                                                                                                                             │
   │   %2 = (Base.getindex)(Main.y, 2)::Any                                                                                                                                                                             │
   │   %3 = (Base.getindex)(Main.y, 3)::Any                                                                                                                                                                             │
   │   %4 = (Base.getindex)(Main.y, 4)::Any                                                                                                                                                                             │
   │   %5 = (Core.tuple)(%1, %2, %3, %4)::NTuple{4,Any}                                                                                                                                                                 │
   └──      return %5

but actually everything in red which is blue here. Why is it Any here instead of UInt8 when explicitly typing out the tuple?

ffevotte · February 15, 2019, 10:09pm

If you are always using a simple range (like 1:4, or even 11:14), why not use the ntuple method again, like this ?

f(y) = ntuple(i->y[i], 4)

julia> @code_warntype f(x)
Body::NTuple{4,UInt8}
1 ─ %1 = (Base.getfield)(y, 1, true)::UInt8
│   %2 = (Base.getfield)(y, 2, true)::UInt8
│   %3 = (Base.getfield)(y, 3, true)::UInt8
│   %4 = (Base.getfield)(y, 4, true)::UInt8
│   %5 = (Core.tuple)(%1, %2, %3, %4)::NTuple{4,UInt8}
└──      return %5

And if you want this to work for arbitrary iterable sequences of indices (not just ranges like i:j), then I don’t know how a macro could evaluate this sequence (assuming it is const) to generate the code, without resorting to using eval or the like.

foobar_lv2 · February 15, 2019, 10:55pm

Also, consider proper boundscheck elision.

Base.@propagate_inbounds function load_ntup(x, offset, ::Val{N}) where N
@boundscheck checkbounds(x, offset:offset+N)
return ntuple(i -> (@inbounds x[i+offset]), N)
end

That can become a single vector-load instruction (also for extracting a tuple from an Array). Boundschecks inside the loop prevent simd, and hence it is better to do them outside.

mbauman · February 16, 2019, 12:00am

Two things:

Use esc(sym). You can see in your code_warntype that it’s not using the y that’s the argument to f but instead using Main.y. This is macro hygiene.
Don’t @eval idx. Instead, manually dig into the AST of 1:4 and grab out the endpoints manually. Of course, you’ll want to add error handling to ensure callers don’t give you want you don’t want. The Meta module can be helpful as you explore what the AST looks like:

julia> idx = :(1:4)
:(1:4)

julia> Meta.show_sexpr(idx)
(:call, :(:), 1, 4)
julia> idx.head
:call

julia> idx.args
3-element Array{Any,1}:
  :(:)
 1
 4

gdkrmr · February 16, 2019, 1:11pm

Thanks everyone! Playing around with the answers, I tried:

f1(x) = (x[4], x[5], x[6], x[7])
f2(x) = ntuple(i -> x[i + 3], 4)
Base.@propagate_inbounds function f3(x, offset, ::Val{N}) where N
    # this line causes a runtime error
    # @boundscheck checkbounds(x, offset:offset+N)
    return ntuple(i -> (@inbounds x[i+offset - 1]), N)
end
f4(x, from, to) = ntuple(i -> x[i + from - 1], to - from + 1)
f5(x, from, to) = ntuple(i -> (@inbounds x[i + from - 1]), to - from + 1)
f6(x, ::Val{from}, ::Val{to}) where from where to = ntuple(i -> x[i + from - 1], to - from + 1)

macro ntplaccess(sym, idx)
    elements = []
    for i in (idx.args[2]:idx.args[3])
        push!(elements, :($(esc(sym))[$i]))
    end

    return :( ($(elements...),) )
end
@ntplaccess(x, 1:4)
f7(x) = @ntplaccess x 4:7

where f1, f2, f6, and f7 give the best @code_warntype:

Body::NTuple{4,UInt8}
1 ─ %1 = (Base.getfield)(x, 4, true)::UInt8
│   %2 = (Base.getfield)(x, 5, true)::UInt8
│   %3 = (Base.getfield)(x, 6, true)::UInt8
│   %4 = (Base.getfield)(x, 7, true)::UInt8
│   %5 = (Core.tuple)(%1, %2, %3, %4)::NTuple{4,UInt8}
└──      return %5

Speed-wise they are all more or less the same with 0.017ns. I like the ones that use value types the most.

asprionj · February 16, 2019, 9:18pm

I like the macro version. Why? Because you can also check (in the macro) which type of indexing is used. That is, one could call the macro with a single index, range indexing, an array of indices, …

Then I was thinking; why specify the symbol and the indexing as two arguments? There’s the @view macro, which you can also use as @view x[1:3]. I didn’t look up how it is implemented but wanted to come up with my own solution (as an exercise), and it works!

macro ntplaccess(expr)

    sym = expr.args[1] # the symbol
    idx = expr.args[2] # the index, or the indexing expression

    if typeof(idx) <: Int
        # single index
        return :($(esc(sym))[$idx])

    else
        # range, array, tuple
        elements = []

        if idx.head == :call && idx.args[1] == :(:)
            # range
            for i in (idx.args[2]:idx.args[3])
                push!(elements, :($(esc(sym))[$i]))
            end

        elseif idx.head == :tuple || idx.head == :vect
            # tuple or array
            for i in idx.args
                push!(elements, :($(esc(sym))[$i]))
            end
        end

        return :( ($(elements...),) )
    end

end

# can do...
f(y) = @ntplaccess y[1]
f(y) = @ntplaccess y[[3,5]]
f(y) = @ntplaccess y[(1,4,7)]
f(y) = @ntplaccess y[1:4]

ffevotte · February 16, 2019, 11:23pm

I would not worry too much about enclosing (from, to) indices in value types, since I would expect that these constructs would always (or at least most of the time) be used in larger functions, with possibly variable data, but always the same indices.

Say you define f4 in the same way as above:

f4(x, from, to) = ntuple(i -> x[i + from - 1], to - from + 1)

Of course a top-level call is not type stable:

julia> @code_warntype f4(x, 1, 4)
Body::Tuple{Vararg{UInt8,N} where N}
[...]

But as soon as you enclose it in a larger function, if indices arguments are constant (as is the case above and in all examples in this thread so far), then constant propagation simplifies everything, and you’re type-stable again:

julia> big_function(x) = f4(x, 1, 4);
julia> @code_warntype big_function(x)
Body::NTuple{4,UInt8}
[...]

From there, regular compiler optimizations further simplify the code and essentially remove all forms of integer operations for indices so that what you get in the end mostly performs as good as the by-hand tuple construction:

julia> @btime big_function($x)
  0.024 ns (0 allocations: 0 bytes)
(0x01, 0x02, 0x03, 0x04)

julia> by_hand(x) = (x[1], x[2], x[3], x[4]);
julia> @btime by_hand($x)
  0.025 ns (0 allocations: 0 bytes)
(0x01, 0x02, 0x03, 0x04)

ffevotte · February 16, 2019, 11:39pm

I tend to like macros too. Still, this is probably not what I would do here: the polymorphism you introduce in your macro is essentially a manual implementation of some sort of dispatching depending on the index type.

Therefore, I would probably define a function to perform what I need, with multiple methods depending on the index type. For example, this is what a minimal implementation for unit ranges could look like:

function ntpl(x, idx::UnitRange)
    from = idx.start
    to = idx.stop
    ntuple(i->x[i+from-1], to - from + 1)
end

julia> f(x) = ntpl(x, 2:4)
f (generic function with 1 method)

julia> @code_warntype f(x)
Body::Tuple{UInt8,UInt8,UInt8}
[...]

julia> @btime f($x)
  0.024 ns (0 allocations: 0 bytes)
(0x02, 0x03, 0x04)

For tuples of indices, the implementation could be:

function ntpl(x, idx::Tuple)
    ntuple(i->x[idx[i]], length(idx))
end

julia> g(x) = ntpl(x, (2,3,4))
g (generic function with 1 method)

julia> @code_warntype g(x)
Body::Tuple{UInt8,UInt8,UInt8}
[...]

julia> @btime g($x)
  0.024 ns (0 allocations: 0 bytes)
(0x02, 0x03, 0x04)

And a new implementation could be added for each useful type of indices set (StepRanges and so on).

If you want to have a bracket-like syntax (without overloading the standard Base.getindex) you can add then implement only the syntax-to-syntax transformation in a macro:

macro ntpl(expr)
    sym = expr.args[1]
    idx = expr.args[2]
    quote
        ntpl($(esc(sym)), $(esc(idx)))
    end
end

julia> @macroexpand @ntpl x[2:4]
quote
    #= REPL[25]:5 =#
    (Main.ntpl)(x, 2:4)
end

julia> h(x) = @ntpl x[2:4]
h (generic function with 1 method)

julia> @code_warntype h(x)
Body::Tuple{UInt8,UInt8,UInt8}
[...]

julia> @btime h($x)
  0.024 ns (0 allocations: 0 bytes)
(0x02, 0x03, 0x04)

This has the advantage (IMO) that specific implementations of the NTuple extraction can be added for many types of indices, without having to change the macro itself.

cstjean · February 17, 2019, 1:45am

FWIW, there’s a similar utility in Unrolled.jl:

julia> using Unrolled
[ Info: Precompiling Unrolled [9602ed7d-8fef-5bc8-8597-8f21381861e8]

julia> @code_warntype (1,2,3)[@fixed_range 2:3]
Body::Tuple{Int64,Int64}
1 ─ %1 = (Base.getfield)(seq, 2, true)::Int64
│   %2 = (Base.getfield)(seq, 3, true)::Int64
│   %3 = (Unrolled.tuple)(%1, %2)::Tuple{Int64,Int64}
└──      return %3

help?> @fixed_range
  @fixed_range 3:10 behaves like the standard range 3:10, but is stored within
  the type system, so that some_tuple[@fixed_range 3:10] is type-stable. Also
  supports some_tuple[@fixed_range 3:end-5]

bennedich · February 17, 2019, 9:01am

Note that timings this low are benchmarking artifacts. It occurs when the expressions are replaced with constants during compilation, and you end up benchmarking nothing at all. You can tell that this is the case, since one CPU clock cycle is ~0.3 ns on a 3 GHz computer, so 0.017 ns would be a tiny fraction of a clock cycle, which is not realistic.

Topic		Replies	Views
Type unstable function returning a tuple General Usage	34	1334	November 3, 2020
Core.tuple() warntype Performance	22	1481	January 17, 2020
Computing tuple replacements New to Julia tuple	18	1650	October 21, 2021
Is there a way to "forward" getindex for Tuples in a type-stable way? General Usage	25	3650	December 6, 2017
Performance of creating a tuple with a for loop Performance	26	2633	September 15, 2020

Subset tuples with macros

Related topics