Better way to improve a function with splat operator used in the input

I have a multi-variate function like

function f(input::Vararg{Int, N}) where {N}
# some stuff
end

I need to call it using the form f(x1..., xi, xn...), where x1 and xn are vectors determined in the runtime.

The splat operator is convenient, but seems not performant. With some simple experiments, I noticed that it takes >100ns on my machine.

I’m thinking about writing a function like f(x1, xi, xn), but the lengths of x1 and xn have to be specialized in different situations, because they are not fixed in the problem. Is this possible? Or is there a better way to handle this issue?

Thanks.

Why do you need to specialize on the lengths of x1 and xn?
I would reconsider whether there is a way that avoids specialization.

If it is truly unavoidable, but x1 and xn have only a small number of lengths that they can be, I would consider dispatching on a Tuple or a SVector (from StaticArrays) instead of (or in addition to) splatting.

1 Like

Maybe I misused that word. Do you mean I just need a function with signature like (Tuple, Number, Tuple)? Is the length of the tuples not important?

In addition, currently I use Vector to store the partial input. Why are Tuple and SVector better in this case?

Thank you.

The compiler knows the length of a Tuple or SVector, so it can compile specialized code — if x = (a,b,c), then f(x...) compiles to exactly the same thing as f(a,b,c).

The compiler doesn’t know the length of a Vector, so at runtime it needs to construct the function call and dynamically dispatch to the correct method. This is expensive.

If you always know the size of your container statically (at compile time), then use a Tuple, or a StaticArray if you want to perform vector operations. Otherwise, don’t splat — just pass your array directly, and declare your function as function f(input::AbstractVector{<:Integer}) or similar.

4 Likes

Thank you for the explanation and suggestion.

I have a following struct which contains the sets of indices.

struct IndexSet
I::Vector{Vector{Vector{Int}}}
end 

I[1] == [[1], [2], [3]]
I[2] == [[1,2], [3,4]]
I[3] == [[1,2,3],[2,3,4],[3,4,5],[5,6,7]]

That is, the i-th element of I is a index set with each element of length i.
If I can rewrite this to something like Vector{Vector{NTuple{i, Int}}}, then I think the splat operator can be compiled to the most efficient code.

The problem is then if I can make I to contain Tuples with different lengths for different i. Could you please give some hints about if this is possible? Thanks.

I just played with that idea. It seems Vector{Vector}} is a concrete type and I can put vectors of NTuple in it.

It is a concrete type, but probably not in the way that you think. I.e. if v isa Vector{Vector}, then v[i][j] is not type stable, because the compiler doesn’t know what sort of things the internal vector can hold. So it’s concrete in the sense that it can exist, but it’s really no better than a Vector{Any}

What is IndexSet used for?

2 Likes

Thank you! So is Vector{Vector} like an array of pointers in c?

The IndexSet is used for storing a sequence of sets of indices. The element I[k] contains a bunch of indices of length k. You can check the example given in my previous post.

For a fixed k, I[k] has a stable type. Can I make the compiler use the information of k? Or is there a better way to store heterogeneous elements?

Maybe the simplest way is using Tuple for I, instead of Vector, so that I can have an I like Tuple{Vector{Tuple{Int64}}, Vector{Tuple{Int64, Int64}}}. Does this work?

The length of I can be determined in the compile time from some constant.

I found Tuple{(Vector{NTuple{k, Int}} for k=1:N)...} is valid, for a given N. For example, if N=2 I get

Tuple{Vector{Tuple{Int64}}, Vector{Tuple{Int64, Int64}}}

However, if I put it in a struct, like

 struct IndexSetN{N}
     I::Tuple{(Vector{NTuple{k, Int}} for k=1:N)...}
     function IndexSetN{N}(I) where N
          new{N}(I)
     end
end

I got

ERROR: MethodError: no method matching (::Colon)(::Int64, ::TypeVar)
Closest candidates are:
  (::Colon)(::T, ::Any, ::T) where T<:Real at range.jl:41
  (::Colon)(::A, ::Any, ::C) where {A<:Real, C<:Real} at range.jl:10
  (::Colon)(::T, ::Any, ::T) where T at range.jl:40
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[5]:1

Does anybody know how to make this work? Or is it just impossible to define a struct like that?

Thanks.

This is not possible. But you could simply do

struct IndexSetN{T}
# or `IndexSet{N,T}` if you still want the `N` parameter for other purposes
  I::T
  # ...
end

and then pass your Tuple type directly as T. You can use the inner constructor to validate it, if necessary.

1 Like

Thank you. Eventually I have this

struct ISetTuple{IndexSet}
    I::IndexSet

    function ISetTuple{N}() where N
        IndexSet = Tuple{(Vector{NTuple{k, Int}} for k=1:N)...}
        I = Tuple(NTuple{k, Int}[] for k in 1:N)
        new{IndexSet}(I, N)
    end
end

Then I can call like ISetTuple{3}() to construct a object. Julia is very expressive!

Unfortunately, I just noticed that this is very unlikely to make the compiler know how to optimize the splat operator, which is my initial question, because which element of I is being used is only known in the runtime.
This problem is due to my poor design of the code.

Thanks for everyone here. I learned a lot!

Not quite. You can’t actually create an instance of a Vector{Vector}, even though isconcretetype. An array of pointer arrays is instead a Vector{Vector{Any}}. (Note that Vector{Vector} is equal to Vector{Vector{<:Any}}.)

Example:

julia> let x=[Any[]]
           @show typeof(x)
           @show isconcretetype(typeof(x))
           @show x isa Vector{Vector{Any}}
           @show x isa Vector{Vector}
           @show x isa Vector{<:Vector}
       end;
typeof(x) = Vector{Vector{Any}}
isconcretetype(typeof(x)) = true
x isa Vector{Vector{Any}} = true
x isa Vector{Vector} = false
x isa Vector{<:Vector} = true

Are the indices always sequential? in which case you could use unitranges 1:3, 2:4, etc.

Thanks for the example.

They are not sequential in general. Sorry my example is misleading.

Sure you can:

julia> v = Vector[ [1,2,3], ["a","b","c"] ]
2-element Vector{Vector}:
 [1, 2, 3]
 ["a", "b", "c"]

A Vector{Vector} is an array whose elements can be different types of Vector, e.g. Vector{Int} and Vector{String} in this example.

Under the hood, I think it’s stored in much the same way as Vector{Any} — an array of jl_value_t pointers, which point to a “boxed” value that includes both a type tag and the actual data (or a pointer thereto).

5 Likes

Touché.