Creating Sparse NTuple

I have a problem where I want to make an NTuple of length N where I’m given indices and values for the non-zero components.

The equivalent function for creating a Vector would look something like

function foo(N, nzindices, nzvalues)
    bar = zeros(N)
    bar[nzindices] .= nzvalues
    return bar

How would I make an equivalent function that returns a Tuple of length N without allocating a Vector?

You may broadcast Base.setindex

function foo(::Val{N}, nzindices::Vector, nzvalues::Vector) where N
    function f(i)
        idx = findfirst(isequal(i),nzindices)
        if isnothing(idx)
            return zero(eltype(nzvalues))
            return nzvalues[idx]
    return ntuple(f,Val(N))

that works like this?

julia> foo(Val{8}(),[1,4],[10,40])
(10, 0, 0, 40, 0, 0, 0, 0)

another option, with Base.setindex:

function foo2(::Val{N}, nzindices, nzvalues) where N
    res = ntuple(i -> zero(eltype(nzvalues)),Val{N}())
    for i in 1:length(nzindices)
        res = Base.setindex(res,nzvalues[i],nzindices[i])
    return res

Quick question: what is the difference between Val(N) and Val{N}() in your example above?

Also, I did a little benchmarking of the two that I thought I would share:

N = 10
Nnz = 4
nzindices = rand(1:N, Nnz)
nzvalues = rand(Nnz)

@btime foo($(Val(N)), $nzindices, $nzvalues)  # 21.606 ns (0 allocations: 0 bytes)
@btime foo2($(Val(N)), $nzindices, $nzvalues)  # 8.583 ns (0 allocations: 0 bytes)

N = 100
Nnz = 27
nzindices = rand(1:N, Nnz)
nzvalues = rand(Nnz)

@btime foo($(Val(N)), $nzindices, $nzvalues)  # 867.103 ns (0 allocations: 0 bytes)
@btime foo2($(Val(N)), $nzindices, $nzvalues)  # 37.958 μs (2808 allocations: 86.48 KiB)

So it appears that the first method might be faster, at least until N gets large and starts allocating. Do you know why this would be the case?

1 Like

I suppose that the first method creates an NTuple once, but calls a function with conditionals. at low N, the cost of the function might be dominated by that call.
The second method creates nz NTuples, that may be fast at low N, but i suppose there is a limit on that. if you want the best of both worlds, you can combine both, if you find the N threshold that most suits your enviroment