Mutate NTuple at position

question

#1

I want to switch out an NTuple’s element, like this:

julia> function switch_k(tup::NTuple{N,T}, elem::T, pos::Integer) where {N,T}
       #_assume(0<pos<=N)
       return ntuple(i-> (pos!==i) ? tup[i] : elem ,N)
       end

julia> tup = ntuple(i->i+0.0, 10)
(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0)

julia> switch_k(tup, -1.5, 2)
(1.0, -1.5, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0)

The position is computed (and N is behind a function boundary for local type stability). Now the proper way of doing this is to memcopy all of tup, and then use pos as an offset into the stack memory to mutate it. Unfortunately I cannot seem to obtain this, when looking at julia> @code_llvm switch_k(tup, -1.5, 2) (or @code_native).

Instead, llvm tries to be clever and run vectorized comparison/select/shuffle. Ok, how should llvm know that I am really exchanging a single element? That only works if pos is inbounds. So let’s tell llvm about that:

julia> @inline function _assume(b::Bool) 
       Base.llvmcall(("declare void @llvm.assume(i1)", 
       "%assumption = icmp ne i8 %0, 0
        call void @llvm.assume(i1 %assumption) 
        ret void"), Nothing, Tuple{Bool}, b)
        end
julia> function switch_k(tup::NTuple{N,T}, elem::T, pos::Integer) where {N,T}
       _assume(0<pos<=N)
       return ntuple(i-> (pos!==i) ? tup[i] : elem ,N)
       end

This does not help. So what is the right way of obtaining this? I am definitely happy with a @generated and llvmcall solution. Use is for in-place manipulation: tup = switch_k(tup, elem, pos) with run-time computed pos. Basically I want to use tuples as mutable stack-allocated C-like arrays (and hope that the optimizer will turn this into inplace modifications that uses run-time computed memory offsets; stack-based buffer-overflows FTW).


Fortran vs Julia stack allocated arrays
#2

I realize this doesn’t directly answer your question, but have you considered just using an MVector from StaticArrays? They are already implemented as NTuples with mutation via replacing the entire tuple (with the assumption that LLVM can avoid actually replacing the unchanged elements).


#3

I think mutation of MVectors is implemented by pointer arithmetic / unsafe_store! on pointer_from_objref?

But this is essentially what I do at the moment: Write the tuple into a heap location, mutate it in-place, read it back. That way I can reuse the heap location (no allocations; but I effectively use part of an Array instead of an MArray). The ntuple is an alternative way, but turns out worse in practice.

But this work-around is pretty silly.

What I am actually working with are “static bitvectors” (with the number of 64bit-chunks as N). Occasionally (rarely) I need to flip a single bit. I am almost tempted to just carry along a Vector{NTuple{N,UInt64}} such that V[i] has only bit i set.

I am running a bad recursion on such a static k x k bitmatrix encoding a graph (problem is almost surely NP-complete in worst case), so fast operations on (N-1)*64 <= k <= N*64 bit vectors are important and I gladly pay for the dynamic dispatch on N on entry. I already replaced tail-calls by @goto, which is really useful; next I’d like to get all the recursion state on the stack if possible (I am registering all the found needles in the recursive haystack in some array, so there are no return values).


#4

StaticArrays.jl has setindex for this, which is a non-mutating version that returns a new object with the k point changed.


#5

That emits roughly the same as my switch_k :frowning:

But, on the bright side, I promise to make a PR to StaticArrays if someone here helps me figure this out.


#6

Too bad insertvalue needs constant indices.

If we could convert the stack object to a pointer in llvm, this would be easy to solve via a simple store, and it would also solve my earlier problem about masked load/store operations for stack-allocated objects.
This is obviously possible:

julia> @code_llvm identity(tup)

; Function identity
; Location: operators.jl:475
define void @julia_identity_-1618924128([10 x double]* noalias nocapture sret, [10 x double] addrspace(11)* nocapture nonnull readonly dereferenceable(80)) {
top:
  %2 = bitcast [10 x double]* %0 to i8*
  %3 = bitcast [10 x double] addrspace(11)* %1 to i8 addrspace(11)*
  call void @llvm.memcpy.p0i8.p11i8.i64(i8* %2, i8 addrspace(11)* %3, i64 80, i32 8, i1 false)
  ret void
}

julia> f(x) = x
f (generic function with 1 method)

julia> @code_llvm f(tup)

; Function f
; Location: REPL[18]:1
define void @julia_f_-1618924070([10 x double]* noalias nocapture sret, [10 x double] addrspace(11)* nocapture nonnull readonly dereferenceable(80)) {
top:
  %2 = bitcast [10 x double]* %0 to i8*
  %3 = bitcast [10 x double] addrspace(11)* %1 to i8 addrspace(11)*
  call void @llvm.memcpy.p0i8.p11i8.i64(i8* %2, i8 addrspace(11)* %3, i64 80, i32 8, i1 false)
  ret void
}

julia> function llvmcalltest(tup::NTuple{10,Float64})
           Base.llvmcall(("",
           "  %2 = bitcast [10 x double]* %0 to i8*
         %3 = bitcast [10 x double] addrspace(11)* %1 to i8 addrspace(11)*
         call void @llvm.memcpy.p0i8.p11i8.i64(i8* %2, i8 addrspace(11)* %3, i64 80, i32 8, i1 false)
         ret void
       "), NTuple{10, Float64}, Tuple{NTuple{10,Float64}}, tup)
       end
llvmcalltest (generic function with 1 method)

julia> llvmcalltest(tup)
ERROR: error compiling llvmcalltest: Failed to parse LLVM Assembly: 
julia: llvmcall:3:31: error: '%0' defined with type '[10 x double]'
  %2 = bitcast [10 x double]* %0 to i8*
                              ^

Stacktrace:
 [1] top-level scope at none:0

Then we’d just need getelementptr and store. But I don’t know how to actually get the pointer. Normal lowering seems to have no problem doing that, so the answer must be somewhere in base. Maybe passing the type as something other than Tuple{NTuple{10,Float64}}?

I think mutation of MVectors is implemented by pointer arithmetic / unsafe_store! on pointer_from_objref ?

I’m also planning on making a PR so that getindex works this way on MArrays. The x.data[i] version’s compile time is a function of the size of the object. I would like to go through StaticArrays and replace all the functions that compile slowly for large MArrays…
If/when Julia gets more support for manipulating stack objects, it’d be worth doing the same for SArrays too. But, for now, no one has any reason to want big SArrays anyway.

Anyone know how Fortran’s -fstack-arrays works?

-fstack-arrays

Adding this option will make the Fortran compiler put all arrays of unknown size and array temporaries onto stack memory. If your program uses very large local arrays it is possible that you will have to extend your runtime limits for stack memory on some operating systems. This flag is enabled by default at optimization level -Ofast unless -fmax-stack-var-size is specified.


#7

Yes, the fact that the calling conventions for @code_llvm and llvmcall differ is super annoying. Otherwise, reading @code_llvm would be the obvious approach to reverse-engineering the internal julia calling convention. It would be nice if it were possible to copy-paste @code_llvm into llvmcall (the documentation for the internal calling convention is somewhat lacking, but a living documentation in the REPL is almost preferable anyway).


#8

I am not sure why you say this. Tuples are (potentially) heterogeneous collections, and most of the code is optimized for that. If you have homogeneous elements, use a Vector or an StaticArrays.MVector.