Push!, and interfacing to the runtime library

I think using this code would be a terrible idea, in view of yuyichao’s comments in https://discourse.julialang.org/t/why-so-many-internals-in-c/6119, and in view of https://discourse.julialang.org/t/unsafe-store-sometimes-slower-than-arrays-setindex/7166.

That is, my example is just a PoC that at least 30% speedup for push! are available if we manage to inline the fast path (avoid the ccall, avoid checking array flags because type-inference tells us all). Same should apply for pop!.

We gain even more if we acknowledge that Vector and higher-dim Arrays are different:
Vectors need no more doubling of length=nrows, and tiny vectors (up to 4*sizeof(Ptr)) can be stored inline (they fit into the same cache-line! Miss avoided if we need to iterate over a large number of Vectors, many of which are small! This is actually something I do a lot: variable-arity trees where most nodes have very few or zero children). Alternatively, one could use the shrunk vector_T to store two of them per cache-line (but I think inline storage gives a bigger plus).

Higher-dim arrays free the offset for an additional 32 bits of info. Not sure what to do with that; maybe store ndims, so that nospecialize on the number of dimensions becomes possible? Maybe make them user-available for locks?

Unfortunately I don’t feel comfortable enough with the C base to even attempt to separate array and vector or do a clean improvement.

I think the most pragmatic way to get a fastpush would be to copy the approaches for setindex!-non-bitstype and length for grow_at_end and shrink_at_end (that is, there is a special case in code-gen that emits the fastpath for inlining, and if this fails we do a library call for rooting of the setindex!ed object).

PS. Also, this needs to be different on 32 bit systems.

1 Like