Resize!(matrix)

@foobar_lv2, what’s you opinion on the alignment issue? Julia seems to aim for 64-byte alignments (because it’s the cache-line size?), but it looks like calloc will only guarantee 16-byte alignment. Since unsafe_wrap doesn’t allow for specifying an offset, I’m not sure how best to handle this.

Giant-sized allocations “should ™” give you memory at the start of a page, and should also give you a larger page-size (which saves you TLB misses). You could also use malloc, if you don’t need guaranteed zero initialization (you “should” get virgin zero-initialized lazy virtual memory anyway, for giant allocations).

As far as I understand, the worse alignment should not be a problem, normally, and the wrapped array should work just fine. Except if your code is very hand-crafted to exploit the cache optimally (say e.g. judy arrays).

If you get unaligned memory, and your application gets too slow with this, you can try another dirty (untested) trick (only if you wrap a vector instead of a higher-dim array): You increment the pointer until it is well-aligned, and wrap around that one. Finally, when you are done and want to wrap your “good” vector that takes ownership of the array, you wrap around the incremented pointer, and then set the “offset” field of your array. That way, the array will look like you removed the first couple of elements. And, ta-da, free() will be called by the garbage collector with the correct address (pointer(vector) - offset) when the giant-vector is finally collected.

Yes, I was wondering if that was possible - how user-accessible is the array offset? Are there any Julia-internal C-functions to manipulate it, or do I have to fiddle around with the array struct directly (which, I guess, is not guaranteed to stay unchanged over time)?

I would just write to it, via

ap = pointer_from_objref(some_vec)
unsafe_store!(convert(Ptr{Int32},ap+20), offset_new)

All the dirty tricks will break anyway, whenever julia.h changes sufficiently. Of course, it would be preferable to use something (cxx.jl?) that parses julia.h instead of you looking into julia.h and hand-coding everything into your source file (after hand-computing all offsets).

Thanks - well, I guess the array struct will stay unchanged at least for a while, and one can always check the Julia version.

I’d like to ping/reopen the discussion. Resizing matrices in-place used to be fundamentally impossible, because back in the day when arrays were implemented in C, julia assumed that non-Vector Array had constant size and memory (useful “this is constant” annotations in codegen / for llvm).

We have since moved on to pure julia arrays, via Memory. This means that we have lost the performance upside of LLVM knowing that matrices cannot resize / move their pointers. Apparently this was acceptable.

Now that we paid the price, we could offer resizing of matrices. Adding/removing rows would be a bad operation, because it always requires shuffling data around. But we could without issue add or remove columns from front or back, using the same logic as for vectors.

Do we want that? This would be very useful.

There is a trade-off:

This would forever preclude making size and pointer/ref of higher-order Array constant again.

Currently we cannot make that constant again, because we have no @generated struct, i.e. the julia language cannot express “oh, but if N is larger than zero then ref and size are const” in the definition of mutable struct Array{T,N}.

If we added that language feature in the future we could try to recover that optimization (hoping that nobody started resizing matrices in-place all over the ecosystem).

1 Like

It could be implemented with strided arrays, a typical fortran way to do it. BLAS typically supports strided matrices, and SIMD will work better than a vector of vectors. I’m not familiar with StrideArrays.jl, but I suppose that’s where to find it.

I believe const-ness of N-dimensional Array is special-cased in the compiler - at least the original Memory proposal mentioned that.

Semantically, there have been a lot of discussion about distinguishing resizable and non-resizable arrays. Doing so would allow much more explicit and non-hacky const sizes of the non-resizable ones.

So, I would support moving FixedSizeArrays to Base, and then enabling resizing of all Array dimensionalities.

1 Like

Afaiu it doesn’t work, though. Consider

barrier(recurse=true) = recurse && Base.invokelatest(barrier, false)

function optimizeMe(A)
    p0 = pointer(A)
    barrier()
    p1 = pointer(A)
    return p0 == p1
end

This function does not necessarily return true on Vector. (you could put a reference to the vector into a global, update barrier to grab that global and resize! it, and then run in an old world that predates the change of barrier definition)

Before Memory, the compiler optimized the return value to “known true” for matrices. Today it does not.

2 Likes

Sure. However, this is also not necessary: with the introduction of Memory and the related changes, Array is not very special any more. Just make your own AbstractArray subtype to try things out.

Note that FixedSizedArrays.jl gets this optimization for free (since it doesn’t use a mutable struct to store the size or Memory)

1 Like

Array is still the default: stdlib returns it, magical [1,2,3] syntax, etc.

But sure I could try it out by simply typing function Base.resize!(v::Matrix ... end into the REPL.

The specific question was: Now that Matrix is de-facto resizeable, do we want it to be officially-supported resizeable?

This is somewhat offtopic, but FixedSizedArrays.jl should imo store a MemoryRef instead of a Memory: Sure, this costs 8 extra bytes in size, but you end up saving one level of pointer indirection on each access.

PS. written in C this is the difference between

typedef struct {double* ptr;} Memory; // passed by-reference, has object header, has more fields, immutable
typedef struct {double *ptr; Memory* mem;} MemoryRef; // passed by value, no object header.
typedef struct {MemoryRef ref; size_t len;} JuliaVector; // passed by-reference, has object header, mutable!
typedef struct {Memory* mem; size_t len;} FixedSizeVector; // passed by value, no object header
typedef struct {MemoryRef ref; size_t len;} BetterFixedSizeVector; // passed by value, no object header
3 Likes