Resize!(matrix)

I may have this wrong, but I think RecursiveArrayTools and ArraysOfArrays will be somewhat orthogonal - it may even be possible to use RecursiveArrayTools on top of ArraysOfArrays.

Ahh I see. So yup totally different.

But if you’re working with an array of arrays, like growing a matrix a column at a time, why would you want to spend the effort to make it contiguous at each resize!? For our purposes we do the dis-contiguous form because if you every need it contiguous Array(VA) builds the matrix and IMO it should only be done once (right before you do any analysis on the full matrix), or VA' builds the transposed matrix all at once (for the timeseries). So I’m curious what kind of problem this architecture helps for. Or is it more like ElasticArrays.jl 2.0 which allows resizing both row and column?

But if you’re working with an array of arrays, like growing a matrix a column at a time, why would you want to spend the effort to make it contiguous at each resize

Well, for large amounts of data, one would still use some kind of blocking, not put everything in one array. But those blocks need to have a representation, too. I very often have use cases where data of unknown size comes in, or where I only know it’s maximum size (so I can use sizehint! to avoid frequent reallocation). These array types will keep track of the actual size for me (if I pre-allocate that max size, it might end up not being filled completely).

A typical use case I have is signal waveforms of the same length, which need to be stored in a vector-of-vectors. But when I want to run an FFT, I want the matrix view (esp. when offloading to a GPU). For such use cases, I intend to use VectorOfSimilarVectors back by an ElasticArray.

On the other hand, I have use cases where my data consists of very short vectors (most of length 1 to 3, but some of length over 100), and I have 10^7 or more of them that need to be stored … that’s where VectorOfVectors will hopefully come in handy, so I don’t drown in mem allocation - while UnsafeArray.@uview will make sure I have an allocation-free getindex (otherwise, no multi-threading scalability).

1 Like

I see. Yes, this could be very useful for matrix coloring algorithms and things like that. I’ll be happy to see where there goes. Thanks for the explanation!

(It’s interesting how many different architectures for “a matrix” one needs to think about if you really want a complex application to be fast!)

1 Like

One somewhat dirty trick on 64bit is to calloc some giant amount of virtual memory, and unsafe_wrap your vector around it (without giving julia control over the memory, otherwise the gc goes haywire).

Then you can keep writing and the kernel / page-fault handler will do all the dirty work (you only consume as much physical memory as you actually write, plus the unfilled remainder of the last page).

Afterwards, you unsafe_wrap again, up to the length you actually used, and hand the memory to julia. You now have wasted the rest of the last memory page of your array (<2MB).

This allows you to just write and never check upper bounds when writing, safe in the knowledge that your process will be killed for memory usage before it manages to corrupt memory.

edit: This dirtyness is only because the garbage collector accounts for not yet used virtual memory when calculating memory pressure. I guess changing that would be quite a lot of work, though.

1 Like

So true!

Very interesting, thanks!

Which operating systems does this trick work on? 64-bit only? Linux only?

On 32bit, virtual memory is precious (only 4GB). So you probably shouldn’t do this on 32bit.

On linux, malloc and calloc both do the trick (a giant malloc will be on-demand zero-initialized anyway).

I never tried this on other systems.

Not entirely sure about windows, but you can probably do something with VirtualAlloc, or just give malloc/calloc a try. I think BSD and OSX should work without problems, but don’t take my word on it.

What this boils down to is the fact that we have hardware support we should use (a rarely triggered trap is faster than a predicted branch, and we have an MMU: contiguous memory is a convenient lie).

Thanks for the explanations! Hm, would be nice to have another wrapper than unsafe_wrap, one that supports resize! (up to the amount of calloced memory), so that the array can start at size zero. Then one could wrap that in an ElasticArray and have a multi-dimensional array that can grow via page faults without reallocation (at least up to the the huge virtually allocated size) …

You can, in principle, resize! any array by hand, regardless of whether it is shared or multidimensional, using something like https://github.com/JuliaLang/julia/issues/24909. That is, an Array is internally something like Ref{jl_array_t}, and you can modify the jl_array_t as much as you like, using pointer_from_objref(some_array), as long as you don’t cause segfaults :wink:

So, I’d go for unsafe_resize!. But note that you don’t get to handle the page fault.

I think the only thing that would be very hard to replicate in pure julia, without using array.c, is the fast pool allocator and the memory pressure calculation (which we don’t want for such shennigans).

That is, small allocations in julia are freakishly fast. Otherwise, you could let ElasticArray just wrap a pointer / calloc / realloc / free in finalizer.

I wonder why we can’t simply allocate a zero-sized vector and do a huge sizehint!, then - that would be so elegant. I tried, ended up out of memory - doesn’t Julia use malloc or calloc? Or does Julia actually touch all the memory reserved by sizehint! somehow?

Stumbled on this post. Apparently something like this would work with numpy - I wonder why Julia doesn’t seem to use calloc (for large arrays)? Would it interfere with the memory pressure calculation?

Yes, I was thinking about something just like that. Either with a finalizer, or via a try/finally-based environment.

Found this: implement zeros() by calling calloc ¡ Issue #130 ¡ JuliaLang/julia ¡ GitHub

Also interesting, maybe: https://github.com/numpy/numpy/issues/8598

Nice blogpost, and thanks for the issue tracker link.

I guess my favorite solution would be to use a different custom allocator for giant arrays, which would use different code for tracking of pressure (giant meaning e.g. >128MB or >1GB). That way you can also make the growing cheap, by just remapping memory instead of copying it. Also, these could support copy-on-write, which can be really good for some algorithms.

edit: Different allocator because we now are allowed to be slow; a couple of micro-seconds don’t matter when dealing with such a big slab of memory, so we can do more fancy stuff.

I found out it actually does work in Julia - just not with Int8:

A = Vector{Int8}()
sizehint!(A, 8 * 1024^3)

results in instant allocation of 8GB of RAM (measured via shell command free). But

A = Vector{Float64}()
sizehint!(A, 1024^3)

allocates no memory, initially. So here calloc or so is used - but why not for UInt8?

@foobar_lv2, in this case the manual calloc won’t be necessary, right? Or would it still be preferable, would Julia’s memory pressure calculation get confused by having seemingly huge arrays around?

That was my problem, yes; the seemingly huge vectors sent the garbage collector into a frenzy (it runs too often, slowing everything else down).

I prefered my memory zero-initialized (that’s the difference between malloc and calloc) and needed it accessible right away (in order avoid the slow push!; even after sizehint!, push is really slow compared to unsafe_store! + integer-increment).

edit: But the wrapping only really works if your workflow is overallocate - write once - shrink to correct size, hand memory to julia’s gc. Pretty situational.

It’ll be a very common workflow for me. :slight_smile: