I may have this wrong, but I think RecursiveArrayTools and ArraysOfArrays will be somewhat orthogonal - it may even be possible to use RecursiveArrayTools on top of ArraysOfArrays.
Ahh I see. So yup totally different.
But if youâre working with an array of arrays, like growing a matrix a column at a time, why would you want to spend the effort to make it contiguous at each resize!
? For our purposes we do the dis-contiguous form because if you every need it contiguous Array(VA)
builds the matrix and IMO it should only be done once (right before you do any analysis on the full matrix), or VA'
builds the transposed matrix all at once (for the timeseries). So Iâm curious what kind of problem this architecture helps for. Or is it more like ElasticArrays.jl 2.0 which allows resizing both row and column?
But if youâre working with an array of arrays, like growing a matrix a column at a time, why would you want to spend the effort to make it contiguous at each resize
Well, for large amounts of data, one would still use some kind of blocking, not put everything in one array. But those blocks need to have a representation, too. I very often have use cases where data of unknown size comes in, or where I only know itâs maximum size (so I can use sizehint!
to avoid frequent reallocation). These array types will keep track of the actual size for me (if I pre-allocate that max size, it might end up not being filled completely).
A typical use case I have is signal waveforms of the same length, which need to be stored in a vector-of-vectors. But when I want to run an FFT, I want the matrix view (esp. when offloading to a GPU). For such use cases, I intend to use VectorOfSimilarVectors
back by an ElasticArray
.
On the other hand, I have use cases where my data consists of very short vectors (most of length 1 to 3, but some of length over 100), and I have 10^7 or more of them that need to be stored ⌠thatâs where VectorOfVectors
will hopefully come in handy, so I donât drown in mem allocation - while UnsafeArray.@uview
will make sure I have an allocation-free getindex
(otherwise, no multi-threading scalability).
I see. Yes, this could be very useful for matrix coloring algorithms and things like that. Iâll be happy to see where there goes. Thanks for the explanation!
(Itâs interesting how many different architectures for âa matrixâ one needs to think about if you really want a complex application to be fast!)
One somewhat dirty trick on 64bit is to calloc
some giant amount of virtual memory, and unsafe_wrap
your vector around it (without giving julia control over the memory, otherwise the gc goes haywire).
Then you can keep writing and the kernel / page-fault handler will do all the dirty work (you only consume as much physical memory as you actually write, plus the unfilled remainder of the last page).
Afterwards, you unsafe_wrap
again, up to the length you actually used, and hand the memory to julia. You now have wasted the rest of the last memory page of your array (<2MB).
This allows you to just write and never check upper bounds when writing, safe in the knowledge that your process will be killed for memory usage before it manages to corrupt memory.
edit: This dirtyness is only because the garbage collector accounts for not yet used virtual memory when calculating memory pressure. I guess changing that would be quite a lot of work, though.
So true!
Very interesting, thanks!
Which operating systems does this trick work on? 64-bit only? Linux only?
On 32bit, virtual memory is precious (only 4GB). So you probably shouldnât do this on 32bit.
On linux, malloc
and calloc
both do the trick (a giant malloc
will be on-demand zero-initialized anyway).
I never tried this on other systems.
Not entirely sure about windows, but you can probably do something with VirtualAlloc, or just give malloc/calloc a try. I think BSD and OSX should work without problems, but donât take my word on it.
What this boils down to is the fact that we have hardware support we should use (a rarely triggered trap is faster than a predicted branch, and we have an MMU: contiguous memory is a convenient lie).
Thanks for the explanations! Hm, would be nice to have another wrapper than unsafe_wrap
, one that supports resize!
(up to the amount of calloc
ed memory), so that the array can start at size zero. Then one could wrap that in an ElasticArray
and have a multi-dimensional array that can grow via page faults without reallocation (at least up to the the huge virtually allocated size) âŚ
You can, in principle, resize!
any array by hand, regardless of whether it is shared or multidimensional, using something like https://github.com/JuliaLang/julia/issues/24909. That is, an Array
is internally something like Ref{jl_array_t}
, and you can modify the jl_array_t
as much as you like, using pointer_from_objref(some_array)
, as long as you donât cause segfaults
So, Iâd go for unsafe_resize!
. But note that you donât get to handle the page fault.
I think the only thing that would be very hard to replicate in pure julia, without using array.c
, is the fast pool allocator and the memory pressure calculation (which we donât want for such shennigans).
That is, small allocations in julia are freakishly fast. Otherwise, you could let ElasticArray just wrap a pointer / calloc / realloc / free in finalizer.
I wonder why we canât simply allocate a zero-sized vector and do a huge sizehint!
, then - that would be so elegant. I tried, ended up out of memory - doesnât Julia use malloc
or calloc
? Or does Julia actually touch all the memory reserved by sizehint!
somehow?
Stumbled on this post. Apparently something like this would work with numpy - I wonder why Julia doesnât seem to use calloc
(for large arrays)? Would it interfere with the memory pressure calculation?
Yes, I was thinking about something just like that. Either with a finalizer, or via a try
/finally
-based environment.
Also interesting, maybe: https://github.com/numpy/numpy/issues/8598
Nice blogpost, and thanks for the issue tracker link.
I guess my favorite solution would be to use a different custom allocator for giant arrays, which would use different code for tracking of pressure (giant meaning e.g. >128MB or >1GB). That way you can also make the growing cheap, by just remapping memory instead of copying it. Also, these could support copy-on-write, which can be really good for some algorithms.
edit: Different allocator because we now are allowed to be slow; a couple of micro-seconds donât matter when dealing with such a big slab of memory, so we can do more fancy stuff.
I found out it actually does work in Julia - just not with Int8
:
A = Vector{Int8}()
sizehint!(A, 8 * 1024^3)
results in instant allocation of 8GB of RAM (measured via shell command free
). But
A = Vector{Float64}()
sizehint!(A, 1024^3)
allocates no memory, initially. So here calloc
or so is used - but why not for UInt8
?
@foobar_lv2, in this case the manual calloc
wonât be necessary, right? Or would it still be preferable, would Juliaâs memory pressure calculation get confused by having seemingly huge arrays around?
That was my problem, yes; the seemingly huge vectors sent the garbage collector into a frenzy (it runs too often, slowing everything else down).
I prefered my memory zero-initialized (thatâs the difference between malloc and calloc) and needed it accessible right away (in order avoid the slow push!
; even after sizehint!
, push is really slow compared to unsafe_store!
+ integer-increment).
edit: But the wrapping only really works if your workflow is overallocate - write once - shrink to correct size, hand memory to juliaâs gc. Pretty situational.
Itâll be a very common workflow for me.