RAM needed to initialise large matrices

fipelle · March 22, 2022, 8:04pm

Hi, I have a basic doubt on memory management. I am currently working on a high-dimensional problem in which I would need to use a handful of matrices with size 500,000 x 500,000. I was wondering how much ram is needed to initialise similar matrices of floats.

I have tried using varinfo() after having initialised a matrix A=zeros(100000, 100000) to get a rough idea, but I am not sure I understand the output. In fact, it says that A requires about 74.506 GiB (which sounds odd to me, considering that I am testing it on a MacBook Air with substantially less RAM).

How should I read the output of varinfo()? Also, is it enough to multiply it by 5 to get a rough idea of the RAM required for some matrix B=zeros(500000, 500000)?

goerch · March 22, 2022, 8:09pm

You don’t get me to try this on windows. Back on the envelope calculation gives me a memory requirement of 80 GB. So for 500.000 x 500.000 you should multiply by 25.

Edit: checked it: clearly a troll.

Edit: @fipelle : you can’t risk system integrity for unsuspecting observers.

jling · March 22, 2022, 8:13pm

julia> A=zeros(100, 100);

julia> varinfo()
  name                    size summary                               
  –––––––––––––––– ––––––––––– ––––––––––––––––––––––––––––––––––––––
  A                 78.164 KiB 100×100 Matrix{Float64} 

julia> 100*100*64/8/1024
78.125

stevengj · March 22, 2022, 8:18pm

In double precision, an n \times n matrix requires 8n^2 bytes.

In short, you won’t be able to handle n = 500,000 (without a huge supercomputer), since that corresponds to 2TB of memory. People working with such large matrices almost invariably exploit some special structure, e.g. sparsity (if your matrix is mostly zero). We could give you more specific advice, but we’d need to know more about your problem.

fipelle · March 22, 2022, 9:47pm

Thank you!

That’s exactly what I am trying to do for handling the last huge matrix in my problem - I did manage to reduce the other matrices into smaller blocks. The last one is a selection matrix (ones and zeros, mostly zeros) of size 500,000 x 500,000.

Any suggestion? I am trying to use Sparse Arrays · The Julia Language but a specialised structure for selection matrices would certainly be better.

goerch · March 22, 2022, 9:51pm

That is not what you are trying to do with zeros.

fipelle · March 22, 2022, 9:52pm

No, I was trying to get a rough idea of the memory needed for a matrix with size 500,000 x 500,000 - as indicated in the OP.

goerch · March 22, 2022, 9:53pm

But others might try that on their system?

goerch · March 22, 2022, 9:54pm

Forcing system instability due to OOM. In my eyes that is not OK.

fipelle · March 22, 2022, 10:01pm

It should be evident from the history of my account that I am using this discourse properly and certainly not to troll other people - as I have just noticed you mentioned on top. While quite odd, my MacBook does not crash when initialising that matrix - and I am still wondering why btw.

If you have relevant suggestions, I could really use some help. However, I do not think that polluting this post with similar conspiracies is of any use.

goerch · March 22, 2022, 10:04pm

This doesn’t apply to Windows as far as I can tell (I checked). I accept this as kind of an apology.

Edit: but this could be quite a serious difference of memory management between the supported platforms.

Another edit: I didn’t really wait for my system to crash, only observed system monitor showing more and more memory being allocated.

Jeff_Emanuel · March 22, 2022, 10:29pm

virtual memory plus swap file sufficiently larger than physical ram.

fipelle · March 22, 2022, 10:30pm

Thank you Jeff! Would you please expand on that?

Oscar_Smith · March 22, 2022, 10:37pm

What’s happening here is that you are requesting the memory, but the OS only actually gives you physical memory when you write to it. Calling zeros doesn’t write to memory, so the OS doesn’t actually have to give you any memory.

Jeff_Emanuel · March 22, 2022, 10:39pm

In modern computer systems, a process’s memory address space is mapped to physical ram through page tables. Blocks of memory (aka pages) can be offloaded to disk in the swap or paging file, and reloaded later as needed.

goerch · March 22, 2022, 10:39pm

So @fipelle should see indications of this going on in his system monitor, too? (i.e. not that a big difference in memory management between Windows and Mac OS then)

fipelle · March 22, 2022, 10:42pm

I see. I am still going to use a sparse representation, but just for the sake of clarity: as long as there’s enough space on my hard disk it should work, right?

Jeff_Emanuel · March 22, 2022, 10:45pm

It depends on the swap file configuration. The OS usually puts a limit on the swap file size. Paging will lead to very poor performance.

goerch · March 22, 2022, 10:45pm

Do we have a difference in demand paging here?

Jeff_Emanuel · March 22, 2022, 10:47pm

From the sources:

https://answers.microsoft.com/en-us/windows/forum/all/physical-and-virtual-memory-in-windows-10/e36fb5bc-9ac8-49af-951c-e7d39b979938

Topic		Replies	Views
Matrix{Bool}(1000000, 1000000) really? General Usage	2	651	December 26, 2018
Huge sparse array construction General Usage sparse	9	867	April 12, 2020
Best practices - initialize a matrix New to Julia	4	814	October 28, 2018
Is varinfo() designed to not report size of an array of sparse matrices? Performance memory-allocation , sparse	2	344	January 22, 2021
How to efficiently construct a large SparseArray? Packages for this? Performance package , performance , parallel , sparse	20	1760	May 15, 2022

RAM needed to initialise large matrices

Related topics