Size of Multi-dimensional SharedArray

Is it a bug with Base.summarysize or something very inefficient about SharedArray? A 1D SharedArray only have a small overhead but a 2D SharedArray uses 2x more memory?

julia> Base.summarysize(rand(1000000))
8000000

julia> Base.summarysize(SharedArray{Float64}(rand(1000000)))
8000213

julia> Base.summarysize(rand(1000000,2))
16000000

julia> Base.summarysize(SharedArray{Float64}(rand(1000000,2)))
32000221

I’d take anything that summarysize says with a large grain of salt.
It doesn’t count the header (40 bytes on a 64-bit platform), nor the actual amount allocated, just the sizeof the vector (what is used).

Thanks for the insight. Do you know of any good way to accurately measure how much memory it is using? I have a use case that I could load up to 100 GiB into shared memory and it would be a bummer if it fails miserably for my lack of understanding :slight_smile:

I am getting the following on v0.6.2:

julia> Base.summarysize(rand(1000000))
8000000

julia> Base.summarysize(SharedArray{Float64}(rand(1000000)))
8000341

julia> Base.summarysize(rand(1000000,2))
16000000

julia> Base.summarysize(SharedArray{Float64}(rand(1000000,2)))
16000349

There is also Base.shmem_rand btw.

Interesting, what OS/platform are you using? I also have v0.6.2:

$ julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.2 (2017-12-13 18:08 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-apple-darwin14.5.0

julia> Base.summarysize(SharedArray{Float64}(rand(1000000,2)))
32000221

Good old Windows.

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17* (2017-12-13 18:08 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Prescott)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

On Linux with v0.6.1, I am getting the same result as you. But on nightly, getting the right result.

On Windows with latest nightly (downloaded today), I am getting:

julia> using SharedArrays

julia> Base.summarysize(SharedArray{Float64}(rand(1000_000,2)))
32000205

This really seems like a bug.

I think it’s just double counting, because of the behavior of summarysize. It basically recursively counts how much memory is used by all objects reachable. A SharedArray has a field loc_subarr_1d and another s. The latter holds the whole array and the former holds a 1d view of the array. Mutating one mutates the other, so it’s just double counting. It’s probably doing something like:

mysizeof(f) = sum((sizeof(f), (mysizeof(getfield(f,i)) for i in 1:nfields(f))...))

But it’s weird that it’s not double counting for the 1D case :confused:. Anyways, I think it deserves some attention from devs.

Maybe adding a bug keyword or something will get more attention more quickly. In the meantime, I think the equivalent of task manager in Windows or system monitor in Linux will give you a rough idea of memory used by large enough data.

If you haven’t already, you should consider opening an issue so that it does not get lost.

Done https://github.com/JuliaLang/julia/issues/25367.

2 Likes