Is there a package to list memory consumption of selected data objects?

I could use a package that could look at a variable and compute the true memory used by the pointed-to data object.

Is there anything like this?

2 Likes

Sounds just like Base.summarysize, but are there reasons that’s not enough? PS I have no idea what the chargeall argument does from its description.

1 Like

That looks promising. I will check it out. Thanks.

Just beware of adding the numbers reported by summarysize if you can’t dismiss overlap. Take the example below; you shouldn’t simply add the memory usage by each variable instance because they share 8040 bytes.

julia> x = collect(1:1000); y = Ref(x); z = Ref(x); a = 1+1im;

julia> Base.summarysize(x), Base.summarysize(y), Base.summarysize(z), Base.summarysize(a)
(8040, 8048, 8048, 16)
7 Likes

One thing that might do what you’re looking for is the function varinfo():

julia> x = randn(100,100);
julia> y = randn(1000,1000);
julia> varinfo()
  name                    size summary                  
  –––––––––––––––– ––––––––––– –––––––––––––––––––––––––
  Base                         Module                   
  CGFuns            30.709 KiB Module                   
  Core                         Module                   
  InteractiveUtils 254.409 KiB Module                   
  Main                         Module                   
  ans                7.629 MiB 1000×1000 Matrix{Float64}
  x                 78.164 KiB 100×100 Matrix{Float64}  
  y                  7.629 MiB 1000×1000 Matrix{Float64}

I also have this function in my ~/.julia/config/startup.jl:

readablesize(x) =  Base.format_bytes(Base.summarysize(x))

which then prints the size of objects in a nicer format than just summarysize.

8 Likes

So I just figured out Base.summarysize((x, y, z, a)) counts shared memory once. There is a bit of overhead from Tuple pointers: 8096 (total) = 8040 (x) + 8 (y) + 8 (z) + 16 (a) + 24 (pointers to x,y,z). sizeof((x,y,z,a)) is 40 because the Tuple stores 3 pointers and a copy of the immutable a, so simply subtracting sizeof(_) isn’t right. Would need some way to figure out if an element is implemented as a pointer or inline copy; just checking semantic mutability does not always work e.g. Strings are immutable yet not stored inline. (But for some reason, ismutabletype(String) and ismutabletype(DataType) return true.)

Really not sure if there’s a way to do this for all live variables or references without explicitly writing them. There is a Base.gc_live_bytes() to report all live memory, but that also includes deallocated memory that has yet to be garbage collected and allocations for hidden implementation mechanisms.

2 Likes

It seems to me that the varinfo function cannot be used inside an arbitrary function to inspect the local variables. That is what I really need. I would like to inspect the sizes of the local variables in this
easily grokkable way.

You could use Base.@locals + Base.summarysize perhaps

5 Likes

Wonderful!

1 Like

If you’re digging into local variables, just be aware that you could be interfering with optimizations, including those concerning memory usage.

julia> function f()
        x = Ref(Int16(1))
        y = 3.5
        x[]+y
       end
f (generic function with 1 method)

julia> function f2()
        x = Ref(Int16(1))
        xsize = Base.summarysize(x)
        y = 3.5
        ysize = Base.summarysize(y)
        x[]+y
       end
f2 (generic function with 1 method)

julia> @code_llvm f()
;  @ REPL[1]:1 within `f`
define double @julia_f_172() #0 {
top:
;  @ REPL[1]:4 within `f`
  ret double 4.500000e+00
}

julia> @code_llvm f2()
;  @ REPL[2]:1 within `f2`
define double @julia_f2_184() #0 {
top:
  %gcframe7 = alloca [3 x {}*], align 16
### I'll omit the Ref and summarysize parts
; ┌ @ refvalue.jl:56 within `getindex`
; │┌ @ Base.jl:42 within `getproperty`
    %19 = load i16, i16* %12, align 2
; └└
; ┌ @ promotion.jl:379 within `+`
; │┌ @ promotion.jl:350 within `promote`
; ││┌ @ promotion.jl:327 within `_promote`
; │││┌ @ number.jl:7 within `convert`
; ││││┌ @ float.jl:146 within `Float64`
       %20 = sitofp i16 %19 to double
; │└└└└
; │ @ promotion.jl:379 within `+` @ float.jl:399
   %21 = fadd double %20, 3.500000e+00
   %22 = load {}*, {}** %5, align 8
   %23 = bitcast {}*** %2 to {}**
   store {}* %22, {}** %23, align 8
; └
  ret double %21
}

PS really not sure why there are the %22 and %23 lines in the addition part of @code_llvm f2() when it just ends up returning %21

2 Likes

Super neat, I didn’t know about that macro! I’m totally going to add a macro like this into my startup.jl file:

macro show_locals()
  quote 
    locals = Base.@locals
    println("\nIndividual sizes (does not account for overlap):")
    for (name, refval) in locals
      println("\t$name: $(Base.format_bytes(Base.summarysize(refval)))")
    end
    print("Joint size: ")
    println("$(Base.format_bytes(Base.summarysize(values(locals))))\n")
  end
end

# example use:
function tester(n)
  x = randn(n,n)
  @show_locals
  sum(x)
end

With example output:

julia> tester(100)

Individual sizes (does not account for overlap):
	n: 8 bytes
	x: 78.164 KiB
Joint size: 78.625 KiB

94.23373017998107

I love threads like this where I learn some nifty trick. I was just wishing a month or two ago to do something like this but I didn’t really take the initiative do something about it and try to figure out a solution.

9 Likes

one thing to consider is that if it isn’t easy to eyeball how big the locals are, you probably should split your function up more.

I wonder if @locals makes copies of the data? Not easy to tell by looking at the code:

macro locals()
    return Expr(:locals)
end

This also has overhead, more than the Tuple example it seems:

julia> function f()
         x = 1
         @show_locals
       end
f (generic function with 1 method)

julia> f()

Individual sizes (does not account for overlap):
	x: 8 bytes
Joint size: 472 bytes

It’s negligible when your instances are large enough to take up most of the reported memory, like your example with a 100x100 matrix, but that’s not always the case. I’m not sure if there is a way to separate the memory of the values iterator and the referenced @locals dictionary from the memory of the contained instances, though. sizeof(locals) is a constant for any size, so probably have to dig into some internals.

@locals makes a Dict{Symbol, Any}, so it would involve how those instances are boxed. Hypothetically the boxes on the heap could just point to the existing instances, but I don’t actually know how boxes are implemented; as far as I know, copying could happen for many immutables. In any case, @locals or any other container would only contain 1 of the copies, so summarysize wouldn’t count twice.

1 Like

I couldn’t find out how to measure dictionary memory, and I wasn’t comfortable with large heterogeneous tuples that can store its elements in various ways, so I went with a simpler Vector{Any} as the container.

It appears that Base.summarysize doesn’t actually report all the memory used by a Vector{Any}. I know there should be boxes containing element type information, but it only seems to count the elements and the vector’s pointers to them. Also, if you allocate a v = collect(1:100000) then empty!(v), the underlying buffer does not shrink if I recall correctly, but sizeof and summarysize reports a reduction to minimal memory. So summarysize might actually be unsuitable for measuring allocated heap memory; maybe we could say it measures the portion of allocated memory that represents accessible data?

Still, this might make it easier to remove the overhead of the Vector{Any} containing the @locals instances. Bear in mind the following only worked on a few small examples, I have not rigorously tested this and do not know how.

julia> function valuessize(v::Vector{Any})
           (Base.summarysize(v) # doesn't seem to count boxes
           - Base.summarysize(Any[]) # Vector overhead
           - sizeof(Int)*length(v) ) # element pointers
       end
valuessize (generic function with 1 method)

julia> function valuessize(d::Dict{Symbol, Any})
           valuessize(collect(values(d)))
       end
valuessize (generic function with 2 methods)

julia> x = Dict{Symbol, Any}(:x => 1, :y => 2.3, :z => [3])
Dict{Symbol, Any} with 3 entries:
  :y => 2.3
  :z => [3]
  :x => 1

julia> valuessize(x)
64

julia> sum(Base.summarysize.(values(x))) # elements don't share data
64

julia> Base.summarysize(collect(values(x))) # plus Vector{Any} overhead
128

julia> Base.summarysize(values(x)) # plus values/Dict{Symbol,Any} overhead
528
1 Like