Can the result of @time measure the spatial complexity of the algorithm?

F-YF · July 20, 2022, 6:03am

julia> function myfunc(n::Int64)
           D=20
           a=rand(D,D)
           for i in 1:n
               a*a
           end
       end
myfunc (generic function with 1 method)

julia> @time myfunc(100)
  0.000193 seconds (101 allocations: 328.250 KiB)

julia> @time myfunc(1000)
  0.003475 seconds (1.00 k allocations: 3.177 MiB)

julia> @time myfunc(10000)
  0.070506 seconds (10.00 k allocations: 31.741 MiB, 53.73% gc time)

julia> @time myfunc(100000)
  0.398272 seconds (100.00 k allocations: 317.386 MiB, 48.57% gc time)

julia> @time myfunc(1000000)
  1.612816 seconds (1.00 M allocations: 3.099 GiB, 5.19% gc time)

julia> @time myfunc(10000000)
 15.534450 seconds (10.00 M allocations: 30.994 GiB, 4.64% gc time)

This example seems to indicate that the memory consumption of @time is cumulative and does not reflect the maximum memory allocation (that is, memory complexity) that can occur during a program’s execution. So what operations can represent the memory complexity of a function?

Benny · July 20, 2022, 10:22am

I think I see what you mean, but just in case, I’m interpreting “maximum memory allocation”/“memory complexity” to mean the amount of memory needed for the method call to work. For example, if I allocate and free 100 bytes 5 times in a row, the cumulative memory consumption is 500 bytes, but I only need 100 bytes to do the work.

Here’s the problem with what you’re asking: you cannot manually free memory, and there’s no guarantee when the garbage collector runs. Although myfunc theoretically only needs heap memory for 2 arrays a and a*a, the garbage collector probably isn’t running in each iteration of the for-loop, and for all we know, it might only run after the method finishes and free 1+n arrays at once (@time reflects the numbers in this case). That’s not even the least responsive scenario, maybe it runs after 3 of these method calls and frees 3*(1+n) arrays.

If you really need to save memory and don’t mind slowing the method to a crawl, you could try calling GC.gc() after reassignments. But the practical solution is to only allocate what you need and use in-place methods, if available. In this case, there is LinearAlgebra.mul!; below is a quick edit of your method:

julia> using LinearAlgebra

julia> function myfunc2(n::Int64)
           D=20
           a=rand(D,D)
           aa=zeros(D,D)
           for i in 1:n
               mul!(aa, a, a)
           end
       end
myfunc2 (generic function with 1 method)

julia> @time myfunc2(100)
  0.000103 seconds (2 allocations: 6.500 KiB)

julia> @time myfunc2(1000)
  0.000930 seconds (2 allocations: 6.500 KiB)

julia> @time myfunc2(10000)
  0.009923 seconds (2 allocations: 6.500 KiB)

julia> @time myfunc2(100000)
  0.093000 seconds (2 allocations: 6.500 KiB)

Incidentally, I think dead code elimination isn’t happening, despite none of the computations being returned, because 1:n could be defined to mutate n, and the n::Int64 annotation is just a subtype restriction that cannot inform the method the argument is immutable.

F-YF · July 20, 2022, 12:52pm

Oh, so the memory allocation returned by @time refers to the maximum memory required by myfunc(), which can be used to express the size of space complexity, right?

F-YF · July 20, 2022, 12:58pm

I don’t konw why my code doesn’t free some memory when it calls gc.gc ():

julia> function myfunc1(n::Int64)
           D=20
           a=rand(D,D)
           for i in 1:n
               a*a
               GC.gc()
           end
       end
myfunc1 (generic function with 1 method)

julia> function myfunc2(n::Int64)
           D=20
           a=rand(D,D)
           for i in 1:n
               a*a
           end
       end
myfunc2 (generic function with 1 method)

julia> @time myfunc2(100)
  0.000273 seconds (101 allocations: 328.250 KiB)

julia> @time myfunc2(1000)
  0.002338 seconds (1.00 k allocations: 3.177 MiB)

julia> @time myfunc1(100)
  6.571333 seconds (101 allocations: 328.250 KiB, 99.97% gc time)

julia> @time myfunc1(1000)
 64.667089 seconds (1.00 k allocations: 3.177 MiB, 99.98% gc time)

Of course, it takes up quite a bit of time

F-YF · July 20, 2022, 1:03pm

This works well for the example . Are there any other general methods? What do I do when I’m not multiplying matrices in the loop? GC.gc() takes too long.

Benny · July 20, 2022, 1:21pm

You’re still misunderstanding what @time is measuring. Let’s say you have a method that allocates 3 times. 1st time allocates 5 bytes, 2nd time 4 bytes, 3rd time 8 bytes. @time will report 3 allocations and 5+4+8=17 bytes. If you’re @timeing the first call with JIT compilation, @time will also count all the allocations done for compilation. @time is just not designed to measure space complexity or “maximum memory”, the latter baselessly assuming the garbage collector somehow runs right before the method call then right after. I’m guessing you might think that the garbage collector frees memory specific to a method call; it does not, it works on the entire heap and can free allocations from multiple method calls that were long over.

For the same reason, you will not see the effects of GC.gc() reflected in @time. The count of allocations and sum of memory per allocation can only go up, garbage collection doesn’t decrease them. Since you are triggering garbage collection, you can be sure that there are fewer live allocations and less allocated memory at any given point during the method call than what @time eventually prints.

Since Julia cares a lot about performance and needless allocations hurts performance, it is common to implement in-place methods; reading the documentation or asking in forums can help you find them. However, that’s not a guarantee, it is still possible to find an allocating method with no corresponding in-place version, even if it would make sense to have one.

F-YF · July 20, 2022, 1:46pm

Huh, I get it. So now I’m interested in measuring the spatial complexity of the algorithm, and although we can infer the spatial complexity, I want to know if there’s a function that gives us the maximum amount of memory that we need at a particular runtime

Oscar_Smith · July 20, 2022, 1:55pm

Base.gc_live_bytes() will give you the total amount of live heap memory.

F-YF · July 20, 2022, 2:12pm

Well. that’s a good idea. Thanks

gdalle · July 20, 2022, 2:18pm

Also note the existence of the memory profiler in Julia 1.8: Profiling · The Julia Language

Benny · July 21, 2022, 3:16am

To demonstrate how the garbage collector’s inconsistent timing throws off any measure of memory usage, here’s a version of the original function where Base.gc_live_bytes is used to measure the change in heap memory occupation:

julia> function myfunc_checkgc(n::Int64)
           start = Base.gc_live_bytes()
           D=20
           a=rand(D,D)
           for i in 1:n
               a*a
           end
           Base.gc_live_bytes() - start
       end
myfunc_checkgc (generic function with 1 method)

julia> @time myfunc_checkgc(100)
  0.000166 seconds (101 allocations: 328.250 KiB)
336128

julia> @time myfunc_checkgc(100)
  0.000140 seconds (101 allocations: 328.250 KiB)
336128

julia> @time myfunc_checkgc(1000000)
  2.711456 seconds (1.00 M allocations: 3.099 GiB, 33.98% gc time)
-29624217

julia> @time myfunc_checkgc(1000000)
  2.403197 seconds (1.00 M allocations: 3.099 GiB, 31.69% gc time)
24925936

julia> @time myfunc_checkgc(1000000)
  2.410707 seconds (1.00 M allocations: 3.099 GiB, 31.21% gc time)
24845856

When the number of allocations are small enough, the garbage collector often doesn’t run over the course of the method call and you get mostly consistent results, but when the garbage collector triggers in the middle of the method call, you can’t know how much memory was actually occupied. You can end up with less occupied heap memory by the end.

The @time number would reflect how much more heap memory was occupied by the method call if the garbage collector isn’t triggered, but that’s just not something you can rely on happening. You could treat it as a very generous upper bound for 1 call.