Profiling allocations?

Hello,

I wrote a simple function recursively and when testing it I found that it allocates a lot of memory.

julia> @time brute_force(10,63)
[...]
 32.055675 seconds (392.13 M allocations: 8.765 GiB, 28.16% gc time)

I thought this probably was due to all those function calls, so I wrote the same, but iteratively, and to my surprise the allocated memory was even larger:

julia> @time brute_force_it(10,63)
 41.741401 seconds (522.84 M allocations: 11.686 GiB, 29.09% gc time)

I have some possible candidates for these allocations, but since I don’t know exactly what triggers memory allocation in Julia, I want to avoid guessing and would like to profile the code for allocations. What is the recommended way to get this information?

(If I write this iterative code in Fortran/C I know that the memory allocations would be minimal, so I want to understand what Julia is doing)

Many thanks,
AdV

1 Like

The manual has a short section on this:

https://docs.julialang.org/en/v1/manual/profile/#Memory-allocation-analysis-1

1 Like

You can also try TimerOutputs.jl package. With proper usage it can give you good insights of your code performance.

2 Likes

Thanks. I tried the way specified in the manual (I had overlooked this, sorry), and while it is a bit cumbersome, it works nice and gave me good allocation profiling for each line in the function of interest.

It turns out that most of the allocation comes from innocent-looking lines like:

c += 1

but where that c is of type BigInt.

So, (and perhaps this should be a different question somewhere else), do you know of any package to work with really big integers but without the overhead of BigInt’s? (I don’t need arbitrarily big ints and all the overhead that comes with it. I know the maximum numbers I have to deal with, so it would be sufficient to specify that I need, say, 50 digits, and then all the numbers in the code will be in that range, and if not, it would be sufficient to have wraparound behaviour as for the predefined integer types).

Any pointers appreciated
AdV

1 Like

Try the forum search, I remember a couple of really long discussions about different ways to tackle BigInt’s in a highly performant ways - unfortunately I didn’t understand any of it as I never work with large integers, so I couldn’t summarise it for you, but there’s a lot of stuff out there.

BitIntegers.jl has a Int256 with a typemax of 57896044618658097711785492504343953926634992332820282019728792003956564819967. I haven’t used it myself, so I can’t speak to the performance; also be aware of the caveats in the README. Are you sure that the 39 digits of Int128 are insufficient for your use-case? If you’re not sure, you could also use SaferIntegers.jl to see if any errors are thrown. There may be other packages in this space that I’m not aware of.

1 Like

Many thanks for the pointers. BitIntegers.jl has Int256, Int512 and Int1024 types! That’s great. I think I need Int512, and the first test was very nice (for the recursive function the allocated memory, and running time, went down dramatically).

With BigInt:

julia> @time brute_force(10,63)
:BigInt[59, 53, 51, 48, 47, 43, 42, 41, 37, 35, 33, 31, 26, 25, 23, 22, 20, 19, 18, 17, 14, 9, 6, 5, 1] 9.849302918817908e17 0
 32.820850 seconds (392.13 M allocations: 8.765 GiB, 27.37% gc time)
false

With Int512:

julia> @time brute_force(10,63)
:Int512[59, 53, 51, 48, 47, 43, 42, 41, 37, 35, 33, 31, 26, 25, 23, 22, 20, 19, 18, 17, 14, 9, 6, 5, 1] 9.849302918817908e17 0
  5.154979 seconds (5.38 k allocations: 331.125 KiB)
false

For the iterative function I still need to check what is going on, because the running time goes down from 52 to 24 seconds, but the allocated memory actually increases to 58GiB.

But overall very promising, thanks.

(And yes, I do need all those digits :slight_smile: Well, at some point I will have to rethink the algorithm and perhaps then I can do it some other way, but at this moment I need to perform accurate addition/subtraction of really big numbers, with totals of about 100 digits).

Depending on what you do more exactly and how much output you have in the end, an option can be to do the computations with modular arithmetics, for a modulo that fits in Int64. The obvious drawback is that you will only get the result up to this modulo, but if you repeat it a number of times with different modulo you can recover the full result with the Chinese Remainder Theorem. This trick was used to compute the 171 digit number of legal positions on a 19x19 go board, a very memory-intensive computation. See section 5.5 of https://tromp.github.io/go/gostate.pdf for details.

1 Like

Interesting. I will check it up. Many thanks