Best way to store integer partitions

This is just a toy example but I was wondering what the best (memory and performance wise) way is to store partitions of a fixed integer n, i.e. lists of positive integers which sum up to n.

Currently, I’m using Array{Int8,1} for the partitions and Array{Array{Int8,1},1} to store all the partitions. But I think this is not optimal.

julia> @time P = partitions(10)
0.000011 seconds (53 allocations: 5.328 KiB)

julia> @time P = partitions(90)
4.301507 seconds (56.63 M allocations: 6.239 GiB, 34.52% gc time)

This is quite a lot of memory allocated compared to what I actually have to store. The number of partitions of 90 for example is 56634173, so this is much less than 56634173*90/1000^3 = 5.1GiB of data (ignoring pointers etc.).

What’s the best way to do this?

If this fits your needs better:

 n = 100: 167.598273 seconds (15 allocations: 4.813 KiB)

then look at
https://github.com/OpenLibMathSeq/IntegerSequences.jl/blob/master/src/Partitions.jl

Interesting, thank you. But I fear this is useless. This implementation does not store the partitions, it just prints (visits) them. In fact, in the code for the benchmark you find the comment “# – first comment out println in VISIT”.

This would be equivalent to not push the partitions into the array. If I just do the visit, I’m a bit more efficient:

julia> @time P = partitions(100)
0.250746 seconds (8 allocations: 576 bytes)

But that’s not the point.

What I don’t really understand is that you show the results from @time but that gives the total number of allocations in the algorithm that creates P but doesn’t say much about the size of P itself. Maybe you just do a bunch of non-needed allocations in partitions?

1 Like

How can I get the actual size of P? I tried

julia> P=partitions(2)
2-element Array{Array{Int8,1},1}:
[2]
[1, 1]

julia> sizeof( P)
16

julia> Base.summarysize( P)
139

So, maybe it’s 16 bytes for 3 8-bit integers plus pointers etc!? I don’t know what summarysize is.

I checked allocations using julia --track-allocation=user and as far as I can tell the only allocations came from the push!(P,part) for each partition found.

Instead of storing things as [[1,1,1], [1, 2], [3]] you could store it as

data = [1, 1, 1, 1, 2, 3]
offsets = [1, 4, 6, 7]

and then you get the offsets at i as data[offsets[i]:offsets[i+1]-1]. Would save you all the overhead from a bunch of small arrays.

8 Likes

This sounds like a very good idea, cheers.