Serializing a tree requires a huge amount of storage

This is the code I used for testing

julia> struct Tree
           left::Tree
           right::Tree
           Tree() = new()
           Tree(l, r) = new(l, r)
       end

julia> a = Tree()
Tree(#undef, #undef)

julia> for i=1:20
           a = Tree(a, a)
       end

julia> using Serialization

julia> serialize("s.dat", a)

The resulting file has size 10MB. It does not resolve the objectid to save space I guess.
Is there some trick to circumvent this issue?

We had a similar discussion recently. But interestingly there we talked about time, not space…

1 Like

Good point, they are probably the same thing. My temporary solution is keeping a dictionary of objects, and then dump this dictionary. Wondering if it can be made more elegant.

This seems pretty reasonable for a tree with 2^20 nodes?

The actual storage is 20 x (the object size of Tree type). Because we only create this many objects. They are conceptually big, but in the computer memory, they are just 20 linked addresses with a lot memory sharing.

How does it scale if you double the tree nodes? Maybe other things take up the space.

Another thing might be a hidden copy of data. Did you try to use Ref instead?

julia> a = Tree()
Tree(#undef, #undef)

julia> for i=1:21
           a = Tree(a, a)
       end

julia> serialize("s.dat", a)

shell> ls s.dat -l
-rw-rw-r-- 1 leo leo 20971537 Mar 15 19:55 s.dat

julia> a = Tree()
Tree(#undef, #undef)

julia> for i=1:20
           a = Tree(a, a)
       end

julia> serialize("s.dat", a)

shell> ls s.dat -l
-rw-rw-r-- 1 leo leo 10485777 Mar 15 19:55 s.dat

julia> objectid(a.left)
0xb6935cdf82ed648e

julia> objectid(a.right)
0xb6935cdf82ed648e

So, we have

  1. The size is exponential increasing as the depth,
  2. The object id of left and right branch is the same.

Did you check the object IDs after deserialization ?

Your struct is immutable, so has copy semantics, so indeed you have about 2^21 objects to serialise.

If you define it to be mutable, which has reference semantics, it should serialise to a small file.

2 Likes

Yes, the objectid of left and right branches after deserialization are the same.

I think immutable types do not always copy, otherwise the following code would blow up my memory:

julia> struct Tree
           left::Tree
           right::Tree
           Tree() = new()
           Tree(l, r) = new(l, r)
       end

julia> a = Tree()
Tree(#undef, #undef)

julia> for i=1:200
           a = Tree(a, a)
       end

But I do tried your suggestion of making it mutable, and the serialized file is reasonablly small.

Semantically they copy, though in this case the compiler can optimise things by using references instead (probably because you have possibly undefined fields), which is why your example doesn’t blow up.

2 Likes

Yup:

julia> mutable struct Tree
                  left::Tree
                  right::Tree
                  Tree() = new()
                  Tree(l, r) = new(l, r)
              end

julia> a = Tree()
Tree(#undef, #undef)

julia> for i=1:20
           a = Tree(a, a)
       end

julia> using Serialization

julia> serialize("s.dat", a)

julia> stat("s.dat")
StatStruct for "s.dat"
   size: 167 bytes