What is the most efficient way to copy a tree *with shared children*? While this is technically a graph rather than a tree, here there is a clear root node and directionality, so it’s more intuitive to think of it as a tree with shared children nodes.

I am attempting to go from expression trees to expression graphs in SymbolicRegression.jl (pull request here).

Here is a simplified version of my data structure for a tree (full version here):

```
mutable struct Node
degree::Int # 0 for constant/variable, 1 for cos/sin, 2 for +/* etc.
val::Float32 # If is a leaf, this stores the value
op::Int # enum over operators
l::Node # Left child node. Only defined for degree=1 or degree=2.
r::Node # Right child node. Only defined for degree=2.
Node(val::Float32) = new(0, val)
Node(op::Int, l::Node) = new(1, 0f0, op, l)
Node(op::Int, l::Node, r::Node) = new(2, 0f0, op, l, r)
end
```

One can create a tree that has multiple nodes with the **same** child node. For example, the expression

\cos(x - 3.2 y) + \cos(2 (x - 3.2 y)) could be stored as the following tree:

Here, both the `cos`

on the left branch and the `*`

on the right branch would link to this shared `-`

node. Now, since this is implemented as a tree, how should this be copied? The normal way of copying this tree would be:

```
function copy_node(tree::Node)::Node
if tree.degree == 0
Node(copy(tree.val))
elseif tree.degree == 1
Node(copy(tree.op), copy_node(tree.l))
else
Node(
copy(tree.op),
copy_node(tree.l),
copy_node(tree.r),
)
end
end
```

The problem with this is that it breaks the topology: shared child nodes are duplicated! For example:

```
base_tree = Node(1, Node(1f0)) # e.g., if 1 is cos, cos(1.0)
tree = Node(1, base_tree, base_tree) # e.g., if 1 is +, cos(1.0) + cos(1.0)
objectid(tree.l) == objectid(tree.r) # true!
ctree = copy_node(tree)
objectid(ctree.l) == objectid(ctree.r) # false!
```

So, I use the following trick with `IdDict`

:

```
function copy_node(tree::Node, id_map::IdDict{Node,Node})::Node
get!(id_map, tree) do
if tree.degree == 0
Node(copy(tree.val))
elseif tree.degree == 1
Node(copy(tree.op), copy_node(tree.l, id_map))
else
Node(
copy(tree.op),
copy_node(tree.l, id_map),
copy_node(tree.r, id_map),
)
end
end
end
```

what this basically does is store a dictionary mapping `objectid(node) => copied_node`

. Thus, if a particular node (referenced by its ID) already has been copied, it simply returns that copy instead. This works!

```
ctree = copy_node(tree, IdDict{Node,Node}())
objectid(ctree.l) == objectid(ctree.r) # true
```

However, this is about significantly slower than a normal copy. I don’t understand the reason for this, because it’s only storing the reference information in an `IdDict`

. If anything I almost think it should be faster, since it results in fewer allocations. Is there a faster way I can copy a tree, preserving its topology?

Here’s a benchmark:

```
using BenchmarkTools
base_tree = Node(8, Node(1f0), Node(2, Node(3f0)))
tree = Node(2, base_tree, Node(2, Node(1, base_tree, base_tree), Node(2f0)))
@btime [copy_node($tree) for i=1:1_000];
# 101.292 μs (16001 allocations: 757.94 KiB)
@btime [copy_node($tree, IdDict{Node,Node}()) for i=1:1_000];
# 300.042 μs (10001 allocations: 711.06 KiB)
```

Even with fewer total allocations, for some reason this topology-preserving copy is 3x slower. Some part of this may just be constructing `IdDict{Node,Node}()`

- it seems like that part accounts for 30us. But, is there a more efficient way to do this?

Thanks!

Miles