I think this discussion has strayed a bit into the weeds since we donāt have a MWE with a clear statement of your requirements. The manual says
Although it is generally a good idea to return a fully initialized object from an inner constructor, it is possible to return incompletely initialized objects.
I donāt think your situation is exotic enough to warrant a departure from the recommendation of returning fully initialized objects from your inner constructors.
Based on my partial understanding of the high level problem you are addressing, I think there are a few possible approaches:
1.
Youāve probably already considered this, but one way to represent a tree with immutable structs is to use a few parametric types, like this:
struct Constant{T}
x::T
end
struct Feature
x::UInt16
end
struct UnivariateOp{A}
op::UInt8
arg::A
end
struct BivariateOp{L, R}
op::UInt8
l::L
r::R
end
julia> BivariateOp(0x04,
UnivariateOp(0x02,
Constant(3.14)
),
BivariateOp(0x09,
Feature(1),
Feature(2)
)
)
BivariateOp{UnivariateOp{Constant{Float64}}, BivariateOp{Feature, Feature}}(0x04, UnivariateOp{Constant{Float64}}(0x02, Constant{Float64}(3.14)), BivariateOp{Feature, Feature}(0x09, Feature(0x0001), Feature(0x0002)))
I think you could write functions that take partial trees and return a new tree with a new node attached. (You might need some types that represent incomplete nodes?) One possible downside of this approach is that if you have large trees you end up with complex, nested parametric types, which might cause poor compile times. (I havenāt really tested that.)
2.
This is pretty speculative, but I wonder if you could use an alternative approach to storing your data. You could use a vector to represent the expression tree in terms of node IDs (integers), and then you could use node IDs to look up related data like the operation, feature, or constant value.
To represent the tree
1
āāā 2
ā āāā 3
ā āāā 4
āāā 5
āāā 6
you could use a vector like this:
[2, 5, 3, 4, 0, 0, 0, 0, 6, 0, 0, 0]
The first number is the node ID for the left child of node 1, the second number is the node ID for the right child of node 1, the third number is the node ID for the left child of node 2, the fourth number is the node ID for the right child of node 2, etc. Zeros indicate that there is no child node in that location.
When you look at elements (3-1)*2 + 1
and (3-1)*2 + 2
in the vector, you see that they are both 0
, and thus node 3 is either a constant or a feature. Perhaps you could fill those elements with -1 if node 3 is a feature, and fill with 0 if node 3 is a constant, so that you can easily distinguish between feature nodes and constant nodes.
I think this representation is also capable of representing a forest of trees, like this:
1
āāā 2
ā āāā 3
ā āāā 4
āāā 5
āāā 6
7
āāā 8
āāā 9
Then your vector would be
[
2, 5,
3, 4,
0, 0,
0, 0,
6, 0,
0, 0,
8, 9,
0, 0,
0, 0
]
And then you could swap out node 6 for node 7 by mutating your vector like this:
[
2, 5,
3, 4,
0, 0,
0, 0,
7, 0,
0, 0,
8, 9,
0, 0,
0, 0
]
(Note that node 6 is now an isolated nodeāit has no parents or children.)
3.
If you want to stick with sum types, you could experiment with the SumTypes.jl package.