I have a struct “parts” and a different struct (“collection”) which contains a Set of parts. I want to check if a part is contained in the Set of parts. I tried to overload the isequal function. Because in the Documentation (under Iterable Collections/ Base.in) it says […] For example, Sets check whether the item isequal to one of the elements […]. In the example below i want x_prime in X.parts to evaluate to true.
I’m running Julia 1.10.4 in a jupyter notebook on linux. Below is a (hopefully) minimal working example. Any help would be greatly appreciated.
Thanks in advance
Sven
Input:
struct part
x::Array{BigInt}
end
Base.:isequal(part1::part,part2::part) = isequal(part1.x,part2.x)
struct collection
num_of_parts::BigInt
parts::Set{part}
end
x = part([1,2,3])
y = part([2,3,4])
z = part([3,4,5])
X = collection(3,Set{part}([x,y,z]))
x_prime = deepcopy(x)
println("x' is equal to x: ",isequal(x_prime,x))
println("x is a part of X.parts: ",x in X.parts)
println("But x' isn't: ", x_prime in X.parts)
Output:
x' is equal to x: true
x is a part of X.parts: true
But x' isn't: false
The short answer is that you should also implement hash for your part struct. Citing isequal’s documentation:
This typically means that types for which a custom == or isequal method exists must implement a corresponding hash method (and vice versa).
If you add Base.hash(p::part) = hash(p.x), then x_prime in X.parts now indeed evaluates to true.
If you want some more context, Julia’s Sets are basically Dicts with Nothing values. In turn Dicts rely on hash tables. When you are then using in(x, X.parts), you’re checking that x is a key in X.parts.dict. To do this, you
Compute the hash index of x in the Dict’s hash table.
Check that this slot is not empty.
If a value in this slot indeed exists, compare it to x using isequal.
In your example the slot for x_prime not the same as that of x due to differing hashes. In fact, the spot is empty, so in will immediately return false, without ever calling isequal. So even if you had written Base.:isequal(part1::part,part2::part) = true (and did not overwrite the default hash), you would find that x_prime in X.parts evaluates to false. Now, the documentation of in does state that
Sets check whether the item isequal to one of the elements;
which is then actually a bit too simplistic, so the confusion is certainly understandable.
Firstly, it is conventional to use Capitalized names for types, i.e. Part and Collection.
Secondly, Array{BigInt} is an abstract type, which makes your code type unstable, and could hurt performance. If you use Array{BigInt, 1} or its alias Vector{BigInt}, it should help.
Thirdly, BigInts are slow. If you need them, they are fine, otherwise you can consider using native (and fast) Int.
Fourthly, in your collection type, you have a separate field, num_of_parts. Normally, you would not store that, since the number of parts is actually stored inside the parts field, you can find it by doing length(X.parts), also, unless you are planning to store 10 billion billion arrays of BigInts, it probably not necessary to make num_of_parts a BigInt