How should I do comparison of data structures involving structs. For example:
struct Type1
a::Vector{Int}
b::Float64
end
struct Type2
c::Vector{Type1}
d::Dict{Tuple{Int, Int}, Int}
end
obj1_type1 = Type1([1,2], 3.4)
obj2_type1 = Type1([3,6,5], 2.3)
obj1_type2 = Type2([obj1_type1, obj2_type1],
Dict((1,2)=>4, (4,5)=>0)
)
obj2_type2 = Type2([obj1_type1],
Dict((1,4)=>3)
)
obj3_type2 = Type2([obj1_type1, obj2_type1],
Dict((1,2)=>4, (4,5)=>0)
)
obj4_type2 = Type2([obj1_type1],
Dict((1,4)=>3)
)
[obj1_type2, obj2_type2] == [obj3_type2, obj4_type2] #should return true
Note that obj1_type2
and obj3_type2
. Similarly obj2_type2
and obj4_type2
are same. Hence, the last statement in the above piece of code must return true.
What’s the most efficient way to compare structs like this? In my project I have some functions that return vector of such objects, and I am not sure how to write unit tests for them.
Any help is appreciated. Thanks!
By default ==
dispatches to the method ==(x, y) = x === y
(just try @less obj1_type2 == obj3_type2
from the REPL), i.e., requires x
and y
to be identical. In order to get value equality, you need to define an ==
method for your type:
julia> obj1_type2 == obj3_type2
false
julia> Base.:(==)(x::Type1, y::Type1) = x.a == y.a && x.b == y.b
julia> Base.:(==)(x::Type2, y::Type2) = x.c == y.c && x.d == y.d
julia> obj1_type2 == obj3_type2
true
Be careful though to also define a new method for hash
as otherwise two equal objects will have different hashes which could break existing code relying on this. From the docs of isequal
:
The default implementation of isequal calls ==, so a type that does not involve floating-point values generally only needs to define ==.
isequal is the comparison function used by hash tables (Dict). isequal(x,y) must imply that hash(x) == hash(y).
This typically means that types for which a custom == or isequal method exists must implement a corresponding hash method (and vice versa). Collections typically implement isequal by
calling isequal recursively on all contents.
2 Likes
Thanks @bertschi for the reply. Can you please elaborate on creating new method for hash
(maybe via an example). I am not sure I get this part. Also, how are dictionaries compared if they are fields in the user-defined structs? Also, what if these dictionaries have their key as some user-defined type? Thanks.
A hash function basically maps any object to a fixed size number. Small changes in the input should lead to pseudo-random changes in the output, i.e., ideally produce an almost uniform distribution on the output space no matter if the inputs are sequential, close together or already random. Hash functions have multiple applications, e.g., for implementing hash tables (dictionaries) or in cryptography.
in Julia, to define a hash function for your type, you need to provide a method hash(x::YourType, h::UInt)
which combines an existing hash h
into a new hash including the information in your type. Here is an example for your types above:
function Base.hash(x::Type1, h::UInt)
ht = hash(Type1, h) # Hash the type itself to ensure that a struct with the same data, but a different type has a different hash
ha = hash(x.a, ht)
hb = hash(x.b, ha)
return hb
end
# or shorter (for Type2)
Base.hash(x::Type2, h::UInt) = hash(Type2, hash(x.c, hash(x.d, h)))
Dictionaries implement a method for equality ==
which checks the equality of all keys and values recursively. I.e., if your user-defined type has proper equality semantics – its own method for ==
– equality on dictionaries will use that and just work.
In any case, just check the definition from the REPL:
julia> d = Dict(:a => 1)
Dict{Symbol, Int64} with 1 entry:
:a => 1
julia> @less d == d
2 Likes
Great! That’s very clear. Just one last question:
For hash
, can I define it this way:
function Base.hash(x::Type1, h::UInt)
return hash((Type1, x.a, x.b), h)
end
Thanks!
That should also work. It might lead to hash collisions though if your code uses tuples of the same form frequently:
julia> struct Type1
a::Vector{Int}
b::Float64
end
julia> function Base.hash(x::Type1, h::UInt)
return hash((Type1, x.a, x.b), h)
end
julia> hash(Type1([1,2,3], 4.0)) == hash((Type1, [1,2,3], 4.0))
true
julia> Base.hash(x::Type1, h::UInt) = hash(Type1, hash(x.a, hash(x.b, h)))
julia> hash(Type1([1,2,3], 4.0)) == hash((Type1, [1,2,3], 4.0))
false
1 Like
That’s very useful. Thanks for mentioning this package.
1 Like