Ad-hoc equality test for struct after definition

I am trying to streamline some unit tests. Most of them work by generating a value and comparing it to a known value. I want something like ==, but I only want to define it in the tests, not the package source, since equality is not really useful for the struct per se, just unit testing (if I wanted to define it in general, I would use something like AutoHashEquals.jl).

So I just want to define an ad-hoc, ==-like comparison operator which does the following:

  1. if arguments have the same type which is a composite type, disregarding parameters, just compare all fields with ==
  2. otherwise compare the values themselves with ==.

Currently I am using

@generated function ≂(x, y)
    if !isempty(fieldnames(x)) && x == y
        mapreduce(n -> :(x.$n == y.$n), (a,b)->:($a && $b), fieldnames(x))
    else
        :(x == y)
    end
end

which works OK, except that I want it to be less strict about parametric types, eg

struct Foo{T}
    a::T
    b::T
end
Foo(1,2) ≂ Foo(1.0,2.0) # should be true, currently false

How can I do this? Also, I don’t insist on a generated function: this is what I could come up with, but better suggestions are appreciated.

Why do you use @generated functions?

This line confuses me, why do you need the && part if you have it covered by the else clause?

EDIT: just deleted my proposal for a faster version , If you remove && x == y your code works as you intended

2nd Edit: I finally understand @generated functions :slight_smile:
if you follow my first suggestion you might compare too different types as equal if they have the same number of fields
with the same name.
This solves the issue:

@generated function ≂(x, y)
    if !isempty(fieldnames(x)) && (x.name == y.name)
        mapreduce(n -> :(x.$n == y.$n), (a,b)->:($a && $b), fieldnames(x))
    else
        :(x == y)
    end
end

in a generated function anything that is not quoted is just its inferred datatype, got it now.

4 Likes

Came here for the same reason. This is pretty cool. Maybe it should be included in Test somehow, the need for comparing only in the tests is pretty general. Like a AutoHashEquals just for Test.

The following behavior seems weird:

struct Bar
    a::Int
    c::Vector{String}
end
x = [Bar(i,string.(1:i)) for i in 1:5]
y = [Bar(i,string.(1:i)) for i in 1:5]
x ≂ y # -> false!
all(x .≂ y) # -> true

Also, why not replace the == in @TsurHerman’s solution with a to allow for nested structs:

        mapreduce(n -> :(x.$n ≂ y.$n), (a,b)->:($a && $b), fieldnames(x))

I might be wrong.

Try something like

@generated function ≂(x, y)
    if Base.isstructtype(x) && Base.isstructtype(y) && (x.name == y.name) 
        mapreduce(n -> :(x.$n ≂ y.$n), (a,b)->:($a && $b), :(true), fieldnames(x))
    elseif (x <: Array) && (y <: Array)
        :(size(x) == size(y) && all(y .≂ x))
    else
        :(x == y)
    end
end

which of course still does not catch all cases. A properly implemented version could dispatch on type, and only use the generated function where it is need (composite types).

I find these things useful so I may package it up when I have time.

3 Likes

Awesome, I’ll use that for sure.

I 'll check this version later tonight. checked, it works for me!

and thank you.

I’ve obviously been thinking about this 24/7 for over three years because for unit testing I have to check arbitrary structs coming over a message queue. I didn’t see the above post until today but I thought I would share my solution. It deals with approx equal for subtypes of AbstractFloat but == for almost everything else. It’s output is a list of true/false values as to whether the test has passed for each field.

I am sure I should be making better use of multiple dispatch instead of all those horrible if statements. But as it’s only for unit testing I’ll put it where people don’t normally look.

test = Vector{Bool}(undef,0)
function chkFieldNames!(x::S, y::T, test::Vector{Bool}, icount=0) where {S, T}
    atol = 1e-5 # Resisted temptation to make an argument that is recursively passed around.
    tx = typeof(x)
    ty = typeof(y)
    if tx == ty
        for i in fieldnames(tx)  
            gx = getfield(x, i)
            gy = getfield(y, i)
            tgx = typeof(gx)
            if isprimitivetype(tgx)
                println("Primitive type: ", tgx)
                if tgx <: AbstractFloat
                    push!(test, isapprox(gx, gy, atol=atol))
                else
                    push!(test, gx == gy)
                end
            elseif tgx <: AbstractArray
                println("AbstractArray type: ", tgx)
                if ndims(gx) == ndims(gy)
                    push!(test, isapprox(gx, gy, atol=atol))
                else
                    push!(test, 0)
                end
            elseif tgx <: String
                println("String type: ", tgx)
                push!(test, gx == gy)
            else
                println("Non-primitive type: ", tgx)
                chkFieldNames!(gx, gy, test, icount)
            end
        end
    else
        println("Unequal types: ", tx, " and ", ty)
        push!(test, 0)
    end
    icount += length(test)
    println("Count = ", icount)
    return test
end           

uhhh, does your function deal with a data structure that may contain itself? Or, more generally, if you look at each object as a vertex and each field of a composite type as an edge, do your function deal with cyclic graphs?

@Henrique_Becker Thx for your interesting questions. Russell’s paradox grew from sets that contained themselves but it had not occurred to me that Julia could support a self-referential struct. So ‘no, not intended’ is the answer to the first question.

More generally, I would see each vertex as a type (in XML-speak this would either be a “simple-type” or a “complex-type”) and each leaf node (or leaf vertex) as a “simple-type” (not necessarily a Julia primitive type, e.g. String is not a primitive type). The edges are the relationships, i.e. “complex-type A is parent of simple-type B” or conversely “a child-of” relationship when thinking in the other direction. Apologies for speaking XML, I hope you can understand what I’m trying to say. (I was marooned on planet XML for a while.)

Furthermore, if two complex-types (i.e types that have child types, either complex-types or simple-types) happen to have the same type of child, this is not the same as saying they have the same leaf node and therefore does not imply the struct is cyclic. At least, that is my understanding and it’s on that basis that I have hacked this paltry function together. I.e. “no” to your second question as well.

Julia allows defining self-referential structs (see Manual > Constructors > Incomplete Initialization). Not that you need them to have a cycle, you can just do:

julia> a = []
Any[]

julia> push!(a, a)
1-element Array{Any,1}:
 1-element Array{Any,1}:#= circular reference @-1 =#

In other words, any parametric container that can take Any as type parameter can contain itself. You can see Julia printing utilities are already prepared to deal with such cases.

In your case, your function has no problem with recursive arrays because your if does not recurse over AbstractArray, but it does so for the example of self-referential type found in the manual.

julia> mutable struct SelfReferential
           obj::SelfReferential
           SelfReferential() = (x = new(); x.obj = x)
       end

julia> sr = SelfReferential()
SelfReferential(SelfReferential(#= circular reference @-1 =#))

julia> test = Vector{Bool}(undef,0);

julia> chkFieldNames!(sr, sr, test)
Non-primitive type: SelfReferential
Non-primitive type: SelfReferential
Non-primitive type: SelfReferential
...
Non-primitive type: SelfReferential
Non-primitive type: SelfReferential
Non-primitive type: SelfReferentialERROR: StackOverflowError:
Stacktrace:
 [1] poptask(::Base.InvasiveLinkedListSynchronized{Task}) at ./task.jl:704
 [2] wait at ./task.jl:712 [inlined]
 [3] uv_write(::Base.TTY, ::Ptr{UInt8}, ::UInt64) at ./stream.jl:933
 [4] unsafe_write(::Base.TTY, ::Ptr{UInt8}, ::UInt64) at ./stream.jl:1005
 [5] unsafe_write at ./io.jl:622 [inlined]
 [6] write at ./io.jl:686 [inlined]
 [7] print at ./show.jl:210 [inlined]
 [8] show_sym(::Base.TTY, ::Symbol; allow_macroname::Bool) at ./show.jl:1086
 [9] show_sym at ./show.jl:1085 [inlined]
 [10] show_type_name(::Base.TTY, ::Core.TypeName) at ./show.jl:591
 [11] show_datatype(::Base.TTY, ::DataType) at ./show.jl:618
 [12] show(::Base.TTY, ::Type) at ./show.jl:497
 [13] print(::Base.TTY, ::Type{T} where T) at ./strings/io.jl:35
 [14] print(::Base.TTY, ::String, ::Type{T} where T, ::Vararg{Any,N} where N) at ./strings/io.jl:46
 [15] println(::Base.TTY, ::String, ::Vararg{Any,N} where N) at ./strings/io.jl:73
 [16] println(::String, ::Type{T} where T) at ./coreio.jl:4
 [17] chkFieldNames!(::SelfReferential, ::SelfReferential, ::Array{Bool,1}, ::Int64) at ./REPL[4]:28
 [18] chkFieldNames!(::SelfReferential, ::SelfReferential, ::Array{Bool,1}, ::Int64) at ./REPL[4]:29 (repeats 37315 times)

So, it is good to keep this in mind. With sufficiently exotic types your function will break.

1 Like

:grinning: That’s great! Many thanks! I had forgotten about SelfReferential types but now you come to mention I did read it in the manual a few years ago, didn’t completely understand the concept, and have since blanked it. Luckily, my little function is only for unit testing messages I have designed myself (but which other software may have implemented), and which are sent over a message queue. So I can safely say I won’t hit that limitation. I could put a guard in to stop it self-referencing too deeply I suppose. Or hold up a placard to say “NO” to “Self Referencing Types”! (Even if it does sound discriminatory.)

1 Like

Just for the sake of it. This is what has gone into my runtests.jl for checking whether two structs are the same (or nearly the same if they have Floats). It’s ugly, doesn’t cover all types, but it only takes a few μs to run. (You don’t need to initialize test outside the function anymore.

function chkFieldNames!(x::S, y::T, icount=0, depth=0; test::Vector{Bool} = Vector{Bool}(undef,0), max_call=50) where {S, T}
    atol = 1e-5 # Resisted temptation to make an argument that is recursively passed around.
    if icount == 0 && depth == 0
        global test = Vector{Bool}(undef,0)
        println("Test initialised")
    end
    tx = typeof(x)
    ty = typeof(y)
    if tx == ty && depth < max_call
        for i in fieldnames(tx)  
            gx = getfield(x, i)
            gy = getfield(y, i)
            tgx = typeof(gx)
            if isprimitivetype(tgx)
                #println("Primitive type: ", tgx)
                if tgx <: AbstractFloat
                    push!(test, isapprox(gx, gy, atol=atol))
                else
                    push!(test, gx == gy)
                end
            elseif tgx <: AbstractArray
                #println("AbstractArray type: ", tgx)
                if ndims(gx) == ndims(gy)
                    push!(test, isapprox(gx, gy, atol=atol))
                else
                    push!(test, 0)
                end
            elseif tgx <: String
                #println("String type: ", tgx)
                push!(test, gx == gy)
            else
                depth += 1
                #icount != 0 || depth == 1 && println("Non-primitive type: ", tgx)
                chkFieldNames!(gx, gy, icount, depth, test=test)
            end
        end
    else
        if depth >= max_call 
            @warn "Run-away search -- self referential types?"
            return test
        else μ
            println("Unequal types: ", tx, " and ", ty)
            push!(test, 0)
        end
    end
    icount += length(test)
    #icount != 0 && println("Count = ", icount)
    return test
end           

Wouldn’t this fail for parametric types having different parameters? eg

struct Foo{T}
    x::T
end

then comparing Foo(1.0) and Foo(1). Also, tx === T and ty === S, so I am not sure what motivates the use of typeof.

That said, I would not worry about recursive structures etc unless your application has them. All concepts of equality are somewhat ad hoc, it really depends on what you need.

@Tamas_Papp Hi, thank you! Use-case: I am receiving messages from sensors and I need a way to unit test my system. The source is a sensor system (a sort of SCADA) and lives on another computer. It sends messages over 0MQ into my code. For the sake of unit testing I need to do some round trips, ie. sending a message to the SCADA and receiving it back. So all message types will be concrete (they are similar to ROS message types but I am not using ROS).

The function takes two different composite types (the message types) and outputs a Vector{Bool}. If there is a single false in that vector the unit test fails. That’s all. I have written it so that I can find out which field is the culprit.

Sorry the code is so badly written! I’ll get the hang of Julia one day. Oh and it never occurred to me toe actually use S and T directly. :astonished: I’ll get rid of typeof().