Check that two julia files lead to the same code

Let’s say I have these two files: a.jl

function foo()
  println("hello")
end

and b.jl

"""
    foo()
A function
"""
function foo()
  println("hello")
end

is there an easy way to assess that these two files are “the same” in the sense that they both define identical objects bar comments, whitespaces and docstrings?
I tried with a filtered Meta.parseall but it seemed ugly so I thought I’d ask here first.

Thanks

I would write tests that check behavior is the same and then run the suite of tests on both codes.

hmm thanks that won’t work; the files are not generated by me and I can’t make assumptions about what they may define

What should work is a big regex removing docstrings, comments and whitespaces and checking if the strings match. But I was wondering if there’s something nicer that could be done

This is the rough solution I have, it seems to do what I want, feedback welcome to make it better

is_code_equal(s1::AbstractString, s2::AbstractString) =
    is_code_equal(Meta.parseall.((s1, s2))...)

is_code_equal(c1, c2) = (c1 == c2)
is_code_equal(e1::Expr, e2::Expr) = is_code_equal(e1.args, e2.args)

function is_code_equal(a::Vector, b::Vector)
    a, b = trim_args.((a, b))
    length(a) == length(b) || return false
    for (ai, bi) in zip(a, b)
        is_code_equal(ai, bi) || return false
    end
    return true
end

# expand (and remove) docstring blocks, remove linenumbernode
function trim_args(a::Vector)
    r = []
    for e in a
        if e isa Expr && e.head == :macrocall
            append!(r, e.args)
        else
            push!(r, e)
        end
    end
    filter!(
        x -> !(typeof(x) in (LineNumberNode, GlobalRef, String)),
        r
    )
    return r
end

With this

s1 = """
    function foo()
        return 0
    end
    """

s2 = """
    function foo()
        # hello
        return 0

    end
    """

s3 = """
    function foo()::Nothing
        return 0
    end
    """

s4 = """
    \"\"\"
        foo
    Function
    \"\"\"
    function foo()
        # abc
        return 0
    
    end
    """

is_code_equal(s1, s2) # true
is_code_equal(s1, s3) # false (type signature difference)
is_code_equal(s1, s4) # true (docstrings are ignored)

If I tried to do something like this, I would start by checking whether the CSTParser package could be of help.

Could you evaluate the files into different modules and then check setdiff of a list of names in the respective module namespaces, plus maybe checking typeof or fieldnames(typeof()) of those defined objects (or something similar depending on your needs)?

I’d like something very lightweight which can be conservative (it’s fine if it returns false on code that actually would have the same effect but not the other way around).

The context as to why I’m bothering with this is that in Franklin when the server is running, if a specific file is modified (utils.jl) it will trigger a full build which may be slow. If the change in utils.jl is frivolous (eg a docstring change, whitespace or comment), it’s better if that full re-build is avoided.

So I’d like something that can fairly quickly (i.e. not much more than the time it takes to read the file) assess whether the change might be significant or not.