"""
foo()
A function
"""
function foo()
println("hello")
end
is there an easy way to assess that these two files are “the same” in the sense that they both define identical objects bar comments, whitespaces and docstrings?
I tried with a filtered Meta.parseall but it seemed ugly so I thought I’d ask here first.
hmm thanks that won’t work; the files are not generated by me and I can’t make assumptions about what they may define
What should work is a big regex removing docstrings, comments and whitespaces and checking if the strings match. But I was wondering if there’s something nicer that could be done
This is the rough solution I have, it seems to do what I want, feedback welcome to make it better
is_code_equal(s1::AbstractString, s2::AbstractString) =
is_code_equal(Meta.parseall.((s1, s2))...)
is_code_equal(c1, c2) = (c1 == c2)
is_code_equal(e1::Expr, e2::Expr) = is_code_equal(e1.args, e2.args)
function is_code_equal(a::Vector, b::Vector)
a, b = trim_args.((a, b))
length(a) == length(b) || return false
for (ai, bi) in zip(a, b)
is_code_equal(ai, bi) || return false
end
return true
end
# expand (and remove) docstring blocks, remove linenumbernode
function trim_args(a::Vector)
r = []
for e in a
if e isa Expr && e.head == :macrocall
append!(r, e.args)
else
push!(r, e)
end
end
filter!(
x -> !(typeof(x) in (LineNumberNode, GlobalRef, String)),
r
)
return r
end
With this
s1 = """
function foo()
return 0
end
"""
s2 = """
function foo()
# hello
return 0
end
"""
s3 = """
function foo()::Nothing
return 0
end
"""
s4 = """
\"\"\"
foo
Function
\"\"\"
function foo()
# abc
return 0
end
"""
is_code_equal(s1, s2) # true
is_code_equal(s1, s3) # false (type signature difference)
is_code_equal(s1, s4) # true (docstrings are ignored)
Could you evaluate the files into different modules and then check setdiff of a list of names in the respective module namespaces, plus maybe checking typeof or fieldnames(typeof()) of those defined objects (or something similar depending on your needs)?
I’d like something very lightweight which can be conservative (it’s fine if it returns false on code that actually would have the same effect but not the other way around).
The context as to why I’m bothering with this is that in Franklin when the server is running, if a specific file is modified (utils.jl) it will trigger a full build which may be slow. If the change in utils.jl is frivolous (eg a docstring change, whitespace or comment), it’s better if that full re-build is avoided.
So I’d like something that can fairly quickly (i.e. not much more than the time it takes to read the file) assess whether the change might be significant or not.