How to obtain the result of a diff between 2 files in a loop?

Hello,

I would like to compare in a loop files and to obtain the result through an integer (1=OK, 0=FALSE).

I use the following code:

for n in 1:10
   run(`diff file_a file_b`)
end

If the 2 files file_a and file_b are identical, the code is OK. If the 2 fils are different, the code gives an error.

Is it possible to have a “clean” code wich gives an int or a boolean in function of the result of the shell diff command in the previous loop ?

Or is there another way to code such test ?

Thanks.

You’ll want to use success to test command success, rather than run:

success(`diff file_a file_b`)

Note that if all you want to know if whether they differ or not, you may want to use the UNIX cmp command instead:

if success(`cmp --quiet file_a file_b`)
    # they are the same
else
    # they are different
end
3 Likes

Thanks ! :slight_smile:

--quiet is a GNU-ism; if you want it to work with any POSIX cmp you should use -s.

Of course, this won’t work on Windows. It would be pretty easy to write an equivalent Julia function, though, like:

function filecmp(path1::AbstractString, path2::AbstractString)
    stat1, stat2 = stat(path1), stat(path2)
    if !(isfile(stat1) && isfile(stat2)) || filesize(stat1) != filesize(stat2)
        return false # or should it throw if a file doesn't exist?
    end
    stat1 == stat2 && return true # same file
    open(path1, "r") do file1
        open(path2, "r") do file2
            buf1 = Vector{UInt8}(undef, 32768)
            buf2 = similar(buf1)
            while !eof(file1) && !eof(file2)
                n1 = readbytes!(file1, buf1)
                n2 = readbytes!(file2, buf2)
                n1 != n2 && return false
                0 != Base._memcmp(buf1, buf2, n1) && return false
            end
            return eof(file1) == eof(file2)
        end
    end
end

Not only is this more portable, it is also much faster than executing an external program like cmp. On my Mac laptop it is about 1000× faster in the common case where the file sizes differ, and about 60× faster for a 20kB file when the files match (so that the whole files need to be read).

(Python provides filecmp.cmp in its standard library, I wonder if Julia should too?)

10 Likes

Probably a good idea. Worth opening an issue for.

1 Like

Are you sure this should be in stdlib? Just a package would be fine IMO.

1 Like

True, it’s on the border between basic enough to include in stdlib and not-needed-often-enough to belong in stdlib. On the other hand, what I like about it is that the API is absolutely crystal clear and won’t change—comparing two files to see if they’re bit-for-bit identical is always going to mean the same thing.

1 Like

Until feature requests are opened for perfectly reasonable extensions :wink: Eg even the dead simple shell command cmp can ignore initial regions, compare up to a given number of bytes, etc.

I don’t indend to be nitpicking here; just given how labor intensive extending anything in Base & the standard libraries is (ramifications, reviews), I am very much against making something an stdlib unless there is a compelling reason.

In any case, the effort can certainly start as a package.

1 Like

All fair points!

There are a couple of variations of what it can mean for files to be “identical”:

  • a and b are different files, that happen to contain identical data
  • a and b are hard-linked to the same inode
  • a is a symlink to b, or vice versa
  • a and b are actually the same file (but the path might be different, e.g. due to symlinked directories).

All of these scenarios would count as “bit-for-bit identical” but a filecmp function should offer the possibility to distinguish between them.

I wrote some code for this kind of thing once upon a time. MIT license, if anyone wants to copy from it: https://github.com/perrutquist/DeduplicateFiles.jl

4 Likes

How can I capture the output of diff (the lines that are different) in a Julia variable?

I tried using read(command::Cmd, String) but it fails since diff exits with an error whenever the two files are different.

Edit: I found a way:

diff_output = read(ignorestatus(`diff $file1 $file2`), String)

In general people looking at this question are likely interested in DeepDiffs.jl
Which is much older than this question, so I am surprised it isn’t mentioned already

Applying it to files can be done via comaring eachline(file1), eachline(file2)

3 Likes

Or also

3 Likes