Base.Filesystem.samefile inconsistent and too restrictive?

Hi,

I am toying around with a simple struct type that contains a file system entry’s path and its StatStruct. Defining stat(x::FsEntry) = x.st then allows using much of Base.Filesystem’s API without repeated calls to the actual stat() of the underlying files. (The use case is interactive filtering of many file system entries via isfile, isdir,…, and then creating a hardlink shadow hierarchy for virtually reorganizing files based on their file and Exif characteristics.) So we just leverage the common pattern of

isfile(st::StatStruct) = ...
isfile(path) = isfile(stat(path))  # generic wrapper

At least one function, samefile, behaves a bit differently. First, it seems to be unnecessarily restricted to AbstractStrings in the generic case (which prohibits a “pure” injection). Also, it seems to be inconsistent in a border case.

Assume two files “A” and “B” do not exist:

julia> using Base.Filesystem

julia> samefile(stat("A"), stat("B"))
true

julia> samefile("A", "B")
false

I never made a PR here, and wanted to test drive it in this forum. Any idea if the following change seems feasible?

Change from current =

# samefile can be used for files and directories: #11145#issuecomment-99511194
samefile(a::StatStruct, b::StatStruct) = a.device==b.device && a.inode==b.inode

"""
    samefile(path_a::AbstractString, path_b::AbstractString)
Check if the paths `path_a` and `path_b` refer to the same existing file or directory.
"""
function samefile(a::AbstractString, b::AbstractString)
    infoa = stat(a)
    infob = stat(b)
    if ispath(infoa) && ispath(infob)
        samefile(infoa, infob)
    else
        return false
    end
end

to the new =

"""
    samefile(patha, pathb) -> Bool
Check if `patha` and `pathb` refer to the same existing file or directory.
"""
function samefile(sta::StatStruct, stb::StatStruct)
    ispath(sta) || return false
    ispath(stb) || return false
    return sta.device == stb.device && sta.inode == stb.inode
end

samefile(patha, pathb) = samefile(stat(patha), stat(pathb))

I wanted to stay as close to the current logic as possible and, e.g., not throw an exception in case of files not existing.

Changes:

  1. If one or both files do not exist, they are also not the same file – now in both specific and generic function variants. Also, for non-existing “A”, samefile of “A” and “A” returns false in both cases.
  2. Generic variant no longer restricted to AbstractString, just as isfile.
  3. I moved the doc string to the specific variant, like ispath et al. do.
  4. I tried to stick with the naming there also: doc string contains “path”-named args; args themselves are called "st"s.
  5. The generic function’s args are called “path”-y, as do their other API counterparts.
  6. Personally, I would probably prefer (!ispath(sta) || !ispath(stb)) && return false (i.e., have error exit strongly connected to the condition, instead of the assert-like style above), but the above seems to be more in line with code I see.
  7. I like an explicit return at a function’s end, but could part with it.

Did I overlook something? Is this too small a fish to fry? Any other how-to-PR hints? (I found Making a first Julia pull request | Katharine Hyatt - maybe you have additional suggestions?) (And frankly, if the change were to be OK, I would more than gladly take any help in submitting this from someone who is steeped in the process, has already forked, etc.)

Thanks!

2 Likes

Related issue: Export samefile, make it smarter for nonexistent inputs · Issue #9436 · JuliaLang/julia · GitHub

It does seem like the false output was the desired one

Yep, let’s just leave it that if EITHER file doesn’t exist, (including the case where BOTH don’t exist) we return false.

Based on the documentation being specific about ::AbstractString arguments, and the code update being done only on that method, it seems like the StatStruct-accepting method is intended to be an internal one. And being internal, it’s just assumed that a StatStruct would be for a valid file - though that assumption seems a bug waiting to happen in itself.
(Honestly IMO, the major problematic part here is that stat returns a struct with 0 values for non-existent files, rather than erroring or returning nothing. That seems a pretty weird choice, made worse by the fact that it’s undocumented.)

It’s worth at least opening an Issue about your suggestions, with what you’ve posted here. I’ll save further thoughts for discussion on the issue (please leave a link here if/when you open the issue).

2 Likes

Agree that this seems like a bug/oversight and returning true for two non-existent files is wrong.

1 Like

Thanks for your feedback!
@digital_carver – I opened issue 46830

(Not sure if the separate subissues of samefile-result-on-inexistence and samefile-type-restrictions go together well, or should have been separated; will learn…)