Better handling of pathnames

I’m new to Julia, loving it. As I’m used to Common Lisp, I kinda miss a pathname structure. Of course I understand strings are easier to deal with, yet they lead to mistakes. The package FilePaths.jl is nice, yet it is weird to have to call string(path) when using other packages like OdsIO or CSV. I think it’s not a nice practice to define methods belonging to another one’s package for another one’s classes (e.g. csv_read(file::Path; kwargs...) = CSV.read(string(file); kwargs...)), since a third person could redefine them another way.

One approach would be to create methods for functions that expects strings, e.g.

csv_read(file::AbstractPath; kwargs...) =
    CSV.read(string(file); kwargs...)
csv_write(file::AbstractPath, table; kwargs...) =
    CSV.write(string(file), table; kwargs...)
# etc

I don’t like that very much.

For that reason, I was thinking about creating a new package with pathnames as a subtype of AbstractString, specifying a unique string representation for each pathname (like TCL does, what a nice language!). That way, one deal with pathnames as structures and use them in methods that expects strings. Plus, it would be ease to extend this interface to, e.g., URLs, sockets, etc., and having them to function appropriately.

What do you guys/gals think?

7 Likes

I feel the same way occasionally, even though I was complaining about it being a bit baroque the time I was using CL. I guess that’s karma for me :wink:

I think this could be a neat approach; the only caveat I see is supporting all of the string interface as the set of valid pathnames may not be closed under all string operations. Eg what happens for your type * an arbitrary string (may or may not be a pathname)?

But in any case, I think this is worth experimenting with.

By the way, you are 100% right about this. The term used in Julia for defining someone else’s function on someone else’s type is “type piracy”.

I think this would be nice. The fact that we have a function joinpath() rather than join(::AbstractPath...) feels like an indication that path-specific types could be useful.

2 Likes

No discussion of Paths is complete without pointing out FilePathsBase.jl. It doesn’t solve the particular issue you’re discussing but it is worth being aware of if you’re working and thinking about solutions in this space.

1 Like

FilePaths.jl and FilePathsBase.jl used to inherit from AbstractString, and then that was dropped at some point Drop string subtyping by rofinn · Pull Request #22 · rofinn/FilePathsBase.jl · GitHub.

I think the most helpful thing at this point would be to polish FilePathsBase, to the point where it might be ready to make it into the stdlib at some point.

1 Like

I find the reasoning for making this change a bit odd. They wrote the only reason we’ve been subtyping it is to make interop with existing filesystem methods easier. Compatibility with all existing code that uses file paths does not seem like a minor benefit to me! Without it, I find this approach to be of very limited utility — it’s not practical to require all existing filesystem code (in both Base and packages) to be updated, or to require every caller to perform a conversion.

Yes, implementing a full-featured AbstractString interface does require you to implement a fair number of methods, but the benefits of compatibility are huge.

5 Likes

I can’t imagine another reason for subtyping anything other than interoperability with existing code/functionality.

1 Like

I can see that I was in favor of dropping the inheritance from AbstractString back when that change was made, but I have to admit I no longer understand why… Maybe we should revisit that?

The only reasoning I can come up with now is that maybe one wants to encourage API design where a AbstractString is not treated as a path? Say parse(x::AbstractString) would parse the content of x directly, and parse(x::Path) would load a file and parse. But of course that even works if Path inherits from AbstractString, so that is probably not a great argument.

2 Likes

Isn’t the easier way of doing this simply support ::Path in CSV or other packages that deal with it? This shifts the burden to package maintainers to have a function csv_read(::Path) but isn’t this a feature of Julia? i.e. that whether using a string or Path, it should just “work”?

I don’t think that CSV should add new methods to read etc. to support ::Path. Rather the contrary, I think that the advice is keeping such functions as generic as possible, such that they will “magically” work even with new types that the authors never knew of, if those types (like the proposed Path <: AbstractString) are created with a sufficiently rich interface.

That’s the message that I got from this video, I hope I interpreted it in the right way:

1 Like

So, rather than implementing a couple dozen methods to implement an AbstractString interface in one package, you think it is simpler to change every package that works with files?

Out of the 3000+ Julia packages, how many do you think accept a pathname? (Also, you’ll have to extend every Base function that accepts a pathname, of which there are quite a few.)

2 Likes

Hmm, I didn’t think of it this way. So if Path < AbstringString that would would be easier since many methods across many packages already accept a string.

I always thought that as a package developer if I expose a type, it should be up to other package maintainers to write methods to dispatch on that type. I guess it depends on how “abstract” the type is.

The whole reason multiple dispatch works so well is that if you write generic code, and someone makes a type that matches the interface, you get code re-use that no one had to plan. Stuff like this is why people in the discourse put so much emphasis on non over-typing your functions or structs.

2 Likes

Even if it is made to be <:AbstractString, maybe a path does not need to support all of the relevant interface. Eg it could be perfectly reasonable for * to error on results that are not valid paths, or can’t be interpreted as such.

Wouldn’t it be possible to have methods for * which simply call joinpath if one of the inputs is a path? That would certainly add to the readability of path generation.

1 Like

Note however that they are not equivalent:

julia> "a" * "B"
"aB"

julia> joinpath("a", "B")
"a/B"

so if paths behave like strings this could be confusing.

I would recommend keeping the current behavior for both * and joinpath, and suggest that the path API is used for all path manipulations.

That’s the point but you are right, it’s not a design decision without consequence.

julia> programPath = path("C:\\Programs\\My_Program");
julia> execPath = programPath * "bin" * "prog.exe"
Path: C:\Programs\My_Program\bin\prog.exe

vs

julia> programPath = "C:\\Programs\\My_Program";
julia> execPath = joinpath(programPath,"bin","prog.exe")
"C:\\Programs\\My_Program\\bin\\prog.exe"

The former is more readable in my opinion and I would expect the overlap in usage wouldn’t be frequent or unclear.

julia> programPath = "C:\\Programs\\My_Program";
julia> println("Executable Path: " * string(programPath * "bin" * "prog.exe"))
Executable Path: C:\Programs\My_Program\bin\prog.exe

But this would violate basic assumptions of the AbstractString interface. You can either

  1. have a path type <: AbstractString, then * has to do what it does for strings,

  2. have a path type that is not a string, supports * for joinpath (and, of course, requires rewriting a ton of package code that assumes paths are strings).

You can’t have it both ways. And I don’t think that * as an alias for joinpath is worth a breaking change.

Other things you can have if path is not a string are specialized getindex and iterate. It’s kind of cute if you can do path[end] to mean basename(path). Things like path[end-3:end] for constructing relative path is useful sometimes.

4 Likes

I like the current behavior of FilePaths.jl where / is joinpath and * is regular string concatenation.

julia> p"/dir" / p"subdir" / "file" * ".ext"
p"/dir/subdir/file.ext"
6 Likes