Better handling of pathnames

jessymilare · March 31, 2020, 10:16am

I’m new to Julia, loving it. As I’m used to Common Lisp, I kinda miss a pathname structure. Of course I understand strings are easier to deal with, yet they lead to mistakes. The package FilePaths.jl is nice, yet it is weird to have to call string(path) when using other packages like OdsIO or CSV. I think it’s not a nice practice to define methods belonging to another one’s package for another one’s classes (e.g. csv_read(file::Path; kwargs...) = CSV.read(string(file); kwargs...)), since a third person could redefine them another way.

One approach would be to create methods for functions that expects strings, e.g.

csv_read(file::AbstractPath; kwargs...) =
    CSV.read(string(file); kwargs...)
csv_write(file::AbstractPath, table; kwargs...) =
    CSV.write(string(file), table; kwargs...)
# etc

I don’t like that very much.

For that reason, I was thinking about creating a new package with pathnames as a subtype of AbstractString, specifying a unique string representation for each pathname (like TCL does, what a nice language!). That way, one deal with pathnames as structures and use them in methods that expects strings. Plus, it would be ease to extend this interface to, e.g., URLs, sockets, etc., and having them to function appropriately.

What do you guys/gals think?

Tamas_Papp · March 31, 2020, 10:58am

I feel the same way occasionally, even though I was complaining about it being a bit baroque the time I was using CL. I guess that’s karma for me

I think this could be a neat approach; the only caveat I see is supporting all of the string interface as the set of valid pathnames may not be closed under all string operations. Eg what happens for your type * an arbitrary string (may or may not be a pathname)?

But in any case, I think this is worth experimenting with.

rdeits · March 31, 2020, 12:51pm

By the way, you are 100% right about this. The term used in Julia for defining someone else’s function on someone else’s type is “type piracy”.

I think this would be nice. The fact that we have a function joinpath() rather than join(::AbstractPath...) feels like an indication that path-specific types could be useful.

Jordan_Cluts · March 31, 2020, 1:16pm

No discussion of Paths is complete without pointing out FilePathsBase.jl. It doesn’t solve the particular issue you’re discussing but it is worth being aware of if you’re working and thinking about solutions in this space.

davidanthoff · March 31, 2020, 4:56pm

FilePaths.jl and FilePathsBase.jl used to inherit from AbstractString, and then that was dropped at some point Drop string subtyping by rofinn · Pull Request #22 · rofinn/FilePathsBase.jl · GitHub.

I think the most helpful thing at this point would be to polish FilePathsBase, to the point where it might be ready to make it into the stdlib at some point.

stevengj · March 31, 2020, 5:12pm

I find the reasoning for making this change a bit odd. They wrote the only reason we’ve been subtyping it is to make interop with existing filesystem methods easier. Compatibility with all existing code that uses file paths does not seem like a minor benefit to me! Without it, I find this approach to be of very limited utility — it’s not practical to require all existing filesystem code (in both Base and packages) to be updated, or to require every caller to perform a conversion.

Yes, implementing a full-featured AbstractString interface does require you to implement a fair number of methods, but the benefits of compatibility are huge.

jessymilare · March 31, 2020, 5:41pm

I can’t imagine another reason for subtyping anything other than interoperability with existing code/functionality.

davidanthoff · March 31, 2020, 8:01pm

I can see that I was in favor of dropping the inheritance from AbstractString back when that change was made, but I have to admit I no longer understand why… Maybe we should revisit that?

The only reasoning I can come up with now is that maybe one wants to encourage API design where a AbstractString is not treated as a path? Say parse(x::AbstractString) would parse the content of x directly, and parse(x::Path) would load a file and parse. But of course that even works if Path inherits from AbstractString, so that is probably not a great argument.

affans · March 31, 2020, 8:45pm

Isn’t the easier way of doing this simply support ::Path in CSV or other packages that deal with it? This shifts the burden to package maintainers to have a function csv_read(::Path) but isn’t this a feature of Julia? i.e. that whether using a string or Path, it should just “work”?

heliosdrm · March 31, 2020, 9:12pm

I don’t think that CSV should add new methods to read etc. to support ::Path. Rather the contrary, I think that the advice is keeping such functions as generic as possible, such that they will “magically” work even with new types that the authors never knew of, if those types (like the proposed Path <: AbstractString) are created with a sufficiently rich interface.

That’s the message that I got from this video, I hope I interpreted it in the right way:

stevengj · March 31, 2020, 10:11pm

So, rather than implementing a couple dozen methods to implement an AbstractString interface in one package, you think it is simpler to change every package that works with files?

Out of the 3000+ Julia packages, how many do you think accept a pathname? (Also, you’ll have to extend every Base function that accepts a pathname, of which there are quite a few.)

affans · April 1, 2020, 6:08am

Hmm, I didn’t think of it this way. So if Path < AbstringString that would would be easier since many methods across many packages already accept a string.

I always thought that as a package developer if I expose a type, it should be up to other package maintainers to write methods to dispatch on that type. I guess it depends on how “abstract” the type is.

Oscar_Smith · April 1, 2020, 6:16am

The whole reason multiple dispatch works so well is that if you write generic code, and someone makes a type that matches the interface, you get code re-use that no one had to plan. Stuff like this is why people in the discourse put so much emphasis on non over-typing your functions or structs.

Tamas_Papp · April 1, 2020, 7:35am

Even if it is made to be <:AbstractString, maybe a path does not need to support all of the relevant interface. Eg it could be perfectly reasonable for * to error on results that are not valid paths, or can’t be interpreted as such.

Daneel · April 2, 2020, 6:29am

Wouldn’t it be possible to have methods for * which simply call joinpath if one of the inputs is a path? That would certainly add to the readability of path generation.

Tamas_Papp · April 2, 2020, 6:44am

Note however that they are not equivalent:

julia> "a" * "B"
"aB"

julia> joinpath("a", "B")
"a/B"

so if paths behave like strings this could be confusing.

I would recommend keeping the current behavior for both * and joinpath, and suggest that the path API is used for all path manipulations.

Daneel · April 2, 2020, 6:58am

That’s the point but you are right, it’s not a design decision without consequence.

julia> programPath = path("C:\\Programs\\My_Program");
julia> execPath = programPath * "bin" * "prog.exe"
Path: C:\Programs\My_Program\bin\prog.exe

vs

julia> programPath = "C:\\Programs\\My_Program";
julia> execPath = joinpath(programPath,"bin","prog.exe")
"C:\\Programs\\My_Program\\bin\\prog.exe"

The former is more readable in my opinion and I would expect the overlap in usage wouldn’t be frequent or unclear.

julia> programPath = "C:\\Programs\\My_Program";
julia> println("Executable Path: " * string(programPath * "bin" * "prog.exe"))
Executable Path: C:\Programs\My_Program\bin\prog.exe

Tamas_Papp · April 2, 2020, 7:02am

But this would violate basic assumptions of the AbstractString interface. You can either

have a path type <: AbstractString, then * has to do what it does for strings,
have a path type that is not a string, supports * for joinpath (and, of course, requires rewriting a ton of package code that assumes paths are strings).

You can’t have it both ways. And I don’t think that * as an alias for joinpath is worth a breaking change.

tkf · April 2, 2020, 8:08am

Other things you can have if path is not a string are specialized getindex and iterate. It’s kind of cute if you can do path[end] to mean basename(path). Things like path[end-3:end] for constructing relative path is useful sometimes.

jmkuhn · April 2, 2020, 2:42pm

I like the current behavior of FilePaths.jl where / is joinpath and * is regular string concatenation.

julia> p"/dir" / p"subdir" / "file" * ".ext"
p"/dir/subdir/file.ext"

Topic		Replies	Views
Working with path at module level? General Usage question	2	187	February 13, 2024
How do I find a type conversion function? PosixPath -> String New to Julia	7	1291	August 9, 2021
Best practice for storing data in Packages Data	5	1818	May 23, 2021
Designing a Paths Julep Internals & Design proposal , filesystem , path , rfc	145	4146	April 7, 2025
PackageCompiler @ccallable for julia function that take String's as arguments General Usage package-compiler	3	259	October 2, 2023

Better handling of pathnames

Related topics