Splitpath/joinpath for non-native paths?

This problem highlights a problem in Base.Filesystem. The natural way to do this conversion would be to use something along this line:

joinpath(
  splitpath("a\\b\\c"; arch=Base.Arch.Windows); 
  arch=Base.Arch.Linux
) == "a/b/c"

The underlying string transformations do not depend on the architecture they are running in. This situation is akin to algorithm selection in sort.

Currently, Base.Filesystem joinpath and other functions use Sys.iswindows() and Sys.isunix() to query the current system. This is nice default behaviour, but it is obvious people need access to the other transformation functions.

(this comment not directly related to OP, so moderators can feel free to split it to different thread)

Well, the “it” in my sentence referred to URIs.escapeuri — URIs.jl currently doesn’t have any functions for handling file paths, only URI paths, so this particular issue had nothing to do with Base.

Regarding your point, though, that splitpath and joinpath are currently only for the system you are running on, I agree that in principle it would be nice to be able to handle “foreign” file paths too. Though this doesn’t seem to come up in practice very often?

(However, it shouldn’t be an issue for @thestoicone in this thread the original thread, because they are using walkdir to generate file paths, and hence their paths will always be for the native OS.)

There is another principle at work here and it is function purity. Given a parametrized specification of the OS makes the functions pure. Pure functions are easier to reason about and to formally handle.

Although the quick default of using current system needs to be there. A good principle would be for functions to be easily modified into pure functions from the caller’s location.

This essentially is why we like the ability to specify an RNG for functions. Impurity and stochasticity should be easily turned off.

Going further, even better, an automated tool which fixes such parameters (OS and RNG) and drives functions to be as pure as possible and warns of possible randomness or impurity could be extra good.

For fun, I’ll mention the recent outcome of a #gripe in Slack: there’s interest in doing take 2 of a Julep adding a path type to Base. The current design includes PosixPath and WindowsPath concrete types, and so you would be able to reason about non-native paths in this way.

2 Likes

I played around with an alternative URI and path type design a while ago, completely forgot about it. But I just uploaded to GitHub, the types are at URIs2.jl/src/types.jl at main · davidanthoff/URIs2.jl · GitHub, and the README has some more thoughts on the design (GitHub - davidanthoff/URIs2.jl).

One thing that is different in this design (right now only visible in the URI parts) is that I’m using types to distinguish different storage layouts, rather than semantic differences. I think for a path type design I would probably go the same direction these days, i.e. something like AbstractPath, and then Path stores things just as one string, and PathParts stores it as tuple that holds the parts, or something like that. I would store the semantic difference between windows and posix parts just as a Bool flag.

1 Like

Seems like more people than I thought have done experiments in this direction :eyes:. Do you think I could interest you in the latest (WIP) Path type proposal David? It currently has this type hierarchy:

AbstractPath{T}
└── PlainPath (<: AbstractPath{SubString{String}})
    └── SystemPath
        ├── PosixPath
        └── WindowsPath

And a 6-function interface (root, parent, basename, iterate, length, *).

There are also some windows particularities that I’m vaguely aware of, but not familiar with (UNC paths, shares, and other fun stuff), that might necessitate some tweaks to the design.

1 Like

I think the main question I have these days is whether using the types to distinguish between windows and posix paths is the right choice. At the moment I’m more tempted to use types to distinguish between different memory layouts for storing a path, and making the windows/posix distinction just a plain field in the type. So something like:

abstract type AbstractPath end

struct Path <: AbstractPath
  _internal::String
  _windows::Bool # Or maybe an enum or Symbol
end

struct PathParts <: AbstractPath
  _parts::Tuple{Vararg{String}}
  _windows::Bool
end

I see at least two benefits of this: First, for different use-cases different memory layouts are better, so I generally think that something that locks this down to one choice is not great. Second, I think with a type hierarchy that is based on windows/posix there is a fair chance that one ends up with heterogeneous arrays, dynamic dispatch etc., and so to me right now it seems easier to just encode whether something is a Windows path as a field.

Finally, I would probably keep it simple, and just say “this is for file paths”, and not try to design something that also covers all sorts of other paths. I think the latter leads to a fairly complicated design, with not much upside. If someone wants to add support for say S3 paths, or something like that, they can just create a new type, I don’t see a really good reason why all of that needs to be part of one type hierarchy.

Do you have a link? Happy to take a look.

3 Likes

I don’t see why the memory layout should be different?

I have trouble seeing many situations where you’d want to feed a path for a different platform into a function, so I’d expect most functions to just be defined for the current platform type (except for basic abstract path manipulations that have definitions for both) — and so I don’t see much of a risk of heterogeneous arrays, dynamic dispatch, etc.

Sure, but I do think it’s nice to provide an appropriate abstract type to subtype.

I do, this is currently very much WIP though (half-finished, started last weekend), so I’ll probably just DM it to you rather than share it publicly yet.

Just wanted to mention that one use-case is working with remote machines of a different platform. e.g. for the sake of laziness simplicity LibSSH.jl’s SFTP implementation currently assumes that the server is always running on *nix:

(though I personally don’t have a strong opinion on whether it should be a type or field as long as joinpath() does the right thing)

Log file processing, or essentially any file that might embed a filename that could have been generated on a different platform. This to me seems a very common thing.

1 Like