Designing a Paths Julep

Thanks for all the work and research on this topic @tecosaur. I don’t want to derail this conversation too much with legacy conversations, but a few important decisions and challenges with FilePaths(Base).jl were:

(1) Paths are not strings

I think there’s consensus on this, but path types should not be a subtype of abstract string. Yes, paths can implement some of the string interface, but it won’t make sense to implement all of it and subtyping abstract string makes debugging that much harder for end users.

(2) / vs *

As has been noted, I don’t think there’s an issue with using the / in filepath code, but it shouldn’t be the default, since it represents division globally. In FilePathsBase.jl, you need to explicitly import / within the context you’re using it. I’d also argue against using * since it’s very handy to do directory / filename * “.csv”.

(3) Interfaces vs Types

Most implementations have tradeoffs between flexibility, readability, performance, strictness, etc. For example, the Tuple{Vararg{String}} vs String/Substring implementations, as you mentioned. However, as an end user, I often don’t care as long as the behavior is consistent. I just want to know that I can call joinpath, get a consistent result, and pass that to another function. There have been multiple conversations on adding interfaces / protocols to Julia, so you can specify that an argument implements some set of functions/interfaces (e.g., joinpath, mkdir, write, status). The way we worked around this was to have FilePathBase.jl mirror the Filesystem API. That way, if you simply loosened the type restrictions for your library then the code would just work without explicitly depending on string, path types, etc.

(4) Non-local paths

This has it’s own set of challenges, but there are multiple remote storage solutions that could support a subset of the filesystem interface. For example, S3Path in AWSS3.jl was really handy for our applications which often saved data to S3 in specific contexts. Some things that didn’t make sense in retrospect though (e.g., directory ops, status). Again, interfaces or duck typing might be a good solution in these cases.

Anyways, I’m glad this is something the community is still interested and keep up the good work :slight_smile:

8 Likes

absolutely strong yes IMO

16 Likes

I personally don’t see why one should be so strict with keeping / division under all circumstances. It seems counter the convenience of the syntax having to import a different / or having to prefix it. And I think in well written code it should be clear from context that you’re looking at path operations and not division.

6 Likes

Took a look, it should now behave a bit more sanely:

  • You can no longer combine a profix/suffix + interpolation to produce invalid values.
    julia> ext = "."; p"path/.$ext"
    ERROR: Invalid segment in PosixPath: ".." is reserved.
    
  • Trailing slashes are now removed
    julia> x="."; p"hi/$(x)🍕/"
    p"hi/.🍕"
    
  • Some other minor bugfixes have occured

This is a flow-off effect from the currrent normpath implementation. If you try p"/" * p".." you’ll see an insufficient parents error.

2 Likes

I’m wary of the bikeshedding opportunity here but I don’t like the idea of using / for joinpath. Most simply: Base./ is defined to be division. I don’t want to use it for something unrelated.

* is used for concatenation of Strings because it is associative but not commutative. Taking seriously the algebraic structure of sequences, and the composability of Julia functions, / is defined to be division, so it should be the inverse of *, not identical to it. For example "ab" * "c" / "c" == "ab".

However, using * for concatenation on vectors is going to be confusing because so many users of other languages will expect pointwise multiplication.

So Julia needs a new concatenation operator that works on general sequences.

  • it shouldn’t be * (multiplication) or Base./ (division).
  • vcat isn’t good because it’s not an infix operator and doesn’t work right on Strings or Tuples or Sets.
  • ++ is used for concatenation in Haskell and in PlusPlus.jl and I’m fine with that, but I think it needs to be in Base.

This is the concat operator that I want to use on Paths. Discussed contracts in `append`/`concat` function · Issue #53040 · JuliaLang/julia · GitHub.

Alternatively, if the / symbol is desired, a separate Filesystems./ would be fine with me. I just don’t want Base./ to mean two unrelated things.

3 Likes

the counterpoint being I realllllly think it would be a shame if Julia chose once again to be a special flower and reject overwhelmingly consistent & near universal precedent for the sake of pseudo-mathematical ideological “purity”

20 Likes

/ would have the most natural look for paths, but I understand the reservations against it.
* is a common wildcard in paths, and I’d dislike it for path concatenation.
+ would have been OK for me if it were used for string concatenation - but in Julia, it is not.

However in Julia we are not limited to ASCII, and there already exist some operators which are not. So we could take some character which is visually similar but distinctive from /. Or something else, which looks like a separator. E.g. one of the following:

⌠ ⫽ ⦙ ⧘ ⎛ ⎡

My favorite is - p"C⫽user⫽eben60⫽documents" looks like a path, but distinct enough from the representation of the same path as a String.

I’m afraid I’m unfamiliar with this near universal precedent. What is it?

1 Like

See Designing a Paths Julep - #31 by MilesCranmer for some other examples: Python, C++, Ruby, Nim (plus the more obvious ones like shells, Git, browsers, cloud storage, etc.)

Thanks for the list – the C++/Ruby/Nim examples I’d missed.

Like I said though, /a/b/ in a shell is a string interpreted as a path, not an operation on two path objects. Likewise in Julia that string would be specified "/a/b" or as a path literal p"/a/b/", distinct from applying the division operator applied to two path objects /a / b.

In Unix, / is not the symbol for joining paths: it’s the symbol that delimits path components inside a string that represents a path.

But regardless of whether / is the symbol used, Base./ is specifically division: Filesystem./ would best be a different function imo.

3 Likes

Given the fact that / is the actual symbol for joining paths (even on Windows), it seems strange to claim that it is not a natural choice for an operator.

I think your objection is rather an argument for changing the path separator itself, which is a bit of a long shot.

3 Likes

I don’t like all these unicodes:

  • it is too hard to remember.
  • it is not visual appealing.
  • use functions like os.path.join is more straightforward.
4 Likes

Small gripe: A number of directory operations in S3 require that the key be ended with / (list_objects, for example).

I know object shortage and file paths are different things, but removing trailing separators would make it difficult to use the Path API to deal with S3 keys.

1 Like

It also makes a difference for some command line tools one could interpolate a path into. I guess a trailing separator should be left alone

4 Likes

@tecosaur are you thinking this can be used for web URLs as well?

a trailing slash in URLs can mean something different:

  1. There’s a difference between linking to a child in /url/path/base and /url/path/base/, the first links to /url/path/child but the second links to /url/path/base/child.
  2. Different static site hosting services will treat the trailing slash differently (e.g. vercel vs netlify vs github pages)

Being able to control constructing a URL path with and without trailing slashes would be nice, if this is in scope for this effort.

1 Like

I think a URL could be an AbstractPath but not a Path, there’s too many differences between filesystem paths and URLs to unify them. Like ?x=y query params and all the different options what the root can look like.

Maybe the abstract API can still be useful for both, not sure.

I suppose all of these points have been made already somewhere in this thread, but just my 2 cents on the different issues:

  1. Path joining and string concatenation are entirely different things that should not use the same operator under any circumstances: "root" * "folder" is "rootfolder", but Path("root") / "folder" is Path("root/folder").
  2. AFAIK, there is not a single language where the same infix operator is used for string concatenation and path joining. In fact, using / is completely universal (possibly up to Haskell, but </> is close enough). I would consider it completely insane for Julia not to follow precedent here. If you really can’t stomach / being used both for division and path joining, make it separately importable, although that seems silly to me. Or come up with something that at least involves /, the way that Haskell does (a macro @/ isn’t possible, right?). In any case, there would still be joinpath(p"folder", p"subfolder", "file.ext") for anyone who doesn’t want to use an infix operator.
  3. There is no such thing as string division (existing or planned) as an inverse to *, so there is absolutely no confusion with / as an operator on Paths. Nobody in the world would ever be confused by this.
  4. I don’t think Path("root") * "folder" (i.e., the * operator for Path objects) absolutely needs to be defined, but if it is defined, it should obviously be equivalent to Path("root" * "folder"). That is, it should implement concatenation, not joining.
  5. It’s nice to have a string-macro p"folder". I don’t know that that macro absolutely needs to support things like interpolation, but if it does, p"$root$directory$file" should be exactly the same thing as Path("$root$directory$file"). Under no circumstances is interpolation an alternative to path joining. I think all the semantics of p"folder" are pretty clear if you remember it as a shorthand for Path("folder")
  6. It would definitely be possible to have p"folder" / ("filebase" * ".ext"). It might be okay to have p"folder" / "filebase" * ".ext" without parenthesis if * already has higher precedence than /, or if we want to allow * for Path and String objects. Of course, p"folder" / "$(filebase).ext" is always okay, with the caveat:
  7. While combining path joining and string concatenation is obviously possible, for file extensions in particular, those should be part of the Path API, and should be something like p:folder" / "filename" |> with_extension("ext") to change or add extensions. With a well-designed Path API, the actual need for string interpolation or concatenation should be extremely minimal.
  8. p"relative_path" / p"absolute_path" should not be an error, but simply return p"absolute_path". This is what enables cwd / file to save a file with an absolute or relative path inside cwd, without having to make manual distinctions. I would note that, e.g., Python’s pathlib is very well-designed and has thought about details like that. It behooves us to learn from that design and the design of other proven solutions. I would strongly recommend following pathlib’s design overall, with the obvious change that in Julia, we need to design around functions, not OOP class methods.
10 Likes

I don’t think I agree with that. You can get your behavior by calling realpath(path) along the way, which will either get cwd-concatenated for a relative path or just return the absolute path as is (minus normalization). This will usually happen when passing paths off to the OS to actually do stuff with them.

But when I’m joining paths, I consider it a footgun to be able to join a path with an absolute path which then destroys the original path information. As a compromise, I’d maybe be ok with joinpath(p1, p2; join_abs = true) or so to specifically enable this in certain circumstances.

Another option could be adding realpath(path; cwd = ...) so you don’t actually have to cd somewhere first.

2 Likes

One of the signs that there’s a missing abstraction in the first place is the name joinpath, so really, the name should be join, though the old name is probably also needed for compatibility?

In theory, yes, but join already exists, and is documented as

Join any iterator into a single string, inserting the given delimiter (if any) between adjacent items.

I’m not sure if that’s compatible with Path objects, but that would be something to discuss.

In any case, we’re not looking to get rid of the existing joinpath (for strings), so that should definitely be extended to Path objects.