Designing a Paths Julep

My thinking is that joining path legs with a path separator is fundamentally a different operation than string concatenation, which is also why it is common to call it “join”.

And, as @kdheepak points out, what should happen with

p"my/path" * ".ext"

Should it produce p"my/path/.ext"?

3 Likes

Good point. I think

# equivalent to joinpath
/(::Path, ::Path) -> Path
/(::Path, ::String) -> Path

## equivalent to string concat (like file extensions)?
# [deleted] *(::Path, ::String) -> Path

# invalid
*(::Path, ::Path) -> error
*(::String, Path) -> error
/(::String, ::Path) -> error
/(::String, ::String) -> error

I think making * completely invalid for use on Path would also be a good option, to avoid ambiguity here [edit: yeah probably just do this].

Also, if relative paths is made a different type from an absolute path (which semantically seems good), you could even make it so that

# ok
/(::RelativePath, ::RelativePath) -> RelativePath
/(::RelativePath, ::String) -> RelativePath
/(::AbsolutePath, ::RelativePath) -> AbsolutePath
/(::AbsolutePath, ::String) -> AbsolutePath

# invalid
/(::String, ::RelativePath) -> error
/(::String, ::AbsolutePath) -> error
/(::RelativePath, ::AbsolutePath) -> error
/(::AbsolutePath, ::AbsolutePath) -> error

Perhaps those could just be fields of Path rather than actually distinct types.


Edit: nvm, removed the *(Path, String) - see below.

5 Likes

This is also the behavior I asked for above and it seems like @tecosaur implemented that now (so you can’t ever append an absolute path).

I’m not sure if I like *(::Path, ::String) -> Path just because I don’t think it’s likely that you have a path dir/subdir/filename to which you then append .csv, at least that has never happened to me that I did path + filename and extension in separate places. It’s more likely to do p"dir/subdir" / "somefile$ext".

Good point. Probably just better to throw an error.

I can imagine wanting to do

p"my/path/file" .* [".pdf", ".tex"]
6 Likes

I’m tempted to take a leaf from other ecosystems with more path functions, the rather nice CL library I reference in the design doc fosskers/filepaths: Modern and consistent filepath manipulation for Common Lisp. - Codeberg.org has:

> (filepaths:with-base #p"/foo/bar/baz.txt" "jack")
#p"/foo/bar/jack.txt"

> (filepaths:with-name #p"/foo/bar/baz.txt" "jack.json")
#p"/foo/bar/jack.json"

> (filepaths:with-parent #p"/foo/bar/baz.json" #p"/zing")
#p"/zing/baz.json"

> (filepaths:with-extension #p"/foo/bar/baz.txt" "json")
#p"/foo/bar/baz.json"

> (filepaths:add-extension #p"/foo/bar/baz.txt" "zip")
#p"/foo/bar/baz.txt.zip"

> (filepaths:drop-extension #p"/foo/bar/baz.json")
#p"/foo/bar/baz"

Rust has the subset: with_file_name, with_extension, with_added_extension.

Right. I haven’t implemented @/... yet, but something like what RelocatableFolders.jl does and the linked PRs to base is very much within scope — I guess we’ll see what’s requied for that when I (or you/other people! :grinning:) take a stab at implementing it.

1 Like

FWIW, I have this exact use case: writing out a data file with .parq and a metadata file of the same name with .json. There’s also the case of adding the .gz to a file that already has an extension.

1 Like

Oh, just on this example. I think you could just do:

p"/home/$user_tstr/$dir_tpath/$(name_tcustom).csv"

I wouldn’t mind it if your example worked though, to my sensibilities at least that looks reasonable to me.

1 Like

I had to write code like this yesterday which is why this example comes to mind. Let’s say I saw a PR that looked like this:

# basename = p"/home/$user_tstr/$dir_tpath/$(name_tcustom)"
# exts = [".csv", ".json"]

function map_over_extensions(basename, exts)
    for ext in exts
        path = basename * ext
        ...
    end
end

If * were used for both path concats and string concats, and if I saw this in a code review, I think I’d ask for it to be rewritten as:

function map_over_extensions(basename, exts)
    for ext in exts
        path = p"$basename$ext"
        ...
    end
end

But if / were used for paths, I’d let the original code slide. Just my 2 cents :slight_smile:


My preference:

  1. / as a means to join path
  2. not having any infix operator
  3. any other operator (++, other unicode symbols) to join paths
  4. * as a means to join path

Here are some examples of infix operator APIs from a few select languages (C++, Python, Haskell, Nim, Ruby).

(click me! It's long so I'm wrapping this in a toggle to not pollute the thread with walls of text:)

C++ (std::filesystem)

  • Join paths: /
  • Change file extension: .replace_extension(...)
  • Get parent directory: .parent_path()
  • Get file name: .filename()
  • Get current working directory: current_path()
  • Example:
    #include <filesystem>
    namespace fs = std::filesystem;
    fs::path base = fs::path("dir");
    fs::path path = base / "file.txt";
    auto parent = path.parent_path();
    auto newPath = path.replace_extension(".bak");
    

Python (pathlib)

  • Join paths: / or .joinpath(...)
  • Change file extension: .with_suffix(...)
  • Get parent directory: .parent
  • Get file name: .name
  • Get current working directory: Path.cwd()
  • Example:
    from pathlib import Path
    base = Path('dir')
    path = base / 'file.txt'
    parent = path.parent
    new_path = path.with_suffix('.bak')
    

Haskell (filepath package)

  • Join paths: </>
  • Change file extension: -<.>
  • String joining for extensions: <.>
  • Get parent directory: takeDirectory
  • Get file name: takeFileName
  • Get current working directory: getCurrentDirectory
  • Example:
    import System.FilePath
    base = "dir"
    path = base </> "file.txt"
    parent = takeDirectory path
    newPath = path -<.> "bak"
    

Nim (built-in)

  • Join paths: /
  • Change file extension: changeFileExt(...)
  • Get parent directory: parentDir(...)
  • Get file name: extractFileName(...)
  • Get current working directory: getCurrentDir()
  • Example:
    import os
    let base = "dir"
    let path = base / "file.txt"
    let parent = parentDir(path)
    let newPath = changeFileExt(path, ".bak")
    

Ruby (Pathname library)

  • Join paths: /
  • Change file extension: .sub_ext(...)
  • Get parent directory: .parent
  • Get file name: .basename
  • Get current working directory: Pathname.getwd
  • Example:
    require 'pathname'
    base = Pathname.new('dir')
    path = base / 'file.txt'
    parent = path.parent
    new_path = path.sub_ext('.bak')
    

I do kind of have an admiration for Haskell’s syntax but it’s probably unrealistic for Julia at this stage. Also would make more sense if <> was string concat (which it isn’t even in Haskell… weird).

Also, I continue to be surprised by how clean C++17 looks!

One theme from poking around is that nearly all languages that do have an infix operator use / apart from Haskell which uses </> instead.

4 Likes

Thanks for that overview Miles! I remain open to / myself, I’m just not sure that would be accepted by the core/triage team overall.

As usual, Ruby’s API looks nice, and I’m somehwhat surprised to see that C++ 17 has a path type!

2 Likes

In looking at the examples using / as the path concatenation operator, I’ve decided that I don’t like it. Even though I know that the variables are paths, I can’t stop thinking that something is getting divided. Both + and * look much more natural to me for joining paths.

For whatever reason, when I use Python’s pathlib I’ve gotten into the (possibly bad) habit of joining paths this way:

base = Path("/")
home = Path(base, "home")

just so that its completely unambiguous that I’m working with paths.

I also like the interpolation syntax p"$base$file". Just my 1 cent.

2 Likes

One potential alternative is to just have Paths.jl define its own / function that is not the Base.:(/).

i.e. a user could then do

new = let / = Paths.:(/)
    base/file
end

or if they want it globally you do

using Paths: /

new = base/file

you just keep this as a totally separate function from Base.:(/) and then there is no punning.

5 Likes

Re: path concatenation.

The operator * is not associative in Julia, which means multiple * will not create multiple intermediate strings.

julia> dump(:("a" * "b" * "c"))
Expr
  head: Symbol call
  args: Array{Any}((4,))
    1: Symbol *
    2: String "a"
    3: String "b"
    4: String "c"

julia> @which "a" * "b" * "c"
*(s1::Union{AbstractChar, AbstractString}, ss::Union{AbstractChar, AbstractString}...)
     @ Base strings/basic.jl:261

On the other hand, / is left associative. It’s not guaranteed that it’ll not create intermediate values.

julia> dump(:(a / b / c))
Expr
  head: Symbol call
  args: Array{Any}((3,))
    1: Symbol /
    2: Expr
      head: Symbol call
      args: Array{Any}((3,))
        1: Symbol /
        2: Symbol a
        3: Symbol b
    3: Symbol c

This isn’t to say * is better. I just wanted to point out a more subtle trade-off.

Path concatenation doesn’t really need a binary operator. Using a function (or a constructor method as @nsajko and @mihalybaci suggest) would be easier to learn, and harder to misuse.

10 Likes

But since a path is a kind of string (or closely related), it’s very unfortunate to use the string concatenation operator for something that’s very different from concatenation. Especially since concatenation is something you probably want to do.

IMO, * is the worst possible operator for path join, better to use something completely different.

8 Likes

Re: associativity, I think the compiler would just optimize this anyways, especially since it’s likely to be only instances of ~2-3 chained / at a time

I’m not certain the (pedantic) mathematical justification that underpins the choice of * for strings would necessarily apply to a path join operation, but I’m not sure. It’s also worth noting that the choice of Base’s * for strings also lends itself to the thought that Base’s / might be used for doing the inverse of concatenation. Yes, julia#13411 was decided against, but / means divide and therefore a duck-typed ::Any method might do some wild things and unexpected things with paths.

An available operator with n-ary parsing like * and precedence like + is ++. It’s largely unused in the ecosystem.

4 Likes

We should probably use / as the path operator - it’s the universal convention across languages and formats (even C++, surprisingly) and visually intuitive. The worry about confusion with division seems unrealistic - I can’t conceive of any scenario where you’d accidentally divide a Path by something, much less have it conflict with path separation. While semantic purity is often valuable, I think / as a path separator is too universal, and path manipulation too practical, to sacrifice clarity for theoretical purity.

17 Likes

I think @savq might be onto something with this. I rarely use operators for string/path concatenation and generally prefer joinpath() type functions. From a human-readable perspective is

path = root / directory / file

really that much easier to read or type than

path = joinpath(root, directory, file) # or
path = p"$root$directory$file"

?

I do like the idea of a dedicated, batteries-included Path type, but I’m not sure if it needs to have a / type operator just because other languages do. I mean, if Python jumped off a bridge, should Julia do it to? :sweat_smile:

One other thing, would we need to add a \ for Windows paths? / is naturally for unix/linux users, but maybe less so for Windows? I (for one) always found it weird typing / in Python for Windows paths.

3 Likes