Designing a Paths Julep

Prompted by a gripe I made on Slack, thoughts I’ve had on paths have coalesed into the start of a Julep and a prototpye.

At this stage, I’d love to get wider input on the idea and help shoring it up into a solid proposal for Julia (a Julep). That HackMD document is commentable and editable to anyone logged in, and there’s a basic (and likely lightly buggy) prototype that can be played with.

It would be great to have a little working group of sorts to take this to a finished proposal. @jakobnissen, @jules, and @simonschoelly have been great in the Slack thread to name a few, but this seems worth opening up to a wider audience.

In particular, it would be brilliant to hear from @oxinabox, @Rory-Finnegan, @ExpandingMan, and @c42f to name a few people I’ve seen previously thinking about this topic.

39 Likes

Thanks for the work you put in so far!

Rereading this document I had a few ideas. I think the focus of these path objects shouldn’t be maximum performance but robustness and convenience as looking up things in filesystems is slow anyway, so relatively you probably don’t gain much by making the path object super optimized.

So I thought what are the main things we’d want to avoid with paths that the current string representation doesn’t give us?

  • rejecting invalid path segments at creation
  • disallowing a root segment within a path’s segments
  • disallowing joining a path with an absolute one
  • clear handling of special . and .. segments

I like your proposal of adding special cases to the path macro, mostly @ and ~.

We could handle these requirements pretty well by making the path type store its root explicitly and making the path segments special types that validate at construction. Closed unions would be a little less performant than boiling everything down to just SubStrings but easier to validate.

struct PathSegment
    str::SubString
end
struct CurrentDir end # .
struct ParentDir end  # ..

struct UnixPathRoot end # /
struct WindowsPathRoot
    str::SubString
end

const PathRoot = @static Sys.iswindows() ? WindowsPathRoot : UnixPathRoot

struct PathSegment # could also be split into Unix and Windows variant
    str::SubString
    PathSegment(str) = new(validated_segment(str))
end

struct Path
    root::Union{Nothing,PathRoot}
    segments::Vector{Union{PathSegment,CurrentDir,ParentDir}}
end

This way one could for example ensure that no roots are in the segments and one can error easily when joining a path with another that has path.root !== nothing.

You cannot be sure that a Path object you didn’t construct yourself hasn’t been messed with (Julia just has no guards against reinterpretation circumventing validation at instantiation) but if you called the Path constructor or p"" macro yourself, you should be sure that there are no invalid segments or roots.

The ~ functionality could be handled by adding another type maybe called UserRoot and widening the union in Path.root. The @ functionality could be CodeRelativeRoot etc. The union must be closed for performance but others shouldn’t be allowed to override functionality that touches path string macro parsing anyway, I think.

Then for path interpolation, one could add the rule that interpolation between path separators like some/path/$interpolant/ is equivalent to joining with Path(interpolant) (no roots allowed etc.). And interpolating into a “word” would require the result to validate as a PathSegment like some/path/prefix$interpolant, which would error if interpolant contains / or .. etc.

For the . and .. normalization problem, I think we just need to be clear that realpath interacts with the file system (so you cannot normalize paths that don’t exist) while some other alternative function (named something better than simplify_syntactically but equivalently clear) is allowed to remove .. and . with the understanding that this might not reflect the actual filesystem given symlinks.

6 Likes

Why should . or .. get special treatment? Seems better to just treat those like any other file.

They don’t necessarily need to be separate types but to me it seems like for example for programming certain walking patterns over paths it would be clearer to dispatch on those, or to easily check if a path can be syntatically simplified. Especially the .. is very different from the normal segments I would say because it’s the only one that can go up. And I wouldn’t want p"some/path/.$(".") to work which you can also add as validation logic but which I find nice to solve via dispatch.

Going up is possible for other names apart from .., too, e.g., for up:

$ stat -c %i .  # the inode of `.`, uniquely identifying it
1
$ mkdir some_dir
$ cd some_dir/
$ stat -c %i .
6198
$ ln -s .. up  # make `up` point to `..`, see: https://en.wikipedia.org/wiki/Symbolic_link
$ cd up
$ stat -c %i .
1
$ cd some_dir/
$ stat -c %i .
6198
$ cd up
$ stat -c %i .
1
$ pwd
/tmp/some_dir/up/some_dir/up

So, above, in the path /tmp/some_dir/up/some_dir/up, up is interchangeable with ...

The idea of giving the . or .. path elements special treatment seems bad to me, because it seems like it would require a (perhaps impossibly) careful implementation specific to many different (file) systems.

Also consider some tricky special cases, such as:

  • . or .. not existing in a directory (not sure if POSIX/SUS allows this, but even if it doesn’t, there’s such a thing as a buggy or noncompliant file system)
  • Where does .. in / point to?

Thanks for the kind words and your detailed comments Julius, on Slack and here!

Robuestness and convenience are two apsects where I think we can clearly improve on the status quo, I just think we can get all three (robustness, convenience, and performance) without needing to compromise :slightly_smiling_face:.

I think you’ll be happy to see that the prototype handles path interpolation. Invalid characters and potentially deceptive segments (string-ly segments that are pseudopath elements or contain seperators) raise errors:

julia> invalid_path = p"null_char\0"
ERROR: LoadError: Invalid segment in PosixPath: "null_char\0" contains the reserved character '\0'.
[...]


julia> deceptive_segment = "/etc/passwd"
"/etc/passwd"

julia> p"relative/path/$deceptive_segment"
ERROR: Invalid segment in PosixPath: "/etc/passwd" contains the separator character '/'.
[...]

Of your points, the only one I think is currently unaddressed is:

  • disallowing joining absolute paths as a >first argument

The rules for ./.. pseudopath elements are clear I think (described under “Avoid representational ambiguities”): all Path values are normalised, . components are always omitted and .. is only used at the start of a relative path. The headache with realpath and potentially cyclic symlinks etc. making .. behave unusually is approached by requiring the user to explicitly call realpath before engaging in filesystem-sepcific operations that rely on these details, and similarly handling extenally-provided path strings.

Is this a complete/sufficient approach? I’m not 100% sure, but I think it might be and if it works the mental model is wonderfully simple.

Glad to hear, the @ idea was a more recent shower-thought that seems worthy of further consideration. I appreciate hearing second opinions on it :).

Currently this is handled by making the p"~/..." macro call generate code that at runtime produces the appropriate path, I think this works without needing the structure you propose, but perhaps I’m missing something?

julia> @macroexpand p"~/hey"
:(parse(Path, homedir()) * p"hey")

At this stage, I’m fairly happy with the interpolation behaviour of the prototype. It seems nicely “sensible” to me. There are two forms of interpolation supported:

  • String-ly interpolation
  • Path interpolation

An interpolated string may be a new value, or part of a single segment:

julia> p"demo/some-$a"
p"demo/some-word"

julia> p"more/$a/parts"
p"more/word/parts"

Meanwhile, an interpolated Path can contain multiple segments, and expands to path concatination. Interpolated Paths are not string-concatinated with adjacent content, and must be seperated from surrounding segments.

julia> p"start/$sub"
p"start/multi/segment"

julia> p"start$sub"
ERROR: ArgumentError: Cannot concatenate path with a string prefix
Stacktrace:

All thsese errors are raised during macro-expansion, and are not runtime errors BTW.

:100: You saw me wrestling with this on Slack, and I think ultimately we need to make a clear seperation between purely conceptual path handling and messy live-fileystem details, and clearly communicate an easy to understand split.

3 Likes

I wasn’t trying to say that your proposal wasn’t hitting those points I listed, more that I was trying to come up with a mostly type-based system to handle these :slight_smile: One can of course validate each segment as a string and add the same logic that would be possible with types.

This is one example that I wouldn’t want to allow, I think. I can’t come up with a scenario in which I’d want to interpolate a path B into another path A only to end up completely disregarding path A:

julia> p"""hi/$(p"/hey")"""
p"/hey"

Some other things I found:

julia> x = "."; p"hi/.$x"
ERROR: InexactError: trunc(UInt16, -2)
Stacktrace:
 [1] throw_inexacterror(::Symbol, ::Vararg{Any})
   @ Core ./boot.jl:750
 [2] checked_trunc_uint
   @ ./boot.jl:772 [inlined]
 [3] toUInt16
   @ ./boot.jl:845 [inlined]
 [4] UInt16
   @ ./boot.jl:895 [inlined]
 [5] convert
   @ ./number.jl:7 [inlined]
 [6] GenericPlainPath
   @ ~/dev/julia/julia-basic-paths/path.jl:4 [inlined]
 [7] *(a::Main.Paths.GenericPlainPath{PosixPath}, b::Main.Paths.GenericPlainPath{PosixPath})
julia> x = "."; p"hi/$x./"
p"hi/./."
julia> x = p".."; p"/x/$x"
ERROR: InexactError: trunc(UInt16, -2)
Stacktrace:
 [1] throw_inexacterror(::Symbol, ::Vararg{Any})
   @ Core ./boot.jl:750
 [2] checked_trunc_uint
   @ ./boot.jl:772 [inlined]
 [3] toUInt16
   @ ./boot.jl:845 [inlined]
 [4] UInt16
   @ ./boot.jl:895 [inlined]
 [5] convert
   @ ./number.jl:7 [inlined]
 [6] GenericPlainPath
   @ ~/dev/julia/julia-basic-paths/path.jl:4 [inlined]
 [7] *(a::Main.Paths.GenericPlainPath{PosixPath}, b::Main.Paths.GenericPlainPath{PosixPath})
julia> p"/.."
p"/"

Ah right! I didn’t read your comments in that way, but that makes sense.

The parse(Path, homedir()) bit of the currently macro-generated code I shared could very well be replaced with Path(UserRoot()) with a structure like you describe.

struct UserRoot end

Path(::UserRoot) = ...

I think these two approaches should be equivalent in terms of user-observable behaviour?

Absolutely. This is a consequence of abspath1 * abspath2 currently producing abspath2. I wasn’t sure if raising an error here would be a bit much, but with your comments here and Inkydragon’s on the HackMD it seems like this could well be reasonable.

After this new commit your example now produces:

julia> p"""hi/$(p"/hey")"""
ERROR: AbsolutePathError: Cannot join one path (hi) with an absolute path (/hey)
Stacktrace: [...]

Thanks for raising this, the raising an error seems much more sane here :slight_smile: (though ideally I would like it if it could be known at compile-time whether an error may be raised during path concatination).

Ooh yep, looks like my “likely lightly buggy” comment was worth making :sweat_smile:. Thanks for these, I’ll take a look when I can find a bit more time.

Thanks for all your work on this.

Just curious, looking at this part of the code, why use * instead of /? I feel like the / operator is a pretty universal indicator for subdirectories. Linux, macOS, URLs, Git (across platforms), Cloud storage usually uses /. Even Windows, the one outlier, has accepted / for paths since MS-DOS (and PowerShell uses this as a default actually). Even ignoring specific operators, string concatenation is usually never the same operator as a “join path”, across most languages, so I’m not sure I understand the intuition behind using * for joinpath in Julia.

6 Likes

One other reason I was thinking about more complex “root” objects was the relocatability issues you also mention in your hackmd. For example, something like RelocatableFolders.jl provides.

I guess both of these are what we usually call “punning” an operator? Joining paths is neither concatenation nor division so I guess we’re free to choose. I kind of like / better just because it looks more path-y

2 Likes

Yeah out of all the infix operators, my general intuition points to / being the only one that makes sense.

Aside: even for string concatenation, * feels pretty cursed; I am think Julia is the only top-100 language using it?

Looks like it, from some research:

With some searches and LLM formatting, here are other top languages that have infix string concatenation operators:

  • Python (+)
  • JavaScript (+)
  • Java (+)
  • C# (+)
  • PHP (.)
  • TypeScript (+)
  • Ruby (+)
  • Swift (+)
  • Kotlin (+)
  • Julia (*)
  • Lua (..)
  • Perl (.)
  • Visual Basic (&)
  • Haskell (++)
  • OCaml (^)
  • F# (^)
  • Erlang (++)
  • Groovy (+)
  • Ada (&)
  • ABAP (&&)
  • ActionScript (+)
  • Crystal (+)
  • Elixir (<>)
  • Elm (++)
  • Pascal (+)
  • PowerShell (+)
  • Delphi (+)
  • D (~)
  • VBScript (&)
  • Nim (&)
  • GDScript (+)
  • Hack (.)
  • Haxe (+)
  • LiveScript (+)
  • VHDL (&)
  • SQL (||)
  • C++ (+)
  • Rust (+)
  • Scala (+)
  • Fortran (//)
  • Smalltalk (,)

Julia looks to be the only one using *. + seems to be the most common.

1 Like

I strongly advocate for letting the operator choice for String concatenation R.I.P. - there is an approachable justification in the manual and it’s clearly in line with many other implementation decisions in julia, valueing consistency very highly.
(And due to SemVer * is going to stay either way for the time being)

As for the proposed Path type I feel like there are two directions with merit in their own right:

  • keeping * for familiarity (for Julia users at least)
  • Punning / due to the resemblence of how paths are commonly written.

Also, since introducing a dedicated type separates the underlying concepts, deciding on another operator imo doesn’t seem unreasonable.

Why not just use the Path constructor for joining paths?

5 Likes

It’s been a hot minute since I last read them, but among some of the linked prior discussions exactly this comes up, and / is objected to more strongly than *.

All this aside, concatenation could be entirely done through interpolation: p"$a/$b".

paths are not strings; I see no need for the concatenation operator to match.

pathlib in Python uses / and it looks quite nice because it matches how paths are written in the OS

16 Likes

Thanks for putting together the HackMD document! Just reading through the history was informative.

This thread from your history section seems to have a lot of discussion on / versus other options and I didn’t get the sense that / was objected to more than * in this thread ( counter point is this comment by Jeff in FilePathsBase.jl )

One thing these conversations are touching on is that users will do commonly is mix “path types” and “things that can be converted to strings”. To that end, I would like it if something like this would work:

p"/home/$user_tstr" / dir_tpath / p"$(string(name_tcustom))" * ".csv"

This example uses / for a separator, allows for interpolation of strings (and objects that can be converted to strings as well?) into p"..." string macros to form paths, and allows using * to append for example .csv to a filename to return a Path object.

One additional reason I think not reusing * would be good is if a user accidentally forgets the p in a p"...", with / they’ll get an error but with * they’ll construct a completely incorrect path. Throwing an error here is the better imho.

Thanks for working on this! This is going to be awesome to have in Julia!

6 Likes

Are we sure we want this to work?

if a is a path, depending on whether b is a string or path a user would get different paths constructed which seems like it will make code harder to understand when reading it or reviewing it as part of a PR.

I like p"$a/$b" working regardless of the type of b but I’m not sure that we want p"$a$b".

3 Likes

I just made my own FilePaths2.jl public.
I have no time to pursue it. But there is one idea that is relevant to the question of concatenation function.

I noticed something lacking from FilePaths.jl. If you pass a path object to a function, you’ll very quickly run into a method not found error.

I went through the code in Base for working with paths (as Strings, of course) and added methods for a path object. This allows you to go much further before hitting a method error.

I note in my method for splitpath! that the code is almost identical to that in Base for String because I added methods for the functions called in splitpath!.

For the same reason, a big advantage of using * for path concatenation is that it allows some existing code (that is duck typed) to work immediately with a path type (modulo futzing with separators). And even if its not duck typed, the editing job to make it compatible with a Path type is made easier.

My point is that it makes sense to try to maximize compatibility with existing code that assumes paths are strings.

1 Like

Even a random 12 year old has intuition for / indicating subdirectories from browsing the web. Readability is important!

If you find github.com*JuliaLang*julia intuitive it might mean you’ve spent too much time in a Julia REPL :sweat_smile:

+1 for defensive coding!

10 Likes