Designing a Paths Julep

At this point it’s probably worth pointing out that FilePaths2.jl was implemented as an AbstractTree.

1 Like

Is paths iterating segments a good idea? I don’t really see how that would be useful and it seems to open up bugs like scalars being iterable does.

Even in your example, if the users homedir has foo in it, I don’t want to replace that. I don’t really see when I’d want to do something per segment like that.

3 Likes

I don’t expect that example to be that practically useful, it’s just to demonstrate that if you want to change things other than the basename as strings, there’s a mechanism to do so.

With iterating paths, I can’t see much of an issue myself. You want a way to get all the segments of a path, and to me collect(::Path) seems at leas as good as splitpath(::String). Given that Strings-as-paths are already iterable, and I haven’t seen any issues as a result, it’s hard for me to see the issue, even hypothetically. Perhaps you could elaborate?

If you iterate a string, you get a character, which will be more likely to throw an error, e.g. if you feed it to something typed AbstractString it will throw. If you iterate a path and get a string or something string-like, you might be able to continue to use it as a relative path or similar and not realize the error (because a lot of code will be supporting both strings-as-paths and the new paths).

3 Likes

Ah yep, thanks for spelling that out. I can’t quite say I’m sold, since it does still appear to me that “the segments of the path” is the clear sensible answer to “what is a path composed of?”, but it’s something to mull on as the proposal is tweaked.

I think Julia’s path object should be as unoriginal as possible and use all of the most standard conventions found in the most popular libraries.

and that being said, it seems like C++, Java, and Rust all iterate their paths via segments so copying that would be fine for Julia imo. Python’s pathlib does not, but it just errors so at least it’s not really a competing convention.

4 Likes

Update here: I’ve had a first pass at implementing this, and I’m really encouraged by it.

With these changes, you can now:

  • count all entries under a path with mapreducepath(_ -> 1, +, path)
  • count directories with mapreducepath(_ -> 1, _ -> 0, +, path)
  • find the largest file with mapreducepath(filesize, max, path)
  • collect all paths with mapreducepath(identity, identity, vcat, collect, path)
  • check if any non-empty files exist under a path with mapreducepath(_ -> false, !iszero ∘ filesize, |, any, path)

I still need to add a writeup to the Julep, but I’m excited by this addition. It’s like a really nice walkdir (and also ~10% faster in the current implementation).

2 Likes

Just in terms of process: could you turn this into a regular package, people can try and play with it, and if after some real-world testing and use everyone likes it, then it could move to base?

9 Likes

I don’t think AbstractPath should be iterable because

  • it will make scalar-extension broadcasting very confusing: with_ext.(path, extensions) should be [with_ext(path, e) for e in extensions], not [with_ext(frag, ext) for (frag, ext) in zip(fragments(path), extensions)]. You can get around this by special-casing AbstractPath for broadcasting like AbstractString does, but this already causes problems and imo is very much not worth it.
  • having a function components(path) for that purpose is simple, easy, and clear.
8 Likes

Yeah the broadcasting argument convinces me. And the function components or segments or so can still return a lazy iterator of Substrings or something else that doesn’t need to allocate

1 Like

Yea, I find the broadcasting argument rather convincing.

One other thought: is it possible for to be iterable but broadcast as a scalar?

1 Like

Yes that’s what I meant by my allusion to AbstractString: Strings are iterable but broadcast as scalar. However imo this behavior makes it harder to learn, increases complexity of usage, and creates opportunity for error.

For example, broadcasting over a vector of characters versus a string produces quite different results:

julia> ['a', 'b', 'c'] .^ 2 |> join
"aabbcc"

julia> "abc" .^ 2 |> join
"abcabc"


I’d rather use a little helper function like segments than have to worry about complexities every time.

6 Likes

To throw my 2 cents in, I think that iterating the components of a path is a rather niche operation. You rarely need to do operations on sequential levels of a path, instead it is much more common to act at the same level. i.e. rename all files in a folder, rename all folders with a new patterns, swap the root of the file system, etc…

So, at least to me, it’s is much more intuitive an “horizontal” iteration, rather than a “vertical” one. And by “horizontal” I mean something like a glob pattern broadcasting:

for file in p"/path/to/some/*.txt"
    #= ... do stuff =#
end

Not really proposing this kind of iteration, it is just as an example.

But that could also be

for file in eachfile(p"/path/to/some/*.txt") 
    #= ... do stuff =#
end

or something like that. Etc. etc.

2 Likes

yes absolutely, I’ve edited the post specifying I am not actually pushing that syntax.
A specialized function would absolutely be the best approach there.

I was just pointing out why I think that iterating a path components as an implicit default is not a great idea.

1 Like

I’ve been developing this as a package just so I could use Revise.jl, I’m happy to make it easier for others to load it as a package too. Make loadable as a package · 2a15632626 - tec/julia-basic-paths - Code by TEC might do the trick.

Do we want string operations because they are familiar to people? Or in order to support calls to code for ducktyping? (People tend to annotate function parameters that take strings as String. But requesting that the annotation be removed is not too much of an imposition.)

I wonder how adoption will play out. Most people will choose the most compatible option, the one that will generate the fewest issues and complaints. Even if the ecosystem is 90% compatible with Paths, one incompatibility in my dependency tree will cause headaches. Maybe there’s some way to deal with the problem. But I’ve got a ton of issues to deal with. So I always send strings when I make a call that wants a path. Only a small fraction of users use Paths, but I want to see Paths succeed. So I accept Paths; if it’s a high enough priority on my list of things to do.

The story is the same at 89% adoption, except a little worse. You want people to add Path support that no one will use until some indeterminate future date.

A counterexample is if Paths is powerful and convenient, and I do a lot of work with paths in a package. I accept Strings and Paths, work internally with Paths, and send Strings.