At this point it’s probably worth pointing out that FilePaths2.jl was implemented as an AbstractTree.
Is paths iterating segments a good idea? I don’t really see how that would be useful and it seems to open up bugs like scalars being iterable does.
Even in your example, if the users homedir has foo in it, I don’t want to replace that. I don’t really see when I’d want to do something per segment like that.
I don’t expect that example to be that practically useful, it’s just to demonstrate that if you want to change things other than the basename as strings, there’s a mechanism to do so.
With iterating paths, I can’t see much of an issue myself. You want a way to get all the segments of a path, and to me collect(::Path) seems at leas as good as splitpath(::String). Given that Strings-as-paths are already iterable, and I haven’t seen any issues as a result, it’s hard for me to see the issue, even hypothetically. Perhaps you could elaborate?
If you iterate a string, you get a character, which will be more likely to throw an error, e.g. if you feed it to something typed AbstractString it will throw. If you iterate a path and get a string or something string-like, you might be able to continue to use it as a relative path or similar and not realize the error (because a lot of code will be supporting both strings-as-paths and the new paths).
Ah yep, thanks for spelling that out. I can’t quite say I’m sold, since it does still appear to me that “the segments of the path” is the clear sensible answer to “what is a path composed of?”, but it’s something to mull on as the proposal is tweaked.
I think Julia’s path object should be as unoriginal as possible and use all of the most standard conventions found in the most popular libraries.
and that being said, it seems like C++, Java, and Rust all iterate their paths via segments so copying that would be fine for Julia imo. Python’s pathlib does not, but it just errors so at least it’s not really a competing convention.
Update here: I’ve had a first pass at implementing this, and I’m really encouraged by it.
With these changes, you can now:
- count all entries under a path with mapreducepath(_ -> 1, +, path)
- count directories with mapreducepath(_ -> 1, _ -> 0, +, path)
- find the largest file with mapreducepath(filesize, max, path)
- collect all paths with mapreducepath(identity, identity, vcat, collect, path)
- check if any non-empty files exist under a path with mapreducepath(_ -> false, !iszero ∘ filesize, |, any, path)
I still need to add a writeup to the Julep, but I’m excited by this addition. It’s like a really nice walkdir (and also ~10% faster in the current implementation).
Just in terms of process: could you turn this into a regular package, people can try and play with it, and if after some real-world testing and use everyone likes it, then it could move to base?
I don’t think AbstractPath should be iterable because
- it will make scalar-extension broadcasting very confusing: with_ext.(path, extensions)should be[with_ext(path, e) for e in extensions], not[with_ext(frag, ext) for (frag, ext) in zip(fragments(path), extensions)]. You can get around this by special-casingAbstractPathfor broadcasting likeAbstractStringdoes, but this already causes problems and imo is very much not worth it.
- having a function components(path)for that purpose is simple, easy, and clear.
Yeah the broadcasting argument convinces me. And the function components or segments or so can still return a lazy iterator of Substrings or something else that doesn’t need to allocate
Yea, I find the broadcasting argument rather convincing.
One other thought: is it possible for to be iterable but broadcast as a scalar?
Yes that’s what I meant by my allusion to AbstractString: Strings are iterable but broadcast as scalar. However imo this behavior makes it harder to learn, increases complexity of usage, and creates opportunity for error.
For example, broadcasting over a vector of characters versus a string produces quite different results:
julia> ['a', 'b', 'c'] .^ 2 |> join
"aabbcc"
julia> "abc" .^ 2 |> join
"abcabc"
I’d rather use a little helper function like segments than have to worry about complexities every time.
To throw my 2 cents in, I think that iterating the components of a path is a rather niche operation. You rarely need to do operations on sequential levels of a path, instead it is much more common to act at the same level. i.e. rename all files in a folder, rename all folders with a new patterns, swap the root of the file system, etc…
So, at least to me, it’s is much more intuitive an “horizontal” iteration, rather than a “vertical” one. And by “horizontal” I mean something like a glob pattern broadcasting:
for file in p"/path/to/some/*.txt"
    #= ... do stuff =#
end
Not really proposing this kind of iteration, it is just as an example.
But that could also be
for file in eachfile(p"/path/to/some/*.txt") 
    #= ... do stuff =#
end
or something like that. Etc. etc.
yes absolutely, I’ve edited the post specifying I am not actually pushing that syntax.
A specialized function would absolutely be the best approach there.
I was just pointing out why I think that iterating a path components as an implicit default is not a great idea.
I’ve been developing this as a package just so I could use Revise.jl, I’m happy to make it easier for others to load it as a package too. Make loadable as a package · 2a15632626 - tec/julia-basic-paths - Code by TEC might do the trick.
Do we want string operations because they are familiar to people? Or in order to support calls to code for ducktyping? (People tend to annotate function parameters that take strings as String.  But requesting that the annotation be removed is not too much of an imposition.)
I wonder how adoption will play out. Most people will choose the most compatible option, the one that will generate the fewest issues and complaints. Even if the ecosystem is 90% compatible with Paths, one incompatibility in my dependency tree will cause headaches. Maybe there’s some way to deal with the problem. But I’ve got a ton of issues to deal with. So I always send strings when I make a call that wants a path. Only a small fraction of users use Paths, but I want to see Paths succeed. So I accept Paths; if it’s a high enough priority on my list of things to do.
The story is the same at 89% adoption, except a little worse. You want people to add Path support that no one will use until some indeterminate future date.
A counterexample is if Paths is powerful and convenient, and I do a lot of work with paths in a package. I accept Strings and Paths, work internally with Paths, and send Strings.
I’ve (mentally) put this on the backburner for the past few weeks, but this isn’t forgotten about.
- I’m still trying to be bullish about path normalisation, but also practical: I think it’s worth doing to the maximal reasonable extent, working out what exactly that extent is is requiring more thinking.
- I’ve realised that there are even more headaches relating to symlinks. They truly are the gift that keeps on giving.
- @Sukera kindly made me aware of an IBM research paper on the Unix path syscalls encouraging TOCTTOU  bugs/vunerabilities. It seems like as of POSIX 2008 we can (on non-Windows at least) implement a File Descriptor -first filesystem API that encourages non-TOCTTOU code. Perhaps the introduction of a dedicated Pathis a good opportunity to smuggle in this too?
I’ve just pushed Experiment with making Path FD-oriented, I’ll be mulling over this and updating the design doc with some motivating thoughts and extra considerations over the next week or two.
I have yet to investigate how cross-platform this can be.
I’ve started writing about this here: Charting a Path for Julia: Recognising the difference between locations and resources.
It’s incomplete, and among other things the naming has to be worked out (do we really want to call a handle to a resource a Path? I’m not sure but FilesystemEntityHandle is way too awkward and I haven’t had any naming brainwaves).
At this stage, I feel good about how useful having such a construct is, as it systematically makes it much harder for subtle bugs/vulnerabilities to occur.