Recently there was an LWN news item pointing to an article on the Go blog about Traversal-resistant file APIs.
That’s a nice API. I’d also be nicer in Julia since we could use a do block instead of defer.
My two cents here: if Paths API has some kind of abstract interface, then maybe it can be used consistently across packages like Tar.jl, ZipArchives.jl, ZipFile.jl (maybe with some constraints)?
Thanks for the link! Incidentally this touches on one of the tensions in the current design: my desire to break with select conventions in the name of consistency/safety.
For instance, currently p"/some/absolute/path" * p"../../../../../../../../etc/passwd" will raise an error once we try to get the parent directory of p"/", however the “standard” behavior is for the parent of / to be /
.
I particularly enjoyed reading this section:
If the attacker controls part of the local filesystem, they may be able to use symbolic links to cause a program to access the wrong file:
// Attacker links /home/user/.config to /home/otheruser/.config: err := os.WriteFile("/home/user/.config/foo", config, 0o666)If the program defends against symlink traversal by first verifying that the intended file does not contain any symlinks, it may still be vulnerable to time-of-check/time-of-use (TOCTOU) races, where the attacker creates a symlink after the program’s check:
Since this is exactly the issue that is systematically eliminated by requiring the use of a FD-as-Path type, as I am now experimenting with (and based on encouraging results, advocating for).
I don’t think it would be too hard to extend the proposal slightly and have rooted-ness be an extra flag that the FD-as-Path type can have, which is then automatically propagated to all path joining operations.
From reading that article, with this proposal will give us a better story with this class of vulnerabilities than Go ![]()
RAII would be nice here, but in the current experiment I’m just using a finaliser. I’m not sure that do is really what we want, since it’s good if the same handle is passed around more instead of being dropped and re-acquired.
Sure, why not
How does julia-basic-paths/abstractpaths.jl at main - tec/julia-basic-paths - Code by TEC look to you?
It’s slightly imprecise, but I’m thinking that maybe PathHandle is the name to go with here. I think having “handle” in the name is important, and while FSHandle etc. are most accurate, by putting Path in the name we get an explicit connection to the Path type and an indication of the relationship between the two. Appearing as a completion with Path<tab> is a nice bonus. I think together this more than makes up for the loss in accuracy.
So, with this, there are three kinds of system-native paths:
Pathas a “pure” path for the current system (an alias ofPosixPathorWindowsPath)PathHandlefor a file descriptor/filesystem handle produced from aPathDirEntryis a parent directoryPathHandlecombined with a relativePathand (optional) metadata.
All of these types can be converted between (e.g. PathHandle to a Path, DirEntry to a PathHandle, etc.) using the platform filesystem APIs. We can do this automatically where convenient, and require a particular type when we want to prod the user into behaving more safely (e.g. requiring them to acquire a handle, to encourage them to re-use it).
Since @tecosaur explicitly requested further commenting in
I already expressed most of my views during the original discussion, in Designing a Paths Julep - #58 by goerz, but in the context of the point I was trying to make in the context of the “The strangeness (or not) of * as string concatenation” thread, the worry I took away from the discussion here is pretty much summed up with
One aspect of such “purity” is @jar1’s (and others’) stance
It’s not that I don’t understand this at some level, and it’s hard to argue about it completely objectively. But the experience of every other programming language that has Path objects has shown that there is no practical problem with using / for something completely unrelated to division, in the context of paths. These functionalities do not clash, in practice.
But beyond the purity of “division”:
I’d also want to strongly re-emphasize
But enough about the joinpath operator ![]()
The other point @tecosaur was bringing up was
It’s dangerous because there’s untrusted input being used. People have to be aware of untrusted input, and sanitize it before letting it flow through the rest of their program. But that’s not your responsibility. Or rather, you’re in no position to “fix” this in a path library, and I’d be concerned that any well-meaning attempt to fix it is only going to mess up perfectly legitimate use cases. Stick to the solutions that have been found to work in other ecosystems, such as Python’s pathlib.
A Path('user_content/../../../../../../etc/passwd') is absolutely something I might want to do, and yes, it should resolve to /etc/passwd. The way to sanitize Path("user_content/" + untrusted_file) is to resolve it and check that the result is “secure”. Maybe “secure” means it’s in a subfolder of user_content, or maybe that it’s in a subfolder of the current user’s home directory. The point is, you can’t know, so how could you possibly “fix” this? The CVEs are for libraries improperly using untrusted input, not for the Path library. the Path library isn’t the problem here, and it’s not the place to fix anything.
I would be strongly advise against changing the behavior of existing solutions for things like relative_path / absolute_path -> absolute_path. Even the existing string-based joinpath in Julia implements that behavior. Other pathlib implementation adopt this for good reason. Yes, it’s potentially dangerous with unsanitized input, but it’s important for common practical use cases, such as join_path(cwd(), location) not having to distinguish whether location is relative or absolute.
Basically,
I think I would try to keep this very simple
As for some of the “ideas” for design goals floating around,
- rejecting invalid path segments at creation
- disallowing a root segment within a path’s segments
- disallowing joining a path with an absolute one
- clear handling of special
.and..segments
No! Don’t do any of these things! Don’t fix it if it ain’t broken. All we need is a Path object that understands the “segments” that make up a file path, and functions like parent, name, suffix, stem, relative_to, etc. to manipulate them.
Don’t get in the way of constructing “weird” paths, like “rejecting invalid path segments”, or applying any kind of normalization prematurely.
Then on top of that, you can normalize paths to deal with “handling of special . and .. segments”. At that point, you still don’t want to access the actual file system: it should be possible to generate normalized Unix paths on a Windows system and vice versa.
Then lastly, you resolve paths on the actual file system, and that’s where things like symlinks come into play. (I’m not sure whether the name resolve is appropriate at the normalization stage or at the resolve-on-filesystem stage; again: look at the terminology used in existing implementations). At that level, maybe you also want to add some functionality to verify that certain paths are “secure”, but I’d be pretty careful with that.
Since @tecosaur explicitly requested further commenting in
Thanks Michael. It’s been a while, but I appreciate you following through on this.
What exactly a good Path type looks like is a much trickier question than I first thought, and I hearing other people’s hopes/priorities/concerns is hugely helpful.
But enough about the
joinpathoperator
I’m sorry, but I’m not going to touch the * & / discussion at this time
. I have opinions, but I’m concerned that the bikeshedding around that has crowded out some of the more critical capability/behavior discussion. I still plan on leaving infix operators out of the proposal.
I’d be concerned that any well-meaning attempt to fix it is only going to mess up perfectly legitimate use cases.
I am too, but I want to be as as much as is reasonably to encourage safety. To this end, concrete examples of how some of the ideas I’m throwing around could backfire would be very helpful.
Note: Sometimes the difference in backgrounds/interpretation makes it hard for me to build the same mental model from an abstract description of a problem as the same concept/thought that I imagine existed in your or other commenter’s heads. It also makes it harder to identify when there’s a misunderstanding/miscommunication. I’m rather convinced that this is happening in the thread because I’ve read things that seem “silly” to me from people who I’m quite sure aren’t having silly thoughts.
Your joinpath(cwd(), location) example is exactly the sort of example that’s helpful in this way, though elaboration on where you tend to see/think of locationcoming from wouldn’t hurt ![]()
Hmm, building a checklist of “things that should still ‘just work’” could be a good idea…
…anyway, back to your reply:
The point is, you can’t know, so how could you possibly “fix” this?
Magically knowing what is/isn’t “secure” / “good” is indeed beyond what a path library can do. However, what we can do is require a you to be explicit about what behavior you expect/permit.
For instance, say you have
pathfragment::String = some_source()
if you expect it is going to be a file name with no directory component, you can do:
p"some/parent/$pathfragment.txt"
and get an error when pathfragment contains directories or invalid characters (like a null byte). The same applies when you want a directory and you do:
p"some/parent/$pathfragment"
This logic applies because you are interpolating a String.
Say you really do want to allow pathfragment to have multiple segments. Interpret the pathfragment as a Path and the single-segment expectation no longer applies:
pathfragment = parse(Path, some_source())
p"some/parent/$pathfragment"
You aren’t prevented from using untrusted input, you just need to be explicit about what you expect pathfragment to be.
Don’t get in the way of constructing “weird” paths, like “rejecting invalid path segments”
Examples of why it’s a bad idea to, for example, prevent paths from being constructed that contain null bytes would be helpful.
At that point, you still don’t want to access the actual file system: it should be possible to generate normalized Unix paths on a Windows system and vice versa.
I’m currently thinking we want to three levels of path-related objects:
- Strings (parse to a path, and as path segments)
- Paths (as a series of segments, disconnected from the filesystem)
- Filesystem resources (i.e. file handles)
Drawing the right lines between what each kind of object is responsible for takes quite a bit of thought (for me at least). For example, I’m currently rather inclined to think it should only be possible to do filesystem operations (read/write/move/etc.) on filesystem resources, not paths directly — just make it easy to get a resource handle from a path. This is specifically motivated by TOCTTOU-type issues.