Designing a Paths Julep

Since @tecosaur explicitly requested further commenting in

I already expressed most of my views during the original discussion, in Designing a Paths Julep - #58 by goerz, but in the context of the point I was trying to make in the context of the “The strangeness (or not) of * as string concatenation” thread, the worry I took away from the discussion here is pretty much summed up with

One aspect of such “purity” is @jar1’s (and others’) stance

It’s not that I don’t understand this at some level, and it’s hard to argue about it completely objectively. But the experience of every other programming language that has Path objects has shown that there is no practical problem with using / for something completely unrelated to division, in the context of paths. These functionalities do not clash, in practice.

But beyond the purity of “division”:

I’d also want to strongly re-emphasize

But enough about the joinpath operator :wink:

The other point @tecosaur was bringing up was

It’s dangerous because there’s untrusted input being used. People have to be aware of untrusted input, and sanitize it before letting it flow through the rest of their program. But that’s not your responsibility. Or rather, you’re in no position to “fix” this in a path library, and I’d be concerned that any well-meaning attempt to fix it is only going to mess up perfectly legitimate use cases. Stick to the solutions that have been found to work in other ecosystems, such as Python’s pathlib.

A Path('user_content/../../../../../../etc/passwd') is absolutely something I might want to do, and yes, it should resolve to /etc/passwd. The way to sanitize Path("user_content/" + untrusted_file) is to resolve it and check that the result is “secure”. Maybe “secure” means it’s in a subfolder of user_content, or maybe that it’s in a subfolder of the current user’s home directory. The point is, you can’t know, so how could you possibly “fix” this? The CVEs are for libraries improperly using untrusted input, not for the Path library. the Path library isn’t the problem here, and it’s not the place to fix anything.

I would be strongly advise against changing the behavior of existing solutions for things like relative_path / absolute_path -> absolute_path. Even the existing string-based joinpath in Julia implements that behavior. Other pathlib implementation adopt this for good reason. Yes, it’s potentially dangerous with unsanitized input, but it’s important for common practical use cases, such as join_path(cwd(), location) not having to distinguish whether location is relative or absolute.

Basically,

As for some of the “ideas” for design goals floating around,

No! Don’t do any of these things! Don’t fix it if it ain’t broken. All we need is a Path object that understands the “segments” that make up a file path, and functions like parent, name, suffix, stem, relative_to, etc. to manipulate them.

Don’t get in the way of constructing “weird” paths, like “rejecting invalid path segments”, or applying any kind of normalization prematurely.

Then on top of that, you can normalize paths to deal with “handling of special . and .. segments”. At that point, you still don’t want to access the actual file system: it should be possible to generate normalized Unix paths on a Windows system and vice versa.

Then lastly, you resolve paths on the actual file system, and that’s where things like symlinks come into play. (I’m not sure whether the name resolve is appropriate at the normalization stage or at the resolve-on-filesystem stage; again: look at the terminology used in existing implementations). At that level, maybe you also want to add some functionality to verify that certain paths are “secure”, but I’d be pretty careful with that.

8 Likes