Designing a Paths Julep

Ooof, that’s low!

I know the default soft limit for me is 1024, but the hard limit is ~half a million.

I think this depends on how much work is “held onto”. I don’t think it’s sensible to suggest that every time you think about a file you get a file descriptor and hold onto it. But while actively performing some unit of work it probably makes sense to hold onto a few handles. I don’t think this should be a problem even on systems like those you mentioned?

Quick one-off accesses could still be done with read(handle(path), String) etc. While we could of course make read(path, String) work too, I do think there’s a level of desirable friction to push people to write code that reuses handles whenever it’s sensible to do so.

With work like https://github.com/JuliaLang/julia/pull/45272 (and follow-ups) I wonder if handle could be implemented such that the compiler can perform eager finalisation for calls like this? (i.e. not rely on GC for closing the file descriptor).

I’m not sure if this is a great idea, because fundamentally a resource is a different beast to the path to a resource. We could have something like handle(h::AbstractHandle) -> h though, so that you can accept either and know that by calling handle(h) you end up with a handle. I’m half tempted to consider AbstractResource = Union{AbstractPath, AbstractHandle} but I think at this point we’re overcomplicating things. There’s definitely still room for improvement on the design though…

Yes, I’ve checked and I’m pretty confident this can be done (for a file descriptor) on all three major OSs.

3 Likes

That sounds reasonable, as long as we dont expect handles to be a user level thing. My main concern there was that you’d have to do a lot more input validation :sweat_smile: but if it’s more a matter of what happens after open then yeah, makes sense to me!

So I would expect the user level things like read(any_path) to work exactly because it’s a one off - you will presumably never use the handle after that. And the more path types we implement the weirder it becomes to add complexity if you as a user just want to read an s3 file.

I see the argument for there being some amount of friction to that though - maybe we could enable some preference or so that the user can set during development if they want, that will warn when a path is used in these operations and not a handle?

Well, I hope that package and user-written functions that “do something” with a path might take a handle instead of (or as well as) a path, but I also want to avoid making it feel difficult/complicated to deal with. This is one of the aspects of API design that I think needs more thought/attention.

I’m currently thinking of handles as something you have before open. While open lets you operate on the contents of a resource, a filesystem handle gives you operate on the resource itself: renaming it, listing the contents (for a directory), deleting it, etc.

I get this, it’s just (in my mind, right now) part of the grey area of the API: what I want to determine is where friction is useful vs. just annoying.

This touches on part of my hope for this work; a well-designed abstraction can decrease complexity. What I want is for nothing to be written for an S3 resource, but instead for code that takes a resource and does stuff to be able to be written completely independently of a package that provides an S3 path type, Zipfile path type, etc. and compose seamlessly..

This might be the best way forwards, on balance. Perhaps using something like depwarn so we can push packages to write code that accepts + reuses handles, while allowing users to “just use paths directly” without impediment?

1 Like

Thinking more on this, I’ve come to two conclusions on the matter of separate AbstractPath + AbstractHandle types.

  1. Relying on the right type always used in the right place is fragile. Users will be annoyed if they can’t provide a path. We can use handle(::AbstractPath or ::AbstractHandle) -> AbstractHandle within a filesystem interface to make it so that packages can write path/handle agnostic functions, but without a common supertype packages will inevitably end up specifying arguments as one or the other. Something like AbstractResource is needed.
  2. The problem with something like AbstractResource as a union is that it doesn’t encode what kind of handle a particular path can be resolved to.

We could have AbstractResolvable{H} (name WIP) with the semantics that handle(::AbstractResolvable{H}) -> H, but implementing that type isn’t easy.

I think F-bounded polymorphism would allow for H<:AbstractHandles that are a AbstractResolvable{H} neatly, but that’s not something that fits with Julia’s type lattice.

Currently, this is the best idea I have:

julia> abstract type _AbstractResolvable{H} end # Private/internal

julia> abstract type AbstractHandle{H} <: _AbstractResolvable{AbstractHandle{H}} end

julia> const AbstractResolvable{H} = _AbstractResolvable{AbstractHandle{H}}
AbstractResolvable (alias for _AbstractResolvable{AbstractHandle{H}} where H)

julia> abstract type AbstractPath{H} <: AbstractResolvable{H} end

julia> struct PretendFileDescriptor <: AbstractHandle{PretendFileDescriptor} end

julia> struct PretendFilePath <: AbstractPath{PretendFileDescriptor} end

julia> supertypes(PretendFileDescriptor)
(PretendFileDescriptor, AbstractHandle{PretendFileDescriptor}, AbstractResolvable{PretendFileDescriptor}, Any)

julia> supertypes(PretendFilePath)
(PretendFilePath, AbstractPath{PretendFileDescriptor}, AbstractResolvable{PretendFileDescriptor}, Any)

In this way both PretendFilePath and PretendFileDescriptor are subtypes of AbstractResolvable{PretendFileDescriptor}.

Because of Julia’s lack of abstract self types/F-bounded polymorphism, the one bit of awkwardness is the need to write struct Foo <: AbstractHandle{Foo} instead of struct Foo <: AbstractHandle working.

This would be awesome!! :glowing_star: Thanks for your efforts!

1 Like

I’ve put my thinking cap on and have a v3 type system/design around paths and filesystems. It’s a little complex (more than I would like), but it’s the first design that fulfills all of the criteria I’ve come up with on this journey.

For fun, I tossed the design doc + implementation at ChatGPT, and then directed it to do a comparative evaluation. It might just be flattering me (take this with a pinch of salt), but I think the table it came up with is at least a of some interest, and a bit encouraging :slight_smile:

Dimension Your design Python (pathlib) Java (NIO.2) Rust std Rust cap-std Go (io/fs) POSIX Plan 9 Racket .NET WASI
Path is a structured value :locked: :white_check_mark: :white_check_mark: :white_check_mark: :cross_mark: :cross_mark: :cross_mark: :cross_mark: :locked: :cross_mark: :cross_mark:
Platform-specific path models (POSIX / Windows) :locked: :locked: :cross_mark: :wrench: :cross_mark: :cross_mark: :cross_mark: :cross_mark: :wrench: :cross_mark: :cross_mark:
Invalid paths unrepresentable / validated :locked: :wrench: :wrench: :wrench: :cross_mark: :cross_mark: :cross_mark: :cross_mark: :locked: :locked: :cross_mark:
Pure (FS-free) path manipulation :locked: :wrench: :cross_mark: :wrench: :cross_mark: :cross_mark: :cross_mark: :cross_mark: :locked: :wrench: :cross_mark:
Generic filesystem polymorphism (VFS) :locked: :cross_mark: :wrench: :cross_mark: :locked: :locked: :cross_mark: :wrench: :cross_mark: :cross_mark: :locked:
Paths are filesystem-relative (non-ambient) :locked: :cross_mark: :wrench: :cross_mark: :locked: :locked: :cross_mark: :locked: :cross_mark: :cross_mark: :locked:
Handles are authoritative resources :locked: :cross_mark: :wrench: :wrench: :locked: :wrench: :locked: :locked: :cross_mark: :locked: :locked:
Explicit resolution boundary :white_check_mark: :cross_mark: :cross_mark: :cross_mark: :locked: :cross_mark: :wrench: :wrench: :cross_mark: :cross_mark: :locked:
TOCTTOU resistance :white_check_mark: :cross_mark: :cross_mark: :wrench: :locked: :cross_mark: :wrench: :locked: :cross_mark: :wrench: :locked:
Capability-oriented APIs (opt-in) :wrench: :cross_mark: :wrench: :wrench: :locked: :cross_mark: :wrench: :locked: :cross_mark: :wrench: :locked:

Legend: :locked: Guaranteed/ensured :white_check_mark: Good support :wrench: Partial/weak support :cross_mark: No support

8 Likes