Designing a Paths Julep

This is a fascinating discussion! Thanks everyone for their input. I have some opinions, but won’t weigh in on most of the controversies because all of the suggestions seem better than the status quo, and other people have more informed opinions than me.

Two things I do want to mention. First, a couple of times file extensions have been mentioned, and I just wanted to flag that I remember having issues in python a decade or so ago when dealing with multiple extensions (eg my_file.csv.gz) or less frequently but very annoying, when dots were used in file names (eg my.file.name.txt). I don’t know if there are elegant solutions here, just an observation.

But I do have somewhat strong opinions regarding the relationship with strings. I really love the idea behind FilePaths.jl, and have tried to use it many times, but have run into way too many situations where I have to covert back and forth between strings to make modifications, and I always abandon it.

As an example, a situation comes up routinely in practice is that I need to rename a file from one form of annotation to another. For example

conversions = Dict(
    "sample1" => "db967",
    "sample2" => "db888",
   #etc
)

files =[
    "some/dir/sample1_1.fastq",
    "some/dir/sample1_2.fastq",
    "other/dir/sample1_processed.csv"
    "some/dir/sample2_1.fastq",
    "some/dir/sample2_2.fastq",
    "other/dir/sample2_processed.csv"
    #etc
]

for oldfile in files
    dir = dirname(file)
    filename = basename(file)
    sample = match(r"^sample\d", filename).match
    newname = replace(filename, sample => conversions[sample]
    newfile = joinpath(dir, newname)
    mv(oldfile, newfile)
end

Because I need things like regex matching and string replacements, in order to use FilePaths.jl I was constantly needing to convert to strings and back to paths, so whatever benefit I was getting using explicit path types was swamped by boilerplate.

I think the idea of explicit path types is compelling, but whatever the merits of “a path is not a string”, enforcing that super rigidly is, in practice, annoying.

7 Likes

To be honest, that seems more like a missing feature of Base / FilePaths.jl. It really shouldn’t be an issue to provide a flexible rename function. I suppose in filesystem terms this is just a move, but still, there shouldn’t be any issue with doing that.

Perhaps easier filtering of lists of paths would enable this?

1 Like

Regarding naming, on Julia nightlies, Base.rename is the low-level primitive used for implementing mv. Base.rename is now public: Make `rename` public by nhz2 · Pull Request #55652 · JuliaLang/julia · GitHub

I suppose that’s nice, but AFAICT, it just changes the last line of my loop there from mv to rename.

If there was a function that takes a function as the first argument to determine the rename, and allows one to treat the file name as a String w/r/t regex matching / replacing, and handles the conversion into and out of whatever file path types, that could definitely work.

1 Like

I agree that it’s not good if people need to do a bunch of path-string-path roundtripping just to do the transformations they need. Maybe it could be interesting to define replace(path, ...) in a way that it works segment by segment if you pass a string or a regex match. If you include path separators in your string logic then you’re back to manually handling forward slash / backward slash.

Maybe another option could be a multipart replace where you pass a tuple or vector of parts and they match the path segments sequentially, each one taking a separate substitution string or so. Like replace(p"a/b/c", (r"a", r"b") => (s"x", s"y")) or so.

2 Likes

I think my list of “potential helper functions” from earlier may be relevant here:

With a withname function, your example would become:

for oldfile in files
    filename = basename(file)
    newname = replace(filename, r"^sample\d" => s -> conversions[s])
    mv(oldfile, withname(oldfile, newname))
end

Without withname, you’d have to do newfile = p"$(parent(oldfile))/$newname" (note that this doesn’t handle one of the motivating edgecases for withname: an oldfile with no parent).

You’d still have to do String ↔ Path conversion if you want to modify the parent path with regex, but I don’t see any major usability hurdles in general from this.

2 Likes

Without jumping too far ahead here, I think there’s definitely a nice replace(::Path, ...) method along the lines you’re thinking :slight_smile:

1 Like

Note that we already have convenient interface for these operations with current String paths with Accessors.jl:

julia> using Accessors

julia> p = "/home/aplavin/file.txt"
"/home/aplavin/file.txt"

julia> @set basename(p) = "some.jpg"
"/home/aplavin/some.jpg"

julia> @set dirname(p) = "/root"
"/root/file.txt"

julia> @set splitext(p)[2] = ".jpg"
"/home/aplavin/file.jpg"

julia> @modify(uppercase, basename(p))
"/home/aplavin/FILE.TXT"

Basically, this is the general solution for all “I want a similar object but with f(x) = newval” problems. And this interface will (of course) remain possible for any new path implementation.

Seems much cleaner than creating and learning new functions for everything: you don’t really need basename + set_basename if you have set and basename separately :slight_smile:

5 Likes

That’s nice an all, but I don’t think the answer for “how to perform basic path modifications” should be to use a 3rd party package.

FWIW, I am familiar with this approach, namely setf and generalised variables in Lisp-land. If there was something like that in base Julia, I’d be more onboard with this approach.

2 Likes

That’s fair, but Base itself just has to provide some interface – generic, but not necessarily the one with the least boilerplate or convenient in all situations. That’s a common situation already, and paths aren’t really special.

Like, you can do joinpath(dirname(p), "some.jpg") with just Base, but @set basename(p) = "some.jpg" requires a package.

This is exactly the problem, without a withname function (and friends) the “obvious” code you write without a 3rd party package is wrong. parent(p"old.jpg") is nothing when there is no parent, rather than the empty string. Currently this example “works” because of what is essentially a quirk in how joinpath interprets empty strings. This is a large part of the motivation for functions like withname, withsuffix, etc.

I think it’s worth noting that these approaches aren’t mutually exclusive, there can be a withname (or similar) function in Base, and @set basename(p) = "new" can use withname itself.

1 Like

Not sure if that’s a good thing, having a neutral element for functions like joinpath is generally useful.

1 Like

More generally, I skimmed through this thread but still not sure I understand the scope. Is this julep expected to accomodate stuff like remote paths?

Current Julia filesystem API makes it kinda-possible to define the Base FS interface for remotes, so that users can write functions that work with either local or remote files. But it definitely requires some care.
One example is my package to work with OSF.io: GitHub - JuliaAPlavin/OpenScienceFramework.jl.

Would be great not to lose this ability at least, and make it more principled and convenient at best.

1 Like

I’d recommend looking at the HackMD doc rather than this thread for the proposal itself.

I didn’t find any mention of “remote” or its synonyms there though.

Right, because they’re not an explicit part of the current proposal.

Another piece of prior art, a popular Python package fsspec: Filesystem interfaces for Python — fsspec 2024.10.0.post13+gdbed2ec.d20241115 documentation.

Is it because you expect this interface to automatically be feasible for remotes without extra effort, or explicitly don’t want to cover those usecases?

I think it’s worth keeping in mind that this is a Path interface not a Filesystem interface. That would be a rather different (and more involved) proposal altogether. Not to say it’s a bad idea (I think in recent years Go added a nice generic FS interface IIRC), it’s just not this idea.

This proposal introduces a structured way of working with local filesystem paths, and attempts to find a sensible and minimal abstraction for other paths (URIs, S3, etc.) in AbstractPath.

1 Like

To reduce confusion, you may want to clarify that in the proposal and clearly define the scope – especially given that in Julia the Filesystem · The Julia Language docs page describes both.

That’s what I tried to find in the proposal, but didn’t see any mention whatsoever.

Now, it’s kinda-possible to write code that works with paths, and later apply it to remote files or files within archives. Is this intended to be possible within this proposal as well?