But I (most of the time) don’t think of strings as an embedding into regular expressions with a semiring structure! Nobody does, except computer science PhDs designing a programming language I understand the appeal, and wholeheartedly agree with
There is definitely a place for the kind of mental model; just not when designing user-facing APIs.
Yes, exactly! I just don’t agree that either debate is unimportant
Contrary to
I probably would not be using Julia if it had zero-based indexing. Not because I can’t get used to writing out code with zero-based indexing (or use *
for string concatenation), but because these design choices are reflective of an underlying mental model. Mental models are what really matters, that’s why we have different programming languages suitable for different tasks. When I’m doing numerics, I want a language that has a mental model of a matrix as a mathematical object with two (usually) one-based indices, not “memory offsets in a contiguous list of pointers in memory” (or however you want to describe what C does). A language using 1-based arrays is indicative of the former mental model, while 0-based is indicative of the latter. Neither necessarily so, but it’s a pretty strong heuristic for how pleasant or painful it’s going to be to write numerical code in that language. Doing linear algebra in Julia is glorious, and it’s because it has good mental models for this use case.
I feel like there are some domains where the same mindset that produces the best language for numerical computing bar none (replacing Fortran) comes up with unnecessarily complicated and unintuitive interfaces for what should be more mundane parts of the language, including string processing.
If that algebraic model was in fact required to understand working with strings in Julia, that would be a problem. Imagine wanting to use Julia to let some high-schoolers do some numerical coding (in lieu of the much less appropriate Python), and having them confronted with concepts of abstract algebra. While I love regexes personally, I know many people whose eyes immediately glaze over at their mention. So that’s really not something you’d want to add to the mental model of “string”. I’ve seen the notion of “you need a PhD to use Julia” float around more than once, and would try to steer away from any design that enforces that impressions (not that I think it’s a justified impression, at the end of the day).
I understand that was a bit of a strawman, since
and we all agree on the word “appropriately” in that sentence. But still, I would point out that only sometimes it works out to have a sophisticated mathematical mental model underpinning an implementation while also allowing users of that implementation to have a much simpler mental model. You run into problems when adhering to sophisticated models conflicts with the more intuitive simple model, and IMO the intuitive model should win out in such cases.
Personally, when I feel the need to include a statement like “While *
may seem like a surprising choice” in the documentation (and I’ve been there in my own projects!), I’d take that a sign that the API missed the mark w.r.t. the principle of least surprise. In the big picture, these are pretty small annoyances for the most part, but there’s definitely a few examples in Julia where I’d say prioritizing formal purity over pragmatism violated that principle. Like (another pet peeve of mine), feeling the need to implement an error message use occursin(needle, haystack) for string containment
for "substring" in "string"
, is another such indicator. I don’t think it’s just a lot of people coming in with Python experience, and this being arbitrary; I actually think Python nailed an intuitive interface there.
Yes, if you think in terms of a semi-ring, but not if you think of it as simply alternation and concatenation. The |
symbol is already used inside of regexes for alternation, so that seems like the most natural symbol to use (if an there’s need for an operator for regex alternation at all, which I doubt, but maybe I’m misunderstanding the context).
This seems like a prime example of where an algebraic mental model goes off the rails: If you’re choosing +
and *
because of the algebraic structure when these operators would be confusing in a more naive mental model, that’s making a mistake, IMO.
I don’t have a problem with *
as such. I’ve used many programming languages that all use various operators for string concatenation (although, personally, I do think that +
is the most intuitive one, matching colloquial language). If the explanation for it was “inspired by programming language X”, I wouldn’t think twice about it. What I get a hung up on (probably unjustifiably, but also really not that much, “offended” was more than little hyperbolic), is that it’s indicative of some people getting too caught up in mathematical abstractions for parts of the language that really shouldn’t involve mathematical abstractions.
So while it’s not really a “thorn in my side”, I do feel friction when doing certain things in Julia, string processing among them. I could chalk that up to
and indeed, it makes me reach for other languages when string processing is at the core of my task. But Julia does have some aspirations as a “general purpose language”. While I’ve always considered that a bit tongue-in-cheek (it’s clearly a language for scientific computing, and that’s okay), I don’t think intuitive APIs for string processing, I/O, subprocess management, etc. are at odds with that, and I would advocate for an approach that has alarm bells going off whenever a design is rooted in abstract math; instead, I would generally advocate copying successful approaches from other languages that have proven themselves intuitive and popular, as much as possible.
Python in particular is a great source of inspiration. It’s a terrible language for numerical computing (not having an abstraction for numerical objects, and numpy/jax being very limiting array-programming sub-languages, propped on after the fact), but it gets many other things very right.
Tangentially related, but I was definitely thinking about that thread. I’m not quite sure if those are plans for inclusion in Julia proper, or just a package, but that discussion definitely left me alarmed at over-complicating design decisions. I would consider any implementation not using /
to join Path
objects to be a strong heuristic for “this doesn’t have the right mental model for me”.
To be clear, I agree that
in that we shouldn’t use /
to join two strings as paths. What I would advocate for: just mimic the design of Python’s pathlib
. It’s great, intuitive, nobody ever complains about it. It doesn’t have to be complicated. Just Path(string)
gives you a path object, and path / path
and path / string
joins paths, and then your have root
, parent
, suffix
, etc. on top of that. I’m not sure path"string"
is strictly necessary, but if it exists, it should just be equivalent to Path(string)
, and then I feel like it should be perfectly obvious how string interpolation works: path"user_content/$untrusted_file"
is nothing more than path_str = "user_content" * untrusted_file; Path(path_str)
, and then Path
can decide how it wants to normalize that, whether to throw errors, etc. But again, keep it simple. In Python’s pathlib
, Path("user_content/" + untrusted_file)
is simply PosixPath('user_content/../../../../../../etc/passwd')
, which seems perfectly alright. (Just my two cents, probably my mistake to open the can of worms of going into the details for possible path-implementations in Julia)