Programming languages are full of notation that was initially pretty arbitrary, it’s just that we got used to it over decades.
It’s nice if operators have some internal consistency and/or resemblance to a familiar notation, but that is just one aspect. Brevity, clarity, etc are also important.
Since Julia is primarily not a string-processing language, I am not sure we should introduce operators for a lot of string-related API. Sure, * is fine, arguably ^ is useful too, but after that, diminishing returns kick in. Packages can do /, \, ++, //, in their own namespace if necessary for DSLs, while Base and the standard libraries can just expose functions that have no operator equivalent.
I imagine you may well be thinking of the (dormant, but not forgotten/abandoned) thread Designing a Paths Julep, but in case you or any interested readers have forgotten, that’s exactly what I’m proposing there
Another upside of this sort of approach is that based on whether a $string or $path is being interpolated, you can catch and handle unsafe input like:
Eh, IEEE floats may muddy the waters, but + is the Abelian group operator. Just because it’s not perfectly adhered to is no reason to abandon convention.
That said, I do see the confusion around *, and I think there’s a solid argument that instead of picking between punning addition/multiplication it would be sensible to dedicate an operator to the concatenation of containery-types (vectors, strings, etc.) like ++. Until Julia 2.0 though, we’ll just have to live with the status quo, and this niggle is fairly low my list of breaking changes I wish we could make.
I’m not sure we want that. Containers aren’t all sequences, so concatenation can work really differently. Multidimensional arrays have special array literal syntax backed by vect/hcat/vcat/hvcat, and that doesn’t concatenate input strings (and non-AbstractArrays) because we really want to write strings as separate elements. We often want to write AbstractArrays as separate elements, too, which requires seemingly superfluous brackets. As for sequences, concatenation may not even be useful enough to warrant type-specific implementation: [(1:2);(4:5)] cannot be another inlinable UnitRange, so [(1:2);(3,4)] should not be either for type stability.
We also wouldn’t need to invent a new syntax for strings, we already have string interpolation. "$(a)b$c d" lowers right to Base.string(a, "b", c, " d") and is a more direct and flexible alternative to a * "b" * c * " d" dispatching *.
Since this is the Nth discussion of this, I am curious about the following: I understand that some programmers are bothered by using * for string concatenation when they encounter it for the first time.
But, perhaps, the relevant question is: is *(::String, ...) a thorn in anyone’s side who has used Julia for at least a year? Is this continuously nagging users who found it initially surprising, or does everyone get over it after a while?
In general, when you learn a new language, you should try to learn the idiomatic “spelling” of things in that language.
It is possible to imitate of lot of things in julia, including member functions, and make things look superficially like python or fortran or bash or haskell, lisp, R and whatnot (though not perl, I think). I will e.g. always be offended by 1-based indexing (it is the wrong thing), but I don’t see that as a reason to use OffsetArrays.jl all over the place. One simply gets used to it.
As fun as it was to make arguments, * for strings didn’t bother me on day 1. I noticed it’s unorthodox, but it was just a different name for a familiar operation to me, and Julia is not the only language that does string concatenation with unusual symbols. I’m not unbothered generally, I still don’t like that Julia has multiple method syntaxes that don’t do annotations equally well, and I’m grateful not to deal with exclusive endpoints when I don’t have to.
But I (most of the time) don’t think of strings as an embedding into regular expressions with a semiring structure! Nobody does, except computer science PhDs designing a programming language I understand the appeal, and wholeheartedly agree with
There is definitely a place for the kind of mental model; just not when designing user-facing APIs.
Yes, exactly! I just don’t agree that either debate is unimportant
Contrary to
I probably would not be using Julia if it had zero-based indexing. Not because I can’t get used to writing out code with zero-based indexing (or use * for string concatenation), but because these design choices are reflective of an underlying mental model. Mental models are what really matters, that’s why we have different programming languages suitable for different tasks. When I’m doing numerics, I want a language that has a mental model of a matrix as a mathematical object with two (usually) one-based indices, not “memory offsets in a contiguous list of pointers in memory” (or however you want to describe what C does). A language using 1-based arrays is indicative of the former mental model, while 0-based is indicative of the latter. Neither necessarily so, but it’s a pretty strong heuristic for how pleasant or painful it’s going to be to write numerical code in that language. Doing linear algebra in Julia is glorious, and it’s because it has good mental models for this use case.
I feel like there are some domains where the same mindset that produces the best language for numerical computing bar none (replacing Fortran) comes up with unnecessarily complicated and unintuitive interfaces for what should be more mundane parts of the language, including string processing.
If that algebraic model was in fact required to understand working with strings in Julia, that would be a problem. Imagine wanting to use Julia to let some high-schoolers do some numerical coding (in lieu of the much less appropriate Python), and having them confronted with concepts of abstract algebra. While I love regexes personally, I know many people whose eyes immediately glaze over at their mention. So that’s really not something you’d want to add to the mental model of “string”. I’ve seen the notion of “you need a PhD to use Julia” float around more than once, and would try to steer away from any design that enforces that impressions (not that I think it’s a justified impression, at the end of the day).
I understand that was a bit of a strawman, since
and we all agree on the word “appropriately” in that sentence. But still, I would point out that only sometimes it works out to have a sophisticated mathematical mental model underpinning an implementation while also allowing users of that implementation to have a much simpler mental model. You run into problems when adhering to sophisticated models conflicts with the more intuitive simple model, and IMO the intuitive model should win out in such cases.
Personally, when I feel the need to include a statement like “While * may seem like a surprising choice” in the documentation (and I’ve been there in my own projects!), I’d take that a sign that the API missed the mark w.r.t. the principle of least surprise. In the big picture, these are pretty small annoyances for the most part, but there’s definitely a few examples in Julia where I’d say prioritizing formal purity over pragmatism violated that principle. Like (another pet peeve of mine), feeling the need to implement an error message use occursin(needle, haystack) for string containment for "substring" in "string", is another such indicator. I don’t think it’s just a lot of people coming in with Python experience, and this being arbitrary; I actually think Python nailed an intuitive interface there.
Yes, if you think in terms of a semi-ring, but not if you think of it as simply alternation and concatenation. The | symbol is already used inside of regexes for alternation, so that seems like the most natural symbol to use (if an there’s need for an operator for regex alternation at all, which I doubt, but maybe I’m misunderstanding the context).
This seems like a prime example of where an algebraic mental model goes off the rails: If you’re choosing + and *because of the algebraic structure when these operators would be confusing in a more naive mental model, that’s making a mistake, IMO.
I don’t have a problem with *as such. I’ve used many programming languages that all use various operators for string concatenation (although, personally, I do think that + is the most intuitive one, matching colloquial language). If the explanation for it was “inspired by programming language X”, I wouldn’t think twice about it. What I get a hung up on (probably unjustifiably, but also really not that much, “offended” was more than little hyperbolic), is that it’s indicative of some people getting too caught up in mathematical abstractions for parts of the language that really shouldn’t involve mathematical abstractions.
So while it’s not really a “thorn in my side”, I do feel friction when doing certain things in Julia, string processing among them. I could chalk that up to
and indeed, it makes me reach for other languages when string processing is at the core of my task. But Julia does have some aspirations as a “general purpose language”. While I’ve always considered that a bit tongue-in-cheek (it’s clearly a language for scientific computing, and that’s okay), I don’t think intuitive APIs for string processing, I/O, subprocess management, etc. are at odds with that, and I would advocate for an approach that has alarm bells going off whenever a design is rooted in abstract math; instead, I would generally advocate copying successful approaches from other languages that have proven themselves intuitive and popular, as much as possible.
Python in particular is a great source of inspiration. It’s a terrible language for numerical computing (not having an abstraction for numerical objects, and numpy/jax being very limiting array-programming sub-languages, propped on after the fact), but it gets many other things very right.
Tangentially related, but I was definitely thinking about that thread. I’m not quite sure if those are plans for inclusion in Julia proper, or just a package, but that discussion definitely left me alarmed at over-complicating design decisions. I would consider any implementation not using / to join Path objects to be a strong heuristic for “this doesn’t have the right mental model for me”.
To be clear, I agree that
in that we shouldn’t use / to join two strings as paths. What I would advocate for: just mimic the design of Python’s pathlib. It’s great, intuitive, nobody ever complains about it. It doesn’t have to be complicated. Just Path(string) gives you a path object, and path / path and path / string joins paths, and then your have root, parent, suffix, etc. on top of that. I’m not sure path"string" is strictly necessary, but if it exists, it should just be equivalent to Path(string), and then I feel like it should be perfectly obvious how string interpolation works: path"user_content/$untrusted_file" is nothing more than path_str = "user_content" * untrusted_file; Path(path_str), and then Path can decide how it wants to normalize that, whether to throw errors, etc. But again, keep it simple. In Python’s pathlib, Path("user_content/" + untrusted_file) is simply PosixPath('user_content/../../../../../../etc/passwd'), which seems perfectly alright. (Just my two cents, probably my mistake to open the can of worms of going into the details for possible path-implementations in Julia)
When I first came upon * for string concatenation, I found it a little weird but ok. Then I read the explanation and it really makes much more sense. And this is something that is less impactful than ** or ^ or pow (in the case of C/C++) for powers and 0/1 indexing in arrays. And if we are comparing to other languages, why is no one comparing to C?. Perhaps using strcat is much worse…
Are we also forgetting how string concatenation is done in R? paste seems be the worst option. It works like the string function but try to find that out for the first time. And it puts a space between the strings so you should use paste0.
Lisp doesn’t let you add strings for concatenation, there you have the function string-concat.
I don’t think + is such a universal thing for concatenating strings. It was one of the nice things in BASIC when compared to C. Pascal uses + but Matlab uses strcat. And Fortran uses //. String concatenation is string juxtaposition and in math juxtaposition is multiplication.
I think this is exactly the issue and if you’ll forgive my directness, I think you’re not addressing it.
Functions have meanings that can be specified in code. They’re not just symbols with vague vibes-based associations. Base.in doesn’t just mean “some kind of containment”. It’s a specific kind of containment, namely any(==(needle), haystack).
A generic function in where needle in haystack means any(==(needle), haystack)cannot implement the occursin function. That’s a different function.
Exactly the same is true for Base.+. It’s not just “some kind of combination”. Base.+ has a + b == b + a. If a desired function doesn’t have that property, it’s not Base.+.
Likewise Base./ is division. joinpath isn’t division! That doesn’t mean joinpath has to be *, it just means joinpath can’t be Base./.
I really wish Base specified these contracts in code rather than in the heads of users, but
the contracts are there.
Of course, some people prefer to not take interfaces seriously and have special-case semantics for whatever function-type combinations rather than saying specializations must semantically subtype their generic contracts. I think that’s how you get a mess — it adds convenience at a cost. That cost is turning normal function calls into special cases, so it’s impossible to understand what any code you’re reading does unless you know both what type is being passed into the function (typically unknowable, even in an IDE, since Julia doesn’t have static types) AND the special-case semantics for that particular function-type combination.
Yeah, I’ve noticed that you and I are somewhat diametrically opposed on this issue, and that @tecosaur and @StefanKarpinski are pretty much in your camp
I disagree with what you describe as the contracts for in, +, and /, and I would say they mean exactly what you say they don’t mean. It’s not exactly that you’re wrong; in some sense, you have math on your side, so it’s hard to argue with that. I think this somewhat comes down to mentality, and I think a language in which in means “some kind of containment”, and + is “some kind of combination” has much less friction. I mean, I’ve never heard anyone complain that substring in string works in Python.
I have limited if any control over the direction of Julia or any part of the ecosystem, so this is pretty much an “agree to disagree situation”, where I hope to nudge people to not go too far in the direction of your mentality. Julia is the perfect language for my number-crunching needs, so, if I get to choose, I’d prefer it to be a joy to use (“pragmatic”) in other related tasks, as opposed to constantly rubbing up against what seem like pedantic overcomplications.
Maybe surprisingly, I do agree with
as a main concern about the Julia, and the flipside of the amazing power of multiple dispatch in the absence of some kind of traits system. That’s already the world we live in now, and I’m not sure if it’s even possible to get away from that, given Julia’s design. But for whatever reason, I do not consider + for strings or / for Path objects or in for strings to be examples of this problem cropping up (even though, I guess, technically you are correct). I’m much more worried about nobody being really clear about what the contract for an “array” or “dict” is. Maybe (I haven’t really thought this through all the way) I want to place contracts on types, and less so functions, whereas you want to place contracts on functions.
In any case, that’s definitely my response to
We really need some kind of solution to the “traits” problem in Julia 2.0, whatever form that might take, so that we can have some kind of guarantee like “+ is commutative for my argument types”. I’m skeptical that just writing it into the docstring does much to enforce such a contract. There’s the principle of least surprise again: not that demanding that + is always commutative is totally outrageous, but if people aren’t naturally or intuitively on board with the “contract”, you’ll always end up with broken contacts resulting in correctness bugs throughout the ecosystem.
Just to go down a bit more down that rabbit hole: My pet peeve isn’t even so much with in and occursin being separate functions (although, at the point where you implement a method for in that prints an error, you might as well implement the occursin functionality in that method), but that the infix-syntax substring occursin string isn’t available.
is the direction I was most optimistic about, though it seems stalled now.
I’d be happy to have a symbol for that. I looked for one but couldn’t find anything even in the regex-theory literature.
I think you’re touching on something that is important for people-like-me to keep in mind: if I want a generic-first system to prevail, it needs to be convenient enough that people want to use it and don’t go searching for less-generic but easier workarounds.
Maybe let me put it like this: You’re implementing a function f(a, b). In your implementation, you’re making certain assumptions about that properties of a and b. Being perfectly explicit about your assumption is actually very difficult, but let’s say you’re diligent enough to do it. One of those (perfectly reasonable) assumptions is “a + b is commutative”.
Right now, Julia basically has no way to assert such properties. That’s the “problem” we agree on. The best you can do right now is f(a::SomeAbstractType, b::SomeOtherAbstractType). If you’re doing f(a::Float, b::Float), you’re okay with the commutativity, but with f(a::Number, b::Number), probably not: the ordinal numbers@Benny mentioned seem like they would be numbers, but with a non-commutative +. In any case: it’s pretty clear that Julia’s type system is too limited to go very far in making any such kinds of guarantees.
Now, your solution is: demand that + is always commutative. That certainly solves the problem for f. But it seems pretty unreasonable to me to demand that ordinal numbers can’t implement their non-commutative genuine +. More importantly: Julia has no mechanism for enforcing that at a technical level. Someone in the ecosystem is going to write a package with a non-commutative +, no matter what the docstring of + demands. And most functions are far more ambiguous about what the correct “contract” should be. You really want to assert an interface contract within the body of f. But having a global interface contract throughout the entire Julia ecosystem seems even more problematic (impossible) than figuring out a workable traits system that would give you the guarantee for a + b being commutative in the context of f.
I’m facing this problem pretty acutely within my QuantumControl packages, where I want people to be able to use pretty arbitrary data structures (from QuantumOptics, ITensor, whatever) to encode their quantum system, but I make quite a lot of assumptions about various properties of these objects in my code. If these properties were violated, it could very easily lead to silently wrong results (the nightmare of any computational scientist). So my only solution, for the time being, has been to define a set of “Interface” test functions like QuantumPropagators.Interfaces.check_generator that checks all these assumptions. This is called automatically in certain high-level functions like QuantumPropagators.propagate or QuantumControl.optimize unless passed check=false – which is fine in this case, because these high-level function tend to have runtimes of seconds to minutes to hours, so the overhead of running a test suite is acceptable.
If I could direct you over to that thread, I’d love to hear more. At this stage, I think I can say that thread will eventually become a something, and I’d like that something to be of high enough quality to be worth serious consideration for inclusion in Julia proper (which is somewhat needed for the value of a Path type to be realised, given open, read, etc.).
There are small pile of design compromises that need to be made. More shared thoughts on what the right priorities, trade-offs, and other design elements can only help.*
* Well, up to a point, but I don’t think we’re anywhere near there yet. Please don’t just comment on slash / joining though.
I’d describe this as perfectly dangerous not alright. There’s a pile of CVEs across all sorts of software and libraries from exactly this behavior.
I think you may be reading too much into minor things. Even though some people can vehemently argue that i-based indexing is the Only Consistent Solution™, with i \in \{ 0, 1 \}, experience shows that there are languages with either choice that people love, for one reason or another.
No practical language I know of has a single, monolithic mental model behind every choice that was made in the core language and the basic functions. Useful languages are just too large and complex for that to happen.
I find the semiring argument neat, but for practical purposes, I don’t think it very likely that I would write generic code a * b where a, b could be numbers, matrices, or strings. It’s just one of those things that is not going to happen.
So my bottom line is that all I care about is that a choice was made, and I don’t have to think about it and can focus on writing code. I could live with *, +, ++, and no dedicated operator for string concatenation. (If push comes to shove and I really needed it, I could define one in a single line.)
However, I see two contradictory impulses in the examples here.
* means any kind of juxtaposition, so it can be generic over matrixes, strings, integers, paths, and more. I’m hard pressed to think of a context where I could write a function that is generic over all of these disparate types. If I write a function for multiplying things, it would be nice if I got a MethodError when I passed in a String. If I write a function for concatenating things, I would expect a MethodError if I passed Int. To me, the mathematical similarities between multiplication and concatenation are a well-grounded justification for adding this pun to the language, but it doesn’t fundamentally change the fact that, for the sake of convenience, * was punned for strings. (Maybe originally punned by Regex Theory, not Julia)
in must be precisely “equality to one or more elements”, checking containment cannot be generic over ::String, ::String and ::T, Vector{T}. I can think of lots of cases where I’m writing a generic function that just wants to check if y contains x, but I have to special case String. The meaning of in could be widened to be as generic as * which would allow for punning, but we chose not to.
As I’ve said, I don’t have a particular concern with these design choices. However, I chafe against the appeal to the authority of mathematical justifications because I think it elides that there is human judgement and aesthetics involved in assembling the axioms from which we build the rules.