The strangeness (or not) of * as string concatenation

I think “an operator” has an ambiguity that’s important: a symbol vs a function.

There are some examples of Julia functions having different properties on different types and I am often confused and bothered by it. I want a generic function to have a generic contract, not a contract that changes all its invariants depending on the argument types.

The whole point of namespaces is so that the same name can be used for different purposes without changing the meaning of a shared object. If an author wants to use the name + for functionality that isn’t commutative, or the name map for functionality that doesn’t preserve keys(_), I much much prefer that they use their own MyModule.+, and MyModule.map rather than overloading Base.+ and Base.map.

So in the case of this thread, if Base.+ were used for String concatenation, I wouldn’t like that because I want Base.+ to be always commutative; If MyModule.+ – which is a different function, with a different contract, but the same symbol + – were used for String concatenation, that would be fine by me — go nuts.

Imo, if a function has a generic contract, then no specialization of it to specific types should violate that contract. The specialization should semantically subtype the generic function. Otherwise – if specializations can lawfully violate the generic interface – there is no generic interface.

5 Likes

I think this is an important point. When reading it now I find myself nodding and thinking “fair”, but I vividly remember being new to Julia, coming from Python, and finding this explanation not only rather grating (like, if all the other language designers are too pea-brained to realize this, how come + is working just fine for them?), but also suggesting a lack of pragmatism in the design that I assumed would make the language unpleasant to use in general. This was wrong, of course, but it wasn’t a good first impression. Simply saying “in Julia, we use *” would be plenty, no explanation or defense needed.

6 Likes

I can remember seing the choice of * and reading the explanation, and thinking “oh, well, I guess that makes sense.” it’s not like + is incredibly natural either.

There’s a somewhat arbitrary choice. There’s a somewhat reasonable explanation. This looks like the 1- vs 0-based debate again, only less important.

Getting annoyed, not just with the choice, but with the presence of a short explanation, and still thinking about it years later? Isn’t that … excessive?

12 Likes

When I first started learning French (I am still not anywhere near fluent) I used to think “Why is this word conjugated in such a strange fashion. Funny, I never thought that when learning English, nor when learning my first language.

After a while I realized I would be far better off just accepting French being the way it was, that I would have no influence in changing anything and be done with it. Reading this thread made me realize that this mindset should apply to computer languages (after version 1.0) as well. At best it may refine the documents.

This is kind of the long way of saying what @stevengj said in post 4.

P.S. The post did make me look up again the long forgotten definitions of commutative, and associative. Things I use when I do algebra but no longer had a name for.

6 Likes

The free monoid bit is kind of silly—many operations could be used for a monoid and it just feels like pointless jargon. Taking offense at it feels a little bit extra, but ok.

The really interesting question is: If concatenation is * then what would + be? If you consider strings to be patterns that match themselves, then + would be alternation of patterns in the regex sense. In other words, if a and b are patterns, then a * b is a pattern that matches a followed by b and a + b is a pattern that matches a or b. And this is makes regular expressions, with strings embedded in them as constant patterns, into a semiring: the additive identity is the impossible pattern (matches no strings) and the multiplicative identity is the empty string. You can easily check that the semiring properties hold; and yes, + is commutative while * is not. I suspect that the observation that * makes strings a free monoid is an echo of this, which I explained at least once when someone asked about this. The docs appropriately don’t want to get into this whole semiring of regular expressions business, but the part of that definition that’s just about strings is that they form a free monoid under *.

So we could have used a different opertor than * like ++, but using + still feels strictly wrong. There’s the whole non-commutativity issue, but also the fact that if you’ve made + concatenation then if you wanted to have an operator for alternation, what would you use? I guess |? Then you’re really punning your operators kind of badly. Having | and + be the operators for a semiring feels messy. In languages that have + for concatenation, it’s sometimes allowed to repeat a string using multiplication, but which side do you multiply on? 5*"xo" or "xo"*5? It’s unclear—because when + is commutative it doesn’t matter. Maybe that’s fine and you could allow both. But when you use * for concatenation it’s obvious that you want "xo"^5 for repeating xo five times, not 5^"xo". Then there’s the whole bit about using str/suffix to remove a suffix and prefix\str to remove a prefix. People didn’t want that exported from Base Julia (too funky) but at least you can define it yourself and it is coherent with *. With + and - this doesn’t work nearly as nicely because you have str - suffix for removing a suffix, but there’s no “left subtraction” for removing a prefix—again, because addition is presumed to be commutative.

12 Likes

only semi-serious: should strings support left and right division…?

2 Likes

I proposed that on GitHub long ago and it was shot down as too weird, but I still quite like it. My main question is what str/suffix should do if str doesn’t actually end with suffix? Error? Return str unmodified?

5 Likes

make a Path :eyes:

6 Likes

Yep, that’s a reason that str1/str2 doing path concatenation isn’t great. Yes, it looks nice, but there’s a different sensible meaning for that operation. If there were a path type, having path1/path2 do path joining would be better, but then it’s very different from strings. Seems better to avoid and use a different syntax. There’s also always the option of having a non-standard string type for paths, e.g. path"foo/$part/bar.txt" which could normalize to the right kind of path and do checks like if part is something like / or /root then raise an error.

1 Like

The string operations wikipedia page is linked from that thread and oh, hey, would you look at that—the mathematical syntax used for all the string operations is exactly the same as Julia’s syntax for string operations. What a coincidence!

9 Likes

I had almost forgotten that: actual, literal concatenation/juxtaposition looks like implicit multiplication. I think that’s actually what made me think * was OK.

5 Likes

If this was directed to me, I just want to note that I don’t harbor the slightest annoyance about the choice, I haven’t thought about the explanation in years, and when reading it now, my reaction is “neat, I learned something”. @StefanKarpinski’s comment above extending this idea to a semiring is downright fascinating.

This thread just reminded me that when I was first learning Julia 6-7 years ago and encountered this explanation, I walked over and showed it to my colleague so we could roll our eyes in disbelief together. First impressions matter, so I thought that was worth sharing here. (With my sincere apologies to whoever wrote that paragraph, which, again, I now find educational.)

5 Likes

Partly, I guess, but not mainly.

Same experience here – heavy eye roll, general indifference, acceptance, fascination. I’ll let others decide what it means about me that program syntax can make me go through something like the stages of grief.

Julia makes enough Big Design Choices that I would still back + if we were deciding today because I don’t think another layer of discomfort to newcomers is worth the benefits, but it’s definitely neat.

10 Likes

Another worthwhile Wikipedia page on the subject is

1 Like

Or we could have a + b = min(a, b). We then compactify the space of strings with a -Inf and +Inf. We have that +Inf is a neutral element for addition, whereas -Inf is an annihilator. I’m sure the ideal structure can be figured out, but I’d need a cup of coffee.

Languages that don’t worry about algebra or generic operations can pun like crazy. :slight_smile:

$ pike
Pike v8.0 release 1738 running Hilfe v3.5 (Incremental Pike Frontend)
> "hello world" / " ";
(1) Result: ({ /* 2 elements */
                "hello",
                "world"
            })
> "hello world" / 2.8;
(2) Result: ({ /* 4 elements */
                "hel",
                "lo ",
                "wo",
                "rld"
            })
> "hello world" * 2.8;
(3) Result: "hello worldhello worldhello wor"

(Reference documentation)

4 Likes

The Pike language uses division of a string for splitting, i.e. a/b is (Julia-equivalent) split(a,b). a*b is then join(a,b).

4 Likes

I can see an aesthetic and possibly a logical advantage of * over + there, but it’s not strong enough to me to warrant much more of an official explanation than “we know it’s unorthodox for programming, but it’s rooted in string function notation from formal language theory”.

  1. In practice, people didn’t have a problem with a symbol + used for a commutative operation (addition) also being used for a non-commutative operation (concatenation) when the input types can differentiate the call. We already do that for multiplication of integers and floating points versus matrix multiplication. In that context, arguments for preserving commutativity thus seem like a minor aesthetic preference that isn’t broadly applied.
  2. Framing addition as the commutative one and multiplication as the non-commutative one in general doesn’t pass muster when both addition and multiplication are usually commutative. The element of truth is when just one of them is non-commutative, it’s multiplication e.g. addition and multiplication of 1x1 matrices show the generalization of scalar addition and multiplication. There isn’t both an addition and multiplication for strings (though languages might), so the stated choice of * for being non-commutative is out of the usual context.
  3. String operations don’t need to generalize or be associated with addition or multiplication, putting aside notation. The identity element could be a zero for addition or a one for multiplication (Julia does do one(String) == ""), but a string x doesn’t really have an additive inverse x + (-x) == "" or a multiplicative inverse x * inv(x) == "". We do have chopsuffix(x, x) == choppreffix(x, x) == "", but there’s no direct -x or inv(x) input, and it does nothing if the strings don’t match enough (endswith/startswidth), which is much more unusual for addition or multiplication than non-commutative addition.
  4. / and \ pairing suffix/prefix stripping with concatenation as right/left division is paired with multiplication is aesthetically pleasing, but that doesn’t match the right/left quotient that returns an empty string upon mismatches, and what-ifs aren’t nearly as strong an argument as the existing language. If anything, chopsuffix/chopprefix suggests string operations should all be named functions; repeat is overloaded to do what ^ does, and string can do string concatenation. Languages with + for concatenation mostly use named functions for string operations, and even Pike has some on top of its fuller “arithmetic.”
5 Likes

I’m sure some variant of the Grothendieck construction can be applied to create string inverses, both additive and multiplicative. Sooner or later we’ll end up with a free module of strings over the integers, hopefully with a countable set of generators.

2 Likes