Introduce ++ as the concatenation operator

Everyone I have ever spoken to who had not yet written a line of Julia (but has written code in other languages) thinks it bizarre/hilarious that string concatenation is *

FWIW, using + as the non-commutative concatenation operator has caused real pain points in optimizing both Python (there’s a python list thread about this) and C++ (there’s a C++ youtube talk about this). Don’t have the links off-hand.

7 Likes

OTOH most people who do start writing those lines of Julia code just consider this a tiny, insignificant syntactic detail that is probably the least relevant thing about Julia. Initially, some like it, some hate it, most just don’t care. But eventually, everyone just moves on to more interesting things.

If one had to mention, say, 10 things that make Julia different from other languages, I don’t think this would make the list. Probably not for 20 or 100 either.

I consider it both hilarious and bizarre that someone who is interested in but does not use Julia would even ask about the operator for string concatenation, instead of parametric types, multiple dispatch, macros, the compiler model, the package manager, the maturity of the ecosystem, etc, etc.

5 Likes

You are right of course that no one asks about it unless they are told first. But when told, the reaction is always incredulity. It’s like the many oddities of javascript that people laugh about. When you say in reply to the question “But, why?” that * is not commutative but addition is…well you can imagine.

Of course this won’t stop many people from using Julia who might otherwise want to. In the same way that lots of people code in javascript despite its oddities. But it is a shame nonetheless.

I don’t think that particular point matters wrt Julia though, since + and * are just regular generic functions, so optimizations assuming commutativity can not be done on the untyped AST representation anyways, since they can always be overloaded or even shadowed by the user.

It does make a difference in how the stdlib uses those operators, though. We’re very careful to preserve the order of arguments to * throughout, meaning that our matrix multiplies, for example, work with some pretty esoteric types. You’re right, though, it’s not so much about optimizing (as I initially wrote) as it is about developing and forming good mental models of your code.

5 Likes

Not really. It is just a tiny detail of syntax that nevertheless has a consistent motivation. Alternatives have been considered and debated (you can look up these threads and issues if you are interested), the issue has been decided, and people just moved on.

Bringing it up at this stage is probably just a waste of everyone’s time, distracting an otherwise relevant discussion. Which, incidentally, is a shame.

2 Likes

I suppose it depends on the audience. My reaction to that reasoning when I first read it was being immediately convinced, and wishing other languages had the sense to choose * earlier. (But on second thought, it probably doesn’t matter much, since it’s hard to imagine a generic use of * or of general concatenation that is usefully applied to strings).

3 Likes

That’s very interesting. Was this * for string concatenation specifically?

I have read some of those threads. I have also seen over the years a number of similar feeling topics in different open source projects. I fear that you may get similar comments to mine regularly for years to come. I hope you can see that it’s entirely out of support for Julia as a great language and wanting it to be as good as possible.

My apologies for starting this conversation again. This reminds me of a blog post by Joe Armstrong, one of the creators of Erlang. In it, he describes the Three Laws of Programming Language Design:

  • What you get right, nobody mentions it.
  • What you get wrong, people bitch about.
  • What is difficult to understand you have to explain to people over and over again.

The choice of * for string concatenation doesn’t fall under the second law. It’s not wrong, it’s just a syntax choice. But the third law sort of applies here. Newcomers tend to be surprised by the choice, and questions and debates about the choice will resurface for years to come. :sweat_smile:

6 Likes

Even further, the newcomers will never notice the pitfalls that are avoided by us choosing an operator that doesn’t need to be commutative.

3 Likes

To me this sort of thing just falls into the same category as 1 vs 0 based indexing, using [i] or (i) to access array elements, etc. So it is no “shock” that Julia uses * when some other languages use ++. You can also use string("a","b"), which is my preferred method.

A favorite way in Java is StringBuilder

String stringBuilderConcat = new StringBuilder()
    .append(greeting)
    .append(" ")
    .append(person)
    .append("! Welcome to the ")
    .append(location)
    .append("!")
    .build();
3 Likes

Sure, but it’s not really that often, at all. Very few complain about it, but those who do tend to sound quite frustrated.

The main argument against * seems to be that it’s unfamiliar, not that it doesn’t make sense. But if you just want a language that’s familiar and mainstream, why not stick with what you already know? Lots of syntax is new and different in Julia. At least, make an argument why + is intrinsically better.

Is string concatenation really used that much by most users, though? For printing you shouldn’t use it anyway, it’s better and faster to use println(s1, s2, s3,...)

5 Likes

If this is referring to my post, then yes.

I happen to work with strings a lot so it is surprising to me. In the literature * is often used to mean repetition when it comes to strings. So ab * 3 would mean ababab. ab * ab * ab just seems very surprising as a result and doesn’t make sense if you are used to the string literature. Basically * means multiplication as in multiple copies of a string to those who work with strings.

From ordinary algebra I’m used to seeing

a * b = ab

so it makes a lot of sense to me.

Are you referring to scientific literature on string analysis, or is it more in the direction of algorithm pseudocode? I’m looking for a ‘deeper’ reason to prefer + beyond mere habit.

Right I think you have hit the nail on the head. The reason for (what I regard as) this mistake is that the devs have taken notation from ordinary algebra and applied it to a different field where * is already defined with a different meaning. In the academic literature on string algorithms * means repetition of a string. The string algorithm literature is also (not in-) consistent with the main programming languages.

How very queer I would think that (ab)^3 would be a more common way of denoting the string ababab.

It looks like the Pythonic influence is great in your community, and hence the choice that (s)\ast n to denote (s)^n.

1 Like

* for ‘concatenation’ is used in the theory of formal languages;
e.g. languages over alphabet A are subsets of the free monoid generated by A under * with empty word as identity element. At least for me it’s a perfectly consistent explanation. The only explanation I know for + is “python does it this way”. Do you know why python has chosen + for string concatenation?

3 Likes

My immediate guess was “maybe because ABC did it?” But then it turned out string concatenation and repeat are more “strange” in ABC:

Operations: Examples:
join "now"^"here" = "nowhere"
repeat "-"^^5 = "-----"

See: The ABC Programmers' Handbook: CHAPTER 1, A Quick Look at ABC

1 Like