Introduce ++ as the concatenation operator

++ is standard for concatenation in Haskell.

thanks, I understand the argument

operator is standard for concatenation in language.

But I rather asked what is the reason to choose the operator for this task, language is I believe mostly irrelevant in this question

2 Likes

I believe you’re referring to the Kleene star operator, which, unlike the multiplication operator, is raised, unary and postfix. It’s distinct both visually and semantically from multiplication. In fact, the Kleene star syntax is indirectly derived from (and consistent with) multiplication being used for string concatenation. Consider the definition given in the wikipedia article linked above:

As you see from this definition, the Kleene star is the union of V⁰, V¹, … for all powers. In this definition, exponentiation means repeating something a number of times, which only makes sense if concatenation is multiplication. The * in the exponent of the Kleene star operator refers not to multiplication, but rather to the fact that the operator is a union over all powers of the argument.

10 Likes

Possibly. But this is not a strong argument for changing anything. People can get hung up about all kinds of minor things, and debate them endlessly. 0-vs-1 based indexing is another favorite. At some point things are decided and the language moves on.

Interestingly, many people miss that the preferred approach in Julia is to pick a sensible default, but at the same time make the language extensible enough to accommodate other solutions are (near-)zero cost. Thus we have sidestepped the issue of 0-vs-1-based indexing, and have custom indexing, with packages like OffsetArrays.jl.

Seemingly, the string concatenation operator is not so easy to change since redefining + and * for AbstractString would be type piracy. But there are several possible approaches:

1. One can make a package which uses ++ and ∗ (\ast, not *) as operators, or any of the other unused operators, as an alias for * and ^. This is rather simple.

2. Define a string-like type that is not <: AbstractString, and just go with + and *. This is more involved, but could make sense for a DSL for someone working with strings a lot who really likes these operators.

There may be other solutions. The bottom line is that one cannot argue that something is important, but not be willing to bear a minor inconvenience to do it. The fact that solutions like this are not prevalent in the wild (as far as I can tell) suggests that users don’t really care about the whole thing.

At this point * as a string concatenation operator in Base is pretty much take it or leave it. Theoretically it could change with 2.0, but it is my impression that breaking changes will only be entertained for a really, really good reason. So while I am not convinced that there are many people out there who really consider this a major thing about Julia, if they do, they will either have to work around it or find another language.

6 Likes

“In the academic literature on string algorithms * means repetition of a string.” is not correct. I would like to withdraw that comment. Apologies.

Hi,

You make very interesting points. I do of course completely agree that the issue is not significant in the scale of things and that few people will choose a programming language because of its syntax for string concatenation. It would also be great if the plusplus.jl package were registered so people could just use it if they prefer that syntax. I think Julia is wonderful and am really enjoying learning it, despite this minor syntax oddity.

I thought it might be helpful to write in one place the main reasons in favour of using + or ++ as the syntax. I am not trying to persuade anyone and in general, unless you are writing the code yourself for an open source project I realise that your/my opinion is not worth much. So hopefully this will be mostly fact based rather than too opinionated. I will write it as a series of questions and answers. The questions are deliberately exaggerated for simplicity and partly to make it more interesting to read.

Q: Why should Julia simply copy syntax from Python for concatenation?

A: Let’s take a look at the top ten most popular programming languages (https://www.tiobe.com/tiobe-index/). These are Java, C, Python, C++, C#, Visual Basic, JavaScript, PHP, SQL and R. Of these Java, Python, C++, C#, Visual Basic, JavaScript and SQL (some variants) use the symbol “+” for string concatenation. PHP uses . (full stop) , R uses a function paste, C uses the function strcat. In other words, where there this is a single character symbol, 7 out of 8 of the most popular languages use “+”. The wikipedia https://en.wikipedia.org/wiki/Concatenation also mentions something similar.

Q: OK but Julia still doesn’t need to follow the crowd. Its aim is to be more formally designed and “+” looks wrong mathematically.

A: For this reason Haskell, perhaps the prototypical formal language, uses ++. This can be written as a single unicode character ⧺ . https://tex.stackexchange.com/questions/4194/how-to-typeset-haskell-operator-and-friends shows how to do the same thing in LaTeX.

Q: What syntax do they use in academia?

A: In programming language papers I believe ⧺ or ++ are popular. In the world of string algorithms string concatenation is typically denoted by + or simply by adjacency. That is if u = “hel” and v = “lo” then uv = hello. This is not very helpful for a programming language however where our variables often have more than one character. In lecture notes where a particular programming language is the basis, “+” is the common answer. See https://web.stanford.edu/class/archive/cs/cs106a/cs106a.1178/lectures/Lecture9/Lecture9.pdf and https://algs4.cs.princeton.edu/50strings/.

The earliest written reference I could find for when + started to be used for concatenation in academia is from the paper “String Theory” (1974) (https://www.cambridge.org/core/journals/journal-of-symbolic-logic/article/string-theory/58F6F7E5C62464BAFCC705E05DAF89F7) , where we have the following quote:

[…] the binary function symbol + which is intended to indicate the operation (“concatenation”) of patching one string directly onto the front of another. Thus a_1 + a_2 is simply a_1a_2 and, of course, + is associative, satisfies both cancellation laws, has 0 as a null element, etc.

The symbol + as used for string concatenation is a well defined mathematical function. It just isn’t addition from a ring. In general there is no reason to imagine that strings form a ring of course.

The following is more subjective:

Q: I don’t like ++ or ⧺. The reason why we can’t use “+” is that it is commutative but string concatenation is not.

A: It is true that in ring theory addition is commutative. But programming languages are not a direct application of abstract algebra. Their context is computer science and Julia is an imperative programming language. The concept of string concatenation is not new and syntax for it does not need to be invented anew. It already exists. If you write FULLNAME = first name + family name no one involved in CS is going to think you would be happy with family name + first name instead. In natural language addition is also not always commutative. Q: “How shall I build a house?” A: “Build the foundations, add the first floor, add the second, add the roof” is clearly different from A: “add the roof, add the second floor, add the first floor”. This is a series of instructions that have to be carried out in order, much as in an imperative programming language.

Q: Why do you object to “*” as the symbol for string concatenation?

A: Reasons:

1. It is different from the almost universally agreed standard for programming languages. As far as I know, no other popular programming language does this. It is also different to what is done in academia. It is hard to find a positive reason why Julia should choose a uniquely different syntax for an operation which is common and well known.

2. Before it is explained to you, it doesn’t carry with it any concept of addition or concatenation (as in adding to a chain or a building or a pile) so is unintuitive. Good notation should naturally suggest its use before it is explained. Much as it is awkward to call your matrix “!” for no good reason using “*” for concatenation feels similarly awkward, particularly given the context of both other PLs and how you would write the operation naturally in a talk for example (e.g. FULLNAME = first name + family name). No one would write FULLNAME = first name * family name in a talk I argue.

3. If we really want to use * by analogy with rings (which I believe is not appropriate as partly explained above in the reference “String theory”), what is the corresponding commutative addition operations? And finally, even in ring theory the multiplication operation is commonly written · , not *.

Q: Isn’t this just like the 0 vs 1-indexing debate?

A: Not at all. There is a clear split in the PL community between those who prefer one form of indexing over the other. For the notation for string concatenation there is Julia on one side and essentially all other programming languages (weighted by number of users) and all of academia on the other. It’s more like the debate over whether climate change is real.

1 Like

4 Likes

Thanks for the write-up. I’ll add another data point: Matlab uses array concatenation syntax for their old-fashioned strings, which are really character arrays:

>> ['abc', 'def']

ans =

'abcdef'


They have recently added a new, proper, string class (it’s actually a string matrix since everything in matlab is a matrix). This string type uses + for concatenation:

>> "abc" + "def"

ans =

"abcdef"


I mention Matlab even though it is not a top-10 language, because I suspect it is a top-10 language among Julia users.

Even though Matlab now uses +, the string type is poorly integrated into the language, and no-one I know are aware of this new string type, so I don’t think this really reflects the experience of Matlab users, who will almost universally only be familiar with array concatenation syntax.

Despite the prevalence of + concatenation, I think * is a better choice, and Julia should stick with it.

3 Likes

Are you sure this shouldn’t just define ++(args...) = string(args...)? Both to allow more than two arguments, and because falling back to vcat seems surprising:

"a" ++ 'b' == "ab"       # the point of this
"a" ++ 1 == Any["a", 1]  # surprise?
"a" ++ 'b' ++ "c"       # MethodError


For vectors, the correct symbol for vcat is of course \oplus. Although unlike ++ it doesn’t get parsed as having more than two arguments:

:(1 ++ 2 ++ 3) |> dump
:(1 ⧺ 2 ⧺ 3)   # :((1 ⧺ 2) ⧺ 3)
:(1 ⊕ 2 ⊕ 3)   # :((1 ⊕ 2) ⊕ 3)

1 Like

This BTW is an argument for *. Writing symbols next to each other is the completely standard way to write multiplication in mathematics (or occasionally ⋅, but I can’t think of anywhere that * gets used for this). When translating to programming languages you almost always have to insert a *, to allow for multi-letter variable names.

2 Likes

That package was made as a joke because everyone kept reviving the discussion

6 Likes

A worthy successor to the venerable TwoBasedIndexing package.

14 Likes

Awesome! Thank you.

I believe this analogy is weak. We don’t generally substitute adjacency with the symbol * . Adjacency has a myriad of meanings in mathematical notation, which is one of the problems that systems designed to read math automatically encounter. I can’t remember the exact number they found but it was in the hundreds. For multiplication the reason why 3x is the same as 3*x is explained here https://math.stackexchange.com/a/817298/72724 :

[…] “I have three boxes”, I don’t need any other words between ‘three’ and ‘boxes’. Similarly, “I have three 𝑥”, in shorthand, becomes simply 3𝑥.

But when we write “hel” * “lo” it is not that we have “hel” "lo"s.

The reason adjacency makes sense for single character concatenation is simply that “a” + “b” literally is the string “ab”.

I’m gonna have to disagree with you there, \oplus should be a Kronecker sum!

⊗(A, B) = kron(A, B)
⊕(A, B) = A ⊗ I(size(B, 1)) + I(size(A, 2)) ⊗ B

1 Like

I think You’re mixing two structures: monoidal and \mathbb{N}-module one
when you hel*lo that’s a monoid structure (operation happens between two strings), so is a map A^* \times A^* \to A^*. When you want to write 3hel, hel is a string, but 3 is a number, so that’s a diferent map: \mathbb{N}\times A^* \to A^*. You’re trying to impose a structure of semiring-module over monoid A^* if you pardon my french:)

But our monoid is non-commutative so these analogies leads to all kind of problems: hellohellohello = 3hello = 3(hel + lo) = 3hel + 3lo = helhelhellololo

sorry, I can’t follow this argument

1 Like

Different fields reuse operators for different purposes (this makes sense, not that many operators look nice or can be distinguished in handwriting or visually; everyone wants to use the best ones). So saying that something is the “correct” symbol for some operation is pretty shaky for all things learnt after the second grade of elementary school.

Also, if chaining is needed (for optimization, or semantics), in Julia it may be best to use a function.

2 Likes

Yes I exaggerate slightly. However I was careful to say “for vectors”, and ⊕ acting on vector spaces does have one meaning. (Or at least, I have never ever seen another.) Likewise ⊗, which https://github.com/JuliaLang/julia/pull/35150 may finally build in.

1 Like

@mcabbott The idea is that ++ should be the generic concatenation operator, not just the string concatenation operator. So it should work for anything that can be vcat-ed.

Regarding defining ++(args...) = string(args...): The * string concatenation operator does not convert non-strings to strings, so ++ shouldn’t either. If you explicitly need conversion, then you should use string().

Also, note that concatenation of scalars (via vcat) currently returns a vector, so it’s not unexpected that "hello" ++ 1 == ["hello", 1]. In other words, ++ will only perform string concatenation if every argument to ++ is a subtype of Union{AbstractString,AbstractChar}.

I’ve opened an issue on the new PlusPlus.jl to enable n-ary usage of ++:

If we have the following definitions for ++,

++(args::Union{AbstractString,AbstractChar}...) = *(args...)

++(args...) = vcat(args...)


then we get all kinds of goodies:

julia> "quick " ++ "brown " ++ "fox"
"quick brown fox"

julia> [1, 2] ++ [3, 4] ++ 5
5-element Array{Int64,1}:
1
2
3
4
5

julia> m1 = [1 2; 3 4];

julia> m2 = [5 6; 7 8];

julia> m1 ++ m2
4×2 Array{Int64,2}:
1  2
3  4
5  6
7  8

julia> using DataFrames

julia> df1 = DataFrame(a = 1:2, b = 11:12);

julia> df2 = DataFrame(a = 3, b = 13);

julia> df1 ++ df2
3×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 11    │
│ 2   │ 2     │ 12    │
│ 3   │ 3     │ 13    │

9 Likes

There you go. Julia for the win

1 Like