make a Path
Yep, that’s a reason that str1/str2
doing path concatenation isn’t great. Yes, it looks nice, but there’s a different sensible meaning for that operation. If there were a path type, having path1/path2
do path joining would be better, but then it’s very different from strings. Seems better to avoid and use a different syntax. There’s also always the option of having a non-standard string type for paths, e.g. path"foo/$part/bar.txt"
which could normalize to the right kind of path and do checks like if part
is something like /
or /root
then raise an error.
The string operations wikipedia page is linked from that thread and oh, hey, would you look at that—the mathematical syntax used for all the string operations is exactly the same as Julia’s syntax for string operations. What a coincidence!
I had almost forgotten that: actual, literal concatenation/juxtaposition looks like implicit multiplication. I think that’s actually what made me think *
was OK.
If this was directed to me, I just want to note that I don’t harbor the slightest annoyance about the choice, I haven’t thought about the explanation in years, and when reading it now, my reaction is “neat, I learned something”. @StefanKarpinski’s comment above extending this idea to a semiring is downright fascinating.
This thread just reminded me that when I was first learning Julia 6-7 years ago and encountered this explanation, I walked over and showed it to my colleague so we could roll our eyes in disbelief together. First impressions matter, so I thought that was worth sharing here. (With my sincere apologies to whoever wrote that paragraph, which, again, I now find educational.)
Partly, I guess, but not mainly.
Same experience here – heavy eye roll, general indifference, acceptance, fascination. I’ll let others decide what it means about me that program syntax can make me go through something like the stages of grief.
Julia makes enough Big Design Choices that I would still back +
if we were deciding today because I don’t think another layer of discomfort to newcomers is worth the benefits, but it’s definitely neat.
Another worthwhile Wikipedia page on the subject is
Or we could have a + b = min(a, b)
. We then compactify the space of strings with a -Inf
and +Inf
. We have that +Inf
is a neutral element for addition, whereas -Inf
is an annihilator. I’m sure the ideal structure can be figured out, but I’d need a cup of coffee.
Languages that don’t worry about algebra or generic operations can pun like crazy.
$ pike
Pike v8.0 release 1738 running Hilfe v3.5 (Incremental Pike Frontend)
> "hello world" / " ";
(1) Result: ({ /* 2 elements */
"hello",
"world"
})
> "hello world" / 2.8;
(2) Result: ({ /* 4 elements */
"hel",
"lo ",
"wo",
"rld"
})
> "hello world" * 2.8;
(3) Result: "hello worldhello worldhello wor"
The Pike language uses division of a string for splitting, i.e. a/b
is (Julia-equivalent) split(a,b)
. a*b
is then join(a,b)
.
I can see an aesthetic and possibly a logical advantage of *
over +
there, but it’s not strong enough to me to warrant much more of an official explanation than “we know it’s unorthodox for programming, but it’s rooted in string function notation from formal language theory”.
- In practice, people didn’t have a problem with a symbol
+
used for a commutative operation (addition) also being used for a non-commutative operation (concatenation) when the input types can differentiate the call. We already do that for multiplication of integers and floating points versus matrix multiplication. In that context, arguments for preserving commutativity thus seem like a minor aesthetic preference that isn’t broadly applied. - Framing addition as the commutative one and multiplication as the non-commutative one in general doesn’t pass muster when both addition and multiplication are usually commutative. The element of truth is when just one of them is non-commutative, it’s multiplication e.g. addition and multiplication of 1x1 matrices show the generalization of scalar addition and multiplication. There isn’t both an addition and multiplication for strings (though languages might), so the stated choice of
*
for being non-commutative is out of the usual context. - String operations don’t need to generalize or be associated with addition or multiplication, putting aside notation. The identity element could be a zero for addition or a one for multiplication (Julia does do
one(String) == ""
), but a stringx
doesn’t really have an additive inversex + (-x) == ""
or a multiplicative inversex * inv(x) == ""
. We do havechopsuffix(x, x) == choppreffix(x, x) == ""
, but there’s no direct-x
orinv(x)
input, and it does nothing if the strings don’t match enough (endswith
/startswidth
), which is much more unusual for addition or multiplication than non-commutative addition. /
and\
pairing suffix/prefix stripping with concatenation as right/left division is paired with multiplication is aesthetically pleasing, but that doesn’t match the right/left quotient that returns an empty string upon mismatches, and what-ifs aren’t nearly as strong an argument as the existing language. If anything,chopsuffix
/chopprefix
suggests string operations should all be named functions;repeat
is overloaded to do what^
does, andstring
can do string concatenation. Languages with+
for concatenation mostly use named functions for string operations, and even Pike has some on top of its fuller “arithmetic.”
I’m sure some variant of the Grothendieck construction can be applied to create string inverses, both additive and multiplicative. Sooner or later we’ll end up with a free module of strings over the integers, hopefully with a countable set of generators.
Programming languages are full of notation that was initially pretty arbitrary, it’s just that we got used to it over decades.
It’s nice if operators have some internal consistency and/or resemblance to a familiar notation, but that is just one aspect. Brevity, clarity, etc are also important.
Since Julia is primarily not a string-processing language, I am not sure we should introduce operators for a lot of string-related API. Sure, *
is fine, arguably ^
is useful too, but after that, diminishing returns kick in. Packages can do /
, \
, ++
, //
, in their own namespace if necessary for DSLs, while Base and the standard libraries can just expose functions that have no operator equivalent.
I imagine you may well be thinking of the (dormant, but not forgotten/abandoned) thread Designing a Paths Julep, but in case you or any interested readers have forgotten, that’s exactly what I’m proposing there
Another upside of this sort of approach is that based on whether a $string
or $path
is being interpolated, you can catch and handle unsafe input like:
# from reading config/user input/etc.
untrusted_file = "../../../../../../etc/passwd"
read_path = path"user_content/$untrusted_file"
# ^ interpolation error thrown
Eh, IEEE floats may muddy the waters, but +
is the Abelian group operator. Just because it’s not perfectly adhered to is no reason to abandon convention.
That said, I do see the confusion around *
, and I think there’s a solid argument that instead of picking between punning addition/multiplication it would be sensible to dedicate an operator to the concatenation of containery-types (vectors, strings, etc.) like ++
. Until Julia 2.0 though, we’ll just have to live with the status quo, and this niggle is fairly low my list of breaking changes I wish we could make.
I’m not sure we want that. Containers aren’t all sequences, so concatenation can work really differently. Multidimensional arrays have special array literal syntax backed by vect
/hcat
/vcat
/hvcat
, and that doesn’t concatenate input strings (and non-AbstractArray
s) because we really want to write strings as separate elements. We often want to write AbstractArray
s as separate elements, too, which requires seemingly superfluous brackets. As for sequences, concatenation may not even be useful enough to warrant implementation: [(1:2);(4:5)]
cannot be another inlinable UnitRange
, so [(1:2);(3,4)]
should not be either for type stability.
We also wouldn’t need to invent a new syntax for strings, we already have string interpolation. "$(a)b$(c)d"
is just a more flexible alternative to a * "b" * c * "d"
lowering right to string
instead of *
.