Standardizing syntax?

I can fully understand your idea, if all languages would be unified to one standard, it would be much easier; one would just have to learn one programming language. From a logical viewpoint this is great.

The same would hold for natural languages. We still have different languages – and that is great, since it puts an emphasis on each and everyone’s diversity and culture (or at least hopefully as many as possible).
So in practice — I would also prefer to embrace diversity.

4 Likes

I’d suggest to delay the discussion of the transition to unified syntax in all program languages until the physical world unites on either imperial or metric system :wink:. What do you know about the fun of having 6mm and 1/4" fittings in the same box!?

5 Likes

No, we have several ISO standards already for curly-braced languages, plus e.g. for Ada, APL, Fortran, and Cobol, so many very different ones.

A.
Even function is problematic for that, in e.g. Julia, not in as much in pure functional languages, because it’s also used for non-mathematical functions, i.e. those with side effects (marked by convention with ! in Julia, so not always).

I at least understand that in math a function is very general, but still has to be deterministic (so less so the in programming languages, there actually often rather procedures), for a given input, and e.g. rand() isn’t (that you could argue should be named rand!()). If you allow for the hidden/implicit global state (the seed), then it can be argued to be a function, but but it can no longer argued to be a “function” when asking for e.g. a non-pseudo random number.

Also function and most keywords in languages are English-language biased. Also syntax is biased against non-syntax languages like Lisp/Clojure. Also (standard infix syntax) against concatenative languages, like Forth, or Smalltalk.

B.
I think it’s more productive to find the best semantics of a language, hopefully the one best, such as Julia, and then you need only that one defacto standard.

E.g. the (exact) syntax for destructors and RAII, is less important than what kind of RAII we have or if we have such semantics at all:

There are e.g. 11+ ways to do memory management, and we could standardize on the best one or few, and most are maybe only familiar with the main 3-4 ways, and not even the best one none of the common ways or borrow checking (TL;DR seemingly the best on is “generational references with regions”):

https://verdagon.dev/blog/higher-raii-uses-linear-types

Why haven’t we seen this before?

(Or skip to the Seven Arcane Uses)

Existing languages can’t quite do this, and I’ll show why below.
[…]
Rust is unfortunately even less capable than C++ here.

One of the intriguing approaches:
https://verdagon.dev/blog/when-to-use-memory-safe-part-1

Memory tagging will generate a random 4-bit number for every chunk of memory. Whenever we create a pointer to that memory, it will put that 4-bit number into the top unused bits of the pointer. […]

This is particularly good for debugging and testing. If this is enabled for your integration tests, then any invalid access bug has a 94% chance of being caught.

https://verdagon.dev/blog/when-to-use-memory-safe-part-2

Garbage collection is probably the best approach for developer velocity. It completely decouples your goals from the constraints of memory management.
[…]
In garbage collection, we don’t have to satisfy the move-to-move constraint, or borrow-to-borrow constraint. We dont have to worry about matching pointers versus values. There’s just one kind of reference, rather than Rust’s 5 or C++'s 7. 26
[…]
For example, the borrow checker can turn memory safety problems into privacy problems if one’s not careful.
[…]
Cone and Verona will allow us to explicitly separate GC regions from each other, such that we can create and destroy a temporary short-lived region before its first collection even needs to happen. By using regions wisely, one can probably avoid the vast majority of collections.
[…]
With these advances, we might be able to get GC’s development velocity advantages without the usual performance drawbacks.

https://verdagon.dev/blog/zero-cost-borrowing-regions-overview

Adding the pure keyword eliminates every single generation check in this function, making its memory safety zero-cost.
[…]
For this reason, Vale’s regions are opt-in. In fact, one can write an entire Vale program without knowing about regions.

https://verdagon.dev/blog/on-removing-let-let-mut

We’re going to commit a cardinal sin today and talk about syntax design!

2 Likes

Aren’t these standards specific to these languages? They were promulgated as such e.g. ISO C standard as against ISO high level programming language standard.
I was talking about a common syntax.

In any case, it seems Python’s is becoming one. Mojo and Bend support it. It is quite possible that promotional / adoption strategies may force some new languages go the same way.

I am curious why do language designers make up their own? Is it because they feel the need to be unique?

Different languages are different. They have different behaviors, or else they wouldn’t be different languages. Syntax is the most superficial part of any language. Most follow established math notations for arithmetic and orders of operation, but even math itself doesn’t have a standard or even fully-consistent conventions. So you’ve gotta pick what’s going to work best for your audience.

For example, is ^ exponentiation or is it bitwise xor? Is ! logical negation or is it a factorial or is it both? Is = an equality test or is it an assignment or is it defining a symbolic equation? Do you use && or & or and for logical ands?

Surely you have preferences and biases there based upon your own experience. And that’s just basic maths! Now add in lots of other more language-design things to the mix — types and classes and interfaces and methods and functions and on and on.

I’ll note that in the context of Julia, many proposed syntaxes have been rejected simply because they’d be too confusing for folks coming from other languages. As just one concrete example, it’s not obvious how a ** operator should behave with respect to order of operations — and people coming from different backgrounds would have rightly have different expectations! And it’s for exactly this reason that the pull request suggesting its inclusion in Julia has stalled: julia#39089.

Much inspiration of Python’s syntax is found in how folks typically wrote out “pseudocode” in CS textbooks. Much inspiration for Julia’s syntax is found in how folks typically wrote out algorithms in maths textbooks.

17 Likes

Thanks. That’s helpful.

You can only have semantic meanings for things that can be expressed in the syntax. If your language has new semantic ideas (let me just imagine some kind of self-iterating sequence) then you’re going to need a new kind of syntax to express that new idea.

Even for things where the semantics are the same in two languages, having a unified syntax isn’t necessarily ideal, because there could be different meanings for related ideas in the two languages…

a : b means a sequence starting at a and going to b in Julia, it means a variable a which has type b in Pascal. In Julia that’s what a::b means.

It’s hopeless to ask for a “unified syntax” across different languages because the languages themselves have different semantic possibilities.

2 Likes

Humans aren’t alone in having not converged to a single language/dialect (from wildorca.org):

Orcas communicate through pulsed calls, and whistles and these form a unique dialect for a family. They express their identity through their cultural habits, and their prey choices are central to this, and so it shapes their language.

Orca language is learned and inherited, and just like human babies, orcas can hear their mother in the womb, and so they’re learning their family’s language before they’re born!

The Southern Resident killer whales’ language is so sophisticated that it contains three distinct dialects, one for each of the pods—J, K, and L—with vocalizations that are unique to each pod. However, some calls are common across all three pods, facilitating communication across the community, which allows them to socialize, bond, and mate with other pod members, and most likely for many other cultural and social traditions that we are not even aware of!

In the Salish Sea, there are two different types of killer whales, each with its own culture. The Southern Residents eat salmon, and this shapes their culture and language. Bigg’s killer whales, aka transients, eat marine mammals and this requires different hunting techniques and so a different language. There’s no evidence that these groups can communicate between each other.

I think programmers are a bit like the Southern Resident orcas; there is common syntax across programming languages which allow us to interact with members of other programming pods, but each pod has its own distinct dialect.

7 Likes

For a new language you must change syntax (e.g. from && to and: trivia C++, by now, allow both, is a famously complex language, and this makes for less consistent) and/or semantics (e.g. add multiple dispatch), if neither then well same status quo, no more languages, that many may be ok with.

Note, Julia doesn’t add too much syntax, it mostly stays with MATLAB syntax, but most all do some changes e.g. very math-like polynominal 2x^2 + 1 is possible with Julia (no other language I know has this capability, also exploited in interesting ways with Unitful.jl), it doesn’t force you to do 2 * x^2 like most, or even 2 * x**2 like in Python, just showing Python isn’t the best, also always in the eye of the beholder:

https://wiki.c2.com/?IdealProgrammingLanguage

  • It is easy to read, unlike perl. Python CAN be a good contender.
  • It is top down, rather than bottom up; Python is NOT a contender.

FYI: You can (or could for pre-1.0) have Python syntax in Julia:

while outdated, could be resurrected, I’m not sure it should [be used], there is still current for Lisp/clojure (non)syntax some swear by (I don’t like it, I do like Clojure semantics):

2 Likes

I only know of one Bend and it is very unlike Python, let alone adhering to a Pythonic standard. It doesn’t even have loop structures. Mojo as a Python superset adds a parallel syntax for performant code, which seems to flout the idea of a standard on its own.

1 Like

Even ignoring the various reasons that a uniform syntax is a non-starter, if one had to be invented, it shouldn’t be based on Python.

Python has significant indentation. Indentation bugs are real, that’s what happens when you think you’re inside one if statement (for example) and it turns out you’re in different one. In a language which doesn’t use semantic indentation, running the formatter will put everything at its appropriate level, and now you can use the indentation to spot the bug.

Except in Python. In Python, the indentation is the bug.

I do like writing a page of Python to do something simple. But I hate writing a bunch of Python to do something complex. This problem is one of the reasons for that.

10 Likes

After thinking about this for a while, I realized that I actually do not want to see standardized syntax. I actually would like to see the opposite - customizable, individualized syntax. I would to be able to change between def, fn, func, and function as easily as I change from light mode to dark mode. Additionally, I think this will happen.

While I’m pretty pessimistic about the role of LLMs in software development, I do think they would be pretty good at helping to transform one coding “style” to another as well as dealing with the variable renaming that would need to ensue.

At the end of the day, this may actually achieve the desired goal. You would be able to switch between different language backends using while using a common language frontend syntax for all of them. While in standardization, the resulting syntax is chosen by some committee, with customization the syntax is chosen according to your own preferences.

2 Likes

In some sense this has happenend already. If you strip the code of its meaning, all that remains are tokens and some structuring elements. There some standards for this like e.g. S-expressions known from Lisp or M-Expressions.

I think if one removes the semantics from the syntax (which needs to happen if we want to standardize the syntax on its own), then naturally you end up with some notation for a nested structure of symbols. Some nesting is necessary imo because otherwise I don’t see how to get control flow constructs such as if or function which “wrap” some code. So the question is how do we delineate blocks i.e. group some symbols together? Most languages use round parenthesis and commas for what will be interpreted as function calls/applications and some different ways to group statements: Python chose (poorly) indentation and colons, many C-like languages chose curly braces and semicolons, Julia has end and multiple ways to start a block and S-expressions just reuse the standard round parenthesis for everything (which many people complain about but actually its a very nice feature given good editor support).

To me the best universal syntax I know are actually S-Expressions because they only focus on the important stuff: giving structure to symbols. What these symbols mean in the end is a priori irrelevant: it could be code, it could be data of sorts but then what really is the difference if we only consider standardizing syntax?

4 Likes