Syntax Surprises

I think this is very clear and desirable. It’s in fact the main thing I like about this syntax.

3 Likes

For someone expecting the rules of operator precedence to hold, it is the opposite of desirable.

2 Likes

It’s the same as the difference between (2x + 4)/3a and (2x + 4)/3\cdot a, which is what makes this work so well. The behaviour of juxtaposition below the division sign is the main attraction of this syntax for me.

5 Likes

But the point of disallowing 2e9 is to eliminate syntax ambiguity with numeric literal juxtaposition. Giving it a different meaning doesn’t help at all, it’s just breakage with no benefit. The point would be that the integer must be written as 2×10^9 while the float must be written as 2.0e9 leaving, 2e9 as only meaning 2*e9, thereby eliminating the exceptions to the literal juxtaposition syntax.

3 Likes

If we were starting from a clean slate, I’d actually love having 2×10^9 as the literal syntax for integers and 2.0×10^9 as the literal syntax for floats. It’s much cleaner and matches textual notation. But people are very accustomed and attached to the existing e syntax, so that’s not going to go away, but it’s a bad syntax imo.

11 Likes

Yes, I think I already addressed that. 2e9 is too valuable in its current meaning, and giving up the possibility to write 2.5x is also bad.

I find the ambiguity acceptable, since 2e9 is so established. The case insensitivity is unfortunate, though.

What about a new infix letter for integers? For example 1s3 == 1000?

Candidates for the infix could be s (as in signed integer) or i (as in integer, although it’s getting close enough to im that I’m slightly uncomfortable). Signs on the exponent would be unnecessary (and probably disallowed) since only positive exponents are acceptable. We could also offer 0x1cs2 == 0x1c00 and 0b101s2 == 0b10100, but I think those might be going too far and there’s slight potential for confusion with the juxtaposition of hex values a-f with the exponent delimiter s.

EDIT: this would be breaking as 1s8 is currently evaluated as 1*s8. So I think this isn’t on the table.

Regarding @StefanKarpinski’s suggestion of a new literal syntax, isn’t it already rendered obsolete by constant folding?

That is, you can write 2.0*10.0^9 and this will be turned into the constant 2e9 by the compiler already.

A downside is that it won’t produce the desired number in the face of integer overflow, if you accidentally use an integer exponent. For example 2.0*10^100 is 0.0 where you probably intended 2.0*10.0^100.

An upside is you get to pick the exponent. Another upside is it works equally well for integers.

As you point out, this makes the clash with juxtaposition worse. What you want to avoid is number at taxes that are just some letters after a number. The 0x prefix style avoids this (and zero is a largely unuseful literal coefficient).

1 Like

Complex-number literals with zero imaginary parts, e.g. 3 + 0im, are pretty useful. (Complex literals are also a good general example of the utility of juxtaposition-multiplication, whereas other languages had to build complex-number syntax into their parser, or be stuck with an awkward syntax like 3 + 0*I.)

8 Likes

I think we all agree that 2e9 is far more frequently useful for expressing 2 billion than for expressing twice e9. And at least to me, it’s a shame to lose 2.5x, which again I write far more often than 2.5*f1.

But consider also the risk of a mistake and the prospect of not catching it before deployment. It’s reasonably likely I’ll someday absent-mindedly make a variable named e or f1 and it’ll be replaced with a numeric literal; it’ll look right during a code review; depending on relative magnitudes and the nature of the test data it might look right; and no automated checker will tell me I’m wrong. And it’s conceivable the folks writing the tests could make the same mistake—no automated check will tell them they’re wrong either, they might have adopted the same coding style as me, and their syntax highlighting colors might be the same as mine. Such errors tend to be correlated.

For the stars to misalign like that on any given project is a fairly low-probability event, but it can also be high-impact so the risk is meaningful. Unless a policy is set at the organizational level that tests should always use different variable names expressly to avoid this eventuality, but imagine the trouble you’d run into to set such a policy.

Giving up on 2e9 for two billion is paying a price, but is that cost really so bad as to expose yourself to such risk when you could just type 2.0e9? Already 2e9 doesn’t make an integer literal in other languages, so insisting it be repurposed to another meaning that people already don’t anticipate without resolving the ambiguity seems especially not worthwhile.

The situation is not as bad for these:

julia> let b1=1, o2=2, x3=3, x4p0=4;
           0*b1 => 0b1, 0*o2 => 0o2, 0*x3 => 0x3, 0*x4p0 => 0x4p0
       end
(0 => 0x01, 0 => 0x02, 0 => 0x03, 0 => 4.0)

because although there is also ambiguity, at least common sense will suppress it.

I find it hard to get worked up over a hypothetical typo that doesn’t seem to have been a serious problem for anyone in 10 years of Julia coding. Doesn’t seem worth it to (a) use a different floating-point syntax than every other popular language by disallowing 2e9 or (b) remove the juxtaposition syntax that we’ve had since the beginning and lots of Julia coders have found elegant and useful (and which makes it possible to define something as basic as complex numbers in pure Julia with no special compiler/parser support).

There are lots of ways to accidentally introduce bugs into your code. There are lots of ways to write obfuscated code. This is true in any language.

I thought the whole point of @StefanKarpinski’s “Julia is not at that stage” PSA was that we aren’t going to have pointless churn from spelling changes any more (something we got plenty sick of prior to Julia 1.0), and that the only major breakage that would be contemplated for Julia 2.0 would be changes that enable major new functionality. And, more to the point (since Julia 2.0 is purely hypothetical at this point), we no longer have to have endless, pointless debates over surface syntax — in which everyone has an opinion and there are no right or wrong answers, so nothing can ever be resolved.

34 Likes

It’s also possible that for the past ten years, most Julia users have been satisfied with lab experiments and desktop calculations, but for the next ten years we should hope to make further inroads into production software with higher quality standards—which would require the sign-off of project managers and other risk-minded folk.

However:

By leaning on a linter, the language proper could remain exactly as-is yet achieve greater correctness in practice. Maybe the right thing to do is to focus this concern toward a Julia standard linter, possibly integrated into the package manager, especially for anybody with correctness concerns.

For dynamic languages especially, one can consider the linter to be practically part of the language. For example, JavaScript has linter standards such as ESLint with StandardJS (This is a good intro video).

The use of identifiers such as e, e1, e2, etc., E, E1, E2, etc., and f, f1, f2, etc. could simply be added to a Julia-standard linter ruleset as hazardous identifiers to throw warnings about.

As a bonus, exerting influence over the Julia ecosystem through linter defaults is also a pretty clean way to nudge people into using return for longform function syntax, among other stylistic preferences. :wink:

Also found this interesting:

The term linter comes from a tool originally called “lint” that analyzed C source code. The computer scientist Stephen C. Johnson developed this utility in 1978 when he worked at Bell Labs.

1 Like

Well I don’t think (2.5)y is an improvement over 2.5*y as it’s both longer and less explicit. So if the parens were disallowed in that case I’d be cool with it.

On the whole juxtaposition syntax is a little icky because it does introduce some syntax surprises. But I’d also forgotten about complex numeric “literals” and unitful quantities. These are cases where juxtaposition really is great.

I’m not arguing against this form (as ugly as it is). People should always be able use parentheses to group subexpressions, and this is just a degenerate example of that.

But in contrast, a keyword argument is about how the function is called, not just what is passed to the function: In some sense, the = keyword syntax is “part of” the function call brackets and should not be separated from them in some arbitrary deeply nested parens.

I agree with this assessment. Case insensitivity is unfortunate and could possibly be addressed if we had a way to migrate syntax variants.

Yes we should have good linting and these things should be easy to detect because they’re just syntax. JuliaSyntax.jl is very much being developed with tooling in mind.

6 Likes

I wouldn’t have a linter flag variables such as f or f1 outright, but it would be good to flag instances where the symbol exists and a numeric juxtaposition would otherwise be shadowed by a float literal (4f+1 or 4f1, for example).

3 Likes

Ah, now this is the best of all possible worlds!

It’d be awesome to have a linter built-in to the package manager, so e.g. I could generate a lint report prior to an upcoming package release by writing:

]
lint MyPackage
test MyPackage

and maybe get a linter in the REPL. Then whenever someone is new to Julia and starts writing things like myVar = 5, they can immediately receive feedback that it’s unidiomatic to use camelCase and be chastised into snake_case, or when they write (2x + 4)/3*a they can receive feedback that * after / should have whitespace.

I like the idea of a linter giving hints and generating reports, but not [on its own] forcing adherence. This causes convergence on standard practice by nagging (the same way my car pretty quickly gets me to buckle my seatbelt by beeping), but you’re still free to break guidelines if it proves useful—you just have to be willing to put up with the nagging and the extra sign-offs, and other people (such as peers and managers) get better visibility into which guidelines you’re violating and can compel you into compliance if there’s no good reason to violate.

It could also be interesting to allow for linter error severity to be reduced, if the line before an error is used for a comment on why breaking the linter rule is justified there.

I see what you’re saying now…

Function Abduction

julia> f(args...; kwargs...) = (; args, kwargs)
f (generic function with 1 method)

julia> f(((((a=1)))), ((((b=2)))), ((((c=3))));) # 🤔🦆
(args = (), kwargs = Base.Pairs(:a => 1, :b => 2, :c => 3))

julia> f(((((a=1)))), ((((b=2))),), ((((c=3))));) # 🦆🦆
(args = ((b = 2,),), kwargs = Base.Pairs(:a => 1, :c => 3))

It does feel like this ought to be a syntax error.

(also, Base.Pairs are kind of a WAT of their own, although it’s not a syntax surprise:)

julia> f(a=1)
(args = (), kwargs = Base.Pairs(:a => 1))

julia> Base.Pairs(:a => 1)
ERROR: MethodError: no method matching Base.Pairs(::Pair{Symbol, Int64})
1 Like

if you get rid of all the superfluous parens it makes a lot more sense

julia> f(a=1, ((b=2),), c=3)
(args = ((b = 2,),), kwargs = Base.Pairs(:a => 1, :c => 3))

Where it becomes clear that the source of confusion is the fact that kwargs are allowed in non-final position (unlike python :snake:). I do agree with you that any linter should be able to catch ‘obviously’ redundant parens, aka those which look like (( .. ))

Also I do not think the example of Base.Pairs(:a => 1) erroring is particularly problematic. Just consider

julia> typeof(:a => 1)
Pair{Symbol, Int64}

julia> Pair(:a => 1)
ERROR: MethodError: no method matching Pair(::Pair{Symbol, Int64})

It is certainly not always the case that a struct X will have a constructor X(::typeof(X))

1 Like

One more to go:

f(a=1, (b=2,), c=3)

but there are actually four syntax surprises contained in that example:

  1. It’s possible to have superfluous parens around keyword args.
  2. It’s possible to have superfluous parens around NamedTuple fieldname-value pairs.
  3. Keyword args are allowed in non-final position,
  4. even if they are succeeded by a semicolon!

Would you expect this?

julia> f((a=1, b=2), (c=3); d=4)
(args = ((a = 1, b = 2),), kwargs = Base.Pairs(:c => 3, :d => 4))

That’s not at issue here. At issue is a break from Julia’s convention that, where reasonable, an object’s show method should print working code which illustrates how to construct the object.

Obviously Julia doesn’t print working code for vectors, matrices, and functions, but it’s often unreasonable to try, and at least for those it’s pretty obvious that what has been printed isn’t executable code.

The show method for Base.Pairs here has two WATs:

  1. it presents text that very much resembles a call to its constructor, but it’s not, and
  2. in this context it promotes the idea that Pairs of key => value is a preferred way to package and send keyword arguments around, when infact that causes type-instability (see for example, here).
2 Likes

When feasible, Julia tends to display objects in an input-compatible way, like in

julia> (rand(1,1),)
([0.5278893349360236;;],)

Note the trailing ;; to indicate that this is a matrix and to allow it to be pasted as valid code.

So when I see something print as Base.Pairs(:a => 1), I suspect I’ll be able to type that into the REPL and get a valid object. It’s certainly not required, but as I said Julia tends towards this where possible. I’d have stronger feelings about this particular instance if Base.Pairs wasn’t a construct used exclusively by the internals of kwarg functions.

1 Like