It may be a radical proposal, but I believe it is proper time for a decision to get closer to mathematical notation, and once forever (and for the sake of later generations), distinguish between equality/definition and assignment, and between mutable and immutable states. Implicit mutation is the “dangling pointer” of modern programming languages.
Computational code has virtually become a ubiquitous part of our applied mathematics literature; therefore, it is of utmost importance to have a clear, explicit, and less error-prone syntax and semantics which, apart from providing a ground for compiler optimisations, makes the code easier to read and reason about for the human audience – not just the machine.
It is likely that some sections of my proposal are still ill-conceived (or misunderstood by me) or the proposed syntax could be problematic, but I hope to convey my idea as clear as possible.
Programs must be written for people to read, and only incidentally for machines to execute.
— Abelman and Sussman, Structure and Interpretation of Computer Programs
The Proposal
1) Constant vs Variable
Equality, =
, must have the same meaning as in mathematics (or pure functional languages). It produces a constant binding of the left-hand side to the value of the right-hand side.
For example,
a = 1
or
a::Integer = 1 # explicit type info
or
const a::Integer = 1
means that a
is a constant binding in its own scope, and cannot be re-assigned to any other value; so that, after the declaration above,
a = 2
a = 3.4
a += 1
will produce an error (or a deprecation warning) message.
In other words, Constantness is the default (as for Immutatbility [see below]). Note that Constants are merely Immutable bindings.
Variables must be then declared explicitly with a syntax similar to R (or as in Haskell monads), or with keyword var
, as
v <- 1 # no type info => type as well as value can be modified later
or
var w::Integer <- 2 # declares an Integer Variable and assigns value 2 to it
w::Integer <- 2 # shorthand version of the previous statement
so that w
denotes an Integer
Variable which can be re-assigned to Integers deliberately.
Note that when type is given explicity, as for w
, then modifying the type is not allowed later (type-stability).
This implies that the following code will run without errors/warnings:
v <- 2 # (re-)assignment to Integer (value modified)
v <- 2.0 # (re-)assignment to Float64 (type modified)
v += 1 # addition assignment
v *= 1 # multiplication assignment
w <- 0 # (re-)assignment to Integer (value modified)
while this will produce an error (or warning):
w <- 1.0 # (re)-assignment to Float64 (type cannot be modified)
Furthermore, for
loops must be written as
for i <- 1:10
println(i)
k::Float64 <- i/2
end
for j in 1:10
println(j)
end
[2i for i <- 1:10]
so that for i = 1:10
will be forbidden, because i
should be a Variable counter for the loop, not a Constant.
2) Mutable vs immutable
Currently struct
is equivalent to immutable struct
, and mutable
keyword explicitly declares a mutable state.
I suggest mutability be denoted explicitly everywhere (perhaps, via a simpler keyword like mut
or !
[see below]); for instance, in the following,
p = [2i for i <- 1:5] # immutable Array (can only be initialized)
or
p::Array{Integer} = collect(1:5) # immutable Array of Integers (can only be initialized)
or
# extended initialization
p::Array{Int32}(5) = begin
x = 2; y = 3;
pTMP = Mutable Array{Int32}(5) # temp. mutable Array
for i <- 1:endof(pTMP)
pTMP[i] = i * x + y
end
pTMP
end
p
will be a constant binding to an Immutable Array
that is initialized once in its scope, after which, no modification to its contents is allowed.
Note that in the “extended intialization” case, a local temporary Mutable Array
is made and its final contents are used for initializing p
.
Furthermore,
v = Mutable Vector{Integer} = [1, -2] # mutable Vector
v = Mut Vector{Integer} = [1, -2] # shorthand notation for the previous statement
v = !Vector{Integer} = [1, -2] # shorter notation for the previous statement
q = Mut Array{Integer} = collect(1:5) # mutable Array
where q
(v
) is a constant binding to a Mutable Array
(Vector
) with modifiable/variable contents. Note that q
and v
are Constants (see above), hence, =
should be used for their initialization. Yet, they contain elements which are Variables, so that the following are allowed:
q[1:2] <- [-1, -7]
q[3] += 2
q[:] <- [-1:-1:-5] # re-assigning the contents of q
q .<- [-1:-1:-5] # shorthand notation for the previous statement
v[1] *= 4
v[:] *= 3 # multiply all elements by 3 and re-assign them
v .*= 3 # shorthand notation for the previous statement
After the declarations for p
, q
, and v
above, any of the following statements will produce an error/warning:
p = [1:3] # p is a Constant; no re-declaration
p[2] = 3 # p is Immutable; no re-declaration of contents
p[1] += 1 # p is Immutable; no re-assignment of contents
v <- [-1, 2] # v is a Constant; no re-assignment
Moreover, Mutable user-defined types, or struct
s, can be declared explicitly as
mutable struct T1
c0
d0
end
# shorter keyword
mut struct T1
c0
d0
end
# much shorter notation
!struct T2
c0
d0
end
all of which are equivalent, while
struct T3
c0
d0
end
declares an Immutable struct
.
Thus, Immutatbility is the default (as for Constantness [see above]).
3) Function arguments
Variable arguments of a function
can be declared explicitly as the following:
function foo!(a::Int32, var x::Float64)
# ...
end
where a
will be a Constant and cannot be re-assigned, so that a = 1
or a <- 1
in the function
body will produce errors/warnings. However, x
is a Variable and can be re-assigned with x <- 1.0
in the body (but with type-stability).
Mutable arguments of a function
can be declared as the following:
function foo!(a::T1, x::Mut Array{Int32}, y::Mut T1)
# ...
end
# shorter notation
function bar!(a::T1, x::!Array{Int32}, y::!T2)
# ...
end
where a
will be a constant binding to an Immutable struct
of user-defined type T1
above; i.e., contents of T1
cannot be modified in the function
body, and a
cannot be re-assigned to any other value.
However, x
and y
will be constant bindings to Mutable objects. Hence, in the function body, x[1] <- 3
and y.c0 <- 2.3
are allowed, while a = 2
, a <- 2
, x[1] = 3
, y.d0 = 0.1
, etc., produce errors/warnings.
4) Type hierarchy
In the type hierarchy, Constant T
/Variable T
and Immutable T
/Mutable T
are subtypes of type T
.
Variables (Mutables) can be “cast” implicitly into Constants (Immutables) in function arguments as for y
in foo(y)
below:
function foo(x::Int)
# x is a Constant Int
#...
end
y <- 1 # Variable y
println( foo(y) ) # Variable y is cast to a Contant Int
Conversion from Constant to Variable, or from Immutable to Mutable is not allowed.
5) Edge cases
One could imagine a case where vm
is Variable and Mutable:
var vm::Mut Array{Int32} <- collect(1:3)
or, in shorter notation,
vm::!Array{Int32} <- [1:3]
but this does not happen in 99% of practical situations, and should be deemed as bad style.
================
Codicil
This is an updatable addendum to the proposal based on the comments.
- Notation
@DNF suggested a better notation: use:=
instead of<-
for assignment; example:
var a::Integer := 2
v = Mut Vector{Int64}(3)
v[1] := 3
v[2] += 1
We can indeed build a better notation, if we agree on the concepts; one should only be consistent.
- It is important to emphasise that no radical changes is meant to the fundamental structure, type system, or the logic of Julia.
- Most succinctly, I propose explicit syntactical safeguards to denote programmer’s intention to mutate values or bindings (mutations of the state) in any scope. I firmly believe that it leads to much better coding style in the long run, which is also closer to the mathematical syntax and much safer (since the compiler knows the intent of the programmer and can warn her if needed). The stylistic convention to add
!
to the name of functions which mutate one of their arguments is clearly in this direction. Let’s generalize this nice decision systematically.
For example, according to the proposal, by
a = 2
b = Array{Int32}(10)
c = ImmutableArray{Int32)(5, 5)
the programmer and the code reader (plus the compiler) will be sure that a
is an immutable binding to the value 2, b
is an immutable binding to a mutable Array
, and c
is an immutable binding to an immutable Array
. If, somewhere else in the code, such decisions/pledges are violated, the compiler will throw an error or a warning.
Mutability will be then explicitly marked up, as in
v::Integer := 2
w := -1.5
z.c0 := f
z[i] := 32a + h(2)
where everybody (along with the compiler) is informed that v
is intended to be a mutable binding (but with immutable type) to a value 2, w
is a mutable binding (with mutable type) to a value, -1.5, and z.c0
and z[i]
values are intentionally mutated.
Using :=
for mutation and =
for constant binding (in the corresponding global or local scopes) does not hurt any fundamental principle of Julia, or a quick-scripting user of the interactive mode.
- For further safety inside function bodies and easier reasoning about the code, I see a strong need for an explicit syntactical mechanism like the
intent
keyword of FORTRAN (see above); for instance, by a construction like
# a pure function
function fp(a::Int64, s::Set{Int32})
#... do some computation ...
end
the programmer explicitly promises not to mutate the bindings of the arguments or their contents in the function body, so that
a = 2
a := -1
push!(s, 9)
s = Set([1, 2])
s := s2
will produce compiler errors.
In contrast, a definition like
# an impure function
function fip(var a::Int64, mut s::Set{Int32})
#... do some computation ...
end
allows
a := 2
a += 1
push!(s, 9)
a = 2 # `a` becomes constantly bound thereafter
where var
and mut
are “modifiers” or “flags” to show explicitly that the programmer intends to mutate the binding/value of a
and mutate the contents of s
with this function.
This is also not against any fundamental principles of Julia, afaiu, but leads instead to much safer codes, which are very easy to reason about by a human.
- I think there is a need for an in-built mechanism to map mutable types to immutable types automatically; e.g., suppose that somebody has built up a library which includes a mutable type
List
with apush
method to append elements to theList
. Then something like
typealias ImmutableList Immutable(List)
will generate an immutable-List
type which can only be initialized; moreover, any mutation method built for List
will produce an error when applied to ImmutableList
; e.g., after the generation of the immutable type above, and defining
lm = List([1, 2, 3])
lim_1 = ImmutableList([5, 6, 7])
lim_2 = ImmutableList(lm) # an immutable copy of `lm`
either of
push!(lim_1, 8)
push!(lim_2, 4)
will produce a compile error, while
push!(lm, 4) # does not mutate `lim_2`
works fine.
- Note that I do not insist on an strict enforcement. It would be enough to have warnings. In this way, no previous code would break, while we promote a better style of computational coding for the future, where intention to mutate bindings or values is always explicitly denoted.