Explicit denotation of variability and immutability

proposal

#1

It may be a radical proposal, but I believe it is proper time for a decision to get closer to mathematical notation, and once forever (and for the sake of later generations), distinguish between equality/definition and assignment, and between mutable and immutable states. Implicit mutation is the “dangling pointer” of modern programming languages.
Computational code has virtually become a ubiquitous part of our applied mathematics literature; therefore, it is of utmost importance to have a clear, explicit, and less error-prone syntax and semantics which, apart from providing a ground for compiler optimisations, makes the code easier to read and reason about for the human audience – not just the machine.

It is likely that some sections of my proposal are still ill-conceived (or misunderstood by me) or the proposed syntax could be problematic, but I hope to convey my idea as clear as possible.

Programs must be written for people to read, and only incidentally for machines to execute.

— Abelman and Sussman, Structure and Interpretation of Computer Programs

The Proposal

1) Constant vs Variable
Equality, =, must have the same meaning as in mathematics (or pure functional languages). It produces a constant binding of the left-hand side to the value of the right-hand side.
For example,

a = 1

or

a::Integer = 1  # explicit type info

or

const a::Integer = 1

means that a is a constant binding in its own scope, and cannot be re-assigned to any other value; so that, after the declaration above,

a = 2
a = 3.4
a += 1

will produce an error (or a deprecation warning) message.
In other words, Constantness is the default (as for Immutatbility [see below]). Note that Constants are merely Immutable bindings.

Variables must be then declared explicitly with a syntax similar to R (or as in Haskell monads), or with keyword var, as

v <- 1  # no type info => type as well as value can be modified later

or

var w::Integer <- 2  # declares an Integer Variable and assigns value 2 to it
w::Integer <- 2      # shorthand version of the previous statement

so that w denotes an Integer Variable which can be re-assigned to Integers deliberately.
Note that when type is given explicity, as for w, then modifying the type is not allowed later (type-stability).
This implies that the following code will run without errors/warnings:

v <- 2   # (re-)assignment to Integer (value modified)
v <- 2.0 # (re-)assignment to Float64 (type modified)
v += 1   # addition assignment
v *= 1   # multiplication assignment
w <- 0   # (re-)assignment to Integer (value modified)

while this will produce an error (or warning):

w <- 1.0 # (re)-assignment to Float64 (type cannot be modified)

Furthermore, for loops must be written as

for i <- 1:10
    println(i)
    k::Float64 <- i/2
end

for j in 1:10
    println(j)
end

[2i for i <- 1:10]

so that for i = 1:10 will be forbidden, because i should be a Variable counter for the loop, not a Constant.

2) Mutable vs immutable
Currently struct is equivalent to immutable struct, and mutable keyword explicitly declares a mutable state.
I suggest mutability be denoted explicitly everywhere (perhaps, via a simpler keyword like mut or ! [see below]); for instance, in the following,

p = [2i for i <- 1:5]  # immutable Array (can only be initialized)

or

p::Array{Integer} = collect(1:5)  # immutable Array of Integers (can only be initialized)

or

# extended initialization

p::Array{Int32}(5) = begin
    x = 2; y = 3;
    pTMP = Mutable Array{Int32}(5)  # temp. mutable Array
    for i <- 1:endof(pTMP)
        pTMP[i] = i * x + y
    end
    
    pTMP
end

p will be a constant binding to an Immutable Array that is initialized once in its scope, after which, no modification to its contents is allowed.
Note that in the “extended intialization” case, a local temporary Mutable Array is made and its final contents are used for initializing p.

Furthermore,

v = Mutable Vector{Integer} = [1, -2]  # mutable Vector
v = Mut Vector{Integer} = [1, -2]  # shorthand notation for the previous statement
v = !Vector{Integer} = [1, -2]     # shorter notation for the previous statement
q = Mut Array{Integer} = collect(1:5)  # mutable Array

where q (v) is a constant binding to a Mutable Array (Vector) with modifiable/variable contents. Note that q and v are Constants (see above), hence, = should be used for their initialization. Yet, they contain elements which are Variables, so that the following are allowed:

q[1:2] <- [-1, -7]
q[3] += 2
q[:] <- [-1:-1:-5]  # re-assigning the contents of q
q .<- [-1:-1:-5]    # shorthand notation for the previous statement

v[1] *= 4
v[:] *= 3  # multiply all elements by 3 and re-assign them
v .*= 3    # shorthand notation for the previous statement

After the declarations for p, q, and v above, any of the following statements will produce an error/warning:

p = [1:3]  # p is a Constant; no re-declaration
p[2]  = 3  # p is Immutable; no re-declaration of contents
p[1] += 1  # p is Immutable; no re-assignment of contents

v <- [-1, 2]  # v is a Constant; no re-assignment

Moreover, Mutable user-defined types, or structs, can be declared explicitly as

mutable struct T1
    c0
    d0
end

# shorter keyword
mut struct T1  
    c0
    d0
end

# much shorter notation
!struct T2
    c0
    d0
end

all of which are equivalent, while

struct T3
    c0
    d0
end

declares an Immutable struct.

Thus, Immutatbility is the default (as for Constantness [see above]).

3) Function arguments

Variable arguments of a function can be declared explicitly as the following:

function foo!(a::Int32, var x::Float64)
    # ...
end

where a will be a Constant and cannot be re-assigned, so that a = 1 or a <- 1 in the function body will produce errors/warnings. However, x is a Variable and can be re-assigned with x <- 1.0 in the body (but with type-stability).

Mutable arguments of a function can be declared as the following:

function foo!(a::T1, x::Mut Array{Int32}, y::Mut T1)
    # ...
end

# shorter notation
function bar!(a::T1, x::!Array{Int32}, y::!T2)
    # ...
end

where a will be a constant binding to an Immutable struct of user-defined type T1 above; i.e., contents of T1 cannot be modified in the function body, and a cannot be re-assigned to any other value.
However, x and y will be constant bindings to Mutable objects. Hence, in the function body, x[1] <- 3 and y.c0 <- 2.3 are allowed, while a = 2, a <- 2, x[1] = 3, y.d0 = 0.1, etc., produce errors/warnings.

4) Type hierarchy
In the type hierarchy, Constant T/Variable T and Immutable T/Mutable T are subtypes of type T.

Variables (Mutables) can be “cast” implicitly into Constants (Immutables) in function arguments as for y in foo(y) below:

function foo(x::Int)
    # x is a Constant Int
    #...
end

y <- 1  # Variable y
println( foo(y) )  # Variable y is cast to a Contant Int

Conversion from Constant to Variable, or from Immutable to Mutable is not allowed.

5) Edge cases
One could imagine a case where vm is Variable and Mutable:

var vm::Mut Array{Int32} <- collect(1:3)

or, in shorter notation,

vm::!Array{Int32} <- [1:3]

but this does not happen in 99% of practical situations, and should be deemed as bad style.

================

Codicil

This is an updatable addendum to the proposal based on the comments.

  • Notation
    @DNF suggested a better notation: use := instead of <- for assignment; example:
var a::Integer := 2
v = Mut Vector{Int64}(3)
v[1] := 3
v[2] += 1

We can indeed build a better notation, if we agree on the concepts; one should only be consistent.

  • It is important to emphasise that no radical changes is meant to the fundamental structure, type system, or the logic of Julia.
  1. Most succinctly, I propose explicit syntactical safeguards to denote programmer’s intention to mutate values or bindings (mutations of the state) in any scope. I firmly believe that it leads to much better coding style in the long run, which is also closer to the mathematical syntax and much safer (since the compiler knows the intent of the programmer and can warn her if needed). The stylistic convention to add ! to the name of functions which mutate one of their arguments is clearly in this direction. Let’s generalize this nice decision systematically.

For example, according to the proposal, by

a = 2
b = Array{Int32}(10)
c = ImmutableArray{Int32)(5, 5)

the programmer and the code reader (plus the compiler) will be sure that a is an immutable binding to the value 2, b is an immutable binding to a mutable Array, and c is an immutable binding to an immutable Array. If, somewhere else in the code, such decisions/pledges are violated, the compiler will throw an error or a warning.

Mutability will be then explicitly marked up, as in

v::Integer := 2
w := -1.5
z.c0 := f
z[i] := 32a + h(2)

where everybody (along with the compiler) is informed that v is intended to be a mutable binding (but with immutable type) to a value 2, w is a mutable binding (with mutable type) to a value, -1.5, and z.c0 and z[i] values are intentionally mutated.

Using := for mutation and = for constant binding (in the corresponding global or local scopes) does not hurt any fundamental principle of Julia, or a quick-scripting user of the interactive mode.

  1. For further safety inside function bodies and easier reasoning about the code, I see a strong need for an explicit syntactical mechanism like the intent keyword of FORTRAN (see above); for instance, by a construction like
# a pure function
function fp(a::Int64, s::Set{Int32})  
    #... do some computation ...
end

the programmer explicitly promises not to mutate the bindings of the arguments or their contents in the function body, so that

a = 2
a := -1
push!(s, 9)
s = Set([1, 2])
s := s2

will produce compiler errors.
In contrast, a definition like

# an impure function
function fip(var a::Int64, mut s::Set{Int32})
    #... do some computation ...
end

allows

a := 2 
a += 1
push!(s, 9)
a = 2  # `a` becomes constantly bound thereafter

where var and mut are “modifiers” or “flags” to show explicitly that the programmer intends to mutate the binding/value of a and mutate the contents of s with this function.
This is also not against any fundamental principles of Julia, afaiu, but leads instead to much safer codes, which are very easy to reason about by a human.

  1. I think there is a need for an in-built mechanism to map mutable types to immutable types automatically; e.g., suppose that somebody has built up a library which includes a mutable type List with a push method to append elements to the List. Then something like
typealias ImmutableList Immutable(List)

will generate an immutable-List type which can only be initialized; moreover, any mutation method built for List will produce an error when applied to ImmutableList; e.g., after the generation of the immutable type above, and defining

lm = List([1, 2, 3])
lim_1 = ImmutableList([5, 6, 7])
lim_2 = ImmutableList(lm)  # an immutable copy of `lm`

either of

push!(lim_1, 8)
push!(lim_2, 4)

will produce a compile error, while

push!(lm, 4)  # does not mutate `lim_2`

works fine.

  • Note that I do not insist on an strict enforcement. It would be enough to have warnings. In this way, no previous code would break, while we promote a better style of computational coding for the future, where intention to mutate bindings or values is always explicitly denoted.

#2

I don’t have the time nor capacity to digest all of this, but for all that is holy, don’t use <- for assignment. It is soul-curdlingly ugly.

If you want a different assignment operator, please suggest :=.


#3

I think that you are coming from Haskell, which is a very nice language for certain purposes. These may not carry over to Julia, especially piecewise. Even if you are proficient in Haskell and you are missing some features from there, please consider giving Julia some time as is before proposing a fundamental redesign.


#4

That is also possible, and perhaps nicer; e.g.,

var a::Integer := 2
v = Mut Vector{Int64}(3)
v[1] := 3
v[2] += 1

One should only be consistent.
My point is to explicitly differentiate between assignment and equality, and between mutability and immutability.
We can indeed build a better notation, if we agree on the concepts.


#5

Actually, I am coming from the C++ world (or hell). But I know a bit of Haskell. In fact, my sad (but long) experience with numerical computation in C++ has led me to plead for this, before it’s too late.


#6

My impression is that Julia strives to be a simple language to pick up with a gradual learning curve. This comes at the expense of lacking some safety and strictness features. I suspect that above proposal is way too far on the strict side to have a chance.


#7

I agree that this proposal might be rather strict, @mauro3.

However, I’d say, declaring Variables and Mutables explicitly (e.g., via var or mut keywords) does not hurt the beginning learner. Everything then flows as “usual”. Remember that Julia type system or the absence of object-oriented stuff therein might be also considered as “too complicated” for a gradual learning curve.

Yet, at the same time, these more complex modern concepts allow better computational coding for the intermediate or advanced users. In general, imo, the syntax should promote better coding standards as far as possible.

I believe Julia is meant to be a language designed mainly for (optimised) computational tasks with a modern type system and in-built mathematical structures which do not compromise performance; otherwise, we have already Python, darn simple, but ineffective.


#8

To be honest, it’s far too late to consider this kind of radical proposal. Julia 0.7 is supposed to enter final freeze really soon now, and feature freeze has been in place for more than one month. It’s fine to discuss this if you like theoretical discussions, but there’s practically no chance it will be accepted at this point. (I’m not even speaking of the many arguments which could be opposed to the proposal itself.)


#9

I am not expecting that this proposal would be accepted. I just want to begin such a discussion; perhaps, it could, in a way, affect the later versions of Julia, even if it is not accepted partially or in full.
The const keyword and structs which are immutable by default are already in this direction.


#10

This seems really hard to get right the first time. I like opt-in safety because then I can ignore it when I’m scripting, and use it when I want it. It seems like this would impose on interactive usage.


#11

I am but in favour of opt-in unsafety, like in the Rust language.
Furthermore, I think there is a general trend in the design of programming languages, following the functional paradigm, to “localize” and explicitly “mark up” mutable states to prevent unwanted side effects. This is of utmost importance in parallel and concurrent programming, but in general, making side effects explicit leads to better and more readable code.

Julia has taken many steps towards this goal already, for example by making struct Immutable by default. So, I do not see why we should not systematically generalize this in the design of Julia. Furthermore, as far as I can see, Tuple is rather an Immutable Array. Why not make this explicit?

And I believe, after a little time, those explicit denotations become one’s second nature – no worry.


#12

I’d like to hear some arguments against explicit denotation of Variables (mutable binding) and Mutable states – something beyond complaining about complicatedness. I really see that as a step forward towards better code.
Even good old FORTRAN has some hints of this:

function func(i) result(j)
    integer, intent(in) :: i   ! input (read-only)
    integer             :: j   ! output (writable)
    j = i**2 + i**3
end function func

#13

But then you cannot write generic code which works on both immutables and mutables? Arguably you want to dispatch to both cases any time that happens in order to be more efficient, but that kind of stuff gets in the way of interactive usage.

Rust isn’t interactive. It has different goals.


#14

This is a good point.
But I cannot imagine how such a “generic” code would be!
For example, your function can either modify its arguments or not. If it does, then you should use Variables or Mutables – accepting Constants or Mutables is not reasonable; if it does not modify its arguments, then it can accept both Variables/Constants and Mutables/Immutables (see the example given in part 4).

The proposed scheme does not prevent interactivity. One should use simply var and mut. One can also imagine that the defaults for an interactive session be Variables and Mutables.

Indeed; but I think that safety is also a goal for any serious computational code. In that respect, I prefer opt-in unsafety – as in Rust.


#15

There are plenty of examples in Rust, for instance. It’s a common enough issue that some Rustacean brings up parameterizing over mutability at least once every three months or so.

The logic that people are dreaming of is less “this argument is mutable xor immutable” but rather “if this argument is mutable, then this argument should also be mutable, and vice versa.”


#16

Parts of this proposal are fundamentally incompatible with how Julia works – in particular all the stuff about “type casting” makes sense in C++ but does not make sense in Julia.

In C++, dynamic and static types are separate things. The dynamic type of a value is its “actual type” while the static type of an expression referring to a value is the type of the expression – i.e. what the compiler thinks the type of a value is in a particular region of code. The static type should be a supertype of the dynamic type, but in general, these are different types and can lead to different behaviors. In non-virtual dispatch, for example, a method is chosen based on the static type of something, rather the actual “dynamic” type of the object.

In Julia, a value has only one unchangeable type – it’s actual type (i.e. its dynamic type in C++ lingo). Expressions don’t have types, only values do, and the type that the compiler thinks an expression has is just an implementation detail that generally doesn’t affect the behavior of code. (There are a few corner cases where we allow type inference to leak into visible behavior, but it’s avoided as much as possible.) The “static type” of something is simply not a concept in Julia, which simplifies the mental model of the langage tremendously as compared to C++. Moreover, the type of a value cannot ever change in Julia – an object has only one permanent actual type.

In this context, the proposed notion of casting between mutable and immutable variants for the same objects just doesn’t make sense. Whether a value can be mutated or not is a property of its type and that type cannot change. If you want to make a mutable object immutable, you can wrap it in an immutable wrapper. But that doesn’t buy you much since you can always just reach in and get the mutable internal object. Moreover, local immutability doesn’t really buy you much – most of the benefits of immutability disappear if some other code can still mutate something. You still have to guard against the possibility of mutation everywhere. That means you need to heap allocate data; you can still get data races. Julia does have immutable arrays, but the real benefit comes from knowing that no one can mutate them, not that some code cannot mutate them while other code still might.

Interestingly, making a mutable wrapper around an immutable object is actually useful and efficient, even though at first blush it seems to violate immutability. The key is that semantically, the mutating API works by replacing the whole immutable object with a new modified immutable value upon mutation. Since the wrapped object is immutable, however, we can make sure that the wrapper has the only reference to the immutable object that it wraps: immutability guarantees that we can make copies whenever we want without affecting semantics, so we can copy an immutable when it is wrapped and after that we know that the wrapper has the only reference. Therefore, mutation of the wrapped object can be implemented via actual mutation without violating the semantic immutability of the wrapped object. A bunch of design and implementation work has gone into supporting this better in Julia, but that effort has been tabled until after we get Julia 1.0 out the door.


#17
julia> struct foo{T}
           x::T
       end

julia> mutable struct bar{T}
           x::T
       end

julia> g(x::foo) = foo(3x.x)
g (generic function with 2 methods)

julia> function g(x::bar)
           x.x *= 3
           x
       end
g (generic function with 2 methods)

julia> function f(x)
           g(x)
       end
f (generic function with 1 method)

julia> foo1 = foo(9.2)
foo{Float64}(9.2)

julia> bar1 = bar(9.2)
bar{Float64}(9.2)

julia> f(foo1)
foo{Float64}(27.599999999999998)

julia> f(bar1)
bar{Float64}(27.599999999999998)

julia> foo1
foo{Float64}(9.2)

julia> bar1
bar{Float64}(27.599999999999998)

julia> @code_warntype f(bar1)
Variables:
  #self# <optimized out>
  x::bar{Float64}

Body:
  begin 
      $(Expr(:inbounds, false))
      # meta: location REPL[28] g 2
      SSAValue(0) = (Base.mul_float)((Core.getfield)(x::bar{Float64}, :x)::Float64, (Base.sitofp)(Float64, 3)::Float64)::Float64
      (Core.setfield!)(x::bar{Float64}, :x, SSAValue(0))::Float64
      # meta: pop location
      $(Expr(:inbounds, :pop))
      return x::bar{Float64}
  end::bar{Float64}

julia> @code_warntype f(foo1)
Variables:
  #self# <optimized out>
  x::foo{Float64}

Body:
  begin 
      return $(Expr(:new, foo{Float64}, \:((Base.mul_float)((Base.sitofp)(Float64, 3)::Float64, (Core.getfield)(x, :x)::Float64)::Float64)))
  end::foo{Float64}

The function g was inlined into f both for the f compiled for foo and for the f compiled for bar.
In the former, the argument x is not mutated. In the latter, it is.
You can come up with similar examples for const vs nonconst bindings.


#18

May I ask… why? That is, what additional information are you proposing to provide the compiler for better codegen?

Stuff I would consider useful, if the compiler could make use of it:

@const arg as a modifier to a function argument. Effect: Compiler may assume that any pointer access through arg is not mutated. Usecase: call a non-inlined function that is not supposed to modify arg; skip reloading of unchanged values. Only makes sense if all possible dispatches (during interference) have this modifier (otherwise this information is useless).

@restrict_ptr ptr: Compiler may assume that ptr does not alias with anything not derived from ptr. Usecase is obvious; llvm support is good.

@no_escape: Compiler may assume that an object does not escape the scope (it may be passed downward the call stack, though!) and hence can try to skip the allocation. Hopefully most of these annotations would become unnecessary as the escape analysis improves. Usecase:

v = @no_escape view(A,1:5,2:3) does not need an allocation if the view is only short-lived.

v=@no_escape Ref{Cint}(0)
ccall(:somefun, Void, (Ptr{Cint},) , v )
return v[]

should also need no allocation.

This would basically override the escape analysis.

@gc_preserved: An object is already preserved up the call stack. The compiler may use this information to try to skip allocations of tuples/structs with such a field (because the compiler does not need to care about gc visibility). Idea: Suppose a non-inlined function takes some data-structure as argument, finds some mutable in there and packs it into a tuple for return (multiple return values). In this case, we know that the mutable is preserved upwards in the call stack; no gc visibility needed and hence no alloc for the tuple (if the caller unpacks the tuple; of course the caller might need to alloc for it, depending what he does).


#19

This is assumed.


#20

On a related note, it would be useful to allow local const bindings – in writing a macro that declares const variables I realized I couldn’t use it in local scope.