Version 1.0 released of Nim Programming Language

My understanding is that the key problem is that in general, proving something about types is difficult or infeasible, so for a large class of valid programs, even a very sophisticated compiler just can’t figure it out. Eg for

function fermat(a::Integer, b::Integer, c::Integer)
    # let's pretend there is no overflow
    if a > 0 && b > 0 && c > 0 && (a^3 + b^3 == c^3)
        "a fish"
    else
        42
    end
end

the result type is Int, but I am not sure very many compilers would figure this out (yes, this is a contrived example — but type inference can fail very easily in practice, too).

When this happens, traditionally “static” languages just consider your program invalid (not proven correct, it is considered incorrect). Instead, Julia will just try to work with it.

In general it is very difficult to prove that some code is incorrect in Julia in advance, because new methods, functions and global variables can be defined at any point between the definition and the AOT compilation.

4 Likes

I see your point. In this case, I’m not against the compiler inferring Union{Int64,String}.

However, if the output of this function is sent to a function that only takes Int, I’d still prefer the compiler error, rather than allowing obviously invalid code to execute until it crashes.

It’s just a matter of where you put your value. I can well believe that researchers want more leniency in running unprovable programs. However, many of us have an interest in running Julia in scenarios where we’re willing to trade a little of the flexibility for reliability. I’d be willing to have your example fail if the compiler would prevent stuff like this:

julia> print_int(x::Int) = println(x)
print_int (generic function with 1 method)

julia> foreach(print_int, [1, 2, "oops!"])
1
2
ERROR: MethodError: no method matching print_int(::String)
Closest candidates are:
  print_int(::Int64) at REPL[2]:1
Stacktrace:
 [1] foreach(::typeof(print_int), ::Array{Any,1}) at ./abstractarray.jl:1920
 [2] top-level scope at REPL[3]:1

It may be that Julia is simply not the right language for scenarios where you want this kind of code not to compile, but it would be a shame, since Julia has so many other pleasant features that would make it useful in production environment where more aggressive type checking would be beneficial. Seems like something that could be switched on and off with compiler flags. (potentially)

1 Like

I am kind of confused by when you would want that to happen — since Julia’s compilation is AOT, is there a real difference between “runtime” and “compilation time” errors?

For a concrete example, consider (from a fresh REPL with nothing predefined):

f(x) = "a fish"
g(x) = sqrt(f(x)) # A

Do you want the compiler to error at A? The problem with that is that you are free to specialize or overwrite f at any point before you call g.

Or wait for

g(1) # B

? But then it is not really different from a runtime error.

(If I am missing something, please clarify with a concrete example if possible).

2 Likes

I’m more thinking about these future static compilers for Julia that some of us are dreaming of. I agree that this is borderline nonsense in an interactive context.

I assume it would be possible to also do some type verification on interpreted Julia programs before they begin to run, but I think that should definitely be optional.

2 Likes

Thanks for the very nice explanation and example!

I also agree with the “explicit return type by default” policy, as there are cases where it may be hard for the compiler to figure out the correct return type.

One example that turned out very tricky in Julia data analysis is row iteration of fully typed “large” tables. A “columnar dataset” (which is a collection of columns, each with their own type) can be parametrized to include the vector type of each column. Indexing (i.e. taking a row–a struct with a field per column) is type stable, but the compiler starts having difficulties working with these rows if the table gets large (many columns). I suspect this is one case where a : auto return type is a dangerous thing to do, as the width of the table may cause inference to fail. It is hard to predict exactly which tables fails and which work: I imagine it is an implementation details of the compiler.

2 Likes

For the fermat example you can write a macro that will check that the inferred type is as specified. This will throw an error as you try to declare the function :

julia> @checked O -> O <: Integer function fermat(a::Integer, b::Integer, c::Integer)
           # let's pretend there is no overflow
           if a > 0 && b > 0 && c > 0 && (a^3 + b^3 == c^3)
               "a fish"
           else
               42
           end
       end
FAILURE: fermat(Integer,Integer,Integer) → Union{Int64, String}

This kind of approach doesn’t scale well though (as checking all subtype combinations quickly becomes prohibitive) and doesn’t work for future types, but it could be still useful for simple methods.

1 Like

Isn’t this what we have

function fermat(a::Integer, b::Integer, c::Integer)::Integer
    ...

for?

I agree that it doesn’t scale well to write code like this, but generally you only need it in a few places.

That just does a type assert on the return value at run time, so it will fail only when hitting a solution, while the macro above checks the inferred return type when you declare the function (the function is never run, only type inference). It’s more like automatic testing of the inferred type.

1 Like

Given the clever compiler we have, it’s more than that: by ruling out the branch with String for return values, it effectively allows type inference to recover.

Sorry, I missed where this macro is defined, so I could not read the source.

However, type of [1, 2, "oops!"] is Array{Any,1}, so type checking this is no possible (in any language).
If we change it into Union{Int64,String}, problem still exists, because this is still legal in Julia, if the array are full of Int64, then this code will be ok.
And the most serious problem is that, although you don’t define print_int on String, but you might define it later, so this program is still legal, until the time you finally decide to run this program(and you encounter a String).
Because function is the collection of methods, so even we invoke the same function multiple times, we might use different methods and get different results. I think this is actually forbidden in Haskell(unless we use higher order rank types):

constOne::forall a.a->Int
constOne a = 1

f::(a->Int)->Int
f g = (g 1) + (g 1.0)

the code of f above will not work in ordinary Haskell. We might somehow regard every function in Julia as functions with “higher order rank type”, this is not quite friendly for compiler to do type inference.

Moreover, Union{Int64,String} is not the same as those sum types in static language. For example, in Haskell or C , you have to wrap this in union or a new data type(while in Julia, we can’t use Union{String.Int} to materialize sum type, however we can wrap them in a struct). for example:

struct Either{A,B}
       e::Union{A,B}
end
a = Either{Int64,String}.([1,2,"opps"])
function printInt64OrString(a::Either{Int64,String})
    if isa(a.e,Int64)
        println(a.e)
    else
        error("encounter String")  #pattern matching not exhausted
    end
end

But I’am not sure this is what you want. I think wraping things in a struct and then forcing the function to accept this sum type just works like in C and Haskell(But still, since Julia has no pattern matching, I’am not sure how we can statically check whether pattern matching is exhausted) . Using some macros maybe can make things simpler.

4 Likes

Yes, I get that. I’m thinking about a statically compiled context where you’re not going to have runtime eval. If you’re going to have a static compiler, you’re already in a situation where all methods must be known at compile time.

I’ve been thinking a lot about this recently for a toy language I’ve been hacking on (again, in a statically compiled context). It seems like a static compiler should be able to infer what cases the method handles (up to a certain point) and construct the wrapper implicitly. For non-exhaustive matching, an error would be raised at compile-time if an un-handled type was used. Course, I don’t really know what the limitations of Julia’s type inference are, so I don’t know how feasible that would be in Julia’s case. I understand that it’s more or less impossible with Juila’s current compilation strategy–and even that it’s undesirable for the majority of Julia’s workloads.

I just think it’s something to keep in mind as efforts begin to pick up around the static compilation work. Having a compiler that can construct union wrappers and do pattern matching under the covers would presumably also help cut down on size requirements of the bundled runtime.

I am not sure that is a feasible design goal with Julia. With commonly used idioms chances are that in moderately complex code that takes inputs, you will have to compile some things on demand unless you are generating an excessive number of type combinations, but then you may end up not using 99% of them.

In other words, minimizing compilation may be a feasible option for some code, but totally eliminating it is orders of magnitude more difficult, if feasible.

I mean, this is part of the approach they’ve been talking about using for years.

https://juliacomputing.com/blog/2016/02/09/static-julia.html

The plan seems very much to place restrictions on what types of Julia can and cannot be statically compiled, at least as for people who are interested in creating distributable binaries.

see also:

I am under the impression that, since that blogpost, emphasis has shifted from precompiling a restricted subset of Julia with no “escape hatch” (and thus not including the compiler machinery) to precompiling a lot of stuff (but not possibly everything) simply for startup speed, and/or making the compiler faster so that the cost is not that large. And the latter is still a very hard problem.

1 Like

From conversations on Slack, I have the impression some people are still working on reasonably sized, distributable binaries, but I admit that I’m not sure how much momentum these efforts have.

1 Like

Rich Hickey has an excellent analysis of sum vs union types in this talk:

I think the most damning point he makes about sum types is that if you start with a function that accepts values of type A and then you generalize it so that it can also accept values of type of type B As well, with sum types you have to change every caller because now the function actually doesn’t accept values of type A or B, it accepts wrapped values of those types. With a union this kind of generalization does not require changing any caller: if the function worked with A values before, it keeps working so there’s no need to change anything. Except he makes the point much better than I am.

5 Likes

That’s called the expression problem in the litterature.

The gist is that

  • if you use ADTs:
    • Adding new functions is easy, you just need to have a switch or pattern matching on all existing types.
    • Adding a new type is hard, you need to go and edit all existing functions.
  • if you use inheritance and methods:
    • Adding a new type is easy, create it, implement all existing interfaces.
    • Adding a new function is hard, you need to go and edit all existing types.

For both you have an escape hatch with “NotImplementedError” but the hard part is that you have to do lots of non-local changes instead of self-contained ones in the easy case.

For those interested in the expression problem in a high-performance computing and domain-specific language context, I have gathered plenty of approaches (from Scala, OCaml, Clojure, C++, Rust, Python, Haskell and PL theory) in preparation for the design of my deep learning compiler: https://github.com/mratsim/Arraymancer/issues/347#issuecomment-461009747

I’m familiar:

For what it’s worth, I think that problems with sum types are really missing the crux of the expression problem. If they were then using unions instead would solve the expression problem, which it does not.

9 Likes

In the past I’ve been gently misled by functions like findfirst(predicate, A) which return an array index, or nothing if the predicate didn’t match.

The conceptual mistake on my part was to only consider the happy path, in which a matching element was found. It felt like this mistake was encouraged by the return type, as trying simple cases at the REPL returned a plain index type. If the return value was some form of Some, then it would strongly indicate that the None case must also be handled.

I wonder what you think about cases like this?

1 Like

Or, possibly, not reading the docs. :wink:

There is always a trade-off between safeguards and convenience, and both solutions have their advantages.

A wrapper like Some would be clearly preferable only when the sentinel value cannot be ruled out as a valid result — at this point it is used sparingly in Base.