Version 1.0 released of Nim Programming Language

I use those a lot, though. I mean, it’s fine to have style guides, or opinions, but enforcing it on the language level is something else.

4 Likes

that’s fair: it’s a language design issue, subjective i’d say

1 Like

Hello,

Nim dev for scientific computing here. There seems to be a confusion about multi-methods in Julia vs Multi-methods in Nim.

To clarify, Nim is a statically compiled language, the multimethods that are being phased out are related to runtime dispatch, i.e. when using 2 inherited types.

Nim can statically dispatch on multiple types with no issue and no performance impact for example:

proc my_add(x: int64, y: int32) =
  x + int64(y)
proc my_add(x: int32, y: int64) =
  int64(x) + int32

Note that Nim supports generics, overloading, type classes and constraints so you can do:

proc my_add[T: int32 or int64](x: T, y: T) =
  x + y

So far I’ve never had to use the multi-methods for my scientific packages, I even went so far as completely remove methods and only use static/compile-time dispatch.

@ChrisRackauckas, I’m curious about what you are missing for scientific computing. I don’t think the core language is missing something critical, obviously the libraries are not there.

10 Likes

A REPL. Scientific computing is very interactive.

4 Likes

Indeed, though you can use Nim on Jupyter through Python via this file

# In [1]
%load_ext nim_magic

# In [2]
%%nim mytest -d:release
proc greet(name: string): string {.exportpy.} =
    return "Hello, " & name & "!"

# In [3]
mytest.greet("World")

# Out[3]
'Hello, World!'

There are also some effort at REPL and native jupyter-kernel but in the past year most of the effort were focused on having hot-code reloading support at the language level (20k$ grant by Status, Nim’s main sponsor) which was merged in May.

Hopefully, this will lead to an excellent REPL.

3 Likes

As a matter or PL terminology, it’s not dispatch (multiple or otherwise) if it’s not semantically at runtime—it’s overloading, which is convenient but has no additional expressive powers since you can accomplish the same thing by C/Fortran style naming conventions (ie put the type in the name). Just as it’s possible to have single dispatch in a static language, it’s also possible to have multiple dispatch. Fortress, for example, was static and included multiple dispatch, for the same reasons as Julia.

How do you write generic algorithms where code selection depends on the types of multiple different arguments (or even just not the receiver) without multiple dispatch?

12 Likes

I think the best would be to give an example, it can even be something in the form of the Julia challenge / repo as I was the one who implemented the Nim solution.

When code selection depends on compile-time types or values, either I overload or I use Nim compile-time facilities.

Example 1: static dispatching float32/float64 to the proper CuBLAS call with when (compile-time if)

# Vector copy
proc cublas_copy*[T: SomeFloat](
  n: int; x: ptr T; incx: int;
  y: ptr T; incy: int) {.inline.}=

  check cublasSetStream(cublasHandle0, cudaStream0)

  when T is float32:
    check cublasScopy(cublasHandle0, n.cint, x, incx.cint, y, incy.cint)
  elif T is float64:
    check cublasDcopy(cublasHandle0, n.cint, x, incx.cint, y, incy.cint)

Example 2: static dispatching/specialization rounding to next/previous multiple of N, depending if multiple of 2 or not:

func round_step_down*(x: Natural, step: static Natural): int {.inline.} =
  ## Round the input to the previous multiple of "step"
  when (step and (step - 1)) == 0:
    # Step is a power of 2. (If compiler cannot prove that x>0 it does not make the optim)
    result = x and not(step - 1)
  else:
    result = x - x mod step

func round_step_up*(x: Natural, step: static Natural): int {.inline.} =
  ## Round the input to the next multiple of "step"
  when (step and (step - 1)) == 0:
    # Step is a power of 2. (If compiler cannot prove that x>0 it does not make the optim)
    result = (x + step - 1) and not(step - 1)
  else:
    result = ((x + step - 1) div step) * step

Example 3: generic algorithms on various SIMD:

import simd # this is a SIMD file like https://github.com/numforge/laser/blob/2f619fdbb2496aa7a5e5538035a8d42d88db8c10/laser/simd.nim

proc `+`(a, b: m128): m128 =
  mm_add_ps(a, b)
proc `+`(a, b: m256): m256 =
  mm256_add_ps(a, b)
proc `+`(a, b: m512): m512 =
  mm512_add_ps(a, b)

Furthermore, besides specialization via when or overloading, you can detect pattern and use “super-instructions”, some useful pattern would be matrix A B + C or exp(m-1) or ln(1 + p) which have a specific implementation in BLAS for the first and math.h for the last 2.

You can do that with term-rewriting macros/templates, for example:

# Implementation of the fused operation (out-of-place)
# ------------------------------------------------------------------
proc tensor_multiplyAdd[T](
  A, B: Tensor[T],
  C: Tensor[T]): Tensor[T] =

  result = C

  if C.rank == 2:
    gemm(1.T, A, B, 1.T, result)
  elif C.rank == 1:
    gemv(1.T, A, B, 1.T, result)
  else:
    raise newException(ValueError, "Matrix-Matrix or Matrix-Vector multiplication valid only if first Tensor is a Matrix and second is a Matrix or Vector")

# Implementation of the fused operation (in-place)
# ------------------------------------------------------------------
proc tensor_multiplyAdd_inplace[T](
  A, B: Tensor[T],
  C: var Tensor[T]) =

  if C.rank == 2:
    gemm(1.T, A, B, 1.T, C)
  elif C.rank == 1:
    gemv(1.T, A, B, 1.T, C)
  else:
    raise newException(ValueError, "Matrix-Matrix or Matrix-Vector multiplication valid only if first Tensor is a Matrix and second is a Matrix or Vector")

# Pattern match for fusion operation (out-of-place)
# ------------------------------------------------------------------
template rewriteTensor_MultiplyAdd*{`*`(A,B) + C}[T](
  A, B, C: Tensor[T]): auto =
  ## Fuse ``A*B + C`` into a single operation.
  ##
  ## Operation fusion leverage the Nim compiler and should not be called explicitly.
  tensor_multiplyAdd(A, B, C)

# Pattern match for fusion operation (different arg order)
# ---------------------------------------------------------------------
template rewriteTensor_MultiplyAdd*{C + `*`(A,B)}[T](
  A, B, C: Tensor[T]): auto =
  ## Fuse ``C + A * B`` into a single operation.
  ##
  ## Operation fusion leverage the Nim compiler and should not be called explicitly.
  tensor_multiplyAdd(A, B, C)

# Pattern match for fusion operation (in-place)
# ---------------------------------------------------------------------
template rewriteTensor_MultiplyAdd_inplace*{C += `*`(A,B)}[T](
  A, B: Tensor[T], C: var Tensor[T]) =
  ## Fuse ``C+=A*B`` into a single operation.
  ##
  ## Operation fusion leverage the Nim compiler and should not be called explicitly.
  tensor_multiplyAdd_inplace(A, B, C)

A very involved example is my reimplementation of a BLAS in pure Nim (+ intrinsics) that reach OpenBLAS and MKL-DNN performance on large matrices:


The dispatch I used there is a mix of dispatch depending on:

  • types (int32, int64, float32, float64, …)
  • CPU Architecture (x86, ARM)
  • CPU features (SSE2, SSE3, AVX, AVX2, AVX512, …)
  • Number of registers
  • Register size
  • is the result matrix unit-strided
    see the dispatch logic.

After abstracting away all those architectures, registers and low-level differences, I have a very lean code-generator that can be expanded quickly to any CPU/SIMD combination and provide the same speed as Assembly-coded BLAS routines by OpenBLAS and MKL experts:

SSE float32 codegen

import
  ./gemm_ukernel_generator, ./gemm_tiling,
  ../../simd

template float32x4_muladd_unfused(a, b, c: m128): m128 =
  mm_add_ps(mm_mul_ps(a, b), c)

ukernel_generator(
      x86_SSE,
      typ = float32,
      vectype = m128,
      nb_scalars = 4,
      simd_setZero = mm_setzero_ps,
      simd_broadcast_value = mm_set1_ps,
      simd_load_aligned = mm_load_ps,
      simd_load_unaligned = mm_loadu_ps,
      simd_store_unaligned = mm_storeu_ps,
      simd_mul = mm_mul_ps,
      simd_add = mm_add_ps,
      simd_fma = float32x4_muladd_unfused
    )

AVX512 + FMA codegen

import
    ./gemm_ukernel_generator, ./gemm_tiling,
    ../../simd
  
ukernel_generator(
    x86_AVX512,
    typ = float32,
    vectype = m512,
    nb_scalars = 16,
    simd_setZero = mm512_setzero_ps,
    simd_broadcast_value = mm512_set1_ps,
    simd_load_aligned = mm512_load_ps,
    simd_load_unaligned = mm512_loadu_ps,
    simd_store_unaligned = mm512_storeu_ps,
    simd_mul = mm512_mul_ps,
    simd_add = mm512_add_ps,
    simd_fma = mm512_fmadd_ps
  )

Supporting integers even if there is no SIMD is also very easy:
int32 BLAS on SSE2-only arch (SSE4.1 brings lots of intrinsics for int32)

type Int32x2 = array[2, int32]

func setZero_int32_sse2_fallback(): Int32x2 {.inline.} =
  discard

template set1_int32_sse2_fallback(a: int32): Int32x2 =
  [a, a]

func load_int32_sse2_fallback(mem_addr: ptr int32): Int32x2 {.inline.}=
  let p = cast[ptr UncheckedArray[int32]](mem_addr)
  [p[0], p[1]]

func store_int32_sse2_fallback(mem_addr: ptr int32, a: Int32x2) {.inline.}=
  let p = cast[ptr UncheckedArray[int32]](mem_addr)
  p[0] = a[0]
  p[1] = a[1]

template add_int32_sse2_fallback(a, b: Int32x2): Int32x2 =
  [a[0] + b[0], a[1] + b[1]]

template mul_int32_sse2_fallback(a, b: Int32x2): Int32x2 =
  [a[0] * b[0], a[1] * b[1]]

template fma_int32_sse2_fallback(a, b, c: Int32x2): Int32x2 =
  [c[0] + a[0]*b[0], c[1] + a[1]*b[1]]

ukernel_generator(
      x86_SSE2,
      typ = int32,
      vectype = Int32x2,
      nb_scalars = 2,
      simd_setZero = setZero_int32_sse2_fallback,
      simd_broadcast_value = set1_int32_sse2_fallback,
      simd_load_aligned = load_int32_sse2_fallback,
      simd_load_unaligned = load_int32_sse2_fallback,
      simd_store_unaligned = store_int32_sse2_fallback,
      simd_mul = mul_int32_sse2_fallback,
      simd_add = add_int32_sse2_fallback,
      simd_fma = fma_int32_sse2_fallback
    )

On an even more generic note, I am currently writing a DSL and a corresponding deep learning compiler that would work at Nim compile-time (implemented via macro, maybe later as a compiler plugin shipped as .dll/.so) or Nim runtime with LLVM backend.

A preview is visible here. This is a generalized answer to the Julia challenge as well.

  # Define the algorithm
  # ----------------------------------------
  proc foobar(a, b, c: Fn): Fn =

    # Iteration Domain
    var i, j: Iter

    var bar: Fn

    # Notice that a normal language would do multiple loop
    bar[i, j] = a[i, j] + b[i, j] + c[i, j]

    # Update result
    result = bar

  # Materialize the algorithm for float32
  # ----------------------------------------------
  generate foobar:
    proc foobar(a: Tensor[float32], b, c: Tensor[float32]): Tensor[float32]

  # Test run
  # ----------------------------------------------
  let
    u = [[float32 1, 1, 1],
         [float32 1, 1, 1],
         [float32 1, 1, 1]].toTensor()
    v = [[float32 2, 2, 2],
         [float32 2, 2, 2],
         [float32 2, 2, 2]].toTensor()
    w = [[float32 3, 3, 3],
         [float32 3, 3, 3],
         [float32 3, 3, 3]].toTensor()

  let r = foobar(u, v, w)
  echo r

Edit - bonus: And I don’t use the Visitor Pattern in my compiler for double dispatch unlike many C++ compilers, I just use ADTs.

23 Likes

Considering how important is the multiple dispatch paradigm for Julia users, I think it would be interesting to show how some of Julia’s dispatch capabilities can be adapted to Nim’s features.

Following on this, I would like to propose a Julia to Nim translation for generic multiple dispatch problems, if anyone wants to submit snippets of Julia code to be translated.

2 Likes

Is there any documentation of what the type checking rules of Nim are? I can’t find any in the manual. Or is it implementation defined? If so, what happens if a compiler implementation changes and something that was previously provable by the compiler ceases to be? Does that cause a previously valid program to stop compiling?

1 Like

Presumably that would just make it so that type inference changes are breaking. If Julia wants to be of use for small standalone, compiled binaries we’re going to have to do something similar if we’re to avoid breaking people’s code.

Also, it looks like he put return type annotations on his code so if the compiler is able to trust those annotations then it shouldn’t be too hard to guarantee that your code infers properly.

2 Likes

If I understood correctly, you want to know if incompatible API changes, will be caught at compile-time, yes.

For example, let’s assume Complex are implemented this way:

type
  Complex[T: float32 or float64] = object
    real: T
    imaginary: T

proc `+`(a, b: Complex): Complex =
  result.real = a.real + b.real
  result.imaginary = a.imaginary + b.imaginary

Now for some reason I want to change the type to

type
  Complex[T: float32 or float64] = object
    re: T
    im: T

All use of real and imaginary fields will fail at compile-time.
However you can provide transition procedures or templates, for example:

template `real`[T](a: Complex[T]): T {.deprecated: "a.real is deprecated, use a.re instead".} =
  a.re
template `real=`[T](a: var Complex[T], val: T) {.deprecated: "a.real= is deprecated, use a.re= instead".} =
  a.re = val
template `imaginary`[T](a: Complex[T]): T {.deprecated: "a.imaginary is deprecated, use a.im instead".} =
  a.im
template `imaginary=`[T](a: var Complex[T], val: T) {.deprecated: "a.imaginary= is deprecated, use a.im= instead".} =
  a.im = val

(procedures are materialized in the generated C code, while templates only exist at Nim level so templates guarantee inlining and same codegen as the previous code).

1 Like

No, that’s not what I’m asking. Static languages traditionally have well-defined rules about what qualifies as a well-typed program and what doesn’t. These rules are part of the definition of the language and guarantee that the compiler can determine what the types of all values are. I cannot find any such rules explained anywhere for Nim. I’d like to know what they are.

Nim Manual should be close: but there are more type-related things in the manual , e.g. in type conversions section, the effect system section etc: however i’d guess that the overloading rules that e.g. mratsim depends on should be mostly stable after 1.0?

the nim implementations should follow the nim spec, so the language is not really defined by a single implementation (at the least, there is e.g. nlvm, but i admit, it’s more of a backend). If something changes, there is usually a deprecation period and a flag to keep the old behavior, but i don’t really imagine a case where nim type rule change would result in not compiling something unless it can prove it’s actually a bug and the previous rule was too relaxed (e.g. nilable ref)

As far as I can see, nothing on that page explains how type checking works. It addresses type equality and subtyping (two variations on “I have two types, are the they compatible?”). It does not seem to address the question “I have a program, how do I tell if it type checks?” I’m guessing the answer is “feed the program to the nim compiler and see if it accepts it or not” which means that it’s implementation-defined.

I don’t think that’s completely fair: telling if it type checks means following all the type rules described in the manual including this section, and if the current compiler contradicts with them, this is a bug, and it needs to be fixed in the compiler or in the spec.

On the other hand it’s true there isn’t a single “table” with all the rules written in more classical type theory notation-like way: improvements are certainly needed: so thanks for the feedback!

One of the main goals of Nim >= 1.0 is to focus even more on the spec as the place to define new features, so i hope type checking docs are fixed.

While more detailed type checking docs should happen, i wouldn’t really hurry for a more formal overal spec: even Rust doesn’t seem to have something like that yet: Is there a plan to get a formal Rust language specification at some point? - #36 by Centril - The Rust Programming Language Forum

PS i am just a nim dev, sometimes contributor , not a part of nim’s core team , so take all of my opinions as just personal opinions :stuck_out_tongue:

I’m not trying to bust anyone’s chops about specs here—I think that specs for programming languages are overrated and it’s perfectly reasonable for a language at Nim’s point of development not to be formally specificed. I do suspect that the way Nim’s type checking works is essentially the same as the way Julia’s type inference works: it figures out whatever it figures out. The difference is that in Julia, the program works the same regardless of whether the compiler can figure out the type of something or not, it just generates dynamic code when it can’t; whereas in Nim the compiler tells you your program is invalid if it can’t figure out the type of something. That means that in Nim what the compiler can determine about types is part of the language rather than an implementation detail, albeit a rather complex and implicit part of the language that isn’t documented and needs to be learned by feeding programs to the compiler and seeing whether it accepts them. There’s a similar situation in Julia where the programmer needs to learn what code is going to be fast or not, but that’s different because it’s only about performance, not whether the code is allowed or what it does.

The reason this relates back to multiple dispatch is this: what is the difference between static and dynamic multiple dispatch? It’s pretty much just whether the compiler can figure out all the types at compilation time or not. Since that’s not specified in Nim, it’s hard to make arguments about whether “static dispatch” will be sufficient or not. Will you always be able to express what you need to in such a way that the compiler can figure out the types? I don’t know… because the requirements for static type checking aren’t written down anywhere. The only way to get a sense of that is to feed programs to the Nim compiler and see what it does. And that will change over time as the compiler is developed.

Clearly there are a lot of cases where the static requirement is no problem. That’s essentially equivalent to the fact that there are a lot of cases in Julia where the compiler can completely determine the types of all values and devirtualize all method calls before a program runs. But there are also a lot of cases where the Julia compiler can’t figure that out completely but it can do a couple of high-level dynamic dispatches and then JIT code that is completely typed from there on down. Of course there are other cases where the compiler can’t figure that much out and dynamic dispatch happens a lot—e.g. operating on things in an untyped array. But that’s usually not acceptable in numerical/scientific computing—it’s too slow. This is a problem to the point where we’ve had people ask why Julia can’t just reject programs where it can’t figure out all the types. (Answer: because then Julia would be a very different and statically typed language.) But I believe that’s essentially what Nim does: there’s no formal specification of type checking, the compiler does what it does and rejects programs where it can’t figure out all the types. As the compiler gets smarter, it generally can figure out more types. However, there are cases where one may want a compiler to figure out fewer types—e.g. to reduce compile times. In Julia that’s always an option since programs keep working, they just might get slower (but compile faster, hopefully). In Nim it seems like that’s not an option since it will cause programs that previously worked to stop compiling. That may or may not turn out to be a problem in the long term—it will be interesting to see what happens.

In any case, it seems that at least something like F#'s type providers would be necessary to allow types from outside the code to be provided and dispatched on so long as the rest of the code from there is statically typable. Otherwise code that needs to read typed data from a file or other external data source would be pretty unpleasant to write. It’s possible that Nim already has a feature like that. After that it’s vaguely plausible to me that it might be possible for everything else to be decided statically (of course with type providers you still need to potentially JIT arbitrary code), but I’m skeptical that purely static dispatch is sufficient. As I said before, it will be interesting to see how that plays out in Nim.

20 Likes

From my practical experience of writing Nim code for 2.5+ years both personally and professionally:

  • as long as you stay within a pure Nim ecosystem, you don’t have type inference issues
  • as soon as you read something from outside Nim, say a .npy file, a .csv or a Json config you need to deal with dynamic types.
    For lots of use cases, boxing the dynamic types in JsonNode works. For numerically computing however, this would be too slow as you said.
    So ideally you know what you are deserializing in advance. This is usually not a problem for formats like .npy or .hdf5 but typically CSV is an issue.
    However this is something that we share with C++ for example so the same solutions applies. And given that lots of low-level numerical code for Python or R is written in C++, we don’t have to reinvent the wheel for those use cases.
    An example of this issue in Pandas is that as soon as you have a mix of types in a CSV columns, everything becomes an “object” and much slower.
  • Regarding type providers, if a schema is agreed upon, hardcoding the types, or using libraries like protobuf, or even parsing the schema from Json (as nim json can also work at compile-time) work.
    If everything is dynamic, there is no free lunch, Nim like Julia pays the price of flexibility with slowness. You can use ADTs like JsonNode or something more specific to your use-case. Alternatively you can hide everything behind a class with dynamic methods.
2 Likes

I think nim macros can simulate type providers: you can read various sources on compile time and usually generate types based on them, however i am not very sure what the usecases are, so i’ll need to give an example a try … very interesting question indeed :slight_smile:

You’re right that in some cases we have to go more or less with dynamic dispatch! But that’s a choice that the user usually makes while writing the code.

I don’t really have almost any problems with type inference in Nim: i am not fully sure what you mean by “figure out all the types”. It’s true that sometimes exotic generic combinations did lead to bugs in the past(not sure if this still happens often: i hope not), however here we mostly talk about more normal overloading :
and i feel this is more about figuring out which signature matches best a callsite which is a simpler problem and it still seems to me that the overloading-resolution section answers at least some of the questions.

It seems to me i can write my own nim implementation which follows those rules and i can expect the overloading to work the same way there. Those docs should be improved and a second implementation would be a great thing which surely would add a lot more details to them.

I am very interested in learning more about Julia’s type system: i admit i saw it as mostly dynamic language for which i am sorry <3

2 Likes

In Julia, everything is always dynamic, yet it is supposed to be fast when possible, and handling that appropriately is the core (research) problem of the work.

With Julia you can utilize function-barriers to effectively mix dynamic and compiled code in a fast way because of the clear behavior it has between inference and its automatic dynamic fallbacks. That’s precisely the case where its truly JIT + auto-specialization behavior tends to break away from the static programming model, and shows up in a lot of interactive uses. You might be able to simulate it with a bunch of branching around JsonNode and some compiler-level assertion after doing a manual check, but that’s quite different from having it as part of a truly dynamic programming experience (and it doesn’t automatically compose with having new choices, so it’s a functionality subset to do it like that).

There’s nothing wrong with Nim, but I think what people need to really understand is that Julia has a fundamental difference from such languages because Julia is, fundamentally at its core, a dynamic programming language. This means that the common assumptions of a static programming language cannot apply to Julia, and that’s fundamental to how it’s used.

  • If someone is interactively eval defining new methods to a function, how should a multimethod function work?
  • How should specializations work if in the current context you may not know all of the possible functions that could be called?
  • How do you setup a program such that you may never know the types that will be entered into it, but you want to specialize the compilation on them when you see new ones?

Those kinds of compiler issues are not encountered in a static programming setting, and so the way they are approached and optimized is completely different (you have a lot more information about your world, and so these issues essentially disappear). Handling these issues first class so that interactive usage is exactly the same as other usage requires that one cannot have any full program static analysis that violates dynamic assumptions, which of course would be somewhat silly to do if it really is a static programming language, so all static programming languages do simplify the problem they are trying to solve by assuming these issues cannot exist. However, that makes it a fundamentally different experience when being used interactively, and, for better or worse, makes them entirely different worlds with different design spaces.

If you allowed yourself to drop all of the dynamic behavior from Julia, you might end up with something close to Nim, which is why Nim is a really cool language! But at the same time, it occupies a very different area of design space which is tailored to other types of users, and that’s perfectly fine.

11 Likes