Comparison of Julia with statically typed languages

First let me say I know nothing about Julia.

I have read the following StackExchange post:

As the author says, Julia is dynamically typed. However, as Julia’s website explains, the semantics of Julia are more restrictive than those of Python or R, which enables compilation to highly performant code.

Let us generalize beyond the author’s definition of static typing as “every expression has a type” and consider static analysis more generally, I am interested in the broader question of how much feedback can we get from Julia’s static analysis tools before program execution. A friend of mine has told me in casual conversation:

If you’re writing Julia code where your types aren’t inferable, that’s understood to be an error, and all the built in profiling tools will yell at you. So it seems kinda similar [to OCaml], I’m just allowed to write bad Julia code if I want to. Which I often do, when I’m prototyping - and then I profile it, and it can’t infer the types, and I fix it, and it’s like 1000x faster because now the compiler can work better.

It is easy to write bad Julia code. Optimal Julia code looks very different from what you write in initial prototyping. But if you use the profiling and benchmarking utilities, you can write very fast code that works well and seems pretty safe.

As an outsider to Julia, this observation is really interesting to me. If I could write OCaml code initially in some kind of untyped language extension of OCaml (code which is wrong for edge cases, for example, but works in the primary cases of interest in my program) and then gradually modify it into type compliance, I can imagine that this would be very useful, and it illustrates why Julia is popular.

My question for the community is, if one is interested in writing safe, correct Julia code (potentially working in a more restricted fragment of the language, so giving up some of its dynamic flexibility), can we get good guarantees about the correctness of Julia programs through static analysis that are comparable to what is available in truly statically-typed languages? Is there a statically typable “core” sublanguage of Julia in which it’s possible to be as confident about our code as in OCaml?

I would imagine the answer is “no,” however, I would be interested in hearing practical and concrete examples in your code where

  • static analysis tools for Julia did help you catch errors which would not be realistic to catch in Python or untyped Javascript
  • there was a error in your code where static analysis tools didn’t help you but but the error would have been caught in a true statically-typed language such as OCaml or Haskell
4 Likes

This isn’t a direct answer to your question, but you might want to look at JET.jl, a package that does some static analysis on Julia code.

You could also look at DispatchDoctor.jl

In general, I would say that type instability hasn’t been a big part of performance for me. Usually after I write a prototype, improved performance has come from aggressively filing away allocations in tight inner loops by preallocating buffers that get overwritten, precomputing parts of computations where possible (like correlations), and carefully tested parallelization. A big part of the difference between Julia and most languages is that I can incrementally add this level of detail in hot loops, whereas many languages push you to one end of the spectrum or the other.

8 Likes

I don’t have enough experience with these languages to say, but I think the Julia LanguageServer.jl is so primitive that I don’t see why you couldn’t do at least as well in Python or JavaScript.
I do still find it useful, and I have caught a lot of mistakes thanks to LanguageServer.jl.
However, I strongly suspect that JavaScript and Python tooling would have been similarly useful.

I believe tooling in Julia could be far better than it is today. The state today is poor.
There are limitations over it being dynamic, but in C++ any implicit or explicit template instantiation (for function)/definition (for struct or class) will let clangd type-check your generic code. We don’t have that with LanguageServer.jl.

JET is better, but you have to specifically call it for each function/argument type. It isn’t really practical, except when you desperately need to find all type instabilities in a pinch, which of course isn’t even a problem that exists in a proper static language.
Clang-tidy offers a large number of checks in addition to type errors. Clangd has far fewer false positives, and far fewer false negatives. It just works many times better.
As an example:


Clangd tells me:

  1. I could shrink the size of the enum from 4 bytes to 1 byte to save memory.
  2. it infers that the function returns an int (the -> int isn’t text, but a clangd comment, like the yellow warnings, except not a warning)
  3. warns me of some potential c+±related problem (ODR violations); it can automatically insert inline to fix it.
  4. warns me that I’m missing the possibility of f being an orange.
  5. warns me that my function could accidentally fail to return anything.

Of these, only “3.” wouldn’t be applicable to Julia. So, how well does the Julia language server do on my Julia translation?


The only warning it provides (twice!) is that apple isn’t defined, even though it is. :man_facepalming:

28 Likes

“Error” is too strong of a word here. Type unstable code does run in Julia. If it is needed and intentional, that is fine. The problem is when this occurs unintentionally in code, especially code that is expected to be performant.

This is the basic idea for how Julia is meant to be used. Note even without explicit typing, some types can be inferred.

In the field where I work, I often work closely with domain scientists who are not going to ever use a static language. My use case involves being able to exchange code easily with them, while still being able to take advantage of some features such as typing.

The answer to this getting more complicated with time.

The primary Julia tool stack is likely going to focus on the dynamic nature of the language. Tooling to better analyze types in code will improve over time, but these will be for the language as a whole not a static subset.

That said alternative tool stacks are starting to emerge. Some of these involve transpiling subsets of Julia to static langauges and then proceeding to compile to stayic binaries.

Correctness is a much larger concept than typing. There are cultural issues within the Julia community that make correctness challenging. The community tends to be very reactive rather than proactive at times. The intersection of typing and correctness seems a bit different in Julia than other languages. In Julia there is a cost to prematurely typing code: the loss of generality for code that otherwise may apply well to unkwown types. A frequent critique of code from those new to Julia is that they are overusing types. From the correctness perspective, you only want your code to apply to types that may satisfy certain conditions. How to implement traits or interfaces that may help lessen this dichotomy is not completely solved, but some early solutions do exist.

Python now has tools such as mypy. For JavaScript, there is now Typescript. The tension I see there is that these additions require additional steps for their advantages to manifest. Mypy is only going to be useful if one uses type annotations. TypeScript code needs to be transpiled to JavaScript.

In Julia, typed and untyped code can interact well. I can perform type inferrence analysis and type stability analysis on untyped code. Untyped code can still perform well and even meet some definitions of type stability (e.g. one can determine the output type of a function from the type of its arguments).

Most of Julia’s capability for checking types is probably available to Python or JavaScript with enough effort applied. I find it easier to do this in Julia simply because the base language is type aware and takes advantage of types to do useful things such as multiple dispatch or generate performant code.

The potential for Julia catching errors beyond other langauges is that it is easier to catch type errors or issues even in untyped code. When issues do arise, gradually adding typing can help resolve the issues. The dichotomy between untyped and typed programming is reduced.

Chris makes the point that static langauges are much better at static analysis and tooling. If there was no need for interacting with dynamic or otherwise untyped code, then in many situations static langauges are clearly superior.

Julia is a language that exists between several different worlds of programming. History has shown us that different kinds of langauges are needed to solve different problems.

7 Likes

Okay, this is off-topic and not a response to the broader thrust of your post, so excuse the nitpick, but this seems like a little bit of an oversimplification. Domain scientists tend to want to use languages that

  • they already know, or are easy to learn the basics of (time to go from zero knowledge of the language to a basic script to read data from a csv file, normalize the columns, apply some transformations and plot some graphs /compute some statistics is relatively small)
  • they are amenable to rapid prototyping and have a good REPL
  • they have pre-established libraries for everything they want to do, so they have to write as little code on their own as possible (they often think of themselves as library consumers/end-users/customers, and they often don’t think of themselves as programmers/developers)

Point 1 favors imperative languages with ALGOL or C-like syntax (not a Lisp descendant!), point 3 favors established languages with a large community in that particular domain, point 2 is the only one where you could argue dynamically typed languages inherently have the upper hand but this is not a settled point, it is controversial - many OCaml/Haskell programmers are quite happy with the REPL experience in those languages and would argue that static typing is still valuable during prototyping as a way to catch high level design errors or mistakes in the way you are structuring the problem domain in your code.

Again, sorry, as this is only a minor point in your response.

2 Likes

I assume the issues with Python are similar to the issues with Julia here, which is as much a social challenge as a technical one. I am a Python programmer and I eagerly experimented with Mypy for a while before giving up in frustration.

  • None of my client libraries or even the standard library use mypy, so calls to any library have no contract at all
  • Raising a github issue based on a Mypy error to suggest adherence to, for example, Liskov’s Substitution principle is likely to be met with annoyance/contempt (“Are you just telling us how you would have done it?”)
  • Idiomatic Python code is often “dependently typed” as several different functions with completely different behaviors and return types tend to be grouped together into a master function that takes a string argument selecting what the behavior should be (basically API design is biased towards people using it interactively in a REPL as opposed to people building a system using the module as a component)

To me it’s unlikely that Mypy will ever become widespread in use, if only because this would only happen with evolution of Python coding style towards a Mypy-compatible typing one.

1 Like

I did not mean to characterize domain scientists in general. I was describing domain scientists where I work, who are mostly biologists with little to no background in programming anything at all. MATLAB and Python are already quite a challenge for many of them to use. Any static language is simply just not going to be in most their futures. If I’m lucky, I get to work with someone is at least willing to try edit notebooks I send them.

3 Likes

The issues in Julia are distinct from the Mypy case. We start with types in the base language, and they have utility in dispatch besides correctness.

A pretty large part of the Julia codebase is in Julia.

image

We do not have descendants of concrete types or object oriented programming in base Julia. Misuse of abstract is problematic and usually a serious bug. A classic example is the following.

function sum_of_squares(input::AbstractVector)
    sum = 0
    for i = 1:length(input)
        sum += input[i]^2
    end
    return sum
end

Above, I’ve assumed that since Julia is a “one-based” language that all concrete types implementing AbstractVector would have indices starting at 1. That would be an incorrect assumption as one could implement compliant array types that start at arbitrary indices.

function sum_of_squares(input::AbstractVector)
    # checks that indexing starts at 1
    Base.require_one_based_indexing(input)
    sum = 0
    for i = 1:length(input)
        sum += input[i]^2
    end
    return sum
end

For most standard vectors, the check would be inlined and then compiled away.

julia> @code_llvm Base.require_one_based_indexing([1,2,3])
;  @ abstractarray.jl:131 within `require_one_based_indexing`
define i8 @julia_require_one_based_indexing_194({}* %0) #0 {
top:
  ret i8 1
}

The better way would be to avoid direct indexing…

function sum_of_squares(input::AbstractVector{T}) where T
   s::T = zero(T)
   for i in eachindex(input)
       s += input[i]^2
   end
   return s
end

… or or just map reduce

sum_of_squares(input::AbstractVector) =
    mapreduce(x->x^2, +, input)

We tend not to do this, but we can use multiple dispatch creatively to do this if needed. For example, we have a Val type that can push some primitives into the type domain as parameters. This allows us to then select the method we want in a type stable manner.

julia> foo(::Val{:square}, x) = x^2
foo (generic function with 1 methods)

julia> foo(::Val{:sincos}, x) = (sin(x), cos(x))
foo (generic function with 2 methods)

julia> foo(Val(:square), 5)
25

julia> @code_warntype foo(Val(:square), 5)
MethodInstance for foo(::Val{:square}, ::Int64)
  from foo(::Val{:square}, x) @ Main REPL[21]:1
Arguments
  #self#::Core.Const(foo)
  _::Core.Const(Val{:square}())
  x::Int64
Body::Int64
1 ─ %1 = Main.:^::Core.Const(^)
│   %2 = Core.apply_type(Base.Val, 2)::Core.Const(Val{2})
│   %3 = (%2)()::Core.Const(Val{2}())
│   %4 = Base.literal_pow(%1, x, %3)::Int64
└──      return %4

julia> foo(Val(:sincos), 0)
(0.0, 1.0)

julia> @code_typed foo(Val(:sincos), 5)
CodeInfo(
1 ─ %1 = Base.sitofp(Float64, x)::Float64
│   %2 = invoke Base.Math.sin(%1::Float64)::Float64
│   %3 = Base.sitofp(Float64, x)::Float64
│   %4 = invoke Base.Math.cos(%3::Float64)::Float64
│   %5 = Core.tuple(%2, %4)::Tuple{Float64, Float64}
└──      return %5
) => Tuple{Float64, Float64}

Introducing a method that just takes a symbol directly, produces type unstable code.

julia> foo(s::Symbol, x) = foo(Val(s), x)
foo (generic function with 4 methods)

julia> @code_warntype foo(:square, 5)
MethodInstance for foo(::Symbol, ::Int64)
  from foo(s::Symbol, x) @ Main REPL[29]:1
Arguments
  #self#::Core.Const(foo)
  s::Symbol
  x::Int64
Body::Union{Int64, Tuple{Float64, Float64}}
1 ─ %1 = Main.Val(s)::Val
│   %2 = Main.foo(%1, x)::Union{Int64, Tuple{Float64, Float64}}
└──      return %2

Rewriting that slightly different let’s me take advantage of constant propagation resulting in type stable code.

julia> function foo(s::Symbol, x)
           if s == :sincos
               foo(Val(:sincos), x)
           elseif s == :square
               foo(Val(:square), x)
           end
       end
foo (generic function with 4 methods)

julia> bar(x) = foo(:square, x)
bar (generic function with 1 method)

julia> @code_warntype bar(3)
MethodInstance for bar(::Int64)
  from bar(x) @ Main REPL[41]:1
Arguments
  #self#::Core.Const(bar)
  x::Int64
Body::Int64
1 ─ %1 = Main.foo(:square, x)::Int64
└──      return %1
3 Likes

In julia the common analogues tend to be

  • Val{:foo} as @mkitti mentioned, and
  • functions that change their interface depending on the exact type of the arguments.

Both of these are quite harmful imo but unfortunately they have a strong fan base.

1 Like

Note that in C++, as long as you have either a typed call, or some form of typed declaration, it will infer types from there.
20240608_20h38m21s_grim

Because of the foo<>(long, long), it infers and checks foo + everything foo calls/uses using those types.
You can add multiple, which will do more checking, but it’ll stop the type hints from appearing:

You can also use concepts to constrain dispatch, so you could write two methods like baz(RequiresOneBasedIndexing auto x, auto y) vs baz(auto x, auto y), and the former will be used for any x with 1-based indexing, and otherwise it falls back to the latter.

Anyway, my point here is that C++ tooling has a reasonable solution: give an entry point, and it’ll type check everything from there.
More likely, you don’t write explicit declarations like I did above, but instead just write tests with actual calls.

long x = 3, y = 4;
// the following line would trigger `foo(long, long)` being checked
EXPECT_EQ(foo(x, y), 19);

would work just as well. Or, for something simple enough, you could even use a static_assert:
20240608_20h40m15s_grim

I don’t see why our test/runtests.jl shouldn’t work in the same way, allowing types to be checked while we write code for the type combinations used in our tests.

Of course, test suites do often take advantage of Julia’s dynamism

for T in [Float32, Float64, ForwardDiff.Dual{Nothing,Float64,2}]
    # run tests using `T`
end

but it’d be easy enough to restructure our tests.

run_tests_for_type(Float32)
run_tests_for_type(Float64)
run_tests_for_type(ForwardDiff.Dual{Nothing,Float64,2})
3 Likes

You may want to ask the author of SyslabCC: Suzhou-Tongyuan’s proprietary Julia AOT compiler is now available for free use (personal & educational license only) . I guess the answer would be yes.

1 Like