Does Julia Create a "1.5" Language Problem?

It seems that I need to explain this a bit more detailedly. This is the eval implementation detail of Julia’s dynamic dispatch.

So before Julia goes into the method table, it will firstly use the stackframe address to lookup another hashtable. For example:

for i in (...)
    f(x...) <- dynamic dispatch
end

The stackframe now contains this function and the jl_apply_generic. When you loop through this collection, you will call jl_apply_generic multiple times in the same stackframe. So you can always hint the cache if the input types are the same. This mechanism is designed to optimize type instability (when input types are not changed but type inferencer fails to prove that). It’s also used to bootstrap julia’s type inferencer, where every code is inferred as Any and you still want to run the codes quickly.

That’s why a simple loop microbenchmark yields a good result, because you hint the cache quickly. So we use quicksort’s branching and recursion to create varying stackframe address to invalidate this optimization.

1 Like

I’m sorry, I don’t understand low-level details enough to know what this cache is, what is being cached, or how it’s related to calling a function in the same stackframe.

I don’t think throwing this out of benchmarks entirely is useful. It’s common in even Python programs that the argument types don’t vary in a loop, that only a.f varies in a.f(x, y). But it does warrant a benchmark where every involved instance’s type changes per iteration, just for the most flexible runtime dispatch. There’s no reason to forbid any optimizations as long as the criteria are met.

Issue is the wider algorithm including the recursive calls has its own overhead that can’t be separated from timing the runtime dispatch. Granted, varying the types of instances per iteration would have its own overhead, but that could possibly be timed separately and roughly subtracted.

You don’t need to convince me. You should directly tell the core developers that you don’t want two dialects (or language subset) and 100% type stability is unimportant. And as you can see in many recent threads there are lot of people in Julia community who want static typing. What can you do? What does this trend reflect?

Eventually Julia will slowly work towards this direction and split into two halves, no matter you like it or dislike it. I made a similar prediction under the TTFP post in 2019, where I pointed out that:

  1. The essence of TTFP is that you don’t cache enough and rebuild cache is costly.
  2. To solve 1, incrementally caching binary codes is the only solution. Full system image by PackageCompiler is the wrong solution.
  3. To achieve 2, semantics of Julia’s module will be more or more restricted and structured.
  4. If you can cache precompiled IR correctly, caching binary codes is trivial by dumping LLVM codes. Then TTFP is solved.

This is the starting point for all my compiler work in Julia. Everything is so natural here. Once I got the logic straight I could confidently begin my work, the rest is to read Julia’s compiler source code and add the implementation here and there.

A lot of people in that thread dismissed me because I use pure logical derivation (thought experiments) to draw these conclusions, which implies “I pretend to be a compiler expert”. I also used a lot of logical derivation in this thread and another small binaries thread, and I care less about precision of terminologies and experimental data.

What I want to convey here is that normal Julia users can also form their judgements by using this method, without being a compiler expert and reading C’s codes. Note that I am neither biased toward a dynamic, nor a static Julia. But I know that if Julia is dynamic, then it has X pros and Y cons, which may lead to problem Z. So if people vote for solving Z, then conclusion is that Julia can’t be this dynamic. I simply can’t change any of these facts.

4 Likes

People have contexts where runtime dispatches, allocations, garbage collection, or some other Julia feature isn’t feasible. Achieving a 100% type stable program is a helpful condition for those in some respects. That is not the same as static typing. Julia isn’t a partially dynamically typed and partially statically typed language, it is a dynamically typed language, full stop. It is also logical to use precise terminology or data; communication collapses when words don’t have the same meaning to everyone.

3 Likes

Can you tell me how you draw the conclusion that type stable != static typing? Note, Julia core developer’s words are not bible. If you read StefanKarpinski’s answer carefully, then it implies exactly the opposite when you define a static subst of Julia because of the same arguments.

I don’t really see how communication collapses here, other users like ToucheSir understand my statement without difficulty. And actually I have to point out that I am not the one who uses these terminologies wrongly. For example, union splitting has a much narrower meaning than you use. I just didn’t want to spend my words on correcting these messy usage of compiler terminologies because I believed we can understand each other.

I don’t see such an implication, agree to disagree, then.

I shared that link earlier and it is the exact meaning of union-splitting I’ve been using, sorry if you misinterpreted.

I know that this is not your point in the thread, but since this is a thread that is originated in a discussion about translating code to Julia, I think we should just keep registered here that the Julia code you posted would obviously not be a translation of that python code to Julia, much less by a newbie. The obvious translation of that python code to Julia is almost direct:

straightforward translation
function partition(array, low, high)
 
    # choose the rightmost element as pivot
    pivot = array[high]
 
    # pointer for greater element
    i = low - 1
 
    # traverse through all elements
    # compare each element with pivot
    for j in low:high-1
        if array[j] <= pivot
 
            # If element smaller than pivot is found
            # swap it with the greater element pointed by i
            i = i + 1
 
            # Swapping element at i with element at j
            (array[i], array[j]) = (array[j], array[i])
        end
    end
 
    # Swap the pivot element with the greater element specified by i
    (array[i + 1], array[high]) = (array[high], array[i + 1])
 
    # Return the position from where partition is done
    return i + 1
end
 
# function to perform quicksort
function quickSort(array, low, high)
    if low < high
 
        # Find pivot element such that
        # element smaller than pivot are on the left
        # element greater than pivot are on the right
        pi = partition(array, low, high)
 
        # Recursive call on the left of pivot
        quickSort(array, low, pi - 1)
 
        # Recursive call on the right of pivot
        return quickSort(array, pi + 1, high)
    end
end

And runs (for an array of 1000 elements, which is what I could run in python here), much faster than the python code:

In [29]: x = np.random.random(1000)

In [30]: %timeit quickSort(x, 0, len(x)-1)
187 ms ± 306 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
julia> x = rand(1000);

julia> @btime quickSort(x, 1, length(x)) setup=(x=rand(1000)) evals=1
  43.447 μs (0 allocations: 0 bytes)

At the same time, that direct translation does follow all Julia performance tips, because the Python code was already well encapsulated into functions. Thus probably a complete newbie could make other mistakes that might cause Julia to perform worse. As with the original post, I think part of the “1.5” language problem is actually a “1” language problem: if you are translating something to Julia, the results will be probably good if you know Julia well. Not if you know the other language well. That’s probably one of the main causes of frustration.

23 Likes

Exactly. And I always wonder why this is something we always have to make explicit. My only answer is, that expectations are set too high, which harms the acceptance of the language. But in general this is banal wisdom.

Some thoughts to the “small binary <=> static subset of Julia” discussion above. I am awaiting the small binaries very much, because I have some very concrete use cases for this, they look like this:

First, I came to Julia (early 2012) to solve MY 2 language problem, which was needing R for large statistical analyses which brought me regularly into C of other people. It was always pain. Julia solved this for me, so I fall in love with Julia, it took the pain from me.

Now, around this large analyses there is a large “eco system” of all types of scripts (bash, perl, .NET, php, …), admitted it’s not “eco” but “horrible”.

I want some (not all) of these scripts in Julia but without the runtime. Typically those scripts are quite small, very specific and not much dependent on any packages. It would be a no-brainer to make these scripts statically typed, and actually I want the Julia compiler to see, that everything has well defined types and just spit out a executable without any runtime, because it will never be used. That’s my hope while waiting for small binaries. (Important: they need not to be extremely small, just not huge like now; reasonably small is good).

There are tendencies in this community to go into extremes: the speed discussions are the archetype: if we don’t beat rust/c/c++/cpython we are fighting until mission achieved. I see “small binaries” mutate into similar extremes and this is what I see in above discussion, too. Of course, if we want to port the complete SAP world into small binary Julia, yes, we would need a new sub/superset statically typed Julia. Of course, there are good arguments above and it seems to be true. BUT we don’t need small binaries for large systems! For large systems written in Julia, it’s all fine with LARGE binaries, the extra bytes for the complete runtime, even for a complete system installation is neglectable. But for a small system tool, written in rust gives you ~100Kb executable, but in Julia getting a ~500MByte blob is NOT neglectable (I don’t have actually a real example, someone can correct me with real numbers).

For this outlined scenario I am waiting for small binaries. I still believe that this is possible although I believe that @anon56330260 has some good points but not all I understand.

Julia is a PRACTICAL language: it can be easy, it can be math-like, it can be _______ (enter what you like most). AND it can be fast, too ! It can even be fastest! Now I just need “it can be small”. Some people will need smallest, of course. But nobody needs everything at once in a single example, but arguments tend to run into this reasoning all too often with the result that we can’t have it. True, but not an issue for me, as long as my practical daily problems are solvable in a reasonable way in Julia.

10 Likes

If you do not need the runtime, does GitHub - brenhinkeller/StaticTools.jl: Enabling StaticCompiler.jl-based compilation of (some) Julia code to standalone native binaries by avoiding GC allocations and llvmcall-ing all the things! work for you? The main limitation for me is that it does not load the runtime. It seems that may not be an issue for you.

Probably yes, but I didn’t tried it yet. It’s not that important, because it works already as it is without Julia, but it isn’t nice. I start with this, when Julia has an official small binary feature, if I like this feature as it is in the future. If it will not come I don’t have a problem, just a personal disappointment.

Currently, for newer small things, I use rust. Has anybody said already that implementing in rust is a pain in the … ? Great, i know my executable is clean, secure, thread safe and there are no memory leaks… but, HELL NO, all those unwraps, borrows and lifetimes,… and I never wanted to use threads at all.

So, I say it now offtopic (happy this is closing soon :slight_smile: ): Rust is Pain, Julia is Fun.

2 Likes

The quicksort algorithm is not vectorized at all in the MATLAB/NumPy sense (and isn’t suited to be), so it might be faster on a list import random; x = [random.random() for i in range(1000)], not sure now though. List-indexing retrieves a stored instance, but NumPy array-indexing instantiates NumPy scalars because inlined array elements are not semantically instances in CPython. All Python objects live on the heap, so all those indexing instances are also heap-allocated; the GC at least doesn’t track designated atomic types like the built-in int and float, but I don’t know if NumPy’s scalar types are or could be included. In the past, I noticed this contributes significant overhead when I forgot to decorate a loop-based function with @numba.njit.

1 Like

Yes, 3x times faster:

In [7]: x = [np.random.random() for i in range(1000)];

In [8]: %timeit quickSort(x, 0, len(x)-1)
63.6 ms ± 525 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
1 Like

Thank you for bringing this thread back down to earth.

8 Likes

This topic was automatically closed after 6 days. New replies are no longer allowed.