Comparison of languages for parallel computing tasks

The paper



compares Chapel, Python, and Julia. For reasons that are at this point not crisp to me
Julia has not performed well.
3 Likes

They don’t provide any of the code that they used for the comparison, making it pretty hard to evaluate this article independently. Here is the code on Zenodo, and here is their github repo.

5 Likes

There is a zenodo link to the complete codes.

looks like there’s no warm-up

1 Like

Q3AP-ILS unmodified:

% julia -t 4 ./ils_q3ap_par.jl nug12 100
Time (Init instance):   1.764211577
Time (First LS/Compilation):    0.230459977
======== ITERATED LOCAL SEARCH ======== 100
        Best solution

Solution([5, 4, 7, 6, 10, 11, 12, 9, 2, 1, 3, 8], [11, 7, 6, 2, 12, 4, 1, 9, 8, 5, 10, 3], 658)

        TotalTime:      3.390096732
        ILSTime:        1.395425178
        Nhood evals:    533
        NhoodPerSec:    381.96243582470316

with a bunch of const to avoid using global non-constant variables:

% julia -t 4 ./ils_q3ap_par.jl nug12 100
Time (Init instance):   1.796165105
Time (First LS/Compilation):    0.202069824
WARNING: redefinition of constant sol. This may fail, cause incorrect answers, or produce other errors.
======== ITERATED LOCAL SEARCH ======== 100
WARNING: redefinition of constant sol. This may fail, cause incorrect answers, or produce other errors.
        Best solution

Solution([6, 10, 9, 3, 7, 2, 12, 1, 8, 11, 4, 5], [6, 7, 4, 5, 2, 9, 11, 10, 8, 3, 12, 1], 778)

        TotalTime:      3.181146955
        ILSTime:        1.182912026
        Nhood evals:    528
        NhoodPerSec:    446.3561012101842

Running all the @elapsed ... twice to not measure compilation:

% julia -t 4 ./ils_q3ap_par.jl nug12 100
Time (Init instance):   0.065759911
Time (First LS/Compilation):    0.001595556
WARNING: redefinition of constant sol. This may fail, cause incorrect answers, or produce other errors.
======== ITERATED LOCAL SEARCH ======== 100
WARNING: redefinition of constant sol. This may fail, cause incorrect answers, or produce other errors.
======== ITERATED LOCAL SEARCH ======== 100
WARNING: redefinition of constant sol. This may fail, cause incorrect answers, or produce other errors.
        Best solution

Solution([4, 9, 2, 8, 5, 6, 7, 3, 1, 10, 11, 12], [8, 4, 1, 12, 5, 11, 10, 3, 6, 7, 2, 9], 752)

        TotalTime:      1.141571685
        ILSTime:        1.074216218
        Nhood evals:    544
        NhoodPerSec:    506.41573910774827

NhoodPerSec increases by more than 32% with two trivial changes

6 Likes

Are/have any of the authors been active here? In particular, it would be great to get some clarification on these statements:

Consequently, much of the information one can find in online forums and documentation is no longer valid

There is a large amount of documentation available, but it sometimes feels opaque—for instance we were unable to find information on the thread layer used for the multi-threading package.

Also, using globals is mentioned as a performance trap so often that not accounting for them and then claiming poor performance seems suspect. If nothing else, it makes this section sound like a cop-out:

In our case, both programmers have strong prior experience with C and parallel computing and little to intermediate prior knowledge of Python, Julia and Chapel. As detailed in Section 4, we have followed a protocol that aims at making the comparison fair. However, we cannot completely exclude that some parts of the code could be written more efficiently or concisely.

To put into context, the paper was submitted on 15 October 2019, which is almost one year ago.

1 Like

I can’t help but feel if they know to use Numpy and even Numba for Python (at which point it’s no longer benchmarking python anymore), at least they should know to time things properly? (with worm up etc.), and read the performance tips Manual page once?

For example, I won’t expect them to know to use @SVector, as a comparison. But come on, if the paper is about comparing* performance, at least get that little bit right.

1 Like

Concerning warmup: If the Julia code is cold-started, then it would be fair
to include in the time comparison also the compile/link times
for Chapel and C/OpenMP, wouldn’t it?

As in the paper :

  • Julia 1.2
  • page36: "Numba’s and Julia’s (experimental) multi-threading support is not mature
    enough to compete with OpenMP or Chapel in terms of scalability."

True for Julia at the time. It might be interesting to re-run with 1.6.

4 Likes

I am going to show some prejudice here. Chapel has been around for a long time. It is associated with Cray of course. I do not know if it will ever break out beyond that.
I can be very wrong - I am not plugged into that community.