Topological sort (performance)

You’re right. I have no idea what this statistic is supposed to mean (get it?). Is it a mean, a minimum, a median? No clue.

1 Like

This is quite off topic but if you like to dig into it, newer version are more clear in (i)python now:

Jupyter QtConsole 4.4.1
Python 3.6.6 |Anaconda custom (64-bit)| (default, Oct  9 2018, 12:34:16) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.1.1 -- An enhanced Interactive Python. Type '?' for help.

%timeit 1+1
10.4 ns ± 0.213 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

End of off topic part:

  median time:      16.964 μs (0.00% GC)

In the light of

what does (0.00% GC) mean? Wasn’t just important part of code (GC) not measured here? (I have to admit that I don’t know what (i)python measure either - it could be that it is quicker just because GC was just postponed outside of measurement?)

Doing real, useful benchmarks is hard…

EDIT: Very sorry, I carried out this benchmark incorrectly in Python-- I’ll try to figure this out again later.

Here is one possibly extraneous datapoint for this discussion. I modified the
OP’s Julia set-up-code to first convert all the strings to Vector{Float64} (using the ASCII codes). I also made the same modification to the Python code. I didn’t change either work routine at all-- the only change was to the input. It turns out that you can’t use Python lists as dictionary keys, so for Python I converted the strings to tuples rather than arrays of floats. The running time is now: mean 23.8 usec for Julia, 43.3 usec for Python. So this experiment seems to support the contention made by several posters that the Python internals are optimized for strings and are less performant for other data types.

It is also my observation/hypothesis that for this kind of work current String implementation in Julia is not so good (if we want speed) in comparison to python’s.

This looks promising: " The general philosophy of the architecture is as follows: have a single easy to use type that can replace String that conforms to the recommendations of the Unicode Organization (which internally uses 4 types and is implemented currently as a Julia Union, and has O(1) indexing to characters, not just code units)" (source: GitHub - JuliaString/Strs.jl: String support package for Julia) but I am nor sure it is production ready nor it could really help here.

It would be pretty easy to replace the strings with integer id’s in this case, since it’s really just using them as keys rather than processing

Is this just a guess or do you actually have anything to support the hypothesis? Did you do any profiling? What is slower about Julia strings than Python?

Using Channels (I don’t know why we are talking about generators in this thread) as a way to implement the iteration protocol could indeed be warned against, perhaps in the language difference section.

3 Likes

@Liso linked another thread where a performance drop when continuously creating generators was observed. This is akin to the continuous creation of new Sets in the OP - why he referenced that exactly, I do not know.

Alright, the example in that post is better on master (4x faster):

julia> @btime inversions1($p)
  179.826 ns (21 allocations: 640 bytes)
45

compared to 1.0.1:

julia> @btime inversions1($p)
  717.085 ns (111 allocations: 3.44 KiB)
45

So while there is still a slow-down compared to the hand written loop, things are improving!
It might not be possible to get every high level abstraction to exactly match the performance of a hand written loop but the goal is to make them close enough that it shouldn’t be a big problem in practice.

8 Likes

Without channels, using just generators is still 20% slower than in python. I think it is problem too. Don’t you agree?

Again, this just seems like it is a mistaken belief that any Python code directly translated to Julia will be (magically) faster even if most of the time in the Python code is running optimized C code. That this is not the case is not a problem in itself. It is not the goal of Julia to be faster than Python when writing code in a Pythonic way.

What would be useful here is for some profiling and trying to identify if there is anything specific that is unreasonably slow and could be improved.

7 Likes

It was measured and no GC was performed so it took 0% of the time. GC is triggered by allocation so code that doesn’t allocate will never trigger GC. It can’t take time if it doesn’t happen.

1 Like

But there is:

  memory estimate:  31.05 KiB
  allocs estimate:  447

so allocation happened and if we want measure performance impact to system we need to measure GC work too.

Is this

  mean time:        24.255 μs (26.77% GC)

GC work caused only by toposort3 function? Or there could be for example something caused by BenchmarkTools.collect ?

FYI Python’s timeit module disable GC during the timing by default: timeit — Measure execution time of small code snippets — Python 3.10.6 documentation

Not sure what is the setting in IPython though. It may be fair to enable the GC in Python.

4 Likes

If I am not wrong ipython suppress possibility to enable gc in timing! So comparing median from julia 16.964 μs (0.00% GC) to python’s 20.2 µs per loop seems to be most fair here. So it seems julia is ~15% better in this test…

EDIT:
or better subtract GC work from mean (mean time: 24.255 μs (26.77% GC))?

julia> 24.255*(100-26.77)/100
17.7619365

=> 17.76 μs for Julia? (still ~14% better)

What was the topological sort use-case after all?

To summarize the main outcomes of this thread (since it’s gotten a bit off-topic):

  • Python is quite good at executing the Pythonic style implementation of toposort quickly.
  • Julia 1.0 is about as fast at the Pythonic algorithm, depending on the size of the problem.
  • Julia seems to have gotten about 4x faster on master, so probably it is now about 4x faster than Python on the Pythonic version of the algorithm.
  • There are other algorithms in Julia which are orders of magnitude faster, which are available in well-established pure Julia packages (LightGraphs).

Am I missing anything?

6 Likes

That was from the timing in Performance problem of count? - #13 by Sukera which was linked from Topological sort (performance) - #18 by Liso.

1 Like

So my claim is incorrect then?

Yes, because it is for a different benchmark. I don’t think anyone has measured the benchmark in this thread vs master.