Example for the Depth first multithread implementation performance gain as a motivation

Hello, I have found the following discussion on Julias multithread task scheduler implementation:

https://news.ycombinator.com/item?id=20507628

As far as I can read from the comments, the main advantage and the “new thing” is the “depth-first approach” for schedule the tasks.

If this is a real advantage for numerical computing, May it would be great to see an example and possible even add this example to the start page of julia (or at least at an important place on the web page).

Maybe someone has a good idea for such a thing and real comparison in speed gain compared to the breadth-first approach.

I thing this could greatly improve the attractiveness of Julia and show a huge difference to all the other available languages.

1 Like

My understanding is that there is no speed gain compared to carefully written threaded code. It’s just that doing the right thing becomes much, much easier. Not unlike automatic memory management vs manual allocation.

While I find the new multithreaded implementation in 1.3 amazing, the discussion linked above makes me skeptical about its value for advertising Julia. The typical response is “language X had it this in 1962”, without investing any effort in understanding what is going on. In this respect, HN is almost as bad as Slashdot. This is what it must feel like for an electrical engineer to talk to “audiophiles” :wink: EDIT scrolling past these I realize that many people do get the idea, but perhaps since they don’t generate discussion they are sorted down.

2 Likes

Ok, that is something I didn’t thought about :wink: Well, but perhaps a comparison of depth-first and breadth-first approach for a N cores with N tasks which again spawn N tasks (as in one of the comment by Stefan) would still be a good example on how “well-suited” julia is by default (without knowing anything about multi-threading at all :wink: )

Early termination in the parallel reduce I implemented in Transducers.jl depends on the depth-first scheduler. Ref: Thread- and process-based parallelisms in Transducers.jl (+ some news) - #3 by tkf

1 Like