Review of presentation

xiaodai · December 7, 2017, 10:21pm

I made this presentation focused on the data ecosystem in Julia. I just wanted to post it here for feedback so I can polish it.

Thanks

vchuravy · December 8, 2017, 5:59am

Nice! I especially enjoyed the delving into why group by is fast and implementing an appropriate algorithm. Maybe highlight in the end that Julia is now beating a highly optimized C backed R package.

xiaodai · December 8, 2017, 6:10am

Yeah it’s only beating data.table in a specific case. In the more general case where we group by more than one column, we don’t even have code for that.

Also, I am using 4 threads and only beating data.table by 30%. Additionally data.table is generally faster for smaller group by, only by fractions of a second but still it’s something.

Once Julia can comprehensive beat data.table at everything then it’s time to celebrate.

Tamas_Papp · December 8, 2017, 6:33am

Are you sure that the history of various missing value implementations (including those now obsolete) is relevant?

Also, I tend to prefer plain vanilla PDF slides (eg with beamer) to Prezi’s dizzying zooming around, but that’s a personal preference.

One of the slides has a misspeling (doesn't is mispelled), but I don’t know how to refer to it.

A small example that does something one would do with eg tidyverse in R may be enlightening, if the audience is familiar with that.

yakir12 · December 8, 2017, 8:37am

Thanks! That was enlightening.

nalimilan · December 8, 2017, 9:30am

Nice, but note that “Julia doesn’t have built-in missing value” is no longer true:
https://docs.julialang.org/en/latest/manual/missing

xiaodai · December 8, 2017, 9:55am

Ok so it will be part of 0.7 release then

cormullion · December 8, 2017, 10:41am

Here are a few comments on the presentation (I thought the technical content was pretty good).

I like the general style, with a focus on simplicity.

What’s the D for?
Some of the slides look like they haven’t been given the graphical treatment. Eg The Fast slide (about 6 in) could be made more consistent with the others. The next slide probably has too much information on it. Are there some templates that could provide some consistency?
The code samples should be careful with formatting — those gray comments are difficult to read.
URLs are hard to read if they’re underlined, and if you can’t click on them (Prezi?) you’ll have to type them in. An idea might be to put a clickable link icon after the URL, or a footnote at the bottom of the slide.
Should the graphs have some units/labels? Not important, but the initial impression of the first bar chart was “Julia is wicked fast, it can do 160 per second…”

Anyway, it was good!

hgeorgako · December 8, 2017, 3:28pm

Content is good but the Prezi format makes it hard to focus on the value provided by the Julia data ecosystem. I’ve seen many technical presentations (pitches) and the ones that stick in my mind are the ones that answer the specific question very early on: What can X do for me now?. Notice the emphasis on the me and now. I don’t get that fuzzy feeling when I look at the first few slides of your presentation. Here are some suggestions.

If your audience is comprised of novices, they’ll want to know why to pick Julia as opposed to R or Python. For them, have a single slide that emphasizes the speed aspect of the language, the ease of learning (comparable to Python) and the long term potential. Mention that Julia is here to stay and that very soon it will be a valuable skill to have as a data scientist/developer. Sprinkle in some quotes about the 2 language problem and the ability to do parallel stuff (in contrast to Python/R). Topics like multiple dispatch, lisp style macros, meta-programming should not be mentioned at this stage. It will confuse everyone. In the next slide, show them how they can download Julia and use it asap. There are many options now, Juliabox, docker, binaries, etc.

Use the next few slides to show them how to use their brand new toy. Very concise and practical code snippets that do 1 thing very well. Code should be formatted properly and with large font. Not more than a page per concept. If you can fit a graph, do so. Audience members in the back of the room should be able to read the code on the slide.

In the next few slides, wow them with some more advanced stuff. No theory. Just show them the code and explain what it does. They might not understand all the nuances. That’s fine. That’s why they call you for further information.

For the more seasoned audience members, focus on the DataFrames ecosystem, JuliaDB and definitely mention all the new Machine/Deep learning frameworks that are coming online. Define a very specific problem and then solve it for them in 3 to 4 slides using DataFrames, pandas and data.table. Show realistic comparisons and mention that all of this is written in native Julia as opposed to C or C++ (2 language problem). Then tell them how in the near future, things will get even faster and better.

I wouldn’t bring up anything about the treatment of Nulls, missing values, etc. You can specify relevant links to all these topics on the second to last slide of your presentation. The last slide should be your contact information only. As the expert consultant, if they have any questions, you can answer them all for them.

swissr · December 8, 2017, 4:26pm

Good feedbacks already. I had a quick look and the following remarks:

don’t think that ‘inspired by Python’ is correct. I would have guessed Matlab, Lisp, Dylan (and Pascal b/c of the (begin)/end… ) and others
don’t think that pointing out interpreted (R/Python) vs. compiled C for speed is important. I’d say
- Julia is fast because the language has been designed for it (incl. accepting compromises in the dynamic capabilities). This allows ‘the language to talk to the compiler’ more directly
- there have been successful compilation attempts with R/Python, but it’s difficult b/c those languages are very dynamic. But JavaScript for example is fast
don’t know about DataFrame. Wouldn’t tell too much history other than finally there is hope that with 0.7/1.0 everything will fall into its place and be fast
[for quite many things, I think, patience is still in order and I wouldn’t ‘overhype’ Julia; Julia is not yet completely polished for ‘lazy end-user use’. I wouldn’t raise expectations with e.g. julia dataverse: too much flux still, one cannot compare with hadleyverse, tidyverse, shiny, RStudio, … atm. But the important thing is, that the foundations are super-sound afaict and this is what will matter in future. R is great but imho has nowhere to go, it won’t ever be possible to fix the shortcomings (there was a reason one of the founders, Ihaka, said: start over and build something better…). I don’t have too much experience with Python but I don’t think Python ever will be able to offer the data scientists a nice concise syntax…]

If you have some time for background reading I’d recommend the master thesis of Jeff Bezanson. As a non-computer scientist I found it more approachable than the PhD.

ChrisRackauckas · December 8, 2017, 5:49pm

Nice presentation. I think there’s a bit of fluff at the beginning, it could get to the meat of it faster. I think it can emphasize at the end that the Julia solution means packages written completely in the higher level language, which not only makes it easier to write/maintain, but also makes it easier to add unconventional features like support for weird number types, out-of-core, GPU acceleration of specific algorithms, etc.

I think it’s good to keep an emphasis that Julia itself doesn’t make things magically faster since all this other stuff can just be written in C/C++. However, what it allows you to do is match C/C++ inside the same language that you’re scripting with. So since most of the speed comes from implementing intelligent algorithms, Julia’s advantage isn’t really that it’s raw speed. Rather, Julia’s advantage is that it’s much easier to develop a package with a lot of complicated algorithms in it, and the hope is that overtime the sheer productivity advantage without the performance disadvantage will win.

jlperla · December 8, 2017, 7:24pm

Is that true in the longrun? For example, even without using TMP, many things in C++ generate faster code than C because it can inline better (e.g. passing function pointers to sorting algorithms relies on smart compilers). Also, isn’t the main reason Fortran can be faster than C that it has doesn’t have aliasing problems with arrays, which leads to better compiler optimizations? It seems to be that there is no reason Julia can’t be faster than both C++ and Fortran! I realize this is about theoretical asymptotics of performance, but (if true!) these are worth disseminating (and encouraging big players to aid in the compiler back-end development).

ChrisRackauckas · December 8, 2017, 9:00pm

Julia inlines functions as well. I actually showed the other day:

that part of the reason we can get to the speeds we do with DiffEq is because of specialization on the functions and the inlining that tends to follow. So even though C++ can inline, it cannot inline functions which don’t exist at compile time, which then interrupts some of the optimizations when used in this “Python + C++” or “R + C++” setup. So you probably get those back when using C++ directly, but not when using C++ through a scripting language. However, Julia does get naturally get this boost. This is very helpful not just in optimization or DiffEq, but also for things like maps, find, search, etc.

Julia can implement higher optimizations in local scopes via macros. This has already been suggested:

github.com/JuliaLang/julia

RFC: Macro for expression noalias hints

opened 10:37PM - 19 Dec 16 UTC

Keno

We are missing significant optimization opportunity for small constant-size loop…s. For such loops it is not profitable to generate run-time alias checks, but we still loose significant performance due to lack of vectorization. I'm proposing introducing a macro to annotate variables that are known not to alias, similar to `restrict` in C. My initial thought is something like this. ``` @assert_nolias a::Array # Implies that `a` does not share memory with any other variable in the current scope ``` For generic structs: ``` @assert_noalias b::Foo # morally equivalent to for i in nfields(typeof(b)) @assert_noalias getfield(b, i) end ``` I think these semantics are pretty strong and may be weakened over time, but it seems like an ok place to start. Eventually we may want to introduce more first class support for non-aliasing arrays (which array almost are, were it not for sharing by reshaping etc). Motivating benchmark is https://github.com/jeff-regier/Celeste.jl/issues/483

github.com/JuliaLang/julia

aliasing in Julia

opened 03:00PM - 22 Aug 14 UTC

StephenVavasis

This is a long-term issue that has arisen out of my recent discussion with James…on Nash on the julia users group: I would like to suggest that the core Julia developers come up with a strategy or roadmap for dealing with aliasing. In more detail, consider a code for matrix multiplication C=A*B with the calling format matmul(A,B,C) that is implemented in the old-fashioned form of three nested loops. It is well known by now that performance boosts of huge factors-- 10 or more -- are possible by reordering the loops, blocking them, etc. Indeed, this was the whole impetus behind the LAPACK project of the 1990s. Furthermore, many papers in the compiler community explain how a compiler can automatically transform three nested loops into high-performance code. However, if the compiler does not know that the user will never invoke the routine as matmul(A,B,A), then it cannot carry out most of these transformations because most of them would change the answer in the (unexpected) case that C and A are identical. Currently, the Julia manual says nothing at all about aliasing among function arguments. For this and probably other examples, it could be useful for the compiler to know that the arguments to a function will not be aliased. What is Julia's strategy in this regard? I can think of a few possibilities. As I am not a compiler person myself, I don't have a strong preference for which possibility should be followed, but I think there should be a strategy. 1. Forbid aliasing between arguments (at least, in the case that one of the arguments is going to be mutated), and put the burden on the programmer not to violate the rule. 2. Have the compiler carry out global analysis of callers and functions and identify no-alias instances from its global analysis. In this scenario, there could be multiple instantiations of the same method, so Julia's mechanism of carrying function pointers around would become more complicated, and functions like code_llvm would need a third argument to indicate what assumption is made about aliasing of arguments. 3. Do nothing, and omit compiler optimizations in Julia that require a no-aliasing assumption. It is up to the programmer to write the Julia code with the correct blocking necessary for high-performance code in matmul and similar examples.

Since this would be a feature addon, it’s not v1.0 material. But I hope to see this in a v1.x .

There are a few cases where I have found it hard to get Julia to 1x with good C++ code. Usually this difference is due to optimizations turned off due to aliasing, and this is an example we found in the DiffEq chatroom:

github.com/JuliaLang/julia

Performance issue (possible codegen bug?)

opened 04:12PM - 19 May 17 UTC

closed 06:41PM - 19 May 17 UTC

dextorious

Consider the following benchmark code, taken from the collision kernel of a latt…ice Boltzmann simulation: ``` function collide_kernel!(n, ρ, ux, uy, ex, ey, w, Q, NX, NY, ω) for y in 1 : NY for x in 1 : NX ρ[x,y], ux[x,y], uy[x,y] = 0., 0., 0. for q in 1 : Q nq = n[q,x,y] ρ[x,y] += nq ux[x,y] = muladd(ex[q], nq, ux[x,y]) uy[x,y] = muladd(ey[q], nq, uy[x,y]) end ρ_inv = 1. / ρ[x,y] ux[x,y] *= ρ_inv uy[x,y] *= ρ_inv usqr = ux[x,y]*ux[x,y] + uy[x,y]*uy[x,y] for q in 1 : Q eu = 3 * (ex[q]*ux[x,y] + ey[q]*uy[x,y]) neq = ρ[x,y] * w[q] * ( 1 + eu + 0.5*eu*eu - 1.5*usqr ) n[q,x,y] = (1 - ω)*n[q,x,y] + ω*neq end end end nothing end function init_data(N) Q = 9 n = rand(Q,N,N) ρ = zeros(N,N) ux, uy = zeros(N,N), zeros(N,N) ex = Vector{Float64}([0, 1, 0, -1, 0, 1, -1, -1, 1]) ey = Vector{Float64}([0, 0, 1, 0, -1, 1, 1, -1, -1]) w = Vector{Float64}([4/9, 1/9, 1/9, 1/9, 1/9, 1/36, 1/36, 1/36, 1/36]) return n, ρ, ux, uy, ex, ey, w, Q, N, N, 0.6 end ``` Benchmarking the code with `(data) = init_data(1000); @benchmark collide_kernel!(data...)` gives a very consistent 67 ms with Julia started with `julia -O3 --check-bounds=no`. The Julia installation in question has a system image built to take advantage of the target architecture (Haswell). To evaluate if this can be optimized further, I ran a C++ version of the same code through clang v4.0.0 (using `--std=c++14 -O3 -march=native`): ``` #include <chrono> #include <iostream> #include <memory> #include <random> static void collision_kernel(double* n, double* ux, double* uy, double* rho, const double* ex, const double* ey, const double* w, const int Q, const int NX, const int NY, const double omega) { for (int i = 0; i < NX; ++i) for (int j = 0; j < NY; ++j) { const int idx = i*NY + j; ux[idx] = 0.0; uy[idx] = 0.0; rho[idx] = 0.0; for (int q = 0; q < Q; ++q) { const double nq = n[idx * Q + q]; rho[idx] += nq; ux[idx] += ex[q] * nq; uy[idx] += ey[q] * nq; } const double rho_inv = 1. / rho[idx]; ux[idx] *= rho_inv; uy[idx] *= rho_inv; const double usqr = ux[idx] * ux[idx] + uy[idx] * uy[idx]; for (int q = 0; q < Q; ++q) { const double eu = 3 * (ex[q] * ux[idx] + ey[q] * uy[idx]); const double neq = rho[idx] * w[q] * (1.0 + eu + 0.5*eu*eu - 1.5*usqr); n[idx*Q + q] = (1 - omega)*n[idx*Q + q] + omega*neq; } } } int main() { const unsigned int NX = 1000; const unsigned int NY = 1000; const unsigned int N = NX * NY; const unsigned int Q = 9; const double ex[9] = { 0., 1., 0., -1., 0., 1., -1., -1., 1. }; const double ey[9] = { 0., 0., 1., 0., -1., 1., 1., -1., -1. }; const double w[9] = { 4. / 9., 1. / 9., 1. / 9., 1. / 9., 1. / 9., 1. / 36., 1. / 36., 1. / 36., 1. / 36. }; double* n = new double[N*Q]; double* ux = new double[N]; double* uy = new double[N]; double* rho = new double[N]; std::random_device rd; std::mt19937 mt(rd()); std::uniform_real_distribution<double> dist(0.0, 1.0); for (auto i = 0; i < N; ++i) { ux[i] = 0.0; uy[i] = 0.0; rho[i] = 1.0; for (auto q = 0; q < 9; ++q) { n[9 * i + q] = dist(mt); } } const unsigned int benchmarkCount = 1000; auto start = std::chrono::steady_clock::now(); for (unsigned int t = 0; t < benchmarkCount; ++t) collision_kernel(n, ux, uy, rho, ex, ey, w, Q, NX, NY, 0.6); auto end = std::chrono::steady_clock::now(); auto diff = end - start; std::cout << "n[0] = " << n[0] << std::endl; std::cout << "Time = " << std::chrono::duration<double, std::milli>(diff).count() / benchmarkCount << " ms" << std::endl; return 0; } ``` The C++ code gives a consistent 32 ms, a 2.1x advantage over the Julia code. Comparing the assembly produced by LLVM in both cases, the main differences that stand out involve the use of vector loads/stores and loop unrolling (which is far more extensive in the case of clang). I'm not sufficiently familiar with Julia internals to speculate as to whether this is the consequence of a known limitation or a new issue, so I'm reporting it here just in case. EDIT: versioninfo() output for completeness: ``` julia> versioninfo() Julia Version 0.5.2 Commit f4c6c9d4bb* (2017-05-06 16:34 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) CPU: Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz WORD_SIZE: 64 BLAS: libblas LAPACK: liblapack LIBM: libm LLVM: libLLVM-3.9.1 (ORCJIT, haswell) ```

This is one example among many that I would point to that give me the following heuristic: getting within 2x of C++ with Julia is easy, getting to 1x is possible but can take work in some cases. “Taking work” is generally avoiding things which cannot optimize due to aliasing, and avoiding the fact that views do not stack-allocate. But both of these can be worked around, and both of these are fixable.

Even if you don’t get that, 2x performance gains are comparing between the same algorithm. The kicker is that it’s much easier to write a very complex algorithm in Julia, and in many cases the gains from that are >>2x.

That’s why I’ve been disseminating it. My current view of Julia is very package-forward, essentially saying “why Julia is amazing is because it gives package developers an insane amount of productivity without sacrificing performance, yet in the end it’s also an easy scripting language with a REPL that you can give to an undergrad and have them punch in numbers”. I think Julia’s “winning strategy” is thus not by arguing about whether language internals are helpful, but by using the productivity advantage to build comprehensive and performant packages. This idea is expanded upon in this post:

This is getting somewhat away from the OP so if you want to continue we should do so in another thread.

Topic		Replies	Views
Preaching Julia to biologists Teaching & Outreach	76	5875	November 10, 2018
Why is Julia so great? New to Julia	77	10779	April 16, 2023
Julia vs R vs Python Community performance	106	28121	January 13, 2019
Convincing physicists that Julia is worth their time and effort Teaching & Outreach	34	5845	January 11, 2019
Why Julia - A Manifesto Community	38	5096	December 7, 2023

Review of presentation

Related topics