Preaching Julia to biologists


#1

People, I’m scheduled to give a short (15 min) presentation about Julia to our group on next Tuesday! The crowd is composed of (~15) biologists that work in Animal Behavior (but not in Systems Biology) with a bit of experience in Matlab and R. The goal of the talk is not to teach them Julia, but to convince them that Julia is worth using (and eventually learning).

I’m putting something together now, but it would be fantastic if anyone that gave a similar talk could share with me theirs. :pray:

Many thanks!


Moving from v0.6 to v1.0 has been so annoying
#2

Here’s a link to 20 minute introduction to Julia I held for for energy system modelers. It’s mostly inline with the approach most people use in an introduction (benchmarks, code samples, solving the two-language problem, etc), but with a few extra comments for the modeling group of our department.
http://gofile.me/3TJXy/mEGZtH0en

I’m releasing this under strict WTFPL license terms. Enjoy. :slight_smile:


#3

Awesome, thanks!!!


#4

Maybe this part about file formats, I think it’s something that people can easily relate to, since dealing with all these file formats and tools is quite frustrating:

(maybe you can get the slides)


#5

Impressive stuff! I wish I had some bioinformaticians in the crowd, but the principals Ben mentioned are all very worth mentioning.


#6

In case anyone’s watching, here’s my presentation as it stands now. I still have till next week to change, add, and improve it. If you find anything, let me know (the original markdown file I used, with reveal-md, to generate the static website is there too)!


#7

I always find it helpful in this kind of evangelism talk to have a “when not to use” slide: it makes things more honest (which builds confidence in the audience), manages expectations and makes people aware of the weaknesses of the proposed solution. In this case, for application people I’d mention 1 ecosystem (packages, IDEs, debugger…) not as mature as other environments 2 more general purpose than R or Matlab (makes some things harder to find) 3 lower “stackoverflow effect” (how likely you are to get the answer you’re looking for as the top result on Google) 4 non-trivial performance engineering (you usually can’t just take a Matlab code, convert it straight to julia and expect an improvement)


#8

This is nice, but doesn’t seem like it fits the audience you said you’re targeting. Channeling myself from 4 years ago (when I got my biology PhD but had not done any coding), none of this would seem relevant to me.

I’d want to see some examples of things that I can do with Julia that I can’t do with other languages, or something that’s easier, or something that could be done. One thing that’s a slightly different 2 language problem than the one you describe is the fact that python isn’t great for dealing with matrices of data and with plotting. R is good for those things, but sucks at things dealing with files and with regular expressions. Julia is great at both :slight_smile:


#9

Awesome, so apart from the comparison to R and python, what would have impressed you 4 years ago? What like a comparison to excel? Not sure exactly what would…


#10

Great advice. Yes, I added your (~exact) points. Thanks you!


#11

helpful in this kind of evangelism talk to have a “when not to use” slide

Maybe also mention the painful JIT wait-times.


#12

I know what you mean, but I’m struggling to find a good example. For one, any example you might bring up will likely impress total novices whether I use Julia or MATLAB/R/Python to implement it. If they are truly total non-programmers (which is not the case) then the magic of any-thing-in-programming will impress. So I have to draw on general comparisons to the other languages. Which leads to:

I’m not so confident in these statements. Or let’s say, I’m not that experienced in all these interpreted languages that I feel ok with claiming that that specific difference is both significant and large enough for switching. I think that the two-languages problem is where Julia is significantly better than the rest. But feel free to convince me so I can convince others…! :slight_smile:


#13

I’ll add that to my notes (which are included in the .md file). I suspect that this would require an understanding of how the JIT compiles everything to machine code and the subsequent runs use that etc etc… might be over their head as a slide on its own. If someone asks or if it fits in I’ll mention it.


#14

I agree with @kevbonham: unless the audience consists of very experienced programmers (who are frustrated with their current tools), this is going to be a bit abstract.

Also, on a minor note, I found that animated transitions distract the audience (and make some people dizzy, especially with large projections).


#15

First and foremost, thank you people for taking the time to look this over!!!

I’m more than ready to add different content, but I’m not really sure what. Compare this presentation to John Gibson’s talk (or this), I think I’ve dumbed it down quite a bit. I hardly have any code for the audience to follow, or @code_llvm demonstrations. Yes, the compiled versus interpreted bit as well as the “Dynamic & fast, how” slide are going to be a bit over their head, but I feel this is important in order for them to understand why and where Julia shines. To put things in better perspective, the crowd is small and the settings are very informal – conversational even. So they’ll stop me and ask if they feel too lost.

I’ll gladly add any specific suggestions for bringing this closer to the people!


#16

No, what I meant is more “package load times”, “time to first plot”, etc. I don’t think you need to explain the JIT (or more correctly the just ahead of time compilation JAOT), as such.


#17

Ah… Yes, you’re right. Will do. Done.


#18

Yeah - good question. It’s a bit hard to cast my brain back even that far :laughing:

You’re not alone - I find it difficult to articulate to other biologists that do some coding why I prefer julia. I guess the best argument for people that don’t code at all is:

  1. the syntax is highly readable.
  2. There’s a ton of flexibility - some people prefer loops, some prefer vector operations. Both are performant and readable in julia.
  3. In science we want to be at the cutting edge, we don’t want to have to wait around for someone to write the fast version in C to be able to use a new algorithm. @ChrisRackauckas has better examples on this point than I do.

Here’s a biology - specific example: There was this paper that used MinHashing for DNA sequence comparison. They implemented it in C, and it’s super fast. Someone else made python bindings. Which is all great. But the algorithm is pretty simple, and I managed to implement it and make it super fast in under 100 lines of very readable code.

Another side argument - it’s mostly fast because of suggestions from @bicycle1885 for little tweaks - the julia community is super awesome and generous. That’s at least as important as all of the technical stuff.


#19

This is a great example, alas the people I’ll be talking to are not bioinformaticians or do any genome sequencing work… Nevertheless, I might include included this as an example… You reminded me, I wrote this ray tracer, it too is 100 LOC and very flexible and fast. I could mention mentioned that as well.

I try to show case that here, but maybe I should put more into that section?


#20

I’m a biologist myself, with very little programming background and who knew R and some basic Python before starting Julia. What brought my attention to Julia in the first place was Douglas Bates’ discussions about his transition from R to Julia. I was using GLM’s and GLMM’s heavily and eventually found out that of THE guys responsible for the development of a tool I was using all the time in R (lme4 package) wasn’t involved with the project anymore. As I loved R, I wanted to know why he had left and started learning about this “two language problem”, which is relevant if you want to develop packages, but it’s irrelevant for most students that just want to analyze data or produce plots. And I think that’s the basic selling point: do you people want to analyze data in small controlled experiments? R/Python should be fine. Do you want to develop packages, run long simulations or work with big datasets? Try Julia.

I tried to teach R to biologists that are not very inclined to use command line tools and no matter what you show them, they rarely did the transition from SAS or JUMP. But I also had colleagues that were, like me, interested in statistics, mathematical simulations, etcetera, and they were willing to put more time into learning a new tool.

Good luck with your presentation, it’s hard to get people to change their ways, because it’s hard. I’m doing some data analysis in Julia with fairly big datasets and I might do a comparison with R for a presentation in a Data Scientists meetup. I’m not done yet, unfortunately.