Learning Julia for scientists who are beginning programmers

Thanks, I have read his book, and emailed him several times, very nice and helpful for sure. I’ve only had time to get through the first few chapters but already learned a lot about Julia from it. I do have a package (Pedigree.jl) but sure it’s very inefficient and I need to learn better algorithms in this space as I kind of programmed the brute force (A matrix) one and removed the if/else statements from R code I have. Hopefully as I learn it will improve a lot, but for instance, I use DataFrames.jl a lot because I come from R (only almost), and many other better programmers only use structs. I’m quite confused on the usage of structs, vs dictionaries, vs DataFrames, etc when there are many options to choose. I think a lot of the details fly over my head.

There is no substitute for time and practice. There is an enormous amount you need to learn. Reading helps but there is nothing like coding up a solution and then another one and then another one…

4 Likes

A handful packages that, as a practicing scientist, I love:

  • DrWatson.jl - for organizing reproducible data analysis projects
  • DataDeps.jl - for downloading data from third parties (helps with reproducibility)
  • DataFrames.jl - Cause its great!
  • Gadfly.jl - Cause it creates pretty plots (totally subjective)
  • Measurements.jl - Makes uncertainty calculations fast and easy

One of Julia’s strengths for scientific work is its package manager. Combined with DrWatson.jl, it makes creating reproducible data analysis projects that I can share with others (or reference in journal articles) much simpler than, let’s say, Python.

7 Likes

Hi @austin-putz I’m also a self taught programmer who came from an applied science background (epidemiology and biostatistics). For me, getting into Julia by using it in a real project was going to be too much work and uncertainty for me, so I started by just implementing small things that I was familiar with side by side in R and Julia to learn. Here’s an early example where I was trying to come to grips with dispatch.

Here I mocked dispatch on subtypes using R’s S3 dispatch for stochastic/deterministic Ricker model: S3 inheritance mocking dispatch on inherited types (github.com)

Here I do the same thing in much cleaner Julia: the same example as s3inheritance.R in julia (github.com)

Small fun things like this really helped me a lot.

2 Likes

Hey @austin-putz, quick question: when you say biology, are you limiting that mostly to things like DNA sequences, genomics, etc? Or are you also thinking about population health or medical tooling?

(Just asking as I am coming from JuliaHealth and was wondering if there are other tools you are thinking about – cheers!)

2 Likes

The Julia project maintains a list of books.

I think Julia is an excellent first language. There’s no reason to be familiar with Fortran or C in order to learn it, although, as you’ve discovered, too much documentation leans unnecessarily on concepts from other languages.

1 Like

Thanks, I have read this list, but I don’t have time to read them all to find out which are the best. After looking, someone pointed out Think Julia and another was working on a book, besides Julia for Data Analysis is excellent. I will just keep learning as I go.

1 Like

@TheCedarPrince our core is very very large mixed models, but yes we collect DNA samples on animals to do selections. Many are exploring sequencing right now, but mostly we use a ‘SNP chip’ which is 50k markers (ancestry uses a 700k for people for instance). But over hundreds of thousands of animals, the data gets very large, very fast. So I need Julia for this. I have followed some of the BioJulia or whatever that broke into separate packages but I don’t do a lot with sequence data, mostly it’s a n animals by 50,000 columns (0/1/2 coded). At times we do GWAS.

I will check JuliaHealth to see what they may have that overlaps with our research. Thanks!

@slwu89 great, I will take a look. I do think an R to Julia conversion for many things would be ideal, I think the official docs has some of this, but not super extensive if I remember. I know Bogumil has included many of these for DataFrames.jl, this is extremely helpful as I always keep track of my own when I find solutions. And yes multiple dispatch was quite confusing when I started.

1 Like

@austin-putz Your point about the usefulness of an R to Julia conversion guide is well-taken, especially in the context of DataFrames.jl as you mentioned. I’ve encountered similar challenges in my transition from R to Julia. To assist others like us, I’ve created a series of short YouTube videos some months ago that cover various use cases relevant to daily workloads in Julia, many of which draw parallels to R. These might be helpful to you. Feel free to check them out: https://www.youtube.com/playlist?list=PLW2H7Foa_rc1PNHipbSMlDPuLoKjirLr_

4 Likes

I find it daunting and boring to learn a programming language out of context. Once you have a basic footing, maybe look at the Advent of Code or Project Euler, where you can tackle some low-stakes problems.

To build on @cdawg’s comment, you can also use ChatGPT or (handy within VS Code) Copilot to explain existing snippets of code in plain language. Handy when you’re trying to learn from examples.

1 Like

@Alex_Tantos phenomenal! I saved and will look over. These are the easiest way to learn imo. I found silly things like googling “tables in Julia” was not in fact anything related to the table() function in R, had to learn the hard way :).

1 Like

Oh! Just a note – it probably won’t overlap much for your domain! We tend to veer more into public health, medical procedures (MRIs, ECGs, EKGs, etc.), and population health statistics. Just wanted to save you some time here!

2 Likes

This sounds very interesting. I used to do population genetics with non-model organisms, so my sample sizes were never that huge, but we wished we could have numbers like that for our experiments. We need more tools and packages in that space, for sure, and hopefully the community can help you develop that.

2 Likes

Regarding rapidly-expanding data. Check out J (www.jsoftware.com). According to its web site, it handles millions to billions of rows with little problem, and is fast. I realize this will lead you into learning yet another language. You may not have the luxury of time. I have been fascinated by J ever since my introduction to APL many, many (many!) years ago. It is a different world.

Just a suggestion, for what it is worth.

1 Like

@alejandromerchan Yes, that’s always a problem working with anything other than lab animals. Since we control and manage these farms, it is easy for us to get very large datasets and we are expanding our data capture into commercial production even so our datasets are vast. Furthermore, geneticists are moving into high throughput phenotyping (video cameras, wearables (like an apple watch), and other high tech). We 100% will need Julia and lower languages for this work no doubt. Slowly breeders are moving over but not fast enough (imo).

1 Like

I am a scientist with very little formal background in programming. There is an overwhelming amount of resources out there for you to get started with programming. But if you work primarily with numerical data, there are things you need to know as a user of the language and its packages, regardless of what programing language you use. And then there are other things you can learn as you shift from being a user to a developer. So here are some possibly unorthodox opinions based on my own experience to help you prioritize:

As a user you should know:

types of numbers and numerical precision

  • How are integers different from floats? In Julia, 9^(10^6) is a negative number and 10^20 != 10.0^20.
  • how is a Float32 or a Float16 different from Float64? Float32(1/5) != Float64(1/5)
  • What is an Inf? What is a NaN? (NaN stands for Not a Number but then ask Julia if NaN isa Number)

data types

  • Arrays are really important. Julia’s Array allow you to do amazing things. What does it mean when we say Arrays are mutable? (Mutability is also related to performance, though you should probably not worry about performance initially). You should know that x .= 1/x in Julia is fundamentally different from x = 1/x.
  • Tuples are probably the next important data type.

As you start shifting from a user to a developer, or start writing code to be reused by yourself or others, you want to get comfortable with the following, which I would still consider quite fundamental:

if/else conditions, functions, loops

seperation of objects/structs and functions/methods

  • As scientists, we might be used to just using commands that tells our computers things like ‘linearly regress y on x’. But it helps to start thinking about what objects/structures are and why they are useful. So when you see things like length(dataset.column), you have an intuition that length is a function/method, dataset is the object/struct which has a field/property called column.

scope

  • There are different rules about whether/how variable names can be reused in different blocks such as module-end, let-end, begin-end, function-end. Knowing this is important if you want to write correct programs. I still mess this up more than I’d like.

multiple dispatch

And only then should you worry about things like:

  • recursion and other algorithm-related topics
  • abstraction, refactoring
  • parametric types
  • functors

And finally (things I still don’t know much about):

  • type stability
  • performance optimization
  • investigating lowered code
  • metaprogramming
  • making packages
7 Likes

I was a biology student, and later learned programming and bioinformatics in a DNA sequencing company, unfortunately without any formal education. I mainly used Perl, touched Javascript (Node.js), Bash and R, learned Python a little. Nowadays I mainly use Julia in my daily work though don’t code much and hard, while often read others’ Bash and R scripts.

In fact, most of bioinformatician don’t code like a professional programmer, and just use scripts to make things done, like automation. All these languages meet the kind of requirement, and even half of knowledge is usually enough.

Why I chose Julia, the main reason the syntax is comfortable to my eyes. The feeling is same as when I learned Node.js and Python. An objective reason is Julia is fast and feasible for computationally extensive tasks, like processing large-scale DNA data, though Python/R very likely already has a package to manage this. That is to say, Julia has the potential of being a great choice when you want to develop a novel tool (or be a developer than user). Last, learning coding in Julia has helped me in computing thinking and reshaped my mind and skill in programming and data analysis. By contrast, I generally don’t think deeply in using other scripts.

I will keep learning and using Julia. BYW, though R looks “ugly and difficult” for me (subjective!), I’d like to improve my R skills constantly as well because as said, I probably see or use R scripts and packages everyday.

2 Likes

@sadish-d I think the “overwhelming amount of resources” is the core of the problem… If I had a programming or some comp sci degree this is easy as the curriculum walks you through everything you need in order and covers all the basis. I think your summary is a majority of what I’ve learned on my own, but I had to piece it together over a long period of time because R and bash don’t really teach you any of that. One that is difficult as you have in at least one place was about when to optimize. Sometimes I get in the weeds or cannot follow all the of the optimization stuff I find online because I really do not understand the compiler and why that’s needed when I figured the compiler should be able to handle those things.

Thank you for this list, I think this is what I was looking for as a rough guide. If I know what to learn I can go to YouTube and find resources online. I’m sure I just need to enroll in an online intro programming course on Udemy or whatever. I should look for a top ranked on and just do it. Thanks!

Hi. I’m a biologist by education and recently finished a book (desktop publishing) about statistics in Julia. So, if you are still looking for a Julia book you may try mine:
Romeo and Julia, where Romeo is Basic Statistics.

The book is also listed at: julialang.org and is available freely under CC BY-NC-SA 4.0

4 Likes