Building some Data analysis Tutorials

Three things:

  1. Check out the Julia for Data Science YouTube series that’s on the official Julia channel. A link to the first video in the series is here.

  2. Check out mybinder.org for making your notebooks fully executable in the browser, without the user having to download anything. I created a very basic, intro to Julia notebook (specifically for colleagues of mine) that I have running on Binder so you can see what that looks like here: https://mybinder.org/v2/gh/mthelm85/Intro-to-Julia/master

  3. I’ve been very frustrated at the lack of good data analysis/data science content on the web that relies on Julia. There are loads of great courses on a variety of online learning sites that make use of R/Python but almost nothing for Julia. I would be happy to contribute to this project and would be interested in linking up with you to share thoughts/organize an outline for topics to cover.

2 Likes

Yes, it’s very frustrating for someone who knows a bunch about data analysis, in say R or Python, but wants to move to Julia and get up to speed at their former level of knowledge. So I’m hoping to alleviate that and also teach a bit about data analysis.

I am very happy to partner on this. I am really just getting started on the project though. How about I PM you on the forum here, and we can discuss some ideas there, and then feed the more fully formed ones back into this thread?

Sounds great!

Feel free to loop me in on this too. I’m developing a course right now (starting next week :scream:) so may not be super available. But I’m still coming up with assignments, so there may be some mutually beneficial work to be done. My stuff will largely be biology focused, but I was planning to work with some covid datasets, so there may be broader interest

2 Likes

Nice. I have worked with biologists quite a bit over the years. What sort of topics are you working on?

I am in the process of writing the first of these tutorials, it basically downloads a public Census dataset, munges it, and makes a variety of plots to answer very basic questions about the data. Once that’s in a viable form I’ll put a git repo up on github and mention it here, we can discuss how to build on that foundation in different directions.

I think the “learn by doing” with not too much excess explaining is powerful. I do like to explain, so I’m thinking of having a companion to each tutorial that’s a discussion of why things were done, and why other things weren’t done etc.

I’m also interested in this process. I work for the state of California and I use Julia for some basic data analysis and manipulation. I have some scripts that access some pretty comprehensive database about pesticide use in the state and have been meaning to learn more about the process, but also share some of the stuff I know. @mthelm85 for example, helped me in the past to do some mapping using VegaLite and I did a scientific presentation with that.

Please do not hesitate to ping me or message me.

2 Likes

I currently study the human microbiome (in kids, looking at relationships with cognitive development). The course will include sequence analysis, using web APIs for biological datasets, phylogenetics and a bunch of other stuff

1 Like

Do you do sequence analysis in Julia btw? What tools are there for this kind of thing?

BioSequences.jl and other stuff in BioJulia, mostly. I don’t do so much of this at the moment, and for this course I plan to do very basic things with strings mostly, or have them implement stuff themselves.

@dlakelan I’m also interested to see what you come up with. I’ve been curious about Julia for many years now, and this summer I may finally have some time to get into it.

Thanks!. Right now I have a kind of quick intro tutorial fairly well done in first draft form… I’m now going through and writing the Discussion that goes along with it. The tutorial is supposed to be more of step by step very readable thing with all the decisions made for you… the discussion is all about why did we choose what we did, or what could we have done differently etc. It’s a little more involved. I’m hoping to get those two both in a decent draft form, then throw them up in my git repo and open up some commentary here. Maybe another few days.

2 Likes

Ok, those who are interested. See the very bare github repo: https://github.com/dlakelan/JuliaDataTutorials

You should be able to get notebooks by just running the build.jl script, if you have Weave.jl installed.

The essential format is to split this into a series of Tutorials with paired Discussion. The first one is “BasicDataAndPlots”. Imagine the target audience is a 3rd year undergrad who has at most 1 semester of a computer programming course. The idea is to get them loading some data and producing some plots even if they don’t know how or why, just to see the syntax, maybe play around with it. It should have links to documentation so they can maybe modify plots by reading the discussion.

As things go along it should build to the point where we’re answering more meaningful questions and using more advanced ideas in data analysis, mostly from a Bayesian perspective. I’d like to tackle real world and interesting questions, the kind of thing where the answer isn’t clear, and someone who is interested in the topic could start from these tutorials, and then build a little undergrad or Masters level term-paper type project by further research. For the moment though, it’s just getting started.

Question for @kevbonham, how do I make sure Weave doesn’t try to execute a code block when building notebooks/pdf/html? I’m not clear on the syntax for that.

Having reread it, I can already see that there are some sections I should strip out and push into the Discussion. Also some things I need to add to the Discussion, like how the length units cm and inch work.

2 Likes

Doesn’t the code chunk option (http://weavejl.mpastell.com/stable/chunk_options/#Chunk-Options-1)
eval = false do the job?

1 Like

Exactly - eval, results, and echo are the ones I use most frequently. They affect whether the code is executed, whether the results are shown, and whether the code is shown, respectively. So

```julia; results=false; echo=false
# this code won't show up in the document, but x is available
x = 2
```
```julia; echo=false
# a results block with `5` will show up in the document, but not the code
x + 3
```
```julia; eval=false
# this code block will show up, but won't be evaluated
x = 5
```
```julia
# both this block, and the results (`2`) will show up
x
```
1 Like

Very cool! Looks like a good start - let me know if you’d like help setting this up with Documenter to auto-generate pages (can have html pages built and automatically make links to mybinder for running /downloading the notebooks). I’m in the middle of doing something similar for my course, so hopefully won’t be too much additional effort.

1 Like

Would love to have help using Documenter and getting things into Binder. I have never used either of those. Binder in particular seems extremely useful for this kind of purpose.

In the end, I’m not teaching courses, but I would be very happy to have others who ARE teaching courses to use these resources in their courses. So whatever seems most useful for that target audience we should do.

1 Like

Thanks for sharing! I do teach courses in this area, and I would love to have some more resources to share with undergraduates.

+1 for rendered HTML pages of this content. Being able to browse without installing makes it more accessible.

Note that our tutorials are basically rendered script downloadable as notebooks or scripts using Literate + Franklin which might be relevant and more flexible than Weave (I’m biased). I intend to port the R bookdown template over the next few months to Franklin to help people writing series of tutorials present their content

5 Likes

Franklin also allows your content to be published online very easily. That’s a big plus. :wink:

1 Like

So, for those who are interested in following along. Like many things these days, I’ve been derailed a bit by COVID. Specifically I have a lot of friends and family who want to know what the latest info is on the COVID epidemic, and I was making some PDFs by hand and putting them on my blog each week or so, but I figured, hey, why not give them julia notebooks they can interact with… And I managed to get that into binder etc, It’s been educational, but it’s a work in progress and not very tutorial-like really. In particular I don’t have a discussion document for the COVID stuff because it’s still a work in progress.

I’d like to do a tutorial in which I use Turing to build a Bayesian model of something interesting. Here’s your chance to influence that. What would you like to see modeled? Requirements are:

  1. Publicly available dataset, prefer something not too enormous. Must be an easily readable format (CSV for example). It could involve integrating data from two public sources.
  2. Model shouldn’t require tons of moving parts (so for example the COVID epidemic while very interesting, is a very challenging field, so it’s out). Also shouldn’t be trivially simple (something you could do fine with GLM and a linear or logistic regression with a point estimate).
  3. Should be a topic I have some familiarity with: Economics, Biology, Healthcare, Mechanics/Physics, Civil and Environmental Engineering would be good candidates.

Thoughts?

1 Like