Building some Data analysis Tutorials

Weave.jl is like R markdown. So the document is markdown, and code is fenced, like:

# Here's a Header

Some text with **bold**

```julia
f(x) = x^2 + 4x + 2
f(4)
```

With Literate, the file is a julia file, and the explanations are in comments.

# # Here's a header
# 
# Some text with **bold**

f(x) = x^2 + 4x + 2
f(4)

I really like that in Literate, you can specify certain lines to only show up in Notebook exports,
or only show up in Markdown exports, and I like the fact that the file is a runable julia script (though with weave, you can export as a script). It’s also designed by one of the main contributors to Documenter, and so has a lot of nice features allowing the markdown export of Literate to play really nice with Documenter.

The major downside of Literate IMO is that there isn’t a great deal of tooling for things like Atom or VS code. So the markdown isn’t syntax highlighted, and when you write markdown with a lot of linebreaks as I do, it’s annoying to have to add the comment marker on every line (or the #nb # if you want a notebook-filtered line) etc.

One of the benefits of Weave in its own right is that there are a lot of options for code blocks, like hiding the output of a cell (or hiding the code and only showing the output).

I tend to use Literate when my thing is code-heavy, when I want to run it as a script, or I want to use it with Documenter. I use Weave when there’s a lot of explanatory stuff or when I need more control over my code fences.

For what it sounds like you want to do, I’d probably recommend Weave, but only like 65/35. Hope this helps!

4 Likes

Your reasoning is sound, and makes sense in my use case. My documents will be probably at least 50% explanation, and commenting everything would be irritating I think. Plus I’m familiar with Rmd so I’ll probably go with Weave. Thanks!

1 Like

Three things:

  1. Check out the Julia for Data Science YouTube series that’s on the official Julia channel. A link to the first video in the series is here.

  2. Check out mybinder.org for making your notebooks fully executable in the browser, without the user having to download anything. I created a very basic, intro to Julia notebook (specifically for colleagues of mine) that I have running on Binder so you can see what that looks like here: https://mybinder.org/v2/gh/mthelm85/Intro-to-Julia/master

  3. I’ve been very frustrated at the lack of good data analysis/data science content on the web that relies on Julia. There are loads of great courses on a variety of online learning sites that make use of R/Python but almost nothing for Julia. I would be happy to contribute to this project and would be interested in linking up with you to share thoughts/organize an outline for topics to cover.

2 Likes

Yes, it’s very frustrating for someone who knows a bunch about data analysis, in say R or Python, but wants to move to Julia and get up to speed at their former level of knowledge. So I’m hoping to alleviate that and also teach a bit about data analysis.

I am very happy to partner on this. I am really just getting started on the project though. How about I PM you on the forum here, and we can discuss some ideas there, and then feed the more fully formed ones back into this thread?

Sounds great!

Feel free to loop me in on this too. I’m developing a course right now (starting next week :scream:) so may not be super available. But I’m still coming up with assignments, so there may be some mutually beneficial work to be done. My stuff will largely be biology focused, but I was planning to work with some covid datasets, so there may be broader interest

2 Likes

Nice. I have worked with biologists quite a bit over the years. What sort of topics are you working on?

I am in the process of writing the first of these tutorials, it basically downloads a public Census dataset, munges it, and makes a variety of plots to answer very basic questions about the data. Once that’s in a viable form I’ll put a git repo up on github and mention it here, we can discuss how to build on that foundation in different directions.

I think the “learn by doing” with not too much excess explaining is powerful. I do like to explain, so I’m thinking of having a companion to each tutorial that’s a discussion of why things were done, and why other things weren’t done etc.

I’m also interested in this process. I work for the state of California and I use Julia for some basic data analysis and manipulation. I have some scripts that access some pretty comprehensive database about pesticide use in the state and have been meaning to learn more about the process, but also share some of the stuff I know. @mthelm85 for example, helped me in the past to do some mapping using VegaLite and I did a scientific presentation with that.

Please do not hesitate to ping me or message me.

2 Likes

I currently study the human microbiome (in kids, looking at relationships with cognitive development). The course will include sequence analysis, using web APIs for biological datasets, phylogenetics and a bunch of other stuff

1 Like

Do you do sequence analysis in Julia btw? What tools are there for this kind of thing?

BioSequences.jl and other stuff in BioJulia, mostly. I don’t do so much of this at the moment, and for this course I plan to do very basic things with strings mostly, or have them implement stuff themselves.

@dlakelan I’m also interested to see what you come up with. I’ve been curious about Julia for many years now, and this summer I may finally have some time to get into it.

Thanks!. Right now I have a kind of quick intro tutorial fairly well done in first draft form… I’m now going through and writing the Discussion that goes along with it. The tutorial is supposed to be more of step by step very readable thing with all the decisions made for you… the discussion is all about why did we choose what we did, or what could we have done differently etc. It’s a little more involved. I’m hoping to get those two both in a decent draft form, then throw them up in my git repo and open up some commentary here. Maybe another few days.

2 Likes

Ok, those who are interested. See the very bare github repo: https://github.com/dlakelan/JuliaDataTutorials

You should be able to get notebooks by just running the build.jl script, if you have Weave.jl installed.

The essential format is to split this into a series of Tutorials with paired Discussion. The first one is “BasicDataAndPlots”. Imagine the target audience is a 3rd year undergrad who has at most 1 semester of a computer programming course. The idea is to get them loading some data and producing some plots even if they don’t know how or why, just to see the syntax, maybe play around with it. It should have links to documentation so they can maybe modify plots by reading the discussion.

As things go along it should build to the point where we’re answering more meaningful questions and using more advanced ideas in data analysis, mostly from a Bayesian perspective. I’d like to tackle real world and interesting questions, the kind of thing where the answer isn’t clear, and someone who is interested in the topic could start from these tutorials, and then build a little undergrad or Masters level term-paper type project by further research. For the moment though, it’s just getting started.

Question for @kevbonham, how do I make sure Weave doesn’t try to execute a code block when building notebooks/pdf/html? I’m not clear on the syntax for that.

Having reread it, I can already see that there are some sections I should strip out and push into the Discussion. Also some things I need to add to the Discussion, like how the length units cm and inch work.

2 Likes

Doesn’t the code chunk option (http://weavejl.mpastell.com/stable/chunk_options/#Chunk-Options-1)
eval = false do the job?

1 Like

Exactly - eval, results, and echo are the ones I use most frequently. They affect whether the code is executed, whether the results are shown, and whether the code is shown, respectively. So

```julia; results=false; echo=false
# this code won't show up in the document, but x is available
x = 2
```
```julia; echo=false
# a results block with `5` will show up in the document, but not the code
x + 3
```
```julia; eval=false
# this code block will show up, but won't be evaluated
x = 5
```
```julia
# both this block, and the results (`2`) will show up
x
```
1 Like

Very cool! Looks like a good start - let me know if you’d like help setting this up with Documenter to auto-generate pages (can have html pages built and automatically make links to mybinder for running /downloading the notebooks). I’m in the middle of doing something similar for my course, so hopefully won’t be too much additional effort.

1 Like

Would love to have help using Documenter and getting things into Binder. I have never used either of those. Binder in particular seems extremely useful for this kind of purpose.

In the end, I’m not teaching courses, but I would be very happy to have others who ARE teaching courses to use these resources in their courses. So whatever seems most useful for that target audience we should do.

1 Like

Thanks for sharing! I do teach courses in this area, and I would love to have some more resources to share with undergraduates.

+1 for rendered HTML pages of this content. Being able to browse without installing makes it more accessible.

Note that our tutorials are basically rendered script downloadable as notebooks or scripts using Literate + Franklin which might be relevant and more flexible than Weave (I’m biased). I intend to port the R bookdown template over the next few months to Franklin to help people writing series of tutorials present their content

5 Likes