Building some Data analysis Tutorials

clarkfitzg · June 4, 2020, 2:52am

@dlakelan I’m also interested to see what you come up with. I’ve been curious about Julia for many years now, and this summer I may finally have some time to get into it.

dlakelan · June 4, 2020, 3:01am

Thanks!. Right now I have a kind of quick intro tutorial fairly well done in first draft form… I’m now going through and writing the Discussion that goes along with it. The tutorial is supposed to be more of step by step very readable thing with all the decisions made for you… the discussion is all about why did we choose what we did, or what could we have done differently etc. It’s a little more involved. I’m hoping to get those two both in a decent draft form, then throw them up in my git repo and open up some commentary here. Maybe another few days.

dlakelan · June 5, 2020, 3:15am

Ok, those who are interested. See the very bare github repo: GitHub - dlakelan/JuliaDataTutorials: Tutorials For Data Analysis in Julia

You should be able to get notebooks by just running the build.jl script, if you have Weave.jl installed.

The essential format is to split this into a series of Tutorials with paired Discussion. The first one is “BasicDataAndPlots”. Imagine the target audience is a 3rd year undergrad who has at most 1 semester of a computer programming course. The idea is to get them loading some data and producing some plots even if they don’t know how or why, just to see the syntax, maybe play around with it. It should have links to documentation so they can maybe modify plots by reading the discussion.

As things go along it should build to the point where we’re answering more meaningful questions and using more advanced ideas in data analysis, mostly from a Bayesian perspective. I’d like to tackle real world and interesting questions, the kind of thing where the answer isn’t clear, and someone who is interested in the topic could start from these tutorials, and then build a little undergrad or Masters level term-paper type project by further research. For the moment though, it’s just getting started.

Question for @kevbonham, how do I make sure Weave doesn’t try to execute a code block when building notebooks/pdf/html? I’m not clear on the syntax for that.

Having reread it, I can already see that there are some sections I should strip out and push into the Discussion. Also some things I need to add to the Discussion, like how the length units cm and inch work.

Rob_van_Weelderen · June 5, 2020, 8:22am

Doesn’t the code chunk option (Chunk Options · Weave.jl)
eval = false do the job?

kevbonham · June 5, 2020, 1:36pm

Exactly - eval, results, and echo are the ones I use most frequently. They affect whether the code is executed, whether the results are shown, and whether the code is shown, respectively. So

```julia; results=false; echo=false
# this code won't show up in the document, but x is available
x = 2
```
```julia; echo=false
# a results block with `5` will show up in the document, but not the code
x + 3
```
```julia; eval=false
# this code block will show up, but won't be evaluated
x = 5
```
```julia
# both this block, and the results (`2`) will show up
x
```

kevbonham · June 5, 2020, 2:57pm

Very cool! Looks like a good start - let me know if you’d like help setting this up with Documenter to auto-generate pages (can have html pages built and automatically make links to mybinder for running /downloading the notebooks). I’m in the middle of doing something similar for my course, so hopefully won’t be too much additional effort.

dlakelan · June 5, 2020, 4:15pm

Would love to have help using Documenter and getting things into Binder. I have never used either of those. Binder in particular seems extremely useful for this kind of purpose.

In the end, I’m not teaching courses, but I would be very happy to have others who ARE teaching courses to use these resources in their courses. So whatever seems most useful for that target audience we should do.

clarkfitzg · June 7, 2020, 6:12pm

Thanks for sharing! I do teach courses in this area, and I would love to have some more resources to share with undergraduates.

+1 for rendered HTML pages of this content. Being able to browse without installing makes it more accessible.

tlienart · June 7, 2020, 7:15pm

Note that our tutorials are basically rendered script downloadable as notebooks or scripts using Literate + Franklin which might be relevant and more flexible than Weave (I’m biased). I intend to port the R bookdown template over the next few months to Franklin to help people writing series of tutorials present their content

dlakelan · June 14, 2020, 6:27pm

So, for those who are interested in following along. Like many things these days, I’ve been derailed a bit by COVID. Specifically I have a lot of friends and family who want to know what the latest info is on the COVID epidemic, and I was making some PDFs by hand and putting them on my blog each week or so, but I figured, hey, why not give them julia notebooks they can interact with… And I managed to get that into binder etc, It’s been educational, but it’s a work in progress and not very tutorial-like really. In particular I don’t have a discussion document for the COVID stuff because it’s still a work in progress.

I’d like to do a tutorial in which I use Turing to build a Bayesian model of something interesting. Here’s your chance to influence that. What would you like to see modeled? Requirements are:

Publicly available dataset, prefer something not too enormous. Must be an easily readable format (CSV for example). It could involve integrating data from two public sources.
Model shouldn’t require tons of moving parts (so for example the COVID epidemic while very interesting, is a very challenging field, so it’s out). Also shouldn’t be trivially simple (something you could do fine with GLM and a linear or logistic regression with a point estimate).
Should be a topic I have some familiarity with: Economics, Biology, Healthcare, Mechanics/Physics, Civil and Environmental Engineering would be good candidates.

Thoughts?

nilshg · June 15, 2020, 7:34am

Maybe @cpfiffer has some ideas from the econ world that could even end up as Turing tutorials in the docs so we kill two birds with one stone (if you’d be okay with that of course…)

dlakelan · June 15, 2020, 1:05pm

I’m fine with tutorials ending up in docs. I should probably put up explicit licenses. will do that today.

cpfiffer · June 15, 2020, 1:39pm

Check out this issue to see some economics ideas. Happy to put your tutorial up on the sure if that’s what you’re like.

dlakelan · June 15, 2020, 1:50pm

Oh that gave me an idea, do a Bayesian model decomposing a timeseries into a fast and slow component (seasonality).

There are plenty of wiggly timeseries you can easily get from fred.stlouisfed.org

ymh · July 24, 2020, 5:38pm

I’d be interested in ‘beta-reading’ your tutorials: giving feedback before they are published, etc

dlakelan · July 24, 2020, 11:46pm

The repo I’m using is here… I’ve had a bunch of projects and had to put this on hold for the last several weeks. I’ve got some half-baked ones that I haven’t integrated yet, including one where I’m fitting a nonlinear function to seasonally adjust a timeseries.

https://github.com/dlakelan/JuliaDataTutorials

Topic		Replies	Views
Julia for data analysis book Community announcement , book	22	4600	May 6, 2024
Julia vs (R/STATA/Matlab/SAS/Python) for common data analysis tasks (2022 edition) General Usage data	6	2460	September 1, 2022
Teaching data analysis with Julia - what to do about DataFrames and all that? Data	18	4968	November 21, 2016
Julia stats, data, ML: expanding usability Statistics statistics	84	5025	October 14, 2021
Review of presentation Data	12	1637	December 8, 2017

Building some Data analysis Tutorials

Related topics