So I’m starting this thread with the idea that people could come here with some example data analysis problems and Julia users can reproduce something like them here to give people a sense of how easy or difficult it is to accomplish common tasks.
I don’t want this to become “do my research project for me” of course, but if you’ve worked on a project in another language and want to see how you’d accomplish something similar in Julia, propose a project, preferably with links to a github or a blog page or etc with the code from the other language. Let’s use this as a way to get some simple hand-holding type howtos for data analysis in Julia.
What’s in scope:
Reading datasets in various common formats
Munging data from multiple sources into a particular data set structure / calculating statistics
Plotting various common 2D type plots: scatter, line, histogram, density, small-multiples of each, etc
Fitting regression models
Running simulations of some sort based on data (random number generation, dynamic processes etc)
Optimization / decision making based on data.
Not in scope:
Open research problems
Do my homework for me
Reproducing the output of proprietary software/algorithms etc where the method isn’t public
Lots of precise tweaking of the output of graphs to get very precise visual results / exactly reproduce the output of another piece of software.
To make this really doable the problems should be the kind of thing you’d expect a grad student to be able to do in an afternoon or two. Keep it reasonably scoped.
I guess I’ll link to this repo, where Phil Price and I started comparing some tasks in R vs Julia. We never really finished this project. In particular, I wrote a fairly trivial looping simulation in Julia and Phil tried to come up with something performant in R and eventually gave up because both it’s not trivial and also he had lots of real world things to do.
I think the Olympics example shows how trivial stuff in R is still trivial in Julia.
The simulation.jl shows how trivial stuff in Julia may well be nearly impossible in R
The COVID example is unfinished I think. But still of interest.