Does anyone has suggestions for a relatively advanced book on Julia focusing on data science, including statistics and numerical computations? I am thinking to books like An Introduction to Statistical Learning for R or Python Data Science Handbook for Python…
The issue is the DS scene is not as well developed as R’s or Python’s.
…hmmm… I don’t think I do agree :-)… In relative terms to the community size, I believe DS is pretty common in Julia…
I really hate to do this because the authors are clearly very knowledgeable and capable but I can say that I personally did not find the McNicholas/Tait book Data Science with Julia to be very helpful. To be fair, I did not work through the whole book (but that’s because I found it to be so unhelpful). Your mileage may vary but, for me personally, it didn’t really advance my DS or Julia abilities.
I have a couple of free .pdf drafts that (due to the fact they’re free) I can recommend:
Statistics With Julia:
Applied Linear Algebra (with Julia companion):
I’ve also taken a couple of the machine learning courses on JuliaAcademy.com and my opinion of those is just neutral.
I think the Julia community is really in dire need of a good online data science course like Jose Portilla’s Udemy.com Python for Data Science course. I’ve not taken the course but it is highly rated and I’ve seen it referred to in glowing fashion on a variety of forums. I’ve personally taken numerous Node.js/Vue.js/web development courses on Udemy and found them to be absolutely fantastic (I would have been willing to pay much more than I actually had to pay).
I think there is a real opportunity for the right person/people in the Julia community to get together and make a good data science course and make some money in the process (I have no idea how profitable these Udemy courses are). I personally would be first in line to sign up and take the course!
In absolute terms, imo Julia has potential but is under developed.
This is person exp. I was commissioned to write a Julia course and I think many of DS packages do not work with DataFrames.jl nor tables and instead requires matrix input. This makes harder to do DS. Also, things like CCA isn’t as well documented. Mlj.jl is promising and improving fast but needs more documentation and simpler quick start guides.
I highly recommend “Statistical Rethinking” if you’d like a good intro to develop your intuition for the Bayesian approach to statistical analysis.
The code examples don’t use Julia, but you’d probably want to code them up yourself anyway. There is also a 3rd party repo with some of the book’s examples written in Julia:
I do think something like the new QuantEcon data science lectures ( https://datascience.quantecon.org/ ) would be great for Julia, but so far it’s only in Python. They provide their economics lectures in both Python and Julia, so perhaps this is coming in the future?
Not sure if I agree with that. I see why you would want to use DataFrames for EDAs. However,
Plots.jl is compatible with them already. For prediction / classification it is often less efficient to work with DataFrames as they would require to be converted into Arrays (when calculations get messy).
In my opinion, having Arrays as an input is actually a good idea. For simplicity, I could see wrappers to convert DataFrames into Arrays upfront (within packages for DS).
We have different perspectives. I like DataFrame like into because the variable name is important and matrix form loses that info. Converting to matrix is unintuitive to me and makes every thing into numeric. I want factor types and string type support
I’m not sure this will be what you’re looking for, but might be worth a glance.
I’m teaching a master’s level data science course at Brown University and have been developing the content for the course in Julia. I have exposition presented on a website I’m calling Data Gymnasia, Jupyter notebooks with problems that we do in class, and videos which walk through solutions of the problems. You can see a high level overview of the topics covered in this cheatsheet.
One major caveat is that the course is aimed more at developing facility with the mathematical ideas than proficiency in the Julia data science ecosystem. The students take other courses in which they learn data wrangling, etc., using Pandas and ScikitLearn. For this reason, this course is not analogous to the aforementioned Udemy course for Python.
This looks amazing, thanks for sharing!
Getting a DRAFT copy of this book isn’t that hard. It is not even protected
@sswatson that’s quite an impressive cheatsheet there! Impressive collection of notebooks in a coherent arrangement, many thanks for sharing with us!
@StevenSiew appreciate the nudge to the latest 2nd ed., had an earlier version. McElreath has become my go-to for expanding stats knowledge and understanding - interesting context for examples, fun video series, great ref material (and up to date), plus code to go along…