Introduction to Probability for Data Science: collaborative effort to translate code to Julia

There is a collaborative effort underway to translate to Julia the Python and Matlab code that accompanies Stanley H. Chan’s textbook Introduction to Probability for Data Science. This effort is authorized by Dr. Chan. The textbook is free to download as a pdf. Please see the page Introduction to Probability for Data Science for more information about the book.

My opinion is that this is a very nice probability and statistics textbook for undergraduate students of data science, and that it has the potential to be very widely used.

The code to be translated is quite basic (for the most part). This is an opportunity to participate in helping to make Julia visible to many people, even if your experience with Julia programming is not that great.

The code to be translated is in the Code and Data section of the web page mentioned above, at the bottom left.

If you are interested in translating the code for a chapter, please say so here, and I will edit this top message to keep track of who is working on which chapter. Please sign up only if you can get the work done in a couple of weeks, at the most.

Please make pull requests for completed work to https://github.com/mcreel/IntProbDS.jl

Regarding style, I suggest to follow the style used in the Julia code for Ch. 1, and the style of the Python and Matlab code: self contained blocks that can be pasted into the REPL, and which will run independently of other blocks.

Ch. 1: completed - mcreel
Ch. 2: no code for Ch. 2
Ch. 3: completed - vinicius_de_lima
Ch. 4: completed - rafaelchp
Ch. 5: completed - Elias Carvalho
Ch. 6: completed - j_verzani
Ch. 7: completed . Paul_Soderlind
Ch. 8: completed - mcreel
Ch. 9: completed - mcreel
Ch. 10: congUoM

39 Likes

That’s a wonderful initiative! I would like to contribute with the translation of the Ch. 3.

5 Likes

Excellent initiative!
Hi, my name is Elias Carvalho and I would like to contribute with the translation of chapter 5.

5 Likes

Excellent! Thanks to both of you. Please feel free to open issues, etc., if you have suggestions about organization, style, etc.

2 Likes

Code for Chapter 8 (estimation) is done.

1 Like

Before I sign up, could I ask whether you expect the code to be (mostly) bottom up or to introduce various Julia packages? (I was thinking of Ch 7 in particular.) /Paul S

I’ve done Ch1 and Ch8 trying to use a style similar to the code on the book’s web page, so that instructors who decide to use Julia will be able to follow the text without difficulties. I do take advantage of things like broadcasting, and I have made use some use of specialized packages like ImageView.jl, and considerable use of Distributions.jl, but it has been pretty easy to stay close to the Matlab and Python code. However, please work the way you wish, if you decide to go ahead. Getting an initial version done is the main thing. We can try for some uniformity later, if it seem necessary.

Paul: Ch7 uses \ for OLS fit in the Python code, and that seems natural to use for Julia as well, instead of a package. However, for linear programming (almost certainly), Lasso (probably), and perhaps Legendre polynomials, which I don’t know about, a package could be the way to go.

One doubt I have is about leaving the code at global score, or wrapping in a function. Given that the target audience is people learning about probability for doing data science, there’s the desire to communicate the theory, but also to become a data scientist, where performance of code is a consideration. Wrapping in a function is a departure from the Matlab and Python code, but probably it should be done to teach good Julia habits. Any opinions on that?

However, for linear programming (almost certainly), Lasso (probably), and perhaps Legendre polynomials, which I don’t know about, a package could be the way to go.

OK, I’ll give ch. 7 a try then.

Yes, some functions should perhaps be sneaked in here and there.

1 Like

Great, thanks!

Chapters 4 and 6 still available at the moment. These ones are reasonably small projects…

Hello, I would like to contribute with chapter 4.

2 Likes

Great! Thanks!

I’m sorry, I didn’t noticed that StatsPlots was removed from the project dependencies. I think it would be particularly useful in Chapter 4. Should I try to reproduce the plots in the same way that the ones from MATLAB? For instance, defining the support vector and evaluating the pdf at those points.

Please go ahead and use StatsPlots if it adds things that are needed that aren’t in Plots. Apparently, the most recent StatsPlots will use an older version of Plots, but I don’t think that the difference will impact the code we’re working with.

1 Like

Hi, I can contribute to chapter 10.

1 Like

Excellent! Thanks!

I had been looking over Ch 7 to make sure it was doable, but see that was assigned. I can do Ch 6 if still open.

As for Ch 7, I was using SpecialPolynomials for the Legendre polynomials, but if dependencies are an issue, this function might be helpful. It evaluates Legendre polynomials via Clenshaw reduction:

function legendre(cs, x)
    N = length(cs)
    R = typeof(one(x)*cs[1]/1)
    p₀ = one(R)
    N == 0 && return zero(R)
    N == 1 && return cs[1] * p₀

    Δ0::R = cs[end - 1]
    Δ1::R = cs[end]

    @inbounds for i in (N - 1):-1:2
        An,Bn,Cn = (2i-1)/i, zero(R), (i-1)/i
        Δ0, Δ1 = cs[i - 1] - Δ1 * Cn, Δ0 + Δ1 * muladd(x, An, Bn)
    end
    p₁ = muladd(x, one(R), zero(R))
    return Δ0 * p₀ + Δ1 * p₁
end
# for a single basis
legendre(i::Int, x) = legendre(ntuple(j -> (j == i+1) ? 1 : 0, i+1), x)

For ch 07, I have created a rough draft. You find it at https://github.com/mcreel/IntProbDS.jl/pull/2#issuecomment-943635817

There is a lot of LegendrePolynomials.jl in that code. Please have a look and let me know what you think.

Yes, Ch6 is still open, so I will put your name there. Thanks!