Julia for Education: Package Selection

This thread is a continuation of discussions about using Julia for education in this thread and this thread. Some people, including myself, expressed the opinion that more clarity around the package ecosystem would make it easier to teach classes with a Julia programming component.

The two big questions in my mind are:

  1. What packages should we suggest that students use? My experience (from teaching in another language) is that guiding students towards a fixed set of libraries/packages makes life simpler for them, because they don’t have to search around and figure out how to choose between several options. It also makes life simpler for me, the instructor, to officially support only a few chosen options. I try to choose the ones that easiest to use.

  2. What features are missing from the “standard” packages, or the ecosystem as a whole, that are necessary for teaching certain classes? Is anyone in a position where they’d like to teaching with Julia, but are prevented from doing so because certain functionality is missing?

Regarding the first question, a good starting point is the list of packages in the SciML documentation. [Post updated June 21, so many replies below no longer make sense…]

Regarding the second question, a previous post expressed interest in a simple data loading library.

I also wonder whether the current packages are fully featured enough to teach a graduate course in econometrics or causal inference. (In particular, I would love to dethrone Stata…) However, in a pinch students can always call a relevant R library from Julia, so it’s not an insurmountable problem.

5 Likes

That list is old and came before some of our newer libraries. It now tends to look like:

Problem type Associated Julia packages
Plotting Plots
Linear system / least squares LinearSolve
Sparse matrix SparseArrays
Interpolation/approximation DataInterpolations, ApproxFun
Polynomial roots Polynomials
Rootfinding NonlinearSolve
Finite differences FiniteDifferences, FiniteDiff
Integration Integrals
Optimization Optimization
Initial-value problem DifferentialEquations
Boundary-value problem DifferentialEquations
Method of lines MethodOfLines (still under development)
Automatic Differentiation ForwardDiff, Enzyme, DiffEqSensitivity
Fast Fourier Transform FFTW
Acausal Modeling / DAEs ModelingToolkit
Symbolic CAS Symbolics

Changes:

  • If you use LinearSolve.jl, IterativeSolvers, Preconditioners, etc. are abstracted away. Besides, you don’t actually want to use IterativeSolvers, and Preconditioners is missing most preconditioners, etc. I’ll have a whole JuliaCon talk on this.
  • QuadGK is only one dimensional and doesn’t handle infs automatically. Nor is it differentiable with reverse mode. Integrals handles of those details. It still needs a bit of work, but it at least automates a lot of what’s needed.
  • NLsolve can have odd performance issues here and there (see the performance difference against NonlinearSolve Porting from Python optimize.broyden2 - #4 by ChrisRackauckas), so NonlinearSolve instead. NonlinearSolve is also differentiable by default. It still needs to get an upgrade to its linear solver interface though, in the near future it will match DiffEq in how it uses LinearSolve.jl as its linear solver.
  • I intend to add an interface for EigenvalueProblem to LinearSolve as well, so that it similarly has a unified interface for linear eigenvalue problems. I don’t think any student should ever have to learn what Arpack.jl is, that seems like a kludge to me.
  • DataInterpolations.jl for splines. Again, automatic differentiation support, more algorithms, etc.
  • Acausal and causal modeling is very important for engineering students. This is Modelica and Simulink stuff, now ModelingToolkit is what we’re incorporating into the classroom. Very early classroom experiments at this point.
  • Symbolics.jl for the CAS part just added to the list, since it’s used all over.
  • I think any modern scientific computing course needs to teach automatic differentiation and adjoint methods.

And we have the refocused The SciML Open Souce Software Ecosystem · SciML mirroring that story, designed with examples which integrate all of these libraries into full workflows.

Mainly, NonlinearSolve.jl missing LinearSolve integration to fully show off preconditioned Newton-Krylov for nonlinear systems, Integrals just needs a bit of work in general (just like 1 week though) for polish, MTK is still on the new side, DataInterpolations.jl needs multi-dimensional interpolations, and we need to put doctests on everything. Then we need to fix our tutorials generation server, put out another series of video tutorials, and are finishing up some docstrings. That’s part of the summer plan that we have going with our various developer and student programs.

21 Likes

Oh wow, I’m a student and have been using Julia for over a year now and I found out about quite a few new packages and old wrong ones I was using like QuadGK. Thank y’all!

4 Likes

Old and wrong is probably a bit harsh of a way of putting it. Instead I’d say, we’ve been renewing the Julia ecosystem over the last year to ensure that it have a common API across scientific computing, lives up to the performance and correctness standards of SciML, has unified error messages across the board, and has the features that we see are generally required in 2022 (GPU compatibility, automatic differentiation, etc.). We haven’t fully achieved it yet (Integrals and NonlinearSolve are probably the two most sore points right now), but it’s been a really big push by a lot of individuals over the last year.

9 Likes

Documentation for many of the new SciML-verse meta-packages is still pretty bare-bones, so it seems foolhardy to recommend them for instruction at this point (unless you’re willing to field a large number of student queries). I’ll happily take a well-documented minimal interface over a nominally-universal but functionally-undocumented meta-package.

Unless you’re only dealing with 1D domains, DataInterpolations should probably be replaced with Interpolations.jl or trusty ol’ Dierckx.jl. I understand that work is ongoing to bring the new packages up to standard, but let’s not count those eggs before they hatch.

4 Likes

Which one do you think needs more? I’ll add more where you think it’s thin. We more than quadrupled the documentation size last month (and added docstrings everywhere) so please describe it in terms of the the current dev versions. The only one that I know of that needs a lot more added is Integrals.jl, but of course my eyes aren’t a fresh user’s eyes so please be more specific. For reference, current WIP overall docs are here: The SciML Open Souce Software Ecosystem · SciML

No, DierckX.jl and Interpolations.jl miss some of the most important spline types like regression splines. I wouldn’t consider them sufficient for most work in 1D. Not using regression splines is an implicit assumption that there is no noise in the dataset, and teaching students there’s never noise in real data is :sweat_smile: not my style at least. Yes, DierckX.jl has 2D and Interpolations.jl has ND, but they are still missing a lot of the support to make them usable in-practice (AD issues for example). That gives them all hard edges at this point.

3 Likes

I’ll have to take some time over the weekend to go through the dev docs in detail. Regarding interpolations, I think we’re just approaching this from different domains - I don’t need to autodiff a smoothed interpolated 1D dataset (?), but scarcely a day goes by without reaching for N-D interpolation. If there’s noise in my data, I want to handle it prior to interpolation with a domain-appropriate, sensor-physics-informed filtering technique instead of steamrolling it with a smoothing kernel.

2 Likes

Thanks. There seems to be lots of very fine packages there.

Still, I am slightly worried about “super-packages” as they (I assume) have lots of dependencies, and could therefore be brittle (break one and you break them all…). Am I wrong in this worry?

2 Likes

They don’t. They don’t depend on the subpackages. Instead they are brought in on-demand by the user: using Integrals doesn’t give you Cuba.jl, using IntegralsCuba does. So if Cuba.jl’s binary is an issue for your installation, Integrals.jl is fine, you just use other integrators. If someone gives you a code that uses Integrals.jl with a Cuba solver, you swap one line of code and now the dependency is gone and you move on. It’s just like how OrdinaryDiffEq.jl has no dependence on Sundials: the interface is separate from the method instantiations.

2 Likes

Hello

I was wondering if there’s a preferred package to integrate an array? I’m just using Trapz.jl right now, is there something else I should be using? I didn’t want to start a new thread since then this information will be in this tread only.

Thanks for the help all!

Dierckx.jl does have smoothing splines. I’m unsure if the library is thread-safe though, and importantly it’s not being developed anymore, so it’ll not modernize

1 Like

Oh cool, can you demonstrate it?

julia> using Dierckx

julia> using PyPlot

julia> x = 1:10;

julia> y = x.^2 .+ 5 .*x.*rand.();

julia> s1 = Spline1D(x, y, s = sum(abs2, y)*1e-2);

julia> plot(x, y, ".");

julia> plot(x, s1(x), label="1e-2");

julia> s2 = Spline1D(x, y, s = sum(abs2, y)*1e-4);

julia> plot(x, s2(x), label="1e-4");

julia> legend(loc="best");

The level of smoothing may be provided through the parameter s, which, in general, may be estimated from the standard deviation of the data. In case no weights are available, this has to be estimated through trial and error, and I’ve found fractions of sum(abs2, y) to work well enough. Choosing s=0 returns an interpolating spline.

smoothingsplinefit

2 Likes

What do you mean by integrating an array? An array is discrete data. If you have sampled a function in a discrete set of points, then you would need to build a quadrature rule (i.e. find corresponding weights) and sum. In what kind of points are you sampling your function?

Well, my specific case is calculating the integrated time displaced autocorrelation function. So I simulate the Ising model, get magentization data as a function of time, get the autocorrelation of this magentization time series, and then integrate that autocorrelation data in the form of an array, currently using Trapz.jl.

Ok, so I assume you have a smooth function sampled at uniformly distributed points. For this problem, you can use any composite Newton-Cotes rule, which will work well up to a degree of around 8 (more than that and you get errors due to Runge phenomena). The Trapezoidal rule is a particular case of order 2 for non-periodic integrands, so in theory you could do better.

I’m not sure if there is a Newton-Cotes package in Julia (it would be just a few lines anyway)…

1 Like

This table is great, thanks!
Having this at the beginning of the landing page of the SciML documentation would be helpful. Maybe indicated which packages are part of the SciML ecosystem.

The current overview under “Domains of SciML” is already a bit too detailed to start with.

1 Like

Thanks for the suggestion! I’ll try adding it later today.

4 Likes

Shameless self-promotion: there is also our ND-smoothing interpolation tool DIVAnd allowing for noisy data and some additional physical constraints.

2 Likes

Hello. I was wondering if DataInterpolations.jl is the right package for basic curve fitting. It does have a curve fitting method but I couldn’t find a documentation page and I can’t seem to find a method to give error estimates on the fitted parameters. I’ve been using LsqFit.jl and it does give all of that and more. I’m a bit confused which one I should be using.