Looking for use cases of Julia in computational and data science research with astrophysics focus

Hi all,

I am a PhD student studying astrophysics. I am very interested in Julia and have been using it in some side projects on computational and data science. I hope to use Julia in astronomy research so I would appreciate your suggestions and inspiration. I have two main research directions. One is in numerical simulations, and the other is in the analysis of large sky surveys. Specifically, the simulation framework has already been written in C/C++ as is most of the high-performance simulation software. I am struggling with using Julia in my research for these tasks. I am a heavy-Python user, so I could technically try to replace Python with Julia. However, it could be quite difficult to switch if the Python framework has already been well-established and optimized (i.e., the Astropy package). I want to avoid reinventing the wheel.

I am aware of some available packages in JuliaAstro Hub, and some of them are very useful but I still feel like people are trying to rebuild the basic framework that is already available in Python. Undoubtedly, building the fundamentals is the first step in research. Still, I am considering whether there are other ways to utilize Julia’s strength (multiple dispatch, JIT for performance, more functional-oriented, etc.).

More broadly, I am looking for the use cases of Julia in high-performance computing and data science applications. I’d appreciate it if you could share your Julia usage in astrophysics. Specific use cases in other fields are welcome, too. Thanks a lot.

2 Likes

I think @MilesCranmer is your human!

1 Like

Have you considered using Julia on top of established Python packages with PyCall/PythonCall?

1 Like

Thanks for the suggestions. Yes, I have considered using PyCall. My impression is that PyCall always has some overheads when calling python and converting results back to Julia. Therefore, it seems that PyCall is useful when the package is in Python, and there is no Julia substitution, and it is unavoidable. If one can use Python/Julia natively, why wouldn’t one do so? I am especially interested in Julia-native applications because one huge benefit/selling point of Julia’s ecosystem is that most of Julia’s packages are written in Julia itself, which will enhance the readability and interoperability of the code.

I think that Julia will be most useful if you want to compute something slightly different from what some well stablished package does. I know for example this work, in which the researchers needed to compute some new pairwise properties of galaxy distributions. Julia served them well, I think, and the computations were quite massive (8M particles, with quite large cuttofs).

1 Like

There is a reason for rewriting AstroPy, or as you say “reinventing the wheel”, in a new language such as Julia. The first reason is that Julia is much more composable than Python. The second is that the code is more concise, while being higher performance. These features allow you to easily correct poor design decisions in order to greatly improve the performance of your code. For example, the X-ray spectral analysis package Xspec is written in C++ with a Python wrapper. It needs to be modified to handle the high resolution X-ray spectral mission Xrism. The redistribution matrix file (RMF) in Xspec is assumed to be a dense matrix, which for Xrism means GB sized files. However, the RMF matrix is actually sparse. If Xspec was written in Julia, this would be an easy fix, because dense and sparse matrices just work together in Julia. In C++, not so much. AstroPy has similar limitations. Using PyCall/PythonCall as an initial stop-gap solution is understandable, but should not be the long term solution in my opinion. Another reason for using Julia when developing a large interactive simulation package, is that there is no need to implement a sublanguage, because Julia is the sublanguage. For example, a high performance model such as “mymodel = func1 + func2 * func3”, is easily implemented in Julia. In Python and C++, it’s not. It requires writing some high performance code in a statically compiled language such as C/C++ or FORTRAN.

7 Likes

Hi @brookluo, I was in a similar place as you four years ago. I started grad school the same month that julia hit 1.0.0. I started getting my feet wet by doing some smaller, self contained projects in Julia, before moving on to larger projects and now some big collaborative projects.

I would agree that using Python tools from Julia is not ideal, but only because Python style interfaces start to feel out of place. Performance wise, there should not be any noticeable overhead vs using the library from Python directly.
So what I found useful was moving to Julia step by step–if there was a well known Python package that did the job, I used it via PyCall until the interface started to bug me.

For example, I continued to use Matplotlib via PyPlots for the first 6 months or so before learning the Julia based plotting libraries.

In terms of my Julia usecases now, some of the bigger things I’m working on are:

  1. A library for modelling planet observations (i.e. radial velocity, etc)
  2. A code for direct imaging post-processing
  3. The adaptive optics control system for a pathfinder instrument we’re building.
  4. The control suite GUI for that same instrument

I’ve been really happy with Julia for all of these usecases, and it’s shined especially well for 1 & 3.

I do still use Python libraries on occasion, typically if there is some package for querying data, interacting with a hosted service, or a niche library that would be complicated to replicate such as PICASO.

One place I’ve hit a brick wall with Julia is finding a good remote procedure call library…but I think that’s not a common need in astronomy/astrophysics.

5 Likes

It makes little sense to port the whole astropy to Julia. Lots of its functionality is already available in individual packages that compose well with each other - notable examples include Unitful.jl, Measurements.jl (and other uncertainty packages), and SkyCoords.jl. These don’t need to live together in a single package in order to play well with each other.

Writing in Julia gives you freedoms not possible with Python, like using arbitrary datastructures instead of “everything numeric is a numpy array” or “everything tabular is a dataframe”.
Want objects in your catalog to be custom structs with their specific functionality? Totally fine in Julia! It’ll work fast and compose with other packages in the ecosystem.

At the same time, there’s nothing wrong with using well-established Python packages from Julia.

I still use matplotlib (and sometimes altair) to plot from Julia, after a couple of years (: It’s seamless, I can almost copy code from matplotlib docs verbatim to a Julia program.

5 Likes

For some additional pointers/inspiration on using Julia in astronomy/astrophisics, check out: Julia in Astronomy & Astrophysics Research | JuliaCon 2022

Two of my personal favorite features of Julia are the built in package manager, Pkg, and the incredibly seamless CUDA integration provided by CUDA.jl.

Julia’s Pkg makes package management super easy and robust. This is very different from Python’s plethora of package management schemes. I think this leads to a different approach to packaging that prefers small granular packages that can be used (and updated!) “a la carte” rather than large monolithic packages that are often found in Python (I’m looking at you astropy!). This can be a bit of an adjustment coming from Python, but once you get used to it the benefits become self evident.

The CUDA.jl package feels like magic. You can write CUDA kernels in normal Julia code rather than in strings containing C code as is done in Python. But often you don’t even have to write kernels because many Julia functions written for CPU Arrays can be passed a GPU Array (a cuArray for CUDA). The function will be automatically and transparently compiled into efficient CUDA code and run on the GPU. There are a few constraints that must be followed to ensure that the code can be efficiently parallelized (e.g. avoiding scalar indexing), but that’s good practice anyway and often leads to better multi-threaded performance on the CPU as well.

3 Likes

100% Julia stack is cool, but may not be the most efficient solution to a specific problem.
If there is already a mature toolbox written in another language, I can’t think of a reason to rewrite it.
For example, OpenFoam has been developed for 20 years in C++. If we have a nice way to interoperate with it, we can quickly give up the desperate hope of reimplementing everything again.
For astropy, things are even easier since the PyCall/PythonCall interface is much more mature than CxxWrap at the current stage.
Working with partial PyObjects doesn’t prevent you from leveraging Julia’s advantages in code generation, specialization, JIT compilation, etc.

You should check out PySR for some inspiration. It is a Python frontend for SymbolicRegression.jl, using PyJulia for interconnect, and is pretty seamless. Because of this, you can hook it up to typical pure-Python Scikit-Learn pipelines effortlessly, while all the actual computation is being done on the Julia side. PySR and SymbolicRegression.jl have been used in quite a few astrophysics papers already, of which a small selection of which is shown here: Research - PySR. PySR/SymbolicRegression.jl have scaled up to 1000s of cores pretty easily using ClusterManagers.jl.

With @mkitti and @ngam we managed to package PySR into conda-forge, so that you can install with a one-liner conda install -c conda-forge pysr, and never have to worry about installing Julia manually - it all gets packaged into the conda install. This is really helpful for working with astro people who are stuck in Python!


On the simulations side, I would recommend checking out Trixi.jl as well as Oceananigans.jl. Both are capable of massively parallelized hydro simulations. In fact, Trixi.jl even has MHD built-in, if that’s the type of simulation you are looking to do.

And of course, because Julia is composable and packages are all written in a high-level language, it’s pretty easy to hook into one of the existing packages, rather than writing yet another C++ simulator from scratch.