I’d like to spend some of my free time contributing to projects that help address the issue of climate change. My background is that I have (basically) no domain knowledge at all, but I have julia and modeling/simulation experience (mostly astrodynamics and estimation).
Allthough not about climate modelling, but climate data analysis, ClimateBase.jl is a repository that is great for newcomer contributions. It recently has been overhauled and there are a bunch of low hanging fruits. The source code is small, self-contained, and (to my judgement) easy to understand.
We at CliMA would certainly welcome contributions, though I do concede that it is probably not that easy to dive right in and start contributing to the main ClimateMachine.jl repository (though if you do have ideas, please let me know).
That said, there are lots of areas where contributions would be very welcome, and could benefit the wider Julia ecosystem:
file readers for common data formats (issue #114). There are a wide variety of file formats used for climate data (e.g. NetCDF, Zarr, HDF5): although most have a Julia package, the quality and maintenance of them varies considerably. Some of these rely on cumbersome third-party binary dependencies, and could benefit from being translated into Julia to benefit from features such as memory mapping, and parallel I/O.
visualization: we heavily use Paraview and VisIt for visualizing the model output. Being able to directly interface Julia with these libraries (e.g. for in situ visualization) would be incredibly cool. Alternatively, building something similar in Makie.jl would be very neat (though perhaps a lot more work).
GPU and distributed memory tooling. We heavily make use of CUDA.jl, KernelAbstractions.jl and MPI.jl, so improvements to those directly help our project. Similarly, we are excited about AMDGPU.jl so that we can run on alternative GPU architectures.
There are many rough edges in running Julia on HPC clusters (deployment, binary dependencies, etc.). If you are interested in this topic, please join our discussion.
I think that there’s a lot of good information here in this thread already.
Overall, I think that the whole climate science domain is a natural candidate for Julia, with respect to two main axes:
Modelling is costly. In that respect, ClimateMachine.jl is an awesome “showcase card” for the language.
Climate analysis deals with Big Data… way before the term was coined.
The big advantage right now of Python is the easy data access and scalability with xarray and dask. Each time I advocate Julia to my colleagues, they always ask : can I easily process 100 TB of data with a simple script and scale that to my cluster? I’d say that we are close, but not there yet, at least for the persons that are not developer-inclined (e.g. people writing 2000 lines of script code in Matlab/Python for their current paper).
There are solutions though, but the documentation is sparse. For example, for xarray-like approach, I succesfully tested ESDL.jl and YAXArrays.jl (not a big fan of the name though!). I was able, using ESDL.jl, to scale computations on a slurm cluster.