Mention of Julia for the re-implementation of PETSC

The DOE codes (PETSC, Trilinos, Sundials) have 100s of person-years behind them and are proven to be scalable on top-5 computers. Having these things easily accessible from Juila can’t be bad.

6 Likes

Agreed here. If there was a massive benefit to having PETSc in Julia someone would have started maintaining a nice wrapper by now (like there is for thousands of other packages), but there’s a reason why SciML, CLIMA, etc. somewhat care but only in passing. Having a monorepo come in and do all of those things half well wouldn’t really be a great idea. PETSc would have to be very much reconstructed to be useful. It would be an organization with a a very strong distributed array package that has matching distributed linear algebra, and the rest would just come from the package ecosystem. Then it would need to make sure that distributed MPI array integrated well with DiffEq, Optim, etc.

The Julia package ecosystem has 10,000’s of person-years behind it in a language where people tend to move 100x faster. That isn’t to say that we’ve done everything better, we definitely haven’t. Their bread and butter of how to scale basic linear algebra to 10,000+ cores is fantastic and we haven’t matched that. The issue is that their designs tend to not be too composable, so you get these monorepos like PETSc where they have some fantastic basic tooling for the underlying operations, but when you get to “the things that call the linear algebra” (optimization, differential equations, etc.), not only does the offering tend to get a bit meh at best (you can tell it’s not where the vast majority of time is spent) or wrappers of things already wrapped and widely used in Julia. None of that stuff even needs to be specific to PETSc since you’d have the same speed from the algorithm if it’s generically written and uses the right array primitives.

So a PETSc in Julia wouldn’t be one repo but instead some very fast primitives that connect to everything else. What we need is someone to build a very nice distributed array package with extremely fast distributed linear algebra on it. The ecosystem could take it from there. Maybe the best way to do that is to just wrap PETSc or Trilinos as it is.

6 Likes

Wrapping the general routines for easily building parallel matrices (the core Petsc vec/mat functionality) and handling distributed meshes / automatic partitioning of unknowns is the feature that I miss. I agree that if that were done and robustly maintained we might see other Julia libraries be built on top of it in a more composable manner. That said, I haven’t looked at the Krylov solver implementations in Petsc; are well-optimized parallel Krylov solvers/preconditioners really completely identical to a serial solver except in the array / matrix type, or do they require further algorithmic changes to work optimally in the distributed setting? If the latter, that would seem to require much more work in Julia to catch up to Petsc, and I can imagine for some things at least like domain decomposition and/or multigrid, more specialized implementations are needed. Petsc’s integration with other numerical libraries is also fantastic, each of which would seemingly require separate Julia bindings to make available…

The perspective of the Petsc developers in that thread makes sense; they want to be as widely available to be used in other languages as possible, in which case keeping the core in C/C++/Fortran seems reasonable. Those are all easily callable from most major scientific computing languages these days, and avoid some of the drawbacks of Julia (I have the impression Julia still needs some work for building compiled shared libraries).

I would love to see a benchmark of all the parallel linear algebra options available in Julia including PETSc. Then a list of which packages are incompatible with which types of parallelism can be made which the community can then address one by one. A benchmark is a good place to start to identify where we stand. This is a good GSoC/JSoC/MLH project idea with a clear extension: fixing the parallelism support in different packages.

2 Likes

I see this comment there (also they’re really thinking ahead with “Fortran 20008” :slight_smile: Fortran 2018 already added some TS 29113: Further Interoperability of Fortran with C):

Julia has a lot going for it, but I’m not aware of successful packages written in Julia to be called from C/C++/Fortran. The runtime is relatively heavy and load time is nontrivial.

The former objection, on the runtime, is correct.* I just note, I see quite a few Julia packages being called from Python (e.g. for SciML packages) and R (convexjlr, and as an example wrapping in other direction: Alpaca.jl). At least 2+ for each language (I’m not sure about adoption), while I forget the names, and wouldn’t know about adoption. So that’s certainly possible. It all works because Julia can be embedded in C/C++, so should be usable from those languages, but who cares, I wouldn’t want to use those as the main language. [There’s also a thread here now on using Julia from Java.]

* About load time, Julia startup is 0.2 sec, but if you’re thinking about the packages, then I suppose (eventually) the packages you need could be put in the system image.

So, I agree about Julia the better wrapper language (than Python, at least when a bit more work has been put into latency/precompilation), or only language, as who cares with the eventual Julia world domination.

What was missing to make this a student project was the PETSc_jll, since the binary isn’t easy to vender. But @jdlara-berkeley finished that so that we could ship it with Sundials_jll, so now it’s automatically built on all operating systems. At this point, a new PETSc.jl should be a simple exercise, so I plan to promote it for a GSoC next summer. I only want the array stuff though, and maybe some TS stuff so we can benchmark against it again and just show the advantages of distributed development (again, back when PETSc.jl worked we did a few things so we want that reproducible), but the array and linear solver portion is what is really useful and a great GSoC.

4 Likes

I’d focus on the following comment by Jed Brown’s comment:

Julia has a lot going for it, but I’m not aware of successful packages written in Julia to be called from C/C++/Fortran.

Does anyone has a non trivial example how to integrate Julia into C / C++ which works across Windows, Linux and macOS?
Something like passing to Julia an array of data and a struct of parameters and apply some non trivial hand written function?

The documentation is really sparse on this. Some community made examples will be great.

1 Like

I guess something such does exist, but I don’t see any advantage of doing that. Julia already works on all three operating systems and C/C++ hardly simplifies any operations you’d use Julia for. It’s a wrong idea that author of the quote made.

1 Like

I think people miss how crucial this is.
In many places there is an existing code in C / C++. No one will write in from scratch in Julia.
The question is what will be done with new modules?
If there is an easy way to embed Julia in new modules it means it is a candidate.
Since there is none, it won’t.

In practice people make pragmatic choices, not ideal ones.
It means if there is an existing code, your path in is to be able to integrate into it.

Julia is missing that. It is not a coincident it is mentioned in a C / C++ / FORTRAN project as PETSC.
Julia needs this to penetrate to this market.

2 Likes

I don’t think this is the case at all. As you can see from the Python ecosystem, it’s C/C++ that gets wrapped into a high-level language. Julia is a suitable choice for that. It doesn’t require rewriting any C/C++, just wrapping it.

2 Likes

That’s because Python’s performance is crap for anything numerical. C/C++/Fortran have the benefit of already being fast (which is why Python and R wrap routines in those languages). No one in their right mind writes heavily-numerical code in an interpreted language like Python or R, or if they do, they use a JIT like Numba or Cython.

As @RoyiAvital says, there is a lot of existing numerical code written in C/C++/Fortran that is still actively developed. Those projects would stand to gain a lot by writing new modules of functionality in Julia, but unfortunately we just have an absence of tooling for making that easy. PackageCompiler isn’t an answer here either, because it relies on a full-fat Julia runtime for codes that may never even call into the runtime. Julia is not lightweight, and existing projects don’t want to deal with the added bloat just to get a few benefits from writing new code in Julia.

So let’s please stop assuming that just because Julia is cross-platform and PackageCompiler exists that projects will start writing new codes in Julia; they won’t until we fix our stack to better fit into their workflow and requirements.

16 Likes

That’s because Python’s performance is crap for anything numerical.

That explains why Python projects wrap C/Fortran. But, not why Python is adopted as the user-facing language in the first place. Essentially no big project with a Python front end is going to switch to Julia. And almost no new project that wants users with a broad background will choose Julia as the user-facing language; they’ll choose Python. What a project might do is use Julia in places as a substitute for C/Fortran (or Rust, which some projects are willing to do), for example to write Python extensions. But, not until Julia can produce slim, linkable object code with no runtime, like C or Rust. All this is widely understood in the Julia community. I’m just restating this for this thread. But, I see the latter, when it is possible, as a wedge to drive Julia adoption in more contexts.

1 Like

No one would wrap an existing code base in Julia.
Let’s say I have a system which is doing some analysis of signals and now I want a new module to address new model of a signal.

The whole system is written in C / C++. Yet if there is a language which can be more efficient (In development time) it could be great to use it as long as:

  1. It offers similar run time to C / C++.
  2. It can be embedded easily into the existing code base.

As you can see in the PETSC they consider Rust for those reasons though I am not sure it indeed offers better productivity.
In other places (Commercial businesses) they use MATLAB + MATLAB Coder exactly for that.
MATLAB Coder solves the problem by generating a portable C code (Instead of the MATLAB Run Time).

For first step I’d even embed the whole Julia Run Time in the projects, but I can’t find documentation / example for that for the non trivial case. The documentation shows how to evaluate simple string. It is also oriented to Linux.

Is there an example to embed the Julia Run Time in case of non trivial usage?
Like passing a big array of data and some structs, process few Julia functions from several modules and getting the results back?

My recommendation is to only ever interact with the Julia code via @cfunction pointers. Then your Julia code will behave just like a C library from the embedder’s point of view. Obviously you still need to start the runtime and extract the @cfunction pointers but that requires a relatively small amount of embedding code. The documentation in https://github.com/JuliaLang/julia/pull/28886 might be helpful but notice that it does the setup with a dynamically loaded libjulia. That adds some complexity in exchange for making Julia an optional runtime dependency instead of a build dependency.

1 Like

I am not sure it is.

In fact, I think that many people here hold the opposite view, ie that various features make Julia is ideal for a user-facing API, with the understanding that “user” in this context means someone who is willing to put together various building blocks to solve numerical problems as part of a program, while preferring to leave the implementation of these blocks to others to the extent possible.

The prevalence of Python comes from history: keep in mind that it is more than 2 decades older than Julia*. Not switching a large existing codebase to another language is a reasonable choice in a lot of cases, but does not necessarily reflect on the relative merits of languages.

3 Likes

Or see Data-exchange Julia-LabView for another use case: Experiment control by LabVIEW, and e.g. signal analysis with Julia.

I and many other people here think that Julia makes a great user facing API, far better than Python. (Maybe not ideal, because of JIT latency, etc. although this is improving greatly). I think Python is an absolutely terrible choice for any position in a scientific computing stack. Maybe I meant “almost no new project” in the measure-theoretic sense. In my experience, essentially everyone mistakenly believes that Python is an ok, or even great, language for scientific computing.

@simonbyrne created Conjugate Gradient for C in Julia.
I haven’t tested it in Windows and haven’t checked the Julia overhead in that case but it is an example.

Are there more like that?
Can anyone write some guidelines on how to optimize such use cases to minimize Julia’s overhead in such modules?

Would you care to elaborate on this? As these are pretty bold statements given the popularity of Python in this context. If “scientific computing” is narrowly defined as “writing code to perform scientific computations” then I would agree with you, but many researchers are using Python as the proverbial “glue language” to build on existing components (regardless of what language those components are developed in). The latter use of Python is where it really shines. I certainly would not consider Julia a better option in that respect.

3 Likes

It is running on Windows:

2 Likes