Reflections on Developer Experience After Building a Large Julia Codebase

I want to share my experience developing a Julia codebase for an project on high-performance simulation. It is from the experience of porting a ~80k line Python research code to Julia. Strictly speaking, it is not a “port” but a rewrite since I know the simulator code well enough and been writing idiomatic Julia code for some years.

My experience is not about issues with deploying Julia as in the Zipline + Julia talk but is inspired by it to share. This is purely about developer experience when building and maintaining a larger codebase.

What works well

Let me start with what I appreciate. Julia is great for researchers writing shorter scripts --magically, it just works. The performance, once you understand the model, is excellent. My current state: 70 seconds precompilation, 10 seconds using time, 20 seconds TTFX, and 1.2 seconds runtime (400ms of which is DiffEq). That runtime is really great for my domain. And, DiffEq.jl IS SO GOOD!

The REPL-focused workflow is not an issue for me. I worked around it using DaemonMode.jl + Revise.jl through a shell script, so I can do mytool run script.jl against a persistent daemon. I tend to think the arguments around REPL-focused workflow are somewhat misplaced. The real issue is TTFX, and once you set up the daemon, the workflow is fine.

Surface-level friction

The using/import design creates discoverability issues. using DifferentialEquations, for instance, is a wildcard import that brings in hundreds of symbols. Just by reading a script in VS Code, it’s unclear which symbols are available in scope. I enforced strict rules in my project on which symbols to explicitly import, but reading others’ code remains somewhat difficult. And since some symbols may have been @reexported, tracing back the definition of a struct can take time, and finding out a particular (but somewhat vague) methods on it can be harder.

The power of multiple dispatch combined with the lack of formal interfaces makes it hard to call functions correctly without repeatedly checking documentation. For example, how should one pass KLUFactorization() to QNDF()? IntelliSense gives no hint. The LSP also produces many false positives that I have to ignore.

Deeper friction: refactoring without static analysis

Refactoring code is difficult without good IDE support. I refactored my project from a submodule structure into a monorepo and spent enormous time fixing missing exports. (The monorepo refactor was itself a workaround for a VS Code Plugin issue that prevented IntelliSense from indexing the repo under development.) TestItemRunner also had an issue causing crashes due to ImportError after a few minutes of running (I submitted a PR with a fix that has since been merged).

Changing a struct definition or function signature, without static analysis, leads to error messages like “no method matching…(a long list of type parameters).” That error requires two cognitive steps to diagnose: which dispatch out of the many was intended, and which argument is problematic. My workaround has been to limit my use of multiple dispatch so that when code errors, I have only one function signature to consider. It is a conscious tradeoff for clarity. This works for my own code, but errors that propagate through libraries can still produce cryptic messages.

Performance friction: allocation and compilation

There is a constant fight against allocation and compilation time. To avoid allocation while supporting an evolving model library, I used NamedTuples extensively, which may have been a mistake. I am doing:

struct ResidualWorkspace{T1, T2, ...}
   var_addr::T1
   var_val::T2
   ...
end

struct SystemResiduals{T1, ...}
   ws::T1
   ...
end

sr = SystemResiduals((model1=ResidualWorkspace(...),
                      model2=ResidualWorkspace(...), ...), ...)

But compilation of methods specializing on sr became extremely slow. It’s like two minutes. Diagnosis showed the type parameter was 28 KB. I was unknowingly beating the compiler too hard. My current workaround is using Ref{Any} as a type barrier, but I honestly not sure if this level of internals are supposed to be used by application developers.

I also used closure patterns to capture concrete types and avoid type instability, but later learned that closures won’t be cached in PrecompileTools workflows. I had to switch to functors, which has been working fine.

Learning to live with it

The experience is challenging for those package developers who are willing to trade upfront effort for certainty (and performance). Allocation and type stability are difficult to reason about, and I often find myself in a cycle: small refactor → performance degradation → significant time isolating the issue → another refactor. I now have thousands of tests and have learned to set up regression tests for allocation in hot loops, which hopefully catches issues early.

Looking forward

This is just a post to share my experience, and I haven’t been able to publish my code. But I’m curious: what is the root cause of these friction points? Is there discussion of a Julia 2.0 that might be breaking but would improve developer experience – perhaps being more restrictive in some language features to make reasoning easier for both the LSP and developers? Those are on my wish list:

  • stricter interfaces or traits
  • optional static typing or signatures
  • compiler limits on specialization – and require the think more upfront
  • default on explicit using/import

Anyway, since my community is Python-heavy, I will probably end up packaging with PackageCompiler.jl. But I’d love to hear your perspectives and suggestions.

16 Likes

Some of that may be included here, and doesn’t require a 2.0:

1 Like

I hope that this issue will be resolved by GitHub - aviatesk/JETLS.jl: A new language server for Julia, enabling modern, compiler-powered tooling.. Can you check if it really resolves this issue?

It is still a bit slow (which will change), though.