I order to understand the problematic of use of the Julia language for very large software project (>100,000 code statements, >100 developers), I would like to know which is the largest project or which are the largest ones written in Julia.
I would also be happy to hear about possible encountered difficulties in these projects linked to their size and issues we should consider.
Invenia has 400.000 lines of Julia code in production for apparently over 3 years now (and I has 74 employees in total, not all developers): https://news.ycombinator.com/item?id=24083601
It was the largest Julia project I knew of at the time (and 400.000 lines might be an outdated number), likely still is, and maybe by employee count too? Anyone have more (than a 100) developers?
SciML ecosystem seems huge, maybe it’s largest, but it’s not just one package, at was over a 100 if I recall. OrdinaryDiffEq.jl one of its packages is or was 92.308 lines of code.
For the whole package ecosystem it’s at least 8 million lines (11+ with docs), but those numbers are outdated, there are now 7600 packages or 47% more since that count in 2021.
At the time AWSSDK.jl was the largest single package at 264.115 lines, followed by AWS.jl at 225.723, so it doesn’t seem too hard to have a half a million-lines of code project (with your dependencies counted… if both of those are needed, are they ever?).
I am fairly sure nowhere has over 100 paid julia developers.
Invenia, RelationAI and JuliaComputing all are around 50, from my memory.
SciML github org has 64 members, but odds are a lot of them are not active anymore.
I haven’t run numbers on Invenia’s codebase in a while. I would guess over half a million lines of code now.
We did do some pretty huge refactorings though, and removed 100K or so with that; but i suspect we added 200K net since i counted the 400K LOC.
@oxinabox, has the “time to first plot” been an issue for the Invenia projet? Did it require special care, e.g. limiting dependencies, precompilation? We have a concern that for a large project the time to precompilation on start-up could be an issue and could require special attention.
It’s not clear to me why the plot packages are specifically affected by the precompilation time: is it link to the use case (support of different backends?) or to the size of the packages. In the latter case I’m concern if the time will scale up for larger projects.
Note: the AWSSDK.jl projet code is not really large: the lines are mainly docstrings, that were included in the count of the Eric P. Hanson and Mosè Giordano’s article.
Thanks, that may be. By now AWS.jl which “replaces AWSCore.jl and AWSSDK.jl which previously provided low-level and high-level APIs respectively.” is now 518801 lines of code (a bit larger than doubled or AWS.jl + AWSSDK.jl), likely single largest Julia package, still startup can be fast:
julia> @time using AWS
0.489314 seconds (498.29 k allocations: 34.409 MiB, 42.70% compilation time)
julia> using PackageAnalyzer
julia> analyze("Pluto") # also roughly doubled in size
Package Pluto:
* repo: https://github.com/fonsp/Pluto.jl.git
* uuid: c3e4b0f8-55cb-11ea-2926-15256bba5781
* is reachable: true
* lines of Julia code in `src`: 9046
* lines of Julia code in `test`: 5884
* has license(s) in file: MIT
* filename: LICENSE
* OSI approved: true
* has license(s) in Project.toml: MIT
* OSI approved: true
* has documentation: false
* has tests: true
* has continuous integration: true
* GitHub Actions
“lines of Julia code in src” doesn’t take into account docstrings, it would be a good option to do that (or by default?), and another option might be to look at all dependencies of a package (recursively). For some reason, “number of contributors: 63” is gone can be seen in the help string for analyze, where I got the previous size of Pluto (not Plots which is 19296 lines of code).
It has never been a huge problem.
Time to first plot is a huge problem for like little commandline utilities.
That might get launched in a loop via bash etc.
Invenia’s main system runs for an hour each time it is used.
So startup time is not significant as a portion of this
When you think about common large applications, like MS-Word or the Matlab IDE, start-up times are much longer.
We do sometimes compile system images with PackageCompiler.jl, e.g. when we are doing hyperparameter searches.
And we do run Pkg.precompile() while building our docker images.
but these are using the tools basically just straight out of the box following manual, no real advanced options, no special care required in designing out packages.
Definately the parallel precompilation that was introduced in 1.6 was a huge boon prior to that, yeah full precompile time, on a fresh system, could take 10 minutes or so, and now it is under 1 (numbers off the top of my head.)
And precompilation only has to be done once (and normally is much faster as some things will already be precompiled)
It’s not clear to me why the plot packages are specifically affected by the precompilation time.
It is because they have a lot of methods that are called only once with each type of argument.
Plotting has been described as like an almost optimally bad thing for julia’s type-base specialization JIT.