Average Python package is 2212.6 (average) / 507 (median) = 4.4x larger (for lines of code, as I suspected). I think that might inform discussion about e.g. documentation, if/since Julia packages are smaller, and you tend to use more and compose, rather than one of few bigger in Python. I’m not complaining (just an observation), all else equal, packages should be smaller, for code, though no min. limit on docs.
Largest Julia package is though way larger at 264,115 lines, 5.5x larger 2.2x larger (compared to Python code in root, at 118,793 lines, see comment below). Hecke.jl is 130,560 lines (not as much autogenerated code?).
Both statistics are old, though Python’s may be 8 years old (still interesting to compare to Python of the past), so this might be way too outdated. Do we have similar statistics for Julia, as that page on Python has? And you know of good updated statistics for Python or other languages?
[Biggest doesn’t ring true, TensorFlow, PyToch etc. missing, statistics predating them?]
Total sizes on packages in PyPI amounted to 4.2 GB. Average package size is 161 KB and standard deviation is 1MB.
Minimum and maximum lines were 2 and 47 453 respectively. Number of lines averaged to 2212.6 lines per package and standard deviation was 8729.7
No, that was actually AWSSDK.jl, next-largest AWS.jl also huge (then Hecke.jl, then outdated TensorFlow.jl), I think by now they merged. Julia itself was already excluded, as not a package.
You’re right I should use median, and same for both, for Python, but that’s what I found so I pointed out inconsistency, but I’m not sure it changes much. Yes, better to exclude tests (or compare them separately, or as combined), but I think, not sure both have them in.
To be fair, I don’t know why AWS[SDK].jl is so huge, I suspect autogenerated code and/or comments.
And not sure either why root is so huge (I guess HEP just this complex). And if Python is only 1.6% of it (only 6th largest language, C++ largest share 78.5%) then is it 118793/0.016 = 7 million lines in total?! I updated my top post with that (Python-part) figure.
@ChrisRackauckas Do you know how large SciML is in total (lines of code, or other metric?). I suspect if we count ecosystems, then it is largest. Would be interesting to rerun the stats in the Julia blog post (should be simple, the code for that available). Any guess about the largest “umbrellas”, besides SciML?