Package statistics, Python's average package is 4x larger

Palli · October 19, 2022, 1:40pm

Average Python package is 2212.6 (average) / 507 (median) = 4.4x larger (for lines of code, as I suspected). I think that might inform discussion about e.g. documentation, if/since Julia packages are smaller, and you tend to use more and compose, rather than one of few bigger in Python. I’m not complaining (just an observation), all else equal, packages should be smaller, for code, though no min. limit on docs.

Largest Julia package is though way larger at 264,115 lines, ~~5.5x larger~~ 2.2x larger (compared to Python code in root, at 118,793 lines, see comment below). Hecke.jl is 130,560 lines (~~not as much autogenerated code?~~).

Both statistics are old, though Python’s may be 8 years old (still interesting to compare to Python of the past), so this might be way too outdated. Do we have similar statistics for Julia, as that page on Python has? And you know of good updated statistics for Python or other languages?

[Biggest doesn’t ring true, TensorFlow, PyToch etc. missing, statistics predating them?]

Here are the biggest packages on PyPI:
b2gpopulate (36MB)
ajenti (35MB)
FinPy (29MB)
django-dojo (28MB)
QSTK (27MB)
Total sizes on packages in PyPI amounted to 4.2 GB. Average package size is 161 KB and standard deviation is 1MB.
[…]
Minimum and maximum lines were 2 and 47 453 respectively. Number of lines averaged to 2212.6 lines per package and standard deviation was 8729.7

jling · October 19, 2022, 1:45pm

Because Python is bloat and you should probably use median, and exclude test code

this is Julia itself, you need to exclude it

and then there’s packages like GitHub - root-project/root: The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

which has python glue code:

===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Python                757       118793        82223        17006        19564

Palli · October 19, 2022, 1:52pm

No, that was actually AWSSDK.jl, next-largest AWS.jl also huge (then Hecke.jl, then outdated TensorFlow.jl), I think by now they merged. Julia itself was already excluded, as not a package.

You’re right I should use median, and same for both, for Python, but that’s what I found so I pointed out inconsistency, but I’m not sure it changes much. Yes, better to exclude tests (or compare them separately, or as combined), but I think, not sure both have them in.

Raf · October 19, 2022, 2:01pm

Those AWS packages are auto generated from the API, not really relevant for a comparison like this. They only demonstrate how ridiculously complicated AWS is.

Palli · October 19, 2022, 2:03pm

To be fair, I don’t know why AWS[SDK].jl is so huge, I suspect autogenerated code and/or comments.

And not sure either why root is so huge (I guess HEP just this complex). And if Python is only 1.6% of it (only 6th largest language, C++ largest share 78.5%) then is it 118793/0.016 = 7 million lines in total?! I updated my top post with that (Python-part) figure.

Palli · October 19, 2022, 2:05pm

Right, still Hecke.jl is or was 130,560 lines (not sure why that large, half the size), then larger then Python part of root.

Palli · October 19, 2022, 2:12pm

@ChrisRackauckas Do you know how large SciML is in total (lines of code, or other metric?). I suspect if we count ecosystems, then it is largest. Would be interesting to rerun the stats in the Julia blog post (should be simple, the code for that available). Any guess about the largest “umbrellas”, besides SciML?

ChrisRackauckas · October 20, 2022, 5:28am

I don’t know.

thofma · October 20, 2022, 8:01am

Hecke.jl does not contain auto-generated code (unless you count the CI scripts).

Topic		Replies	Views
Largest software project(s) in Julia General Usage question	6	1006	May 17, 2022
“large” scale software in Julia New to Julia development , code-organization	6	2964	October 15, 2019
Package size and scope? Internals & Design	25	4422	December 1, 2017
Package Structure in Julia Ecosystem Internals & Design question , package	3	974	May 10, 2017
Is Julia mostly written in Python? Offtopic question , python	15	2930	August 24, 2018

Package statistics, Python's average package is 4x larger

Related topics