What are most used packages in Julia ecosystem, e.g. DataStructures and StaticArrays? And most used parts of them?

Can I easily with some script find out the most used packages in the Julia package ecosystem? Or anyone have a good guess? Anyway I want to comment on those two very important and much used ones.

I was optimizing a startup of a package, and those two seem like low-hanging fruit:

 @time_imports using PGFPlots
[..]
      3.7 ms  StaticArraysCore
    619.2 ms  StaticArrays

      9.8 ms  OrderedCollections
     75.3 ms  DataStructures

    174.5 ms  ColorVectorSpace 13.94% compilation time (100% recompilation)

      6.1 ms  Fontconfig_jll 87.95% compilation time
      4.9 ms  OpenSpecFun_jll 87.37% compilation time

Just cutting StaticArrays from the (indirect) dependency list would improve loading startup of the package by 36% (from by the optimized startup, I already cut by 67%, PR forthcoming), and cutting the three worst offenders would cut additional 50%. I understand why people want static arrays (as in C, C++), since can be faster, but for plotting it seems not warranted, and also if I’m not mistaken, can be had with just using the 3.7 ms StaticArraysCore. Should Images.jl rather use that?

It could be argued that StaticArrays should be in the sysimage, or at least StaticArraysCore? Something that is most often good enough, documented, and preventing people from using too expensive/too general other dependency.

Also what are people mostly using DataStructures for? It includes many, also all from OrderedCollections, so please rather use that one if enough, e.g. for an ordered Dict (that I would also like to see provided by Julia…).

I think, but I’m not sure, that if people use just parts of a package, e.g. DataStructures, then still all of it, i.e. all of the compiled code needs to be loaded, the granularity being the package. Am I right? Would it help to change that in Julia, or should packages rather be split up? We’ve already had some discussion about Julia ecosystem having too fine-grained packages, and the need for macro packages.

E.g. DataFrames can’t be split up (or functionality taken out), since dropping stuff is a breaking change. You could have users doing either;

using DataFrames: A, B
using DataFrames: C, D

The former users will load A, B, and C, D etc. of the binary code, just that C and D isn’t accessible? If you have such non-overlapping users of a package, then it could arguably be split up into two or more, and DataFrames made a macro package to include them all for convenience. It would only help if people would then rather use the sub-packages directly. I’m thinking one such subset of some very much-used and mature code, could integrated into Julia Base.

Most JLL packages have no compilation time, not sure why those two are an exception, can someone fix it (it probably means an order of magnitude slower loading)? Also a possible interesting optimization, while any one JLL package loads fast, importing packages is a cumulative process, right? Could some packages at least all JLL be loaded in parallel or asynchronously?

1 Like

StaticArrays is a big offender. It is all over the place. Its effects on load time in the ecosystem need to be mitigated.
Putting it (or SACore) in the system image is not the right answer. And it will never happen.

I did not look into it myself, but I heard it asserted that significant improvements to load time of StaticArrays itself would not be difficult super difficult for some definition of “super”.

Load times for SA have been improving drastically over the past few years.

Moving dependencies to extensions in some packages may be possible.

There are probably some clever ways to make these problems better. But a lot of it just requires time and sweat.

I think this is discussed in several places. Here is an issue to get started.

1 Like