PackageAnalyzer is a tool @giordano created while writing this awesome blog post. After that, I helped expand some of the functionality, and together we used the tool to survey the General registry, resulting in this blog post and 2021 JuliaCon talk. Recently, we updated the package with a bit more functionality and cut a v1.0 release.
What can PackageAnalyzer do?
PackageAnalyzer downloads the code associated to a package, and runs some very basic static analysis, looking for the presence of CI scripts & documentation, counting lines of source code and tests, and checking licenses. It also optionally can gather contributor data from the GitHub API. It is multithreaded, robust and somewhat battle-hardened, as Mosรจ ran PackageAnalyzer v0.1 daily on the whole General registry for a long time to collect statistics over time. Note that a lot of internals have changed in v1.0, so it is possible it has regressed on its โbattle-hardenedโ status, although we have tried to keep in mind the lessons learned from earlier versions .
The API is very simple; one calls analyze("DataFrames")
for example to analyze the package DataFrames
:
julia> analyze("DataFrames")
Package DataFrames:
* repo: https://github.com/JuliaData/DataFrames.jl.git
* uuid: a93c6f00-e57d-5684-b7b6-d8193f3e46c0
* version: 1.4.3
* is reachable: true
* tree hash: 0f44494fe4271cc966ac4fea524111bef63ba86c
* Julia code in `src`: 18778 lines
* Julia code in `test`: 28766 lines (60.5% of `test` + `src`)
* documentation in `docs`: 6761 lines (26.5% of `docs` + `src`)
* documentation in README: 21 lines
* has license(s) in file: MIT
* filename: LICENSE.md
* OSI approved: true
* has `docs/make.jl`: true
* has `test/runtests.jl`: true
* has continuous integration: true
* GitHub Actions
PackageAnalyzer uses RegistryInstances.jl, which is based on code taken from Pkg.jl, in order to query all installed registries for the package name, and thus supports multiple registries. The input to analyze
can also be a local path or a URL.
One can also analyze an entire manifest with analyze_manifest(path)
(where path
defaults to the manifest of the current active project). For example, analyzing a temporary environment in which Iโve added PackageAnalyzer
pkg> activate --temp
pkg> add PackageAnalyzer
julia> using PackageAnalyzer
julia> @time results = analyze_manifest();
0.117077 seconds (317.67 k allocations: 43.424 MiB)
julia> summary(results)
"33-element Vector{PackageAnalyzer.Package}"
PackageAnalyzer will respect the versions of each dependency in the Manifest, meaning it will take care to analyze the associated code (and not, say, the latest development code). It also properly handles code on branches (from e.g. Pkg.add(; rev=...)
) and dev
โd dependencies. It will download code if required, but if the code already exists in your .julia
folder, it will find and use that (and verify the git tree hash to ensure the contents are as expected according to the hash in the manifest or registry). This makes analyzing manifests which have been instantiate
โd very quick.
One can easily post-process the results, since a Vector{PackageAnalyzer.Package}
is a Tables.jl-compatible row table. Continuing the example above,
pkg> add DataFrames
julia> using DataFrames
julia> df = DataFrame(results)
33ร22 DataFrame
Row โ name uuid repo subdir reachable docs runtests github_actions travis appve โฏ
โ String Base.UUID String String Bool Bool Bool Bool Bool Bool โฏ
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ libsodium_jll a9144af2-ca23-56d9-984f-0d03f7b5โฆ https://github.com/JuliaBinaryWrโฆ true false false false false fa โฏ
2 โ HTTP cd3eb016-35fb-5094-929b-558a96faโฆ https://github.com/JuliaWeb/HTTPโฆ true true true true false fa
3 โ licensecheck_jll 4ecb348a-8b88-51ea-b912-4c460483โฆ https://github.com/JuliaBinaryWrโฆ true false false false false fa
4 โ PackageAnalyzer e713c705-17e4-4cec-abe0-95bf5bf3โฆ https://github.com/JuliaEcosysteโฆ true true true true false fa
โฎ โ โฎ โฎ โฎ โฎ โฎ โฎ โฎ โฎ โฎ โฎ โฑ
31 โ RegistryInstances 2792f1a3-b283-48e8-9a74-f99dce51โฆ https://github.com/GunnarFarnebaโฆ true false true true false fa โฏ
32 โ LazilyInitializedFields 0e77f7df-68c5-4e49-93ce-4cd80f55โฆ https://github.com/KristofferC/Lโฆ true false true true false fa
33 โ LicenseCheck 726dbf0d-6eb6-41af-b36c-cd770e0fโฆ https://github.com/ericphanson/Lโฆ true false true true false fa
13 columns and 26 rows omitted
julia> code = select!(flatten(df, :lines_of_code), :name, :version, :lines_of_code => identity => AsTable);
julia> sort!(code, :code)
235ร9 DataFrame
Row โ name version directory language sublanguage files code comments blanks
โ String VersionNโฆ String Symbol Unionโฆ Int64 Int64 Int64 Int64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ libsodium_jll 1.0.20+0 README.md Markdown 1 0 27 11
2 โ HTTP 1.5.5 docs Markdown 5 0 465 191
3 โ HTTP 1.5.5 README.md Markdown 1 0 51 29
4 โ HTTP 1.5.5 CHANGELOG.md Markdown 1 0 218 24
5 โ HTTP 1.5.5 LICENSE.md Markdown 1 0 22 2
6 โ licensecheck_jll 0.3.101+0 README.md Markdown 1 0 33 16
7 โ PackageAnalyzer 1.0.0 docs Markdown 3 0 90 40
8 โ PackageAnalyzer 1.0.0 README.md Markdown 1 0 22 13
โฎ โ โฎ โฎ โฎ โฎ โฎ โฎ โฎ โฎ โฎ
228 โ MbedTLS 1.1.7 src Julia 13 2289 48 237
229 โ JSON3 1.12.0 src Julia 10 2512 68 199
230 โ OpenSSL 1.3.2 src Julia 2 2918 219 521
231 โ Parsers 2.5.1 src Julia 9 3252 136 154
232 โ HTTP 1.5.5 test Julia 26 4537 185 488
233 โ URIs 1.4.1 test JSON 1 4771 0 0
234 โ LazilyInitializedFields 1.2.0 page CSS 94 5944 810 1167
235 โ HTTP 1.5.5 src Julia 36 6712 459 725
219 rows omitted
There are plenty more features and analyses that could be added to the package, so check out the issue tracker if you would like to get involved!
We hope others find it a useful way to get a quantitative understanding of their dependencies, as well as of the OSS ecosystem as a whole.