Looking for test cases for JuliaFormatter, i.e., please give me your big Julia repositories

Hello! Since being moved to JuliaEditorSupport a few weeks ago, v2 of JuliaFormatter has had a lot of updates and bug fixes (by my count, 47 issues have been closed by recent PRs – that is excluding old issues which no longer reproduce).

Now, there are still plenty of known bugs and almost certainly many more unknown ones. (I think it’s generally well known that JuliaFormatter v2 is quite buggy :slight_smile: ) However, it is at a stage where I’d like to start doing more serious CI testing on real-life, large-scale, codebases to catch more of these bugs.

If you have suggestions for large Julia repositories, particularly those with fairly diverse syntax, please let me know either on this GitHub issue or here. Thanks :slight_smile:

I’m also very happy to answer questions generally about JuliaFormatter, its status, and what’s in the works.

And finally, massive thanks to everyone who’s engaged in issues or PRs recently!

There’s a list of the largest packages (at least in 2021) at Code, docs, and tests: what's in the General registry?

if you want to look for crashes (rather than formatting bugs) you could use PackageAnalyzer to sweep over packages, something like:

using JuliaFormatter, PackageAnalyzer

pkgs = find_packages()
root = mktempdir() # place to download code too
for pkg in pkgs
    path, success, = PackageAnalyzer.obtain_code(pkg; root)
    if success
         try
             format(path; overwrite=false)
         catch e
             @info "Got a crash! $e at $path for package $(pkg.name) v$(pkg.version)"
          end
     end
end

Ah, that is perfect, thank you! I had plans to doing something similar, but with a (probably rather inefficient) hand-written script to trawl the General registry and git clone everything myself :smile: .

Expect to clone more than 30 GB of repos (based on experience from 3 years ago, presumably nowadays it’ll be more)

For interest:

❯ du -sh .
 52G	.

❯ ls -1 | wc
   11924   11924  167094

(That was out of 12055 packages that find_packages() found, I didn’t look into what happened to the other 131.)

It’s probably mainly repositories that have been deleted or made private after they had been registered, we also found similar fractions of disappeared packages in our previous analysis.