About Julia's development policy regarding Windows

I’m not going to rehash again how we already test all registered packages for all Julia releases, and how community voting for packages is of little consequence on whether or not any particular work happens.

But it might be interesting to make something like this part of the annual Julia User & Developer Survey (organized by @ahclaster, I think). Along the lines of “which packages do you think are particularly central to the Julia ecosystem”. Nominations would happen during the feedback period and then everyone could vote in the actual survey. It would certainly be interesting data to have, and could inform some decisions on which packages to test with more scrutiny.

1 Like

An actionable thing here could be to think about how to standardize performance benchmarks for a package, such that detection of performance regressions can be automated.

Thinking out loud for a little bit: To start, there has to be a way to specify a benchmark workload for a package. It could look similar to either test suites or precompile workloads, or something in between.

A complication with benchmarks is that you always need to compare two configurations—it’s not like tests where the pass/fail result can be decided uniquely for each combination of package version and Julia version. To make sure you’re only measuring changes due to Julia itself, you probably want to pin the package at a single version that’s compatible with both of the two Julia versions you’re comparing (this also ensures that the benchmark workload is identical, which is obviously important). In many cases, that version would have to be older than the current release, since new package versions often drop support for older Julia versions.

Next, if you want to create something like PkgBenchmarks.jl to complement PkgEval.jl, should that be opt-in (e.g., the community decides on 10 representative packages for which to monitor performance) or opt-out (like PkgEval, every package that defines a benchmark workload is included unless blacklisted). Should you only compare the PR/release candidate against the current release, or also against, say, the LTS version? (Seeing longer-term trends may prevent against some otherwise barely noticeable drift in the wrong direction.) And how big should the workloads typically be, i.e., what should the timeout be? Seconds, minutes, tens of minutes, hours?

Should the benchmarks focus on runtime performance only, or should they also measure precompilation time/load time/latency/TTFX? Should this be up to each package author or baked into the infrastructure somehow?

I’m not involved in Julia development or releases, but it certainly sounds to me like it would be useful to have a more systematic and automated way of monitoring actual performance impacts in the package ecosystem when making changes to Julia. However, someone has to do the work of setting it up and making decisions about all the finer points of how it’s going to work.

1 Like

Y’all really like wholecloth divining out what you think people should be doing and how that should work.

Throughout this thread I (and others) have been trying to describe the status quo. I’m really not trying to argue for or against any particular prescription — the world’s your oyster there — just that it’s helpful to first describe the status quo and work from there. Especially so when it’s other people doing the work.

Performance is checked for regressions automatedly and regularly. There does exist something like PkgBenchmarks that complements PkgEval — it’s called BaseBenchmarks.jl. It’s also run by nanosoldier. There’s an infrastructure to compare between commits and releases. Here’s an example. It is pretty focused micro-benchmarks, but there’s an open issue (and two work-in-progress PRs) trying to figure out how to test some packages and compilation time better. It’s hard, though, particularly since typical CI environments are quite noisy.

9 Likes

Howdy pardner! That’s me :index_pointing_up: I don’t think anyone has harped on the existence of PkgEval more than me here.

As you say, BaseBenchmarks.jl is focused on microbenchmarks, and the open issues/PRs on compilation. It doesn’t address benchmarks of representative package usage, which is on @uwestoehr’s wishlist. So in my last post, I allowed myself to think out loud about the possibilities and challenges of hypothetically automating that, and it looks like we arrive at the same conclusion:

No prescriptive intentions. Apologies for the missing citation of related work.

A discussion needs a basis. I think basis is that there is room for improvement. There are examples where things can be made better. And I started the discussion on how to do this.

I like discussions like this: after experiences and opinions are exchanged, the status quo is clear, people make proposals, the different proposals are evaluated, people think about them, refine the proposals (no matter if own or others), come up with new ones or retract their proposals later on. Every proposal is treated as something to consider, even when one might roll his eyes because others should not fear to make own proposals.
It is OK, and also normal to refuse a proposal as long as there is another or better one. It is however, not good to state a proposal is not necessary because nothing needs to be changed basically. This would imply there is nothing to improve.

Focusing on the status quo is understandable but it can end in blamings. In a company, a typical argument would be “There was an issue. Why do we have it at all and who can be blamed for it? - The person who implemented the status quo and those who did not change it already”. I had such discussions too often and they only lead to frustrations. Therefore the status quo is as it is, nobody is accused. The focus should be how to improve things, have a better QA AND also fun and a good feeling.

I think the basis of an OpenSource project is to appreciate every volunteer work that is done. Therefore nobody should feel attacked by discussions to change things. It is not about the current or past work, but the future.

Also, to encourage people to make proposals, the point who will do what work should be decided late in the proposal evaluation otherwise one can block himself. For example, for some it seems clear that the release is done by coders and coders are usually busy and fear to get more work. But a release manager can also really just be a manager, meaning no coder. Many companies work this way on purpose. The person or team to launch e.g. a medical engineering product is usually no engineer, no scientist, no doctor but e.g. a sales manager because his job is to connect the scientist with the marketing, the security paper work guys with the state officials, he has to organize a press conference, write or review the release announcement, use his contacts to the press to get the product mentioned there, organize that the social media and video channels will have content at the release date.
And software is not so different. From my experience doing a release was:

  • checking the list of new features on different computers (organizing others to do tests as well) the collecting bugs and open issues
  • labeling issues for importance and communicate important ones to the forum to get more persons working on them
  • contacting IT journalists, speak in advance with YouTubers
  • a lot of calls/meetings to teamup different people, translators, developers
  • writing and fine-tuning the announcement texts
  • encouraging in the forum people to test the new features and also the installation process
  • preparing graphics and animations to give YouTubers and bloggers something
  • updating the docs and assuring that the new features are well documented and, if possible, also already translated

Maybe there are persons around that are good at this and have fun to do such work.
For example my proposal include a teamup between the release team and the developers of Tier 1 packages - that is a task perfect for a manager.

Maybe I think out of the box. But why not :wink:. And in the end there is no hurry. I hope that other proposals are made and maybe there is also a discussion at the JuliaCon.

1 Like

A lot of that list seems appropriate for software applications like FreeCAD, not a programming language. To engage in meta-discussion for a bit, you ran into real and acknowledged shortcomings of the language (juliac) and the ecosystem (PackageCompiler) for something fairly significant (shared libraries and executables, especially of a reasonable size for integration). This has only been tolerable so far because a REPL workflow is well established for many usages (by other languages, Julia can’t take credit for this), and interop (JuliaC/C++, JuliaPython) has been usable even if less-than-ideal. But among the good or at least debatable suggestions, you made many demands that are too specifically informed by your FreeCAD development experience. Animations for YouTubers are just not nearly as useful for base Julia as they are for 3D modellers, for example, you’ll need to get into far more specific libraries e.g. Makie.jl for text and still images to become less efficient explanations than even simple animation loops, let alone videos with commentary and ads. It’s unfortunate that these topics ended up primarily explaining why many of the suggestions won’t improve development compared to what is done now, rather than the few suggestions that can make a difference (which they have in PackageCompiler, though the problem needs a whole lot more work to be resolved to anyone’s satisfaction).

3 Likes