While it’s possible (though IMO relatively unlikely) that there’s stuff like what Emery discussed in the video going on (I didn’t click your link because I already watched the video last year), I feel it’s important to note that just because you didn’t do anything consciously on your machine, doesn’t mean your machine wasn’t doing things in the background.
Ultimately though, yeah the way benchmarking is done in julia does have the potential for large systematic errors I agree, and it’d be great if we got better at characterizing or eliminating those errors as they can sometimes mislead people in the optimization process, I think there’s also some quite good evidence that we’re not being totally mislead by our benchmarks as evidenced by the general portability of a lot of the benchmarking results people share even across architectures (and in the best cases, there there’s some additional human reasoning going on that’s guiding the optimization process instead of just alchemy).
Yes, if someone shows you that they managed to shave 5% off a micro-benchmark, there’s always the possibility that that’s a totally non-reproducible quirk of their benchmarking process, but once we start getting into order of magnitude improvements, especially by iteratively layering improvements, I get a lot less antsy.
This is all quite relevant though to how we try and teach people to optimize code. There’s a ton of alchemy that starts happening out there where newbies start trying to learn from what experts are doing, and the main thing they learn is that they should sprinkle about magic macros like @inbounds
, @simd
, @inline
, etc. instead of actually thinking about what the code is doing and if there’s a way to do that faster.