You are conflating two separate issues here. We are not concerned about the world in this post. We are talking about science and scientific methodology correctly applied to a topic in computer science. There are certainly concepts in the world that are not quantifiable. God is an example. But that is not science because it is neither quantifiable nor reproducible.
Not all scientific experiments are feasible. But the least that can be done is to report all circumstances around a scientific report. A survey of user feelings and experience could be also quantfied. That is what psychologists have done for decades. But once emotions and personal feelings are involved in a scientific assessment, one has to be extremely careful of hidden cognitive biases that could be involved and take them properly into account.
I cannot believe there are at least 16 people (presumably scientists) in this Julia forum who believe science is not necessarily quantifiable or reproducible. This mentality is a danger to science and scientific community. NIH and NSF are currently spending billions of dollars to make existing pseudo scientific reports reproducible.
On the contrary, I think the attitude that
is the real danger. By overly relying on what can be quantified and dismissing everything else, you are biasing your view of reality towards the easily-quantifiable. Not everything that is true can be quantified, and questions that can be answered in an easily quantifiable way are not the only relevant or interesting questions. And just because something canāt be quantified doesnāt mean it cannot be reasoned about or approached in a disciplined manner.
This, also, does not automatically imply āperfectā science (whatever that may mean). Armies and tobacco companies are known to fund a lot of quantifiable and reproducible work.
We are veering off target here. Iām sure most of us (aim to) produce quantifiable and reproducible research. We probably also throw in some theorising and interpretation without saying āthis bit isnāt scienceā.
Regarding benchmarks, yes things need to be quantifiable and reproducible as the OP says.
But as others have said, these easily quantifiable things may not be the most important for choosing a programming language. Like would you choose to watch a movie based on box office success Vs advice from a friend with similar taste, just because the former is attached to a number.
I missed this before.
I think itās more likely that there is a general misunderstanding about the points different parties are trying to make than that members of a forum for a modern programming language donāt believe in reproducible research.
I asked a simple question here, to provide more details on a report. Except for a few, most of the responses were off-topic, from the future of ARM chips, to positivism and philosophy, to even comments questioning the basic tenets of science. I appreciate all of your contributions, but these were not the kind of responses I was hoping to get here. So, to avoid the continuation of off-topic comments here, I will stop following this post and responding to comments any further. Thank you all.
Let me preface this by saying that I agree with @jzrās point that the community should engage in as little sensationalism as possible when drawing comparisons between languages. This is something long-time community members are well aware of, but Iād say thereās enough misguided evangelism from inexperienced or new members that we see an inherent resentment of the language in certain internet circles. Iām not sure what can be done to avoid the Rust scenario of this group poisoning the proverbial well of what is otherwise a pretty positive and supportive community, but thatās a topic for another thread.
With that out of the way, letās talk about computing, reproducibility and software engineering research. For those in other fields, the CS and SE seem like the ideal candidates for replication and easy reproduction. After all, computers and software are more visibly deterministic than most of the biological/physical/chemical/geological processes we observe, right?
Well, kind of. As an example, think of some common control variables when trying to benchmark program performance. e.g. using the same algorithm, software/OS versions, processor architecture, cache access/prewarming, noise from other programs, etc. Now see if any of these made the list:
- The size and number of environment variables in the current execution context (can have a 3x difference!)
- One additional stack allocation (e.g. an extra local variable)
- Swapping the order of two otherwise independent heap allocations
- Where the compiler decides to load certain code or data segments
(Courtesy of the excellent https://dl.acm.org/doi/abs/10.1145/2490301.2451141, talked about in "Performance Matters" by Emery Berger - YouTube)
Note that all of the above are just layout related factors. For a more holistic overview for what must be accounted for in benchmarking and how to do so effectively (i.e. not generate misleading results), see https://dl.acm.org/doi/abs/10.1145/2491894.2464160.
Now one might say this is an incredibly high bar. I agree! The issue at hand is one of perception. We perceive our machines to be relatively consistent execution environments in theory, but the complexity and variety of modern hardware/software means this is wholly untrue in practice. I wonāt even get into areas that incorporate inherent stochasticity (like my home domain of machine learning). The question then follows: why isnāt there a replication crisis in CS and SE? Well, there is, and though less publicized than say the social sciences, it has similar deleterious effects on the trustworthiness of scholarship in both fields.
To return to the main topic of this thread, what should a team who wants to migrate a project to a different language (or a different library, different hardware, etc.) do? They could try to control for as many factors as possible, but how many people are going to put in that time for a real-world project? The only group I see doing so would be those looking to write a software engineering paper, because the end result of such a benchmarking run would be a software engineering paper (and a better than average one at that). They could not talk about anything performance-related at all, but then we get into the territory of what is acceptable speech and how much one can control what others talk about in a formal/semi-formal/informal setting. Perhaps the better compromise would be to emphasize the non-rigour of results and the anecdotal nature of the experiment, but that seems to be covered by the video linked above (which answers the question in the OP but, judging by the most recent responses, unfortunately appears to have been buried).
Luckily enough the project in question is a foss project: Climate Modeling Alliance Ā· GitHub
Thus, if oneās interest is to know what on that project resulted in faster implementations of their models, it is free for research and meta analysis.
I am not sure if the developers will be much interested in analyzing in details specifically why their implementations are faster than the previous ones, as that was not the initial goal, nor may be resumed to a few reasons. It may be well be the result of a multitude of small optimizations.
What may be easier to check is if the claim itself is true, and how the performance of their code compares to other free and open codes available.
I am much more frustrated by claims of performance when they are not accompanied by readily available and well documented codes.
To be fair, it is you who insists on labeling a Medium post about Julia āscienceā asking why it isnāt scientific enough. It is not a scientific article published in an academic journal, but an intro post about Juliaās performance, and as such it is fairly balanced and recounts various pitfalls.
As you found, asking that something that isnāt science conform to āscientific objectivityā generates fairly unfocused conversations. But please donāt blame others for this.
@shahmoradi I thought the experiment was done with Alan Edelman. He might be a good person to consult about the circumstances.
What is the actual complaint here? That an anecdote that was quoted in a blog post wasnāt sufficiently scientifically rigorous?
I think this topic is in two parts. One part is a specific question about the actual facts of the situation reported in Edelmanās story, in order to understand it. My sense is that part of OPās professional work includes quantitatively assessing the state of scientific computing ecosystem.
The other part is a complaint about ābenchmarketingā which I tried to unpack in my comment
Except that some of Juliaās main competitors are other dynamic languages, not statically typed (āfastā) languages. People are used to assuming that dynamic languages are slow, and seem to need continual reassurance that this does not apply to Julia.
it literally is
Exactly Iād say the main alternatives are Python, R, and Matlab. Julia beats them hands down in basically everything speed wise. The only hope those languages have is to call C code.
When it comes to comparisons with Fortran, C, or C++ or similar, the main advantage Julia has is to express the computation in a more advantageous way, which may lead to better algorithms, or benefits from libraries or autodiff, or etc.
Any lightly optimized simple loops are likely to be very close in speed between Julia and Fortran et al. In the end itās all machine code.