How is the correctness of public open-source packages ensured? If a package contains latent errors that people are unaware of, someone relied on it to conduct scientific research, leading to incorrect conclusions, and published academic papers, who will be responsible for it?
This is a deep epistemological hole, but open source packages are mostly a distraction.
All science is based on tools. Open source code, closed source code, microscopes, rulers, previous papers in your field, papers on how to correctly analyze certain types of data etc. Any of these can have flaws which may or may not invalidate a particular paper. For any of them, the answer is the same. if the tool is broken, retract the paper that is no longer correct, and do what you can to fix the tool (or prevent others from using the broken tool) to further expand the forest of human knowledge.
who will be responsible for it?
You are:
We do analyses in cancer research, our lab and the IT is certified after ISO 17025. We use open source and closed source and for both itâs the same: for every step and every tool weâve done verification and validation. With every analysis several test are done to ensure that nothing has changed which alters predefined outcomes since last time.
Why are we doing this for research (which is unusual) ? Because in cancer research real patients are involved, errors in analysis could lead to death.
In general researchers do similar verification and validations, not after some ISO, but according to their needs. All researchers I know are typically very doubtful about their methods and their numbers, doing a lot of cross checking and tests. Nothing more embarrassing for them to need to withdraw a publication.
One might also ask the closely related question:
How is the correctness of closed source software ensured? If a package contains latent errors that people are unaware of, someone relied on it to conduct scientific research, leading to incorrect conclusions, and published academic papers, who will be responsible for it?
The answer is generally the same as for open source softwareâyou generally donât have any legal or financial recourse against the purveyor of commercial software if it gives you a wrong answer. And of course, you also canât look at and check what the software actually does.
I am a great defender of free software, but I am under the impression that for some proprietary software they give some guarantees when you buy it. At least in my country would be very strange to buy something and have no feature guaranteed to work correctly.
I donât think it is as straight forward even in most European countries. Of course software has a purpose and should perform towards fulfilling this purpose, but at the end of the day if there is some error in software such as Ansys, Matlab etc. I believe they have disclaimers to avoid any fault at all.
At the end of the day, software is a tool and a tool can have flaws, similar to a welding machine gun when welding parts together.
From Ansys: https://www.ansys.com/legal/legal-notices
THE SOFTWARE IS WARRANTED, IF AT ALL, ONLY ACCORDING TO THE TERMS OF THE LICENSE AGREEMENT. EXCEPT AS WARRANTED IN THE LICENSE AGREEMENT, ANSYS, Inc. HEREBY DISCLAIMS ALL WARRANTIES AND CONDITIONS WITH REGARD TO THE SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES AND CONDITIONS OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL ANSYS AND/OR ITS RESPECTIVE SUPPLIERS BE LIABLE FOR ANY DIRECT, CONSEQUENTIAL, INCIDENTAL, SPECIAL, PUNITIVE OR OTHER DAMAGES WHATSOEVER (INCLUDING WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS INTERRUPTION, OR LOSS OF BUSINESS INFORMATION), EVEN IF ANSYS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
Ansys is just an example and a widely used software in engineering.
If a significant fault is found, its reputation is of course affected which in turn can affect it sales etc. so this is what keeps companies in line so to say (from my understanding).
I just donât think anyone should use any piece of software assuming that it is faultfree - just check that it does not have flaws in what you want to do.
Kind regards
In general you should ensure that results make sense when using some tool. Any tool, whether itâs software or lab equipment. We recently had an educational scandal here in Norway, where the directorate of education had messed up their analysis of the exam grades over a decade. Because they didnât check that the results made sense, and they didnât fully understand the software they used. Neatly explained by some people with my former employer:
The analyses also reveal the reason why the official figures are wrong. The software the Directorate of Education has used (XCalibre) assumes that the skills each year reflect a standardized skill distribution. It is obviously impossible to detect changes in skills across cohorts if one assumes that the cohorts, in expectation, are exactly the same.
(from the english abstract in https://www.frisch.uio.no/publikasjoner/pdf/2024/Formatert/acta.pdf)
Hehe, a friend of mine doing theoretical quantum chemistry told me many years ago that the commercial chemistry software âGaussianâ did a particular computation slightly wrong. It was well known among professionals, and they typically resorted to other software for these computations. But you had to know, or figure it out, it wasnât stated anywhere.
For the curious, this may change in the future, at least in the EU, with the Cyber Resilience Act which aims to solve exactly this liability question of closed source/commercialized software. The TL;DR is that if you commercialize it, you can be held liable for damages due to fatal flaws that e.g. endanger patients, for example.
For an analysis what this means for open source, see this excellent article:
The TL;DR there is itâs good, exemptions have been added for solo devs, non-commercial open source, non-profit work and the like.
To get reproducible results you need the same exact versions, for all packages AND their dependencies, i.e. the Manifest.toml file, neither earlier, nor newer or latest version.
But when itâs know to be wrong, a later, likely latest version might be ok, and reproducible garbage is NOT wanted⌠And should Julia recommend somehow against versions? Implicitly the latest version is very likely to be best (but may not be!), when that is known and some specific version is known problematic, should Pkg warn for it? I.e. for science⌠not always, but an optional feature to report problematic specific versions?
Itâs still pending and I see:
After publication of the draft proposal, multiple open source organizations criticized CRA for creating a âchilling effect on open source software developmentâ. The European Commission reached political agreement of the CRA on 1 December 2023, after a series of amendments. The revised bill received relief and applause from many open source organizations, and introduced the âopen source stewardâ, a new economic concept. [âŚ] still requires formal adoption by the Council before being enforced. Some open-source parties like the Debian project remain critical of the proposed regulation.[needs update ] As of 6 May 2024, the current English draft is dated 12 March 2024
I donât recall seeing âopen-source software [product]â in (proposed) law before, and reading the blog, it seems Debian is safe from prosecution, i.e. âalmost all actual open source projects should be in the clear. There is however pain for those doing âfauxpen sourceâ projects, or those that are doing regular commercial sales of things that come with source.â
Itâs understandable the EU wants to do something, but I donât see that we will even r have full (nor would I want) warranty for any software, closed or open. Possibly law against doing very bad and known insecure things, like [WiFI] devices with default passwords⌠being outlawed.
This sounds like Thompsonâs trusting trust. Itâs trusting all the way down
In general, in science you are supposed to check and verify every step of your research. If one just uses a piece of software as a black box and proceeds to apply it to an unknown case, there is no guarantee that the result is trustworthy. One should have test cases where there exists an expected correct answer and test your tools.
In this regard, open software, for example community codes have been tested by the community and personally I think they are more trustworthy in comparison to scripts written solo.
Nowadays peer review has been rendered effectively worthless because nobody can check your codes to see if a simulation or analysis was performed correctly. Publishing your code or even better using community codes ensures a bit more, that the results are trustworthy.
Interesting
23 posts were split to a new topic: Arguing about sloppy science
All kinds of software and hardware has bugs.
For example the x86 instruction for sin returns incorrect results.
this has been known for a long time now.
Which is why LLVM doesnât emit a fsin
assembly instruction and instead emits call to libmâs sin
function.
A quick look at some of Juliaâs statistical modules shows that they have simple test cases that are used to validate the results. Since I did not do an extensive survey of what various test directories contain, they will remain unnamed.
Will they work properly in edge cases? Who knows.
This is a benefit. You can see whether or not a particular functionality is tested and how extensively. How well is new functionality in Matlab / Stata / SAS tested? We donât know.
Right, if there are concrete places where packages should have more tests, please just open an issue â or better yet, help contribute those tests directly. Contributing tests is the best way to (a) confirm that the functionality you care about works as you want and (b) ensure that it continues to work through continued development.
Thereâs no need for FUD here â the code itself along with the filed issues and ongoing development work is right out in the open.