This is being reported by the media as a failure with Excel.
It seems more likely it is a problem with CSV Format files being used for data exchange.
Should have been using Julia of course!
How does a file format relate (CSV) to use of a programming language (Julia) in this case?
Edit: in case your point was that CSV is a horrible way to store and exchange data then I fully agree But given that Excel is apparently used in the story (and lots of other places) Iâm genuinely curious what role you see for Julia?
Hi Paul. There is a lot of fuss about this in the media in the UK (dont know if you are located there). I guess I am just flagging this up - first we see epidemiological models being front page news. Now data analysis.
From the Register article they say the problem was an old version of excel which could not cope with many rows of data.
I guess my remark about Julia was a bit flippant. Again in the UK there is outcry about why are there not more modern methods being used. I guess Julia is modern!
Before you say it, I am well aware that every company in the world runs on Excel and could not function without it.
I definitely noticed this particular bit of news here in The Netherlands Technological progress is usually hampered by the least âmodernâ link in the chain that canât (or wonât) be upgraded. But then again, folks also choose particular solutions because it works for them right now, instead of one that is more modern but for which they have to wait to become more adopted.
I find the posts on this forum about multi-threaded CSV reading and writing a nice symptom of that situation. I recently worked on a project that used text-encoded files (basically CSV) of several GBs each, which could easily be stored in a binary encoded format (parquet) of around 325 MB a piece. A great step forward in terms of storage, but also I/O performance and easier querying of subsets. But Iâm afraid CSV and the like will be around for a long time.
Itâs also partly about training people to use different (more powerful) tools, like Julia and the things built on top of it. But Excel, or spreadsheets in general, are actually a good fit for certain types of data and workflows, plus it is fairly easy to learn. So thereâs also a tradeoff in tool complexity versus benefit versus skills.
Going a bit off topic⌠Ilived in Eindhoven and worked with ASML.
Sadly I could not introduce Julia there.
Regarding file formats, they had lots of images stored in directories⌠hundreds or thousands. Again I would like to see that in HDF5 or a similar format⌠but ahh wellâŚ
Please contribute to my thread in Community and tell us what interesting things you are doing with Julia
I would be willing to bet that in 99% of all scenarios like this, someone at some point questioned the use of Excel, and was told that it will not be replaced because âit worksâ.
@Tamas_Papp yes indeed. From the comment in the Register article:
From my understanding it was reported right up to the top so that there would be absolutely no misunderstanding that Excel should not be used. The heads of development at NHSX told PHE exactly what would happen if they used a spreadsheet system and sure enough it happened.
This quote from the article is even better (or worse):
A Reg source confirmed widespread use of the spreadsheet software as âhuman middlewareâ in the sector, scathingly describing it as the âdefault for all tech in all of the NHS and related quangos and other bodies⌠to bridge all the gaps that the âproperâ tech hasnât been designed to cope with.â
I think this quote from this BBC article sums it up nicely:
âExcel was always meant for people mucking around with a bunch of data for their small company to see what it looked like,â commented Prof Jon Crowcroft from the University of Cambridge.
"And then when you need to do something more serious, you build something bespoke that works - thereâs dozens of other things you could do.
âBut you wouldnât use XLS. Nobody would start with that.â
Also:
To handle the problem, PHE is now breaking down the test result data into smaller batches to create a larger number of Excel templates. That should ensure none hit their cap.
But insiders acknowledge that the current clunky system needs to be replaced by something more advanced that excludes Excel, as soon as possible.
Which cap?
Was the line limit (roughly 1 m rows) of Excel exceeded?
No, news say 65000 limit of older Excel (file format) exceeded. Iâm guessing the limit is 65536âŚ
That seems strange to me.
In Excel 2007 that row limit was increased to 1,048,576.
So either the use incredibly old software (which I donât think), or the used an *.xls file (which makes me wonder how that could happenâŚ?).
Excel actually gives you a warning, when you enter more than 65536 rows in a xls file and try to save it.
I guess somewhere down the line there was one supervisor who didnât have an updated version of MS Office and didnât want to update their setup, so everyone else was forced to use an old format.
They have a CSV to Excel ââfile formatââ pipeline, and they might not have used Excel at all for it (speculating, could have used a software library). The export would specify the old format (possible even new Excel, not just software libraries). By restricting you to the old format (because of some software), doesnât mean you canât open it in newer Excel.
They may well have simply have used end-of-life Excel version⌠for exporting, or importing (or werenât sure all have upgraded, so using that as justification for using the old format).
Going a bit off topic⌠Ilived in Eindhoven and worked with ASML.
Sadly I could not introduce Julia there.
Also off-topic, but regarding this remark, have you noticed this presentation at JuliaCon 2020?
https://live.juliacon.org/talk/FHEGUA
@Klaas_Pauly I did not see that presentation. I shall watch it!
IF anyone knows Jorgeâs handle on here or contact details please let me know.
Staying off topic, Matlab when run in parallel has a bad habit of leaving processes runnign on the worker systems. You either have to regularly terminate orphaned processes, or do what I did and implement cgroups so thet are killed when the job terminates.