Xkcd Scientist Tech Help


Having to “digitize Kaplan-Meier curves” was a running joke where I worked, but whenever sources (whether journal articles or websites) provide data only in graphical form, there’s not much else you can do.


More and more journals provide vector graphics, which is almost raw data (i wonder if there is dedicated svg2csv parser out there)

This is every single meeting I go to. I’ve literally been asked to take an image of a brain and put it in an excel file.


In undergrad. I had a powerpoint of a bunch of images of spectra saved, but forgot the raw files on a thumb drive left in the building. I literally had to write a program to turn pixel traces into numeric values. The error rate wasn’t too bad, and I got the report done on time but - was a not fun at all.

Reminds me of this: https://www.improbable.com/airchives/paperair/volume1/v1i2/Xerox%20Enlargement%20Microscopy-AIR-vol1-no2.pdf .

Overall this XKCD is hilarious and so true. I live/work smack dab in the middle of ML and Science, in a field blending the two long before it was “cool”. Training NN’s with ODE’s psh that’s 1992 stuff… Anyways, so many people in ML think what scientists want is like some super algorithm to solve all problems ever. Really, ML is almost always a last last resort for anything. A lot of the classical techniques for routine analytic work, have great efficacy and if implemented reasonably, are not cumbersome at all and can even be completely automated… Also they tend to have great guarantees(error bounds, reproducible, linearity, etc) with good descriptions. It’s not flashy or headline worthy, but it’s honest, and completely reliable barring serious user error.

On the flip side, you’ll see people in science doing some of the most asinine things out of pure superstition. “I use a slide rule because I’ve always used a slide rule” - “Sir there’s no need to do all that math without pencil and paper, you know a lot of people do this sort of thing on computers”. okay that was a hyperbole, but not far from what I’m getting at.

Fun to be had on both sides. There’s even fun to be had in the name of the people like myself, who are foolish enough to be in the middle of it all…


What scientists think they need: MATLAB, Python, R, C, C++
What scientists actually need: Julia


I’ve used WebPlotDigitizer in a bind before. Works great! haha


What scientists actually need: Julia :julia: with XLSX.jl package to read and write Excel spreadsheet files.

question: What is the best method injecting Julia generated images (png) to Excel file?

Dang nice - don’t think this was around back when I was suffering from the issue. Solid tip I’ll pass it along to other lost souls!

As someone who once used WebPlotDigitizer to read an entire Ternary Phase Diagram, I can say that there is still room for better tooling in this space.

1 Like

The OP posted a joke. Those of us who actually have to access data that people have published in journals only in graph form perceive it slightly differently.


would be a fun Stipple.jl project!

“In a bind”

Lol. I have to use it routinely for getting data to fit models we use in real papers and reports. I still want to cry every time.

Bit of a tale from me… I am an experimental high energy physicist. LEP era.
When I was a grad student I produced histogram plots using a line printer - the HBOOK package would make plots using the characters such as | and *
Many people at the time produced theses by hiring a typist and cutting and pasting histogram plots into the pages. Cutting with scissors and pasting with paste…

I had a fancy VAX workstation and TeX software - I actually used a package called PhyTEX.
I spent literally months, if not years, preparing figures in Postscript to embed in the lovely typeset pages of my thesis. the nearest laser printer was miles away in another lab so I used to get the proofs mailed over! Also colour plots were done on another printer and merged.

The moral here - looking back I spent far too much time making lovely text and figures.
I really should have got a typist and got the work submitted.
The science is the important part.


My 1st scientific work was compiling data from various experiments from the 1960s on. Most publications were only available on paper in the libraries and my co-worker and myself had to photocopy the graphs in the papers and read the numbers by hand. Today I would probably use a more sophisticated method for this), but we actually got it published in PhysRevD :slight_smile:


WebPlotDigitizer is the fundamental force that powers modern science, and I’m pleased that webcomics are finally starting to recognise that fact :laughing: