Interesting Paper on Pain-Points in Computational Notebooks

I found an interesting paper:

What’s Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities

Souti Chattopadhyay, Austin Z. Henley, Ishita Prasad, Anita Sarma, and Titus Barik.

CHI '20: ACM SIGCHI Conference on Human Factors in Computing Systems, Honolulu, Hawaii.

https://austinhenley.com/pubs/Chattopadhyay2020CHI_NotebookPainpoints.pdf

The paper doesn’t mention Pluto - it reports the experiences of a sample of the author’s colleagues at Microsoft, mainly with Juypiter. But some of the pain points are things Pluto handles very well now:

Writing code. Having to write code—particularly due to lack of code assistance—is something that IP7 “hated the most” about working in notebooks. To be efficient, they had “to know all the function names and class names correctly and have another browser open to search for help and documentation” (IP7, FP2). Coding in notebooks is even more difficult using new libraries since it’s not possible “to explore the API and functions” from within the notebook (IP8). Practically, IP8 argues, “anyone who tries to use notebooks has to start off with an IDE and then graduate into a notebook.”

Managing dependencies. Having to manage packages and library dependencies within the notebook is, to put it mildly, a “dependency hell” (IP7). Notebooks provide little-to-no- support for finding, removing, updating, or identifying depre- cated packages (IP3, IP7, IP9, IP11, IP12, IP13, IP15). Often, discovering what packages are even installed isn’t accessible from the notebook environment, requiring data scientists to plod over to their command-line terminal and use commands like conda and pip to manage their environment (IP3).

Versioning. There’s “a lot of room for improvement when we want to check notebooks into source control, such as being able to visualize the differences between the last version and the new version” (IP3). Using traditional versioning mechanisms intended for source code are “just a complete and utter failure” […] when versioning notebooks […] “because all the outputs are saved within the notebook, there’s a lot of state that’s bundled in the file.” In traditional source control systems, all of these changes appear as spurious differences, making it difficult to identify the actual changes between the notebooks—“there’s just a a lot of mess”

Other pain points mentioned: scaling to big data, sharing collaboratively, security.

Graham

6 Likes

The Jetbrain’s Dataspell FAQ (bottom of page) suggests that they’ll look into support other languages once they’re sailing smooth. And they specifically mention Julia.
I would pay so much for that.

I grew up on Mathematica notebooks – and love the format. But the tools of a proper IDE are just so valuable now that I avoid using notebooks for any code that involves thought – if I need I’ll write something in an IDE and copy to notebook.

Really hope Dataspell takes off and embraces Julia.

I think Dataspell will not be available in a community edition.

I haven’t looked at this closely but how different is it from ipynb support that’s now in the Julia VSCode extension?