Yes, Python is Slow and I Don't Care

performance
python

#1

https://hackernoon.com/yes-python-is-slow-and-i-dont-care-13763980b5a1

I found this very perceptive article in my Twitter feed. The autohor discusses ‘Time to Market’. In the engineering and scientific sphere, Panasas have the phrase “Time to Solution” which I think is very appropriate.
The game is about how fast you can get an overall workflow, or an overall solution to produce results.
I used to work in Formula One. there is a race every two weeks during the competition season. It is no use running a silulation which produces results in more than three or four days. By the time the results are analysed, you then probably have to correlate int he wind tunnel. then if accepted new parts have to be made in carbon fibre and then shipped out to the track. they have to arrive at the tack on the Friday of a race if they are to be used in practice/


#2

A couple of remarks:

  1. AFAICT the context for the author is network services, and the argument is that the real bottleneck is the network. This is vastly different from most of scientific computing, where the bottleneck is the CPU/memory (there are of course networked parallel applications, but they involve low latency fast networks, and they are networked because the CPU is still the bottleneck).

  2. Nevertheless, the trade-off between programmer and CPU time is well-known and, while very relevant, there is nothing new here. Just that for Julia, the boundary is shifted out: you can be faster with the same amount of work, or insanely faster with even more work, or the same speed with less work.

  3. All the advantages of Python mentioned in the article apply to Julia (well, the platonic ideal Julia is striving for and will hopefully approximate a few months after 1.0 is released). But one has to make fewer sacrifices in terms of speed. This is the whole point.


#3

Perhaps we will not agree with this point “Under this logic, you could say that choosing a language for your application simply because its “fast” is the ultimate form of premature optimization. Your choosing something supposedly fast without measuring, without understanding where the bottleneck is going to be.”

th enext section concerns Optimizing Python and discusses introducing Cython to optimise the critical parts of the code. This is of course one of the rationales for Julia - you don’t need to start doing this.

I don’t agree with everything which is said in this article, however it is very much worth reading. If Julia is the tool which can enable you to have a fast time to market, or a time to solution, or a time to diagnosis then that is what you need to deploy.


#4

I bristle at this sort of thing, as I was basically introduced to python twice under the mantra that speed is irrelevant and quickly found that the slowness was unacceptable. The first time was in an academic context, where I very quickly found that python was only useful for scripting “glue” code (and I have to say at the time, I still frequently could not see how that made my life any easier than just writing all of my code in C++). More recently, when starting work as a data scientist, all I heard everywhere was “Oh we use this thing called Python and it’s great and super slow and we don’t care we love it.” Lies. If I can’t iterate over data with an arbitrary piece of code without worrying about the performance, it’s too slow. I don’t want to dig through the documentation of the really very opaque pandas every time I want to do a trivial thing. There’s no reason why I should worry about the mind numbing slowness of doing stupid like figuring out the day of the week of 10^7 different dates, I should be able to just do it. Even worse, trying to pretend that there are no such things as data types in data science is completely absurd.

This article proudly proclaims that performance doesn’t matter. Indeed, there are applications where that is the case, and I would agree, Python can be very nice for those applications. I think Python is a wonderful alternative to bash scripts (which I despise), and that’s one of my favorite uses for it (that really is a good place for python, for instance Julia is bad for this because that is really one of the instances in which you don’t want to wait for anything to compile). I’d also enjoy using Python to, for instance, build plugins for vim. I haven’t done anything to do with network services, but if somebody who does that for a living told me that Python was great for that, I would be happy to take their word for it.

That said, shouldn’t we at least be bothered by the fact that even in places where python is appropriate, it is so largely by historical accident, and there is no real need for it to be so slow?


#5

I am glad to see thaty I have triggered some debate
@Tamas_Papp - I absolutely agree. The article focuses on network services, where the bottleneck will be the network not the code being optimized or not.

@ExpandingMan - strangely enough my taks for today is to set up disks for a testbed for IBM SpectrumScale storage.
I hav edeliberatley chosen to weite a bash script to do the setup, rather than Python. The taks involves repetitively setting up disk partitions in the same fashion, and outputting the configuration for each partition to a text file which will then be read by the configuration utility. I find it is more natural to us ebash to repetively create output lines like that!