My goal is to compare Julia’s performance against other languages like Python, Scala and Rust to perform some simple tasks. My first task was to sort an array of 999999 integers read from a text file.
The code below runs at a similar time in Python and in julia
import time
st = time.time()
f = open("random_numbers.txt", "r")
lines = f.readlines()
numbers = list(map(lambda x : int(x.strip()), lines))
numbers.sort()
numbers.reverse()
et = time.time()
elapsed_time = et - st
print('Execution time:', elapsed_time, 'seconds')
But when I add a print in the codes there is a significant performance discrepancy between Julia and Python
import time
st = time.time()
f = open("random_numbers.txt", "r")
lines = f.readlines()
numbers = list(map(lambda x : int(x.strip()), lines))
numbers.sort()
numbers.reverse()
for n in numbers:
print(n)
f.close()
et = time.time()
elapsed_time = et - st
print('Execution time:', elapsed_time, 'seconds')
It’s all about 8 seconds for both Python and Julia, but I don’t know what you’re measuring at this point: reading from file? parsing? sorting? reversing? printing? This is getting all very confusing.
This is why mixing up multiple things together isn’t helpful: the fact that printing might be slow isn’t surpring at all, and trying to optimise multiple things together when there’s a single significant bottleneck is a waste of time.
Keep in mind that you are comparing Python’s TimSort implemented in C, against Julia’s default QuickSort. Which are not only different algorithms but also C+Python’s overhead is compared against pure Julia solution. Fair comparison would be implementing this algorithm in Python and than comparing. It’s just silly otherwise, because you basically start another program from Python to make a claim about python. Julia can also call C routines.
What i am saying is that conclusion is false, regardless of what defaults people would use. It’s the same as calling C library form Julia in an attempt to benchmark Julia’s speed. That’s just dishonest benchmark regardless of what defaults are. Julia is good at certain things, so is Python but calling another software, leave alone another algorithm to say something about language is wrong. He could compare algorithms by C call from one of languages, if he has wanted to compare algorithms. He could compare implementation in these two languages if he wanted to compare languages. Julia has all sorts of sorting algorithms, so does Python, but timing how things are dispatched carries almost no information about how fast Python is. It’s missleading information to someone who is trying to learn new language for example.
I disagree with this. It’s not unreasonable to expect Julia to be faster than the C code python calls, and when bench-marking sorting vs python, the relevant time is the timsort vs the default Julia sort. This isn’t a great sorting benchmark for other reasons (i.e. most of the time is IO), but it is a decent benchmark of doing basic data science in Julia vs python.
To me Julia’s website has already benchmarks done properly, comparing what it claims to be comparing. There C seems to be baseline for most of tests. Why would it be slower ? Just have a look. You typically don’t benchmark numpy calls to tell that python is fast, because the very reason of having numpy in the first place is that Python is absolutely slow. But if you do make such a claim, be honest and say it’s numpy’s speed. Regarding data science, lots of people mean different thing by that, but for applications where sort() call speed matters, is probably not the application you benefit from Python. I use Python a lot but it’s absolutely a terrible tool for applications where speed matters, such as algorithms development. As soon as you want something that is not in toolbox you are screwed. Most of stuff your run in Python is not even Python because authors that prise it so much shy from implementing it in their very own favourite language.
In the end, everything just calls machine instructions. I think it’s very reasonable to measure the “speed” of a language based on how easy it is to write performant code. For many use cases, Python is fast because numpy is fast. I don’t think that’s some kind of “gotcha,” it’s just true.
it’s absolutely a terrible tool for applications where speed matters
well, except for nearly 100% of mainstream deep learning