Poor time performance on Dict?

Agreed, and I thought this was a fair assumption since the complexity of the code given in the example is as simple as it gets, and thus given the large difference in normalized performance, if they aren’t exactly the same cost underneath the hood in C, I would argue that the differences seem too large to attribute to differences in indexing of arrays.

I wouldn’t argue against you, but the overhead of looping 10^8 items in Python is usually very large. Agreed though, this is all too rough a comparison, however, it does not change the fact that the relative performance is as posted. That is to say, in Python, I can get as efficient looping using both Numpy arrays and Dictionaries. In Julia (as per the example given), I cannot. But I think you’re all getting the wrong idea here that this is supposed to be again a comparison that Python is performing better in any way, I only thought I would get a similar rough relative performance in both languages. If the consensus is that the Julia’s dict performance is reasonable enough, then that’s that.

What Steven and Kristoffer are trying to get at is that this isn’t quite the right mental model here given the asymmetries in the amount of work that gets done. Slightly more accurate would be:

JR = (julia_dict_implementation + julia_call_overhead) / (julia_array_implementation + julia_call_overhead)
PR = (python_dict_implementation + python_call_overhead) / (python_array_implementation + python_call_overhead)

Julia’s call overhead is approximately 0. The python call overhead is definitely nonzero. The array implementations are also really small. The dict implementations actually have to do work. That’s why your ratio is surprising you — you’re not accounting for a constant term.

2 Likes

This statement makes no sense to me. All of your examples show Python code that is significantly slower than optimized Julia code. Your Numpy and Dictionary loops are each inefficient in their own ways (with the Numpy looping code exhibiting vastly more overhead), not equally “efficient.”

(Of course, there are highly optimized libraries that are accessible from Python. But this is not a statement about the language performance, only about the maturity of its libraries, and it says nothing about what will happen when you have to write your own performance-critical code.)

Agreed, this has nothing to do with language performance. It only has to do with the fact that with Julia Arrays, it takes about 0.3 seconds for me to iterate over 10^8 in the way I did, however, using Dict in the way I did, was too slow in comparison. We can remove Python out of the equation as this seems to be a point of tension around here.

EDIT: I only hinted at the fact that this difference was very small in Python compared to the larger difference in Julia, again though, nothing to do with the performance of Python, or Julia in any way or form.

What you are observing is not that Julia dictionaries are slow (as I showed above, with an optimized hash function they are significantly faster than CPython’s dictionary functions called from C), but rather that arrays are extremely fast. This is not surprising, and will also be true in any fast language, because an array is a very simple data structure that corresponds closely to basic CPU instructions, whereas a dictionary is much more complicated (each dictionary access involves many CPU instructions).

The fact that looping over arrays and dictionaries in Python have comparable speed is a consequence of pure Python code (calling fine-grained library functions like accessing individual array elements) being slow, not of its dictionaries being fast.

5 Likes

Thanks for everyone’s input.

1 Like