Performance of using strings as keys in Dict (vs. integers)

e3c6 · July 18, 2017, 12:39pm

I have a large Dictionary, and currently I am using strings as keys. I am getting bad performance, which could be related to this (but I am not sure). I am wondering if it makes sense to rewrite the code so that the Dictionaries use Ints as keys, instead of Strings.

I could just have to keep track somewhere else of the integer indexing each string, but this I would only use to print some stuff to the user. The strings are actually irrelevant to the code logic.

Before I rewrite my code (could take some time), I would like some advice. Could this have an impact on performance?

jtravs · July 18, 2017, 1:04pm

Have you tried profiling?
https://docs.julialang.org/en/latest/manual/profile/

e3c6 · July 18, 2017, 1:32pm

If I profile the code, how will that tell me if changing to Int keys will improve performance?

jtravs · July 18, 2017, 2:47pm

Well, the profiling will tell you how much time is taken up accessing the Dicts (presumably your code does something else too?). If it is a small percentage already, then switching to Ints will have no effect.

jpsamaroo · July 18, 2017, 9:28pm

In my testing for a package that I’m working on (which uses Dict{String,X} as a way to associate a String UUID with whatever X is), using strings of length 16 performed about as well as using a UInt64, which was the alternative that I was comparing against. I was mostly iterating over all the key/value pairs in the dictionary, not doing much else.

I think this only worked well, however, because the strings fit nicely into SIMD lanes, so they can be quickly compared and such. If you have a mix of different string lengths, and/or they don’t fall on a power-of-two boundary, then you may get worse results. But please take what I say with a grain of salt, I’m no expert on how strings are processed in Julia. Profiling and benchmarking a few operations that are common in your application/package would be the way to go to.

ScottPJones · July 19, 2017, 4:17pm

What versions of Julia have you been testing on?

alfred · July 23, 2017, 11:52pm

Hello all,

I am having the same problem reported by cossio. Basically 100% of my codes are related to string handling, dictionaries, hashs, and so on. My dicts are huge (really big) and I must use a string content as keys. Aside of that, I made use of a plenty number of fuzzy and string distance functions always readling and writing zillion number of CSV or TSV files.

I read at this polemic thread: https://www.reddit.com/r/Julia/comments/629qkz/about_a_year_ago_an_article_titled_giving_up_on/
that string handling and text formatting is one of the areas where Julia is been improved, and I am very happy with that, mainly because I loved the language.

Given the core business of my company, I managed a lot of projects related to string data frames in the last years. We made a lot of tests with new languages to find out the best option for us aside Python.

Although I am in love with Julia, I must confess her performance is not so good against APIs like Pandas, or NimData - A DataFrame written in Nim, or Kniren - A DataFrame and data wrangling in Go.

Anyway, for sure, I will continue to support the language and spread the word with my fellow colleagues. I am sure the upcomming Julias will be improved in the string handling area.

ChrisRackauckas · July 24, 2017, 12:07am

That’s not the languages, those are libraries built on the languages. That’s very different. Besides, Julia does have stuff like Pandas.jl if you really need the features and performance of Pandas right now.

I think the problem here isn’t strings but that dictionaries leave a lot of performance behind. I don’t know how that compares to other languages though.

cstjean · July 24, 2017, 1:34pm

If you can post an example of slow code (with comparison to Python), I’m sure that people here will be happy to help.

Topic		Replies	Views
Char vs. String for Dict key New to Julia	6	1891	August 11, 2017
Performance of IdDict vs Dict Performance	13	1459	May 19, 2023
Using `DataType` as key in `Dict` General Usage	10	708	October 23, 2020
Poor time performance on Dict? Performance	26	19101	March 12, 2018
Performance issues when working with dict Performance dictionary	11	1697	November 16, 2022

Performance of using strings as keys in Dict (vs. integers)

Related topics