Char vs. String for Dict key

RemyX · August 11, 2017, 2:17pm

I’m getting to know Julia and found some quick little exercises at exercism.io (http://exercism.io/languages/julia/exercises). In the Nucleotide Count exercise, based on the runtests.jl file, the preferred Dict structure is:
Dict(‘A’ => 0, ‘C’ => 0, ‘G’ => 0, ‘T’ => 0)

Is this preferred to String based keys? i.e. Dict(“A”=>0).

My first solution did this for String keys (trimmed slightly). count is the Dict, myStr the input string.
for i in myStr count[“$i”] += 1 end

This failed the test because of the data type of the key. So I changed to using indexes on the input string, which worked as well, but felt less elegant somehow. I realize it’s quite subjective.
for i in 1:length(myStr) count[myStr[i]] += 1 end

Is there a reason to choose one over the other, aside from passing the supplied runtest.jl? Style, idiom, performance?

Thanks.

mauro3 · August 11, 2017, 2:29pm

Most idiomatic, I think, is to use symbols:

Dict(:A => 0, :C => 0, :G => 0, :T => 0)

although for this particular exercise that might not be an option.

I think chars are immutable and strings are not, so they should be more performant. I would think symbols are also performant. Why don’t you benchmark your variants using Benchmarktools.jl?

RemyX · August 11, 2017, 2:49pm

Thanks for the reply. I played around with converting a String to a Symbol and realized that iterating over a string creates a Char. I’m not sure how I missed that before. The following works just fine for the purposes of the provided exercise.
for i in str1 if !haskey(count,i) throw(ErrorException(“Key not found”)) end count[i] += 1 end
On a side note, figuring out how to get from String to Symbol to Char and back is entertaining.

Thanks for your recommendation of Benchmarktools, I’ll take a look at it.

mauro3 · August 11, 2017, 3:06pm

(Note you can use triple backticks to typeset whole blocks of code at once)

stevengj · August 11, 2017, 3:32pm

Note that String is generally a more heavyweight object than a Char. A String combines a length and an an array of (UTF8-encoded) characters (though technically it no longer uses an Array), whereas a Char is just one number (that can fit in a single CPU register). Operations on strings will typically be more expensive than operations on chars, e.g.

julia> k = "A"; @btime hash($k);
  10.671 ns (0 allocations: 0 bytes)

julia> k = 'A'; @btime hash($k);
  3.790 ns (0 allocations: 0 bytes)

julia> k = "A"; @btime isequal($k,$k);
  5.041 ns (0 allocations: 0 bytes)

julia> k = 'A'; @btime isequal($k,$k);
  1.893 ns (0 allocations: 0 bytes)

mauro3 · August 11, 2017, 3:46pm

And symbols are as fast as chars:

julia> @btime isequal(:a,:a);                                                                                                                         
  1.689 ns (0 allocations: 0 bytes)                                                                                                                   
                                                                                                                                                      
julia> @btime isequal('k', 'c');                                                                                                                      
  1.688 ns (0 allocations: 0 bytes)                                                                                                                   
                                                                                                                                                      
julia> @btime isequal("k", "v");                                                                                                                      
  5.033 ns (0 allocations: 0 bytes)

stevengj · August 11, 2017, 4:18pm

Not for all operations. e.g. hash and isless are both slower for symbols.

Topic		Replies	Views
Performance of using strings as keys in Dict (vs. integers) General Usage question	8	4781	July 24, 2017
Using `DataType` as key in `Dict` General Usage	10	707	October 23, 2020
I am probably missing something very obvious. Can someone help me understand this? General Usage	1	366	February 13, 2021
How to count all unique character frequency in a string? New to Julia question , statistics , strings	25	12105	January 8, 2019
What is the fastest data type out of these? General Usage question	18	1240	October 1, 2018

Char vs. String for Dict key

Related topics