Converting a Vector{Vector{String}} to Dict{String, String}

Cortexture · December 17, 2023, 12:06pm

using Pipe

text_vector = open("../data/exploring_the_wonders_of_nature.txt") |> readlines
words = replace(text_vector[1],
                  "," => "", "." => "",
                  "—" => " ") |> split .|> String
cmu_dataset = open("../data/cmudict-0.7b") |> readlines
entries_vector = @pipe split.(cmu_dataset, "  ") |> map.(String, _)

Trying to convert words to their ARPAbet equivalents using the CMU Pronunciation Dictionary. entries_vector is the Vector{Vector{String}}. A vector of words, then a vector with the word and it’s ARPAbet equivalent. Would like to create a dictionary with the word as a key, and it’s ARPAbet value, as value.
I am lost.

Cortexture · December 17, 2023, 12:11pm

I came up with this solution, but I would like a more elegant solution. A more functional solution.

mykeys = Vector{String}()
myvalues = Vector{String}()
for entry ∈ entries_vector
    push!(mykeys, entry[1])
    push!(myvalues, entry[2])
end
entries = Dict(mykeys .=> myvalues)

DNF · December 17, 2023, 12:37pm

Then the first vector is redundant, right? You just need the second? But it’s not clear how it’s organized. Can you cut and paste a bit of the data, a few entries, so we can see the expected input and output?

TheLateKronos · December 17, 2023, 12:59pm

Does
Dict([entries_vector[1][i] => entries_vector[2][i] for i in eachindex(entries_vector[1])])
Do what you want? I am on my phone, so I can not test it. I have made a strong assumption abou the two vectors in entries_vector being of equal length.

DNF · December 17, 2023, 2:55pm

Remove the square brackets, they create an intermediate, and redundant, vector, before conversion to Dict.

Seif_Shebl · December 17, 2023, 4:42pm

Or simply:

Dict(i => j for (i,j) in entries_vector)

Since the first and second entries of each inner vector should always be a word and its ARPAbet pronunciation, respectively.

TheLateKronos · December 18, 2023, 4:32pm

It actually appears that an iterator over 2-tuples works as well. So

Dict(zip(entries_vector...))

Is quite consise and readable, and should do the same thing without a comprehension
Disclaimer: On phone, not tested.

Topic		Replies	Views
Convert array to dictionary New to Julia	16	5419	September 26, 2018
Convert a matrix to a dictionary General Usage	2	630	July 7, 2022
Conversion of an array of tuples into a dictionary New to Julia tuple , dictionary , convert	4	2713	September 17, 2021
Is there a way I can identity Dict just like identity Vector? New to Julia dictionaries	3	358	May 30, 2022
Apply dictionary (Dict) to many keys at once (broadcast) General Usage question , broadcast , dictionary	3	1059	October 13, 2022

Converting a Vector{Vector{String}} to Dict{String, String}

Related topics