Converting a Vector{Vector{String}} to Dict{String, String}

using Pipe

text_vector = open("../data/exploring_the_wonders_of_nature.txt") |> readlines
words = replace(text_vector[1],
                  "," => "", "." => "",
                  "—" => " ") |> split .|> String
cmu_dataset = open("../data/cmudict-0.7b") |> readlines
entries_vector = @pipe split.(cmu_dataset, "  ") |> map.(String, _)

Trying to convert words to their ARPAbet equivalents using the CMU Pronunciation Dictionary. entries_vector is the Vector{Vector{String}}. A vector of words, then a vector with the word and it’s ARPAbet equivalent. Would like to create a dictionary with the word as a key, and it’s ARPAbet value, as value.
I am lost.

I came up with this solution, but I would like a more elegant solution. A more functional solution.

mykeys = Vector{String}()
myvalues = Vector{String}()
for entry ∈ entries_vector
    push!(mykeys, entry[1])
    push!(myvalues, entry[2])
end
entries = Dict(mykeys .=> myvalues)

Then the first vector is redundant, right? You just need the second? But it’s not clear how it’s organized. Can you cut and paste a bit of the data, a few entries, so we can see the expected input and output?

1 Like

Does
Dict([entries_vector[1][i] => entries_vector[2][i] for i in eachindex(entries_vector[1])])
Do what you want? I am on my phone, so I can not test it. I have made a strong assumption abou the two vectors in entries_vector being of equal length.

Remove the square brackets, they create an intermediate, and redundant, vector, before conversion to Dict.

Or simply:

Dict(i => j for (i,j) in entries_vector)

Since the first and second entries of each inner vector should always be a word and its ARPAbet pronunciation, respectively.

3 Likes

It actually appears that an iterator over 2-tuples works as well. So

Dict(zip(entries_vector...))

Is quite consise and readable, and should do the same thing without a comprehension
Disclaimer: On phone, not tested.