Trouble pushing array line by line instead of as one giant list

Hi guys,
First of all, I apologize if this question is not formatted with an immediately runnable MWE. There is a text dependency that I’ve tried my best to reformat an in-code array so to prevent those helping me from having to download and save a text file, but I have failed at being able to format the data correctly. Also, JuliaBox does not play with my code despite it working just fine on my 0.6 machine. Sorry again for this, I really tried.

Anyway, here’s my question. I am extracting all numbers from a file line-by-line (variable a). My code outputs the correct information, but not line by line. As always, I feel as though I am really close to figuring this out (stupidly missing a dot or something.)

I’ve tried setting
push!(phoneticaccents[n],0) in the inner loop, but that doesn’t work.

What currently happens:
phoneticaccents = [1 0 2 2 0 1 0] phonetic information for all words in one list

What I want
phoneticaccents[1,:] = 1 0 2 #phonetic information for word 1
phonetic accents[2,:] = 2 0 1 0 #phonetic information for word 2

a = readdlm("Sphinx.txt")
for n = 1:size(a,1)
    for m = 2:size(a[n,:],1)
        if search(a[n,m],'0') > 0
            push!(phoneticaccents,0)
        end
        if search(a[n,m],'1') > 0
            push!(phoneticaccents,1)
        end
        if search(a[n,m],'2') > 0
            push!(phoneticaccents,2)
        end
    end
end

Here’s part of my Sphinx.txt file

EXCLAMATION-POINT  EH2 K S K L AH0 M EY1 SH AH0 N P OY2 N T
CLOSE-QUOTE  K L OW1 Z K W OW1 T
DOUBLE-QUOTE  D AH1 B AH0 L K W OW1 T
END-OF-QUOTE  EH1 N D AH0 V K W OW1 T
END-QUOTE  EH1 N D K W OW1 T
IN-QUOTES  IH1 N K W OW1 T S
QUOTE  K W OW1 T
UNQUOTE  AH1 N K W OW1 T
HASH-MARK  HH AE1 M AA2 R K
POUND-SIGN  P AW1 N D S AY2 N
SHARP-SIGN  SH AA1 R P S AY2 N
PERCENT  P ER0 S EH1 N T
AMPERSAND  AE1 M P ER0 S AE2 N D
'ALLO  AA2 L OW1

Thanks,
Nakul

Are you upgrading your code to 1.0? If so, use 0.7 instead - it will show deprecation warnings and upgrade to 1.0 afterwards.

I will upgrade eventually, but I can’t upgrade yet because I am using an application that can only directly infer Julia 0.6 code.

Maybe something like (written in Julia 1.0)

function read_data(file)
    a = readdlm(file)
    phoneticaccents = Vector{Int}[]
    for n = 1:size(a,1)
        phonetic_row = Int[]
        col = 1
        while col <= size(a, 2) && !isempty(a[n, col])
            for i in 0:2
                if occursin(string(i), a[n, col])
                    push!(phonetic_row, i)
                end
            end
            col += 1
        end
        push!(phoneticaccents, phonetic_row)
    end
    return phoneticaccents
end
julia> read_data("Sphinx.txt")
14-element Array{Array{Int64,1},1}:
 [2, 0, 1, 0, 2]
 [1, 1]
 [1, 0, 1]
 [1, 0, 1]
 [1, 1]
 [1, 1]
 [1]
 [1, 1]
 [1, 2]
 [1, 2]
 [1, 2]
 [0, 1]
 [1, 0, 2]
 [2, 1]
1 Like

Are you sure that’s what you want? What about the words themselves, don’t you want to store them?

You asked the same question last month, and many other related questions since then. I think you got pretty good advice in your previous topic to use a dictionary, and it looked like you were on the right track. However, the code you present here and in your other recent topic on splitting an array of strings, indicates that things are now moving in the wrong direction, and you seem to be asking questions that were already covered in your previous topics.

At the risk of being rude, I would suggest going back to your previously started topics, and make sure you understand the suggestions you were given, and all the sample code that was given to you (every single expression!). I think that’s one of the more efficient ways of improving as a developer.

4 Likes

I have decided to have the words and accents in two separate arrays because I’m going to be parsing the data separately in a seq-to-seq RNN. I suppose I could have them all in the same array and index accordingly. I presume dictionaries are much faster than using a split function (something I can benchmark). I’ll reinvestigate using dictionaries.

I appreciate the feedback. I do reference the previous threads often and really didn’t mean to make the same thread twice, it was an honest mistake. I am scramming to complete my qualification exams in computer music and I am sort of losing my mind in the process. Upon referencing the old thread with a month’s more of Julia knowledge, I understand now that findall and occursin are Julia 1.0 things, and at the time I was at a total loss for how to use either based on @Tamas_Papp’s suggestions. I’ll keep striving for a more complete understanding of all the expressions.

Thank you, that works great!