I’m trying to get the character n-grams for a list of words. For example:
julia> s = "apple"
julia> s=reduce(vcat, permutedims.(collect.(s)))
['a', 'p', 'p', 'l', 'e']
julia> ngram(s,n) = join([view(s,i:i+n-1) for i=1:length(s)-n+1])
julia> ngram(s,2)
"['a','p'],['p','p'],['p','l'],['l','e']"
If I have a list, let’s say:
julia> s = ["apple"
"orange"
"pear"
"honeycrisp apple"
];
#split_char here is equivalent to 2nd equation in above example
julia> function split_char(s)
s= collect.(s)
end
s
end
julia> split_char(s)
[['a', 'p', 'p', 'l', 'e'], ['o', 'r', 'a', 'n', 'g', 'e'], ['p', 'e', 'a', 'r'], ['h', 'o', 'n', 'e', 'y', 'c', 'r', 'i', 's', 'p', ' ', 'a', 'p', 'p', 'l',
Then passing it through this n-gram function is where nothing really happens:
julia> function ngram(s,n)
s= [view(s[i],i:i+n-1) for i=1:length(s)-n+1]
end
My goal is to get this as a result:
> julia> "['a','p'],['p','p'],['p','l'],['l','e'], ['o','r'],['r','a'],['a','n'],['n','g'],['g','e'], ......]