Grouping RegexMatch

I’m trying to read this file, and find how many time the following words appear:
file:

What's in a name? That which we call a rose
By any other name would smell as sweet! rose :)

Words to be found: rose' and sweet` so I wrote the below code:

fname = "simplefile.txt"
s = read(fname, String)
rx = r"(rose|sweet)s?"
collect(eachmatch(rx, s, overlap = true))

And got the output as:

3-element Array{RegexMatch,1}:
 RegexMatch("rose", 1="rose")  
 RegexMatch("sweet", 1="sweet")
 RegexMatch("rose", 1="rose") 

As noticed the word “rose” appeared twice, and in the collect appeared in 2 different lines, how can I write a code to made the results be something like:

rose => 2 times, 
sweet => 1 time

And what if I need to remove the duplication or multiple appearance, so that the result be something like:

found words: rose, sweet

Check out countmap from StatsBase

unique()?

1 Like

Thanks,
I sorted the results obtained by unique() as sort!(unique!(my_array))

How can I sort the array obtained by countmap(my_array), I tried sort(countmap(my_array)) but got it sorted by key while I want it to be sorted by value

One more point, I noticed for the unique() lower case is considered different than upper case, I tried using titlecase as unique(titlecase(my_array)) but it failed!

You can sort the result of countmap by value using

sort(collect(my_array),by=last)

For titlecase try

unique(titlecase.(my_array))

1 Like