Grouping RegexMatch

hasanOryx · September 7, 2019, 6:33pm

I’m trying to read this file, and find how many time the following words appear:
file:

What's in a name? That which we call a rose
By any other name would smell as sweet! rose :)

Words to be found: rose' and sweet` so I wrote the below code:

fname = "simplefile.txt"
s = read(fname, String)
rx = r"(rose|sweet)s?"
collect(eachmatch(rx, s, overlap = true))

And got the output as:

3-element Array{RegexMatch,1}:
 RegexMatch("rose", 1="rose")  
 RegexMatch("sweet", 1="sweet")
 RegexMatch("rose", 1="rose")

As noticed the word “rose” appeared twice, and in the collect appeared in 2 different lines, how can I write a code to made the results be something like:

rose => 2 times, 
sweet => 1 time

And what if I need to remove the duplication or multiple appearance, so that the result be something like:

found words: rose, sweet

kevbonham · September 8, 2019, 12:37am

Check out countmap from StatsBase

unique()?

hasanOryx · September 8, 2019, 4:30am

Thanks,
I sorted the results obtained by unique() as sort!(unique!(my_array))

How can I sort the array obtained by countmap(my_array), I tried sort(countmap(my_array)) but got it sorted by key while I want it to be sorted by value

One more point, I noticed for the unique() lower case is considered different than upper case, I tried using titlecase as unique(titlecase(my_array)) but it failed!

Simon_Bolland · September 8, 2019, 11:38am

You can sort the result of countmap by value using

sort(collect(my_array),by=last)

For titlecase try

unique(titlecase.(my_array))

Topic		Replies	Views
Correct usage of regex matches New to Julia regex	5	702	May 9, 2021
Problem with regex example in docs New to Julia regex	3	761	December 6, 2021
RegexMatch to NamedTuple? New to Julia strings , regex , namedtuple	3	1133	August 20, 2023
Q \| Using Regex and eachmatch() to extract substrings General Usage question , regex	1	199	April 5, 2023
Did Julia 1.6 introduced a regression for regex properties? General Usage strings , regex	2	652	March 30, 2021

Grouping RegexMatch

Related topics