Find the location(s) of all occurrences of a specific string in .txt

Nash · May 1, 2020, 8:53pm

I am trying to do the following:

Open a .txt file placed at a specific directory
Find the location(s) of all occurrences of a specific string

An example of the .txt that am working with is found at this link (link to .txt)(Dropbox - Error)

I have tried the following:

f = open(“C:/Users/me/data/1.122430932.txt”)
pattern = r"ltp"
target = f
m = match(pattern, target)

However, I get the error " MethodError: no method matching match(::Regex, ::IOStream)

What is the correct procedure?

Henrique_Becker · May 1, 2020, 9:50pm

Use read to read the whole file as a string, if this is not a problem for memory. Also, prefer using version of open that takes a block, to avoid keeping the file open.

f = open(“C:/Users/me/data/1.122430932.txt”)
pattern = r"ltp"
target = read(f, String)
m = match(pattern, target)

Please, use triple backticks to past code or terminal outputs in your posts.

```
Example.
```

becomes

Example.

Nash · May 2, 2020, 8:28am

When I run the suggested code, I do not get any information about the positions of the target string. But the code does not throw an error. I changed the search criteria to “op” because, as can be seen in the image below, “op” is clearly part of the string.

I was expectung to get a list of positions.

tlienart · May 2, 2020, 9:30am

Using eachmatch is the way to go here consider:

s = """Turnip greens yarrow ricebean rutabaga endive cauliflower sea lettuce kohlrabi amaranth water spinach avocado daikon napa cabbage asparagus winter purslane kale. Celery potato scallion desert raisin horseradish spinach carrot soko. Lotus root water spinach fennel kombu maize bamboo shoot green bean swiss chard seakale pumpkin onion chickpea gram corn pea. Brussels sprout coriander water chestnut gourd swiss chard wakame kohlrabi beetroot carrot watercress. Corn amaranth salsify bunya nuts nori azuki bean chickweed potato bell pepper artichoke. Turnip greens yarrow ricebean rutabaga endive cauliflower sea lettuce kohlrabi amaranth water spinach avocado daikon napa cabbage asparagus winter purslane kale. Celery potato scallion desert raisin horseradish spinach carrot soko. Lotus root water spinach fennel kombu maize bamboo shoot green bean swiss chard seakale pumpkin onion chickpea gram corn pea. Brussels sprout coriander water chestnut gourd swiss chard wakame kohlrabi beetroot carrot watercress. Corn amaranth salsify bunya nuts nori azuki bean chickweed potato bell pepper artichoke. """

Which is the same text twice:

for m in eachmatch(r"bamboo", s)
  @show m
end

Right it finds two of those, but where are they, well you can retrieve that from a RegexMatch object:

for m in eachmatch(r"bamboo", s)
  @show m.offset
end

Does that work?

len = length("bamboo")
@show s[277:277+len]
@show s[827:827+len]

More generally, let’s consider a single regex match object:

m = match(r"bamboo", s)

The key fields are

captures (here empty, no capturing group in the regex)
match (the whole match)
offset (where it is)

You could do something like the following to get ranges:

for m in eachmatch(r"bamboo", s)
  a = m.offset
  b = prevind(s, m.offset + lastindex(m.match))
  @show s[a:b]
end

The prevind and lastindex help avoid issues if there are unicode chars like α

PS: maybe to actually answer your question:

[m.offset for m in eachmatch(r"bamboo", s)]

Topic		Replies	Views
How to return the line that contains specific information from a file? General Usage question , regex , parsing , io	7	1412	November 3, 2021
Find index of all occurences of a string New to Julia	7	2852	April 15, 2019
Regex capture next line in text file General Usage question , regex	7	359	July 9, 2023
Best way to get all substrings or numbers matching a regex General Usage strings , regex , parsing	9	8512	November 27, 2019
Searching for a regular expression inside an array New to Julia	16	5932	October 15, 2018

Find the location(s) of all occurrences of a specific string in .txt

Related topics