How to group data from multiple files?

Hello everyone,

Here’s my code :

function TypeFrame(objet)
   findFrame = findall(x->occursin("frame =",x),objet)
   sframe = []
   for i in findFrame
      lframe = objet[i]
      sframe = [sframe;split(lframe,r"frame =| --> ")[2]]
   return sframe 

for f in readdir()
    file = readlines(f)
    findWindow = findall(x->occursin("Window",x),file)
    for line in findWindow
      if startswith(split(file[line]," --> ")[2],"Window")
   fichier = open("FilteredFile.txt") do file
   f = readlines(file)
   WindowType = TypeFrame(f)

With this code I read all the files in my directory, for each file I only keep a few lines ( if startswith(split(file[line]," --> ")[2], “Window” …) that I will then write in a new text file (FilteredFile.txt). Once I’ve done that, I open this new file and apply a function that gives me an array.
Each time a new file from the directory is processed the file “FilteredFile.txt” is “overwritten” and then recreated with new data.

For example if I have two files in my directory I will get 2 distincts arrays :

Any[" ‘DataManager’", " ‘LauncherFrame’", " ‘DataManager’", " ‘DataManager’", " ‘DataManager’", " ‘ToolBox’", " ‘ToolBox’", " ‘ToolBox’", " ‘MonoWellInterpretationView’", " ‘MonoWellInterpretationView’", " ‘MonoWellInterpretationView’", " ‘MonoWellInterpretationView’", " ‘MonoWellInterpretationView’", " ’ ‘ToolBox’", " ‘DataManager’", " ‘DataManager’"]
Any[" ‘DataManager’", " ‘LauncherFrame’", " ‘DataManager’", " ‘DataManager’", " ‘DataManager’", " ‘ToolBox’", " ‘ToolBox’", " ‘ToolBox’", " ‘MonoWellInterpretationView’", " ‘MonoWellInterpretationView’", " ‘MonoWellInterpretationView’", " ‘MonoWellInterpretationView’", " ‘MonoWellInterpretationView’", , " ‘MonoWellInterpretationView’", " ‘MonoWellInterpretationView’", " ‘MonoWellInterpretationView’", " ‘MonoWellInterpretationView’", " ‘ToolBox’", " ‘DataManager’", " ‘DataManager’"]

I would like for any number of files in my directory to get a single array that contains the data of all the files.

I have already succeeded by not “overwriting” the “FilteredFile.txt” file each time a new file in the directory is processed. It works well when I only have 2 files in my directory but if I have 1000 I will end up with a huge “FilteredFile.txt” with thousands of lines which will take a long time to process.

So my problem is to keep overwriting the “FilteredFile.txt” file while “adding” somewhere the data of each file as we go along to have only one array.

I don’t know if I’ve made it clear and I thank you in advance for your answers and your help !

There is a bit much going on that seems incidental, so let me try to boil this down to the basics:

You have a list of text files. The goal is to construct a new file that contains a subset of the lines of these files (eliminating duplicates?).

I may not understand this correctly, but your current problem is to collate all the WindowType objects into a single object?

It sounds like you don’t have an enormous amount of data, so that fitting all filtered lines into memory would be an issue. It would then seem to be much faster to avoid writing all the intermediate results into a file (“FilteredFile”).

It sounds like you may want to replace the out = ... close(out) block by a function that just returns the filtered lines from one file as a Vector{String}. Then you could just loop over files, append the newly read lines to the list of lines from previous files, use unique to get rid of duplicates, and continue to the next file.

Perhaps I misunderstand, though…

1 Like

Thank you ! I used a vector instead of a temporary file and it worked !