Rdir: search recursive for files with a given name pattern

I am looking for a function that lists all files in all sub-directories according to a given search pattern.
I can imagine that such a function already exist, but I failed to find it.
Below my version:

# rdir: list files in all sub-directories that match a given pattern
s_dir = raw"C:\data\julia\scripts\example"
s_pattern = "hl.jl"

# drop files in directories that cannot be reached:
# walkdir(path; onerror = identity)
# https://discourse.julialang.org/t/hello-got-a-quick-question-i-am-using-walkdir-to-scan-the-file-system-for-file/54194

function rdir(s_dir::String, s_pattern::String)
  s_files = String[]
  if ~isdir(s_dir)
    error(string("\"", s_dir, "\" is not a dir!"))
  else
    for (root, dirs, files) in walkdir(s_dir; follow_symlinks = false, onerror = identity)
      # why is this wrong?
      # global s_files
      println("Directories in $root")
      for i_dir in dirs
        println(joinpath(root, i_dir)) # path to directories
      end
      println("Files in $root")
      for i_file in files
        # why is this wrong?
        # global s_files
        if occursin(s_pattern, i_file)
          println(string("Pattern: \"", s_pattern, "\": ", joinpath(root, i_file))) # path to files
          push!(s_files, joinpath(root, i_file))
          println(s_files)
        end
      end
    end
  end
  return s_files
end

s_files = rdir(s_dir, s_pattern)
println("s_files:")
println(s_files)

The strange thing is the behavior of the variable s_files.
Why is it not necessary to declare it as global in the for-loop?
And even more strange, why does it the opposite from what I would
like to achieve? If I declare them inside the for-loop as global,
the content is deleted.

1 Like

https://github.com/vtjnash/Glob.jl

2 Likes

I think you are after glob patterns?

https://github.com/vtjnash/Glob.jl

The following should do it:

using Glob

readdir("*.jl", dir)
3 Likes

Oh we wrote at the same time :smiley:

I do not know why, but on my computer it does not work, neither on Linux
Julia v1.4 nor on MS Windows 10, Julia v1.7.1.
The error message is in both cases:

no method matching readdir(::String, ::String)

1 Like

I think you need

readdir(glob"*.jl", dir)

but it doesn’t look like Glob.jl acts recursively. So it doesn’t full solve your problem (though it is easy to write a recursive function).

1 Like

Confirmed! :slight_smile: with glob"string" it works.
And you are right! Unfortunately, it does not work recursively.

Interesting, I remember using it recursively in the past, maybe I implemented the recursion myself? I don’t remember exactly.

You can always provide a glob pattern to look into subdirectories though like glob"/*/*.jl" to look into subdirectories if I am not mistaken.

In any case it is worth opening an issue in the repository with a feature request.

This throws an error on my machine:

LoadError: Glob pattern cannot be empty or start with a / character

where is the trick? Does glob"/*/*.jl" only work as an argument to a function?

@ellocco please read the README of the package more carefully, I am just typing random snippets of code here from a mobile device. The character / is special apparently and cannot be used at the start of the pattern. Have you tried omitting the character as the error message suggests?

You could use the Glob package with walkdir, for example:

import Glob
function rdir(dir::AbstractString, pat::Glob.FilenameMatch)
    result = String[]
    for (root, dirs, files) in walkdir(dir)
        append!(result, filter!(f -> occursin(pat, f), joinpath.(root, files)))
    end
    return result
end
rdir(dir::AbstractString, pat::AbstractString) = rdir(dir, Glob.FilenameMatch(pat))

Then do e.g. rdir("somedir", "*.jl").

8 Likes

(It might be useful to add walkdir support directly to Glob.jl. In general, it seems much better to have an iterator for this sort of thing, since a recursive directory tree can get huge.)

2 Likes

sorry, I was sure Glob is recursive and I was literally thinking **, sadly:
https://github.com/vtjnash/Glob.jl/issues/19

There is an enhanced Glob around:
Eglob
Unfortunately, the documentation does not make clear to me,
if I can specify a specific top directory as an input
parameter to the function, which specifies the starting point
of the recursive search.

@stevengj Thanks for your code! It is not easy to understand everything, could you be so kind and comment this line:

For me it is not easy to understand.

I defined two methods of the rdir function. One methodd that takes a FilenameMatch pattern, which is defined by the Glob.jl package, can be constructed with fn"...", and is needed by my implementation because that’s what Glob.jl implements occursin for. The other method, for convenience, takes a simple string pattern β€” it is implemented by simply converting your string to a FilenameMatch and calling the first method.

This way, you can pass either a string or a FilenameMatch to rdir. (The latter provides more options, e.g. there is an option to make it case-insensitive.)

This is a pretty common pattern in Julia.

2 Likes

Hi Steven,
thank you for the link to the topic of different methods for one and the same function!
That link really does widen my horizon!
Stefan

The proposed function with the variable type β€œFilenameMatch” in combination with β€œoccursin” has the drawback that β€œ*.lj” works, but β€œstring*.jl” does not. Another option is the variable type β€œGlob.GlobMatch” in combination with β€œreaddir()” this enables the usage of the joker char / asterisk β€œ*” inside the search string:

function MyLib_RDir(s_dir::AbstractString, s_pat::Glob.GlobMatch)
    files_filtered = String[]
    for (root, dirs, files) in walkdir(s_dir)
        for i_files in readdir(s_pat, root)
            files_filtered = vcat(files_filtered, i_files)
        end
    end
    return files_filtered
end
# Next: add 2nd method to function "MyLib_RDir"
# https://docs.julialang.org/en/v1/manual/methods/
# purpose: convert "String"-type content into 
# "GlobMatch"-type (defined by the Glob.jl package) pattern,
# by utilizing the first methode of this function
MyLib_RDir(s_dir::AbstractString, s_pat::AbstractString) =
    MyLib_RDir(s_dir, Glob.GlobMatch(s_pat))

If I am not mistaken, the nice solution proposed by @stevengj will work as per your wishes by doing the minor adjustement:

append!(result, filter!(f -> occursin(pat, f), files), joinpath.(root, files))
1 Like