Get list of pdf files

Trying to get an array of the names of all the pdf files in the current directory, so I wrote the below:

files = cd(readdir, pwd())
for f in files
    if match(r"*.\.pdf", f) !== nothing # Or if occursin(r"*.\.pdf", f) == true
        println(f)
    end
end

But I got the below error:

PCRE compilation error: quantifier does not follow a repeatable item at offset 0

Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compile(::String, ::UInt32) at ./pcre.jl:104
 [3] compile(::Regex) at ./regex.jl:69
 [4] Regex(::String, ::UInt32, ::UInt32) at ./regex.jl:40
 [5] Regex(::String) at ./regex.jl:65
 [6] @r_str(::LineNumberNode, ::Module, ::Any) at ./regex.jl:103

That’s an invalid regexp, did you mean to write r".*\.pdf" instead?

Thanks, but it also returns files *.pdf.txt

Then use r".*\.pdf$" instead. Do read through a couple of regexp tutorials online though, this isn’t sepcific to Julia at all.

2 Likes

You could also use a filter, see for instance this post: addpath("C:\Users\") - #4 by StefanKarpinski

2 Likes

filter(x->endswith(x, ".pdf"), readdir())

1 Like

What is x, it gaves:

0-element Array{String,1}

Sorry, had the order wrong (updated it). x->endswith(x, ".pdf") is an anonymous function and x is its input argument.

It is done, 10 out of 12 had been displayed, same using the Regex.

  • Do you have an idea what could be the reason preventing 2 files from appearing
  • Is x the iter of readdir()?

Hard to say without more information. What is the output of readdir()?

Yes. filter applies the function (first argument) to every element of the collection (second argument) and returns a vector of all elements for which the function evaluated to true.

I’ve 12 files, but it reads 10

I’ve all the files here if you can help me understanding the issue. thanks

Try using Glob.jl:

using Glob
glob("*.pdf")

Thanks, it gave the same 10 files, not recognizing the other 2!!

@carstenbauer and @oschulz

I just noticed that these 2 files that are not caught are saved as .PDF in caps, not as .pdf in smalls.

So I was able to get them using regex with (?i) (?-i) as below, if you have simplified code will be appreciated.

read_files = cd(readdir, pwd())
for rf in read_files
    if occursin(r"(?i).*\.pdf\z(?-i)", rf) == true
        println(rf)
        end
end

Hm, Glob.jl does support caseless operation for Glob.FilenameMatch, but I’m not sure how to use that with it’s glob() function. @jameson, can I pull you into this thread for some help?

you can put i after the regex to enable case-insensitive matching, something like that:

r".*\.pdf$"i

Doc: @r_str

or you can just lower case the input:

x->endswith(lowercase(x), ".pdf")
6 Likes

Glob.jl also supports regexes, as shown in the README as usage 3, so this, for example, should match the rather unlikely path “./name.pdf/name.pDf/name.PdF”:

glob( ["name.pdf", r".*\.pdf"i, fn"*.pdf"i] )
2 Likes

Thanks, what is fn?

filename—it’s the matcher engine in the Glob.jl package

1 Like