Hi,
I want to read text out of a PDF. I know there’s PDFIO, but unfortunately it seems to crash on some PDFs.
So: I’d like to use Python’s PyPDF2. However, I ran into a different problem…
In Python I can read a page thus:
import PyPDF2
doc=PyPDF2.pdf.PdfFileReader("x.pdf")
p = doc.getPage(1)
type(p)
# PyPDF2.pdf.PageObject
But I don’t get a PageObject with PyCall:
using PyCall
pypdf = pyimport("PyPDF2")
doc = pypdf.pdf.PdfFileReader("x.pdf")
p = p2.getPage(1)
typeof(p)
# Dict{Any, Any}
Obviously, I can’t then do something like:
p.extractText()
What am I missing?
Thanks,
DD