PyCall trouble

Hi,

I want to read text out of a PDF. I know there’s PDFIO, but unfortunately it seems to crash on some PDFs.

So: I’d like to use Python’s PyPDF2. However, I ran into a different problem…

In Python I can read a page thus:

import PyPDF2
doc=PyPDF2.pdf.PdfFileReader("x.pdf")
p = doc.getPage(1)
type(p)
# PyPDF2.pdf.PageObject

But I don’t get a PageObject with PyCall:

using PyCall
pypdf = pyimport("PyPDF2")
doc = pypdf.pdf.PdfFileReader("x.pdf")
p = p2.getPage(1)
typeof(p)
# Dict{Any, Any} 

Obviously, I can’t then do something like:

p.extractText()

What am I missing?

Thanks,

DD

Yeah PyCall converts any returned Python objects to Julia ones. Presumably a PageObject is a dict-like thing so gets converted to a Dict.

You can do pycall(doc.getPage, PyObject, 2) to keep it as a Python object.

(Aside, there’s also PythonCall which doesn’t do this automatic conversion.)

1 Like