PyCall trouble

dodoplus · April 9, 2022, 11:23am

Hi,

I want to read text out of a PDF. I know there’s PDFIO, but unfortunately it seems to crash on some PDFs.

So: I’d like to use Python’s PyPDF2. However, I ran into a different problem…

In Python I can read a page thus:

import PyPDF2
doc=PyPDF2.pdf.PdfFileReader("x.pdf")
p = doc.getPage(1)
type(p)
# PyPDF2.pdf.PageObject

But I don’t get a PageObject with PyCall:

using PyCall
pypdf = pyimport("PyPDF2")
doc = pypdf.pdf.PdfFileReader("x.pdf")
p = p2.getPage(1)
typeof(p)
# Dict{Any, Any}

Obviously, I can’t then do something like:

p.extractText()

What am I missing?

Thanks,

DD

cjdoris · April 9, 2022, 11:39am

Yeah PyCall converts any returned Python objects to Julia ones. Presumably a PageObject is a dict-like thing so gets converted to a Dict.

You can do pycall(doc.getPage, PyObject, 2) to keep it as a Python object.

(Aside, there’s also PythonCall which doesn’t do this automatic conversion.)

Topic		Replies	Views
Workflow for converting Python scripts New to Julia pycall	4	1691	December 27, 2019
Python stdout from PyCall New to Julia package , pycall	2	416	January 22, 2022
How to let PyCall print out results in a Jupyter Notebook General Usage pycall	0	461	January 14, 2020
PyCall and email get_payload General Usage	2	677	April 2, 2017
Assign types to a Dict returned by PyCall General Usage pycall	5	1349	February 9, 2022