Does anybody work with PDFIO.jl?
I need to extract text from pdf, due to limiting of my knowledge, I don’t understand how to use the method to save text to string, which is printing to REPL.
the first argument to dPageExtractText is stdout indicating that the output is printed. If you want to capture the output, you may try something like this
function getPDFText(src, out)
# handle that can be used for subsequence operations on the document.
doc = pdDocOpen(src)
# Metadata extracted from the PDF document.
# This value is retained and returned as the return from the function.
docinfo = pdDocGetInfo(doc)
open(out, "w") do io
# Returns number of pages in the document
npage = pdDocGetPageCount(doc)
for i=1 : npage
# handle to the specific page given the number index.
page = pdDocGetPage(doc, i)
# Extract text from the page and write it to the output file.
pdPageExtractText(io, page)
end
end
I found it on the internet. My problem is that it take always the first line of the next page. The PDFtk does a better job.