Using Julia to extract information from a ballot

I would like to identify tools in the Julia ecosystem to parse a model ballot. For example, from the following PDF,

obtain

  • Candidature
  • Candidate’s name
  • Party Affiliation

@sambitdash would it be possible to use the PDF structure to scrape those fields with PDFIO.jl?

A search shows Avik has written a package called Taro

If all rectangles have the same size you can probably find them with a 2D matched filter?

Simply extract one rectangle and correlate the image with your little rectangle pattern (kernel). The correlation image will have peaks located on all rectangles in the image.

Ended up using OCReract.jl
Ref: https://github.com/Nosferican/CandidatosEleccionesGeneralesPR2020

In this specific case, isn’t possible to find the data in an other format?

Highly unlikely. The PR government is known for not being great at government transparency / openness. I was able to programmatically get the political contributions by using Twitter to contact the Electoral Board Comptroller Office and have them tweak an internal API so I could get the data. Right now I have the executive and legislative ballots done. The municipal is a bit trickier because the location / dimensions are not consistent due to some minority parties that don’t list candidates for all local governments.

Sorry for my delay in response as I do not login to the forum as often. If you can estimate the rectangular regions, we could use the suggestion given in issue: https://github.com/sambitdash/PDFIO.jl/issues/55