I am looking for suggestion on doing Optical Character Recognition. This does not necessarily have to be native Julia (see below). Also I am not looking for a full-on Machine Learnign framework which will recognise images.
I am looking for a library or an external program which if given an image of a typewritten page it will return a scoring value - the higher the score the more readable or more easily recognised is the text.
If this is best implemented usign Kaggle (which I know next to nothing about) I will hoist that aboard.

To explain, I have had a programming exercise in my head for years. Take an A4 typewritten page. Send it through a straight-cut shredder. Take the set of strips you have and scan them as images. Of course as an exercise you can do this virtually. Then combine the strips into an image page, at random. How can we piece together the original typewritten page?
That leads to using optimisations -perhaps a genetic algorithm - to arrange the strips.
Or it maybe that simple brute force is easier - just look at all possible arrangements to the strips.

Perhaps bad form to answer my own question… Tesseract may be what I am looking for

Other suggestions are of course welcomed. (I have searched for this - I just came across Tesseract after I asked on here)


I wish we could mobilize the community and build in Julia something better and simpler than this new LSTM-based OCR you just posted. Could we build a repo to start mapping out its design? Maybe if I saw its beginnings I could contribute to the completion of it. In any case, I am completing A. Ng’s ML course which teaches OCR on the eleventh week - maybe I could start mapping out the design.


Any progress on this front?

There exists a julia wrapper for tesseract, see https://github.com/leferrad/OCReract.jl.
However, it does not appear to be maintained.