Automatically cropping images


I’ve got a directory filled with a couple of hundred images, similar to something like this:found on the internet.
The problem is a lot of these images aren’t centered or don’t take up a lot of the space (e.g. a lot of excess whitespace) so I’d like to crop them.
I tried to crop these images based at row r and column c where sum(img)[r, :] and sum(img[:, c]) exceed a certain threshold. This works in some cases but fails when the edge of the image has a line of black pixels (the images are scanned so a lot of them have this.)

When the image is a thin line drawing this heuristic fails completely.

Examples of hard cases due to thin lines

I’m looking for something along the lines of a filter that is invariant to thin lines, black paper edges…
I’ve already tried using felzenszwalb but that also recognizes the long thin lines along the edge as a blob.
Any help would be appreciated!


Are you familiar with this collection of scripts?

I’ve used it to solve similar problems before


Wow, that’s definitely something I was not familiar with.
Thanks a lot!
I’ll surely have a look through this!

Could you not just define a small margin where you ignore content, and the use the simple cropping procedure on the rest?

That’s a solution for most of the images but some (perhaps too few to spend time on, but that’s another problem…) have an object that touches the edge and I wouldn’t want anything to be cut off.

But for what I have in mind, if the object is larger than the margin, nothing would be cut.

Right, that’s something I tried and it did indeed work quite wel, however it didn’t get rid of the problem with drawings with thin lines.
The cropping also disregards small protrusions since these do not exceed the threshold.