Box coordinates from image segmentation

sardinecan · February 9, 2024, 3:26pm

Hi Everyone,
I’m new to the field of image segmentation, and I need your help to find a way of determining the coordinates of a box from image segmentation.

I have a batch of images like thie one:

I try to find a way of determining the zones/boxes corresponding to each page to “crop” these images directly with the IIIF API.

The ImageSegmentation package seems to be efficient for this task:

using Image
using ImageSegmentation
using HTTP
using FileIO
using Random

imgUrl = "https://api.nakala.fr/data/10.34847/nkl.027b840e/5c8e77a046216ab6aed848b2f781deb9495fea76"
file = download(imgUrl) |> load

function get_random_color(seed)
    Random.seed!(seed)
    rand(RGB{N0f8})
end

seg = fast_scanning(file, 0.2)
map(i->get_random_color(i), labels_map(seg))

ouput:

What I need to do now is find a way to determine the coordinates of the top left and bottom right points for both blue and green areas. Do you have any idea to achieve this task, or with another method?

Thanks for your help!
Best,
Josselin

sardinecan · February 10, 2024, 8:44pm

I found a solution.
First, we take the segmentation matrix

segMap = labels_map(seg)

Then, from this matrix, we can retrieve the coordinates of all the pixels in a given segment. Then we can store the X and Y coordinates in two separate variables to finally determine a box around a segment (the black crosses on the image below):

coordinates = findall(x -> x == 1, segMap)
x = Vector()
y = Vector()

for c in coordinates
    push!(x, c[2])
    push!(y, c[1])
end

xStart = x[1]
xEnd = last(x)

sort!(y)
yStart = y[1]
yEnd = last(y)

Hope this makes sens!
Best

stevengj · February 11, 2024, 4:27am

You don’t need to sort an array to find the minimum and maximum. You should be able to just do:

segMap = labels_map(seg)
coordinates = findall(==(1), segMap)
xmin, xmax = extrema(c -> c[1], coordinates)
ymin, ymax = extrema(c -> c[2], coordinates)

(With a bit more cleverness, you could do it in a single pass over segMap, without ever constructing a coordinates array explicitly, but it’s late and I’m lazy.)

PS. Note that calling x = Vector() and then repeatedly calling push! is fairly inefficient. For one thing, Vector() returns an untyped array Any[], which is inefficient to work with. For another thing, repeatedly calling push!, while it is still amortized linear time, is less efficient than allocating an array of the correct length to begin with. A simpler, more efficient construction would be x = getindex.(coordinates, 1), or alternatively x = map(c -> c[1], coordinates). But it is even better to avoid allocating the x array entirely, as in my code above.

sardinecan · February 11, 2024, 12:19pm

Thanks for your advice and improvements @stevengj. I still have a lot to learn!

rocco_sprmnt21 · February 11, 2024, 8:07pm

If the heuristic is deemed useful for the case at hand, the following form should be faster in carving out the “good” part

tr=10^3
 
cmin=findfirst(c->sum(c.!=@view segMap[:,1])>tr,eachcol(segMap))
cmax=findfirst(c->sum(c.!=@view segMap[:,end])>tr,reverse(eachcol(segMap)))
rmin=findfirst(r->sum(r.!=@view segMap[1,:])>tr,eachrow(segMap))
rmax=findfirst(r->sum(r.!=@view segMap[end,:])>tr,reverse(eachrow(segMap)))

file[rmin:end-rmax,cmin:end-cmax]

mpeters2 · February 12, 2024, 3:10am

I might have missed this, so feel free to ignore me, but does that algorithm return two points (e.g. left-top and bottom-right) or 4 points? The reason I ask is that the images might not be exactly 90-degrees aligned, and it might make more sense to return 4 points so that you have parallelogram.

sardinecan · February 12, 2024, 10:57am

Hi @mpeters2,

You’re right. It returns only two points, which are the intersections of the extrema (xmin/ymin and xmax/ymax). So it does not “fit” exactly the segments… but region with IIIF API only takes one point, the upper left, with two other parameters: width and height.

The region of the full image to be returned is specified in terms of absolute pixel values. The value of x represents the number of pixels from the 0 position on the horizontal axis. The value of y represents the number of pixels from the 0 position on the vertical axis. Thus the x,y position 0,0 is the upper left-most pixel of the image. w represents the width of the region and h represents the height of the region in pixels.

You are also right about the alignment. If we could get the exact coordinates of the 4 points, it would be possible to rotate the image and increase precision (although my 18th c. papers are not square ). But in any case, I have no idea how to obtain these 4 exact coordinates, segmentation may not be accurate enough to identify them? Any suggestion is welcome!

stevengj · February 12, 2024, 1:33pm

You could use a corner identification algorithm. (e.g. via ImageCorners.jl)

sardinecan · February 15, 2024, 9:08am

Thanks @stevengj, I’ll take a look at ImageCorners. For my use case, it might be more effective than ImageSegmentation!?