Detecting crosses in grids

Are you sure we only need 3 and not 4 points…? All the stuff I find about this mention 4 points…

You probably need all 4 corners, if you calibrate just the camera and then process images with unknown orientation of the camera relative to target plane.

I’m not interested in unwarping the images

I thought that you want to segment your image into grid squares for further analysis. This would be easiest after unwarping, otherwise you would have to deal with nonlinear grid.

No no, I track animals in the videos. I then want to unwarp those tracks, which are just sets of xy coordinates. To unwarp them I need to find the trasnformation matrix. I think that what I need is something similar to Matlab’s cp2tform or fitgeotrans with a projective setting.

I track animals in the videos

So what is the purpose of the physical grid? Are you tracking animals small relative to grid size pictured above, e.g. insects? Is the camera position fixed relative to the grid position during the experiment?

I think you were totally right in saying that the grid is pointless. I need only the 4 corners of the frame. Which is what I’m doing now, I’ve removed the grid.
Do we have a function that can produce a projective transform from the 4 image- and respective real world- coordinates in Julia?

You have to calibrate the camera to get intrinsic and extrinsic parameters (matrices). You can use Matlab, OpenCV, if you prefer playing with that API, or any other application for camera calibration with checkerboard pattern. After you obtain intrinsic and extrinsic matrices you can use them to transform the trajectories (points) you track on the video to real world coordinates.

But, if I have the 4 image coordinates of a rectangular frame that is lying on a flat surface where my animals will be moving on (they are restricted to that 2D plane), then I don’t need anything else, right?

Depending on the details of your experiment, you will need at least to remove barrel distortion. Otherwise, you will see a barrel distorted image of your rectangular frame and animal trajectories. Even if you position the camera in such orientation that image of corners of the frame form a perfect rectangle with the same aspect ratio as original, your animals will be able to travel outside of this rectangle and their straight paths will look curved on video.

OpenCV or Matlab’s Camera calibration won’t help you here. Or at least they will be less exact than your original idea.
But it might be an easier procedure.

You can:

  1. Do you original idea, but that requires manual development of the algorithm that finds the grid … after that you already established the steps you need to take to move from pixels to real world cm


  1. have your camera calibrated using the accepted ways of taking several checkerboard images and feeding them into a calibration software.
    This will give you K the intrinsic camera parameters.

Then position your camera infront of the surface such that it is parallel to the camera , and measure the exact distance between camera and surface.

Then, since the camera is calibrated , for a given xy (after distortion correction) in pixels and distance from camera d … the world coordinates is inv(K) (xd,yd,d)

Method 2 is less exact since it is an approximation using a mathematical model of 1

Isn’t (2) what LensFun does, as Tamas suggested above?

If distortion coefficients and intrinsic matrix for your camera are listed somewhere (for example as a part of LensFun project) then you can use that instead of calibrating yourself. Sure .
But calibration can be more accurate because of variations in the manufacturing process.

Good people, thanks for all the input!

It seems like a crucial question still remains: Are 4 pairs of points – real world and their corresponding image coordinates – enough to calculate the real world coordinates of new locations that lie in the same plane as the original 4 points, OR do I need to do the whole camera calibration shebang (like Jean-Yves Bouguet’s toolbox)?

Remember that it’s just one camera and everything is fixed. The camera has a wide angle lens, is 2 meters away from the scene, and has HD resolution. If it’s a question of accuracy, we’re fine with errors of ~2 cm.

If you can undistort the image, then 4 pairs of points (at the
corners/extrema) is sufficient, assuming the area is actually planar.

Is LensFun sufficient to undistort the image for your camera? If not, I
might have some code that can help you, but I’ll have to see if I’m allowed
to share it.


I suggest to just test it. (And update here … it’s interesting)
If you’ll need better accuracy then try first doing the distortion correction thing.

Well, I don’t need to undistort the images. I just need to undistort the tracks of animals that move within that plane. So my process is:

  1. record a video with the (rectangular) frame on the flat ground
  2. remove that frame
  3. record animals moving on the same ground
  4. track the 4 corners of the frame and animals
  5. then calculate the correct transform to convert the image coordinates to real world coordinates using those 4 corners
  6. now I have the animal tracks in real world coordinates

I’m working on Point #5. It’s not utterly trivial…

I’ll report back :slight_smile:

No, it’s not enough. You need to use barrel distortion transform, which is usually a radial, second or fourth order polynomial.

TL; DR: It works, 4 pairs of points – real world and their corresponding image coordinates – are “enough” to calculate the real world coordinates of new locations that lie in the same plane as the original 4 points.

OK, I ran a quick and dirty trial, just to see if it is even close.

  1. I extracted an image from the video of a frame with known dimensions (120 x 100 cm between the inner edge of the frame):
  2. Noted the locations of all 4 corners (inner topside corners).
  3. Calculated the transform.
  4. I extracted an image from the video of a A3 page (29.7 x 42 cm):
  5. Noted the locations of all 4 corners of this page.
  6. Applied the transform from step #3 on the page points from step #5.
  7. Calculated the dimensions of the unwarped page and compared with reality.

The differences were less than a cm (0.95 cm and 0.67 cm for the short- and long-side respectively).

Here comes the shocker: Julia does not currently have a function to calculate the projective transformation and apply said transformation. So I had to do all of this in Matlab :face_vomiting:

Here’s the code:

movingPoints = [503 145;
    1517 200;
    1698 945;
    288 865];

fixedPoints = [0 100;
    120 100;
    120 0;
    0 0];

transformationType = 'projective';
tform = cp2tform(movingPoints,fixedPoints,transformationType);

u = [873, 1248, 1182, 781];
v = [330, 423, 623, 513];

[x, y] = tformfwd(tform, u, v);

long1 = sqrt((x(1) - x(2))^2 + (y(1) - y(2))^2);
long2 = sqrt((x(3) - x(4))^2 + (y(3) - y(4))^2);
long = (long1 + long2)/2;

short1 = sqrt((x(1) - x(4))^2 + (y(1) - y(4))^2);
short2 = sqrt((x(3) - x(2))^2 + (y(3) - y(2))^2);
short = (short1 + short2)/2;

short - 29.7
long - 42

The last image does not have any perceivable barrel distortion as opposed to the first one at the beginning of this post. You are right that in such case, you can use just the projective transformation and get accurate enough results.

oooo… I’ll try to challenge it with a more barrely shot.