Contents 1 Introduction 1.1 State of the art . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The 2.1 2.2 2.3

3 5

camera model 9 Intrinsic camera parameters . . . . . . . . . . . . . . . . . . . 10 Extrinsic camera parameters . . . . . . . . . . . . . . . . . . . 11 Radial distortion factors . . . . . . . . . . . . . . . . . . . . . 12

3 Camera calibration methods 14 3.1 Homography calculation . . . . . . . . . . . . . . . . . . . . . 14 3.1.1 Direct Linear Transformation . . . . . . . . . . . . . . 15 3.1.2 Normalized Direct Linear Transformation . . . . . . . . 16 3.2 Estimation of camera parameters for non-coplanar point sets . 18 3.2.1 Decomposition of the global transformation matrix . . 19 3.2.2 RQ decomposition . . . . . . . . . . . . . . . . . . . . 20 3.2.3 Direct calculation of R using Givens rotations . . . . . 21 3.2.4 Orientation-preserving transformations and homogeneous coordinates . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.5 Inversion of the camera-to-image matrix . . . . . . . . 25 3.2.6 Overall camera initialization for non-coplanar sets . . . 25 3.3 Estimation of camera parameters for coplanar point sets . . . 26 3.3.1 Calculation of the homography transformation . . . . . 26 3.3.2 Estimation of intrinsic parameters with multiple sets . 28 3.3.3 Estimation of extrinsic parameters . . . . . . . . . . . 31 3.3.4 Overall camera initialization for coplanar sets . . . . . 32 3.4 Coordinate transformations . . . . . . . . . . . . . . . . . . . 32 3.4.1 World to image coordinate transformation . . . . . . . 34 3.4.2 Image to world coordinate transformation . . . . . . . 35 3.4.3 Application and removal of radial distortion in images . 36 3.5 Non-linear optimization process . . . . . . . . . . . . . . . . . 37 3.5.1 Error metrics . . . . . . . . . . . . . . . . . . . . . . . 38 1

CONTENTS

3.6

3.7

3.5.2 Numerical estimation of the Jacobian matrix . . . . . Overall calibration process . . . . . . . . . . . . . . . . . . . 3.6.1 Transformation-oriented global calibration . . . . . . 3.6.2 Model-oriented global calibration . . . . . . . . . . . 3.6.3 Calibration of extrinsic parameters . . . . . . . . . . 3.6.4 Estimation of intrinsic parameters using multiple point sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . Undistorted camera model . . . . . . . . . . . . . . . . . . . 3.7.1 Distortion removal . . . . . . . . . . . . . . . . . . . 3.7.2 Generation of undistorted images . . . . . . . . . . . 3.7.3 Undistorted composite calibration of intrinsic and extrinsic parameters . . . . . . . . . . . . . . . . . . . .

2 . . . . .

41 42 43 44 44

. . . .

45 47 48 49

. 50

4 Evaluation of the proposed calibration methods 51 4.1 Calibration results . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2 Robustness against input noise . . . . . . . . . . . . . . . . . . 55 5 Summary

61

Chapter 1 Introduction Camera calibration is the process of finding a set of parameters which define a camera model from a set of world/image point correspondences. These parameters also define the transformation processes between the world and image coordinate systems. Once this model has been computed the camera can be used in different applications like, for example, object tracking, augmented reality or immersive virtual reality. The camera calibration methods developed here are part of a multiview 3D vision project where 16 independent cameras take synchronized shots frequently and use their calibration parameters to perform extensive image/world coordinate transformations. This project is still under development in the Technological Institute of Computer Science, located at the Polytechnic University of Valencia, Spain. Not all details of the project can be discussed here since some parts are still patent-pending, but the overall 3D vision process can be described. First of all, the 16 cameras take different and synchronized shots of a 3D object. The obtained images are then segmented to identify the object and distinguish it from the background. With this information and the original images, a volumetric octree carving procedure is applied. The position of the 16 cameras has been carefully selected to provide good results when this step is performed. Once we have a volumetric representation of the object, a new modified version of the Marching Cubes algorithm, which removes replicated points faster avoiding all comparisons, is applied for isosurface generation producing a 3D model of the object. This 3D model is then aligned to a reference model using an enhanced version of the Iterative Closest Point algorithm with normal vector compat3

CHAPTER 1. INTRODUCTION

4

ibility and Principal Component Analysis. These aligned models are then evaluated using a special volumetric correctness model. The system is expected to work continuously taking synchronized shots of the objects with little time lapses between them. In this case, camera calibration is a previous offline step to this process. It can be done once cameras are in their final positions or in a two-step calibration process for intrinsic and extrinsic parameters. The results of the calibration process will affect directly the definition of the 3D model generated by octree carving and the isosurface algorithm, compromising the entire project in case of failure or too large deviations. Because of this, all methods and algorithms exposed here have been developed or adapted, with numerical stability and robustness against input noise as one if its topmost priorities. The thesis contents are structured as follows. Chapter 2 introduces the camera model and all the intrinsic and extrinsic camera parameters with their mathematical background. Chapter 3 introduces the proposed calibration methods and, without loss of their applicability to the generic case, their adaptations to the project context. Techniques to compute homography matrices, the linear transformations between world and image coordinates, are presented in section 3.1. These techniques consist in the Direct Linear Transformation algorithm and its normalized data version with some minor changes in its calculation. Camera parameter initialization is then discussed. However, the calculations are completely different for the coplanar and non-coplanar cases. That is, if the provided point sets for calibration lie all in an unique plane or not. All details of non-coplanar parameter initialization can be found in section 3.2, while the coplanar parameter initialization is explained in section 3.3. Section 3.4 describes the different coordinate transformations required by the calibration algorithms, including world-to-image and image-to-world transformations as well as radial distortion application and removal from image coordinates. Non-linear optimization is then analyzed in section 3.5. Different error metrics are proposed and discussed. Also, a simple numerical method is proposed for the approximation of the Jacobian matrix, required by the Levenberg-Marquardt non-linear minimization algorithm.

CHAPTER 1. INTRODUCTION

5

With all previous details discussed, section 3.6 defines overall algorithms for camera calibration using different approaches to the calibration problem. Both one-view global calibration and multiple-view separate calibration procedures are detailed. A modification of the original camera model is proposed in section 3.7 to remove the radial distortion and optimize calculations. Also, an alternative calibration method is proposed based on the undistorted camera model. The calibration results of all the proposed methods and a reference implementation of Tsai’s algorithm can be found in chapter 4. A robustness comparison is performed to analyze the behaviour of the different methods against a increasing Gaussian noise in the input data. Finally, result analysis, global conclusions and related working lines on development are detailed in chapter 5. Independently from the adaptations and the optimizations to the project context, this work can be a good reference guide as it gathers different algorithms and techniques together. It also provides calibration methods for the most general cases and detailed results about their robustness in real calibration situations.

1.1

State of the art

Camera calibration is a mature field with a large amount of literature about it. It comes both from the photogrammetry community and from computer vision, where this project belongs. Since photogrammetric calibration requires special and usually expensive equipment it has not been explored as much as computer vision-based options applied to the context of this project. Regarding camera calibration papers based on the computer vision point of view, probably one of the most important papers is the one presented by Tsai [8]. Its camera model has served as basis for many other methods including the one proposed here, and its implementation is still being used in many places. Tsai proposes a two-step method for computing all the camera parameters both in the coplanar and the non-coplanar cases, combining closed-form techniques with non-linear ones. All parameters are estimated from a unique camera view and consequently one set of points. However, an adaptation to multiple views also proposed by Tsai can be found in [7].

CHAPTER 1. INTRODUCTION

6

A free implementation of the Tsai’s algorithm can be found at the homepage of Reg Wilson, with its last version dating from October ’95. An interesting point from Tsai’s article is about the lens distortion. Tsai acknowledges the presence of radial and tangential distortions caused by the camera optics. Both distortions can be expressed as infinite series with nonlinear factors but Tsai suggests to completely ignore the tangential distortion and to use only one quadratic distortion factor to avoid numerical problems. These suggestions have been taken into account in our project. However, camera calibration methods using more complex distortion models have been explored. For example, Heikkil¨a [5] proposes a camera model which includes 4 distortion factors: 2 for radial distortion and 2 for tangential distortion, and cites even more complex models which include linear distortion factors for non-orthogonal axes [6]. Heikkil¨a proposes a 4-step calibration process where Direct Linear Transformation is performed on the start followed by a non-linear optimization process and extra distortion correction procedures. The first two steps of the proposed method are quite common in the literature and are also the ones adopted here. A complete implementation of Heikkil¨a’s calibration method is available as a Matlab toolbox at the homepage of the author. Heikkil¨a mentioned the use of the Direct Linear Transformation (DLT) method. It was originally proposed by [1] as a closed-form solution to the camera calibration problem, but ignoring deliberately the radial and tangential distortions introduced by the camera lens. This algorithm builds an overdetermined system where a vector which nullifies a matrix must be found. However, all [1], [5] and [6] applied additional constraints to solve the system using the pseudo-inverse while avoiding the trivial solution. Here the 3D homogeneous to 2D homogeneous Direct Linear Transformation method has been generalized to a P -D homogeneous to 2D homogeneous problem. The use of the pseudo-inverse with additional constraints has been replaced with the Singular Value Decomposition. Also, data normalization proposed by [4] was applied before building the overdetermined system to increase the numerical stability of the process. Details on the current implementation can be found in sections 3.1.1 and 3.1.2. The output of the Direct Linear Transformation algorithm as defined in [1] is a 4 × 3 matrix which encodes both intrinsic and extrinsic parameters excepting non-linear distortion factors. Melen [6] proposed a method based

CHAPTER 1. INTRODUCTION

7

on the RQ decomposition to extract a set of eleven camera parameters in a five-step algorithm. In case of coplanar points, the matrix used in DLT to estimate the camera parameters becomes singular and it needs to be constrained to a 3 × 3 matrix to get a useful output. However, with Melen’s approach only a subset of the camera parameters could be estimated. This problem was solved by Zhang [9] who found a way to calibrate all camera parameters from multiple 3 × 3 matrices corresponding to multiple views of coplanar points. All these methods have been adapted to the current implementation and are described in sections 3.2.1, 3.2.4 and 3.3.1. Zhang proposes a complete algorithm to calibrate separately both intrinsic and extrinsic parameters of the camera from multiple views of coplanar points. First, homography transformations are computed using a method based in maximum likehood producing 3 × 3 homogeneous matrices as could be done with the Direct Linear Transformation algorithm. Then 2 equations are extracted from each homogeneous homography matrix to build an overdetermined system in a similar way as previously done with DLT. The solution of this system can be used to calculate the intrinsic camera parameters, with the exception of the non-linear distortion factors, in a closed-form way. Mathematically, 3 views not parallel to the plane defined by input points are required to produce a non-singular matrix, but additional constraints can be added. Zhang proposes an equation to force a null camera skew but, with little calculation, other constraints can be generated, such as equal x-y camera scaling factors or setting the principal point, also called the optical center, to the image center. Zhang also proposes an alternative method for intrinsic parameter calibration making use of a well-known pure translation of two views. The calculation of extrinsic parameters in the coplanar case from the previously calculated intrinsic matrix and the homogeneous homographies is also detailed. Refining methods are proposed to ensure the orthogonality of the calculated rotation matrices. The adaptation of Zhang’s work to the current project context can be found in sections 3.3.2 and 3.3.3. Other techniques are available for intrinsic parameter calibration in the coplanar case. For example, Chen, Wu and Wada [2] propose a closed-form method to calibrate the focal length and the extrinsic parameters from only one perspective view of two coplanar circles with arbitrary radius. This method takes the geometrical approach of the problem based on the implicit conics defined by the circles and their perspective transformation into ellipses. The results of this method can also be used to initialize non-linear optimizations as usually done in the literature. A variant of this method

CHAPTER 1. INTRODUCTION

8

using the projective equation of the circle can also be found in [10]. After the initialization of the camera parameters, a non-linear optimization method is used to calibrate the distortion factors. In this case, the Levenberg-Marquardt algorithm is used as also suggested by Zhang [9], Heikkil¨a [5] and others. Concretely, Heikkil¨a proposes to use the lineal camera parameters calculated by the Direct Linear Transformation as initialization values for the Levenberg-Marquardt algorithm, accelerating its convergence to only a few iterations. This method is also applied in this implementation. Many other options exist in the large amount of literature available. However, the experiments presented show that the different models and methods proposed here, adapted to the context and circumstances of this project, are adequate to calibrate our cameras with sufficient precision for later 3D model reconstruction and validation.

Chapter 2 The camera model As stated before, camera calibration is the process of finding a set of parameters defining a camera model from a set of world/image point correspondences. The required calibration point correspondences should be previously found using different methods like calibration targets and should be as accurate as possible. Our camera model is the same as proposed by Tsai [8]. That model defines the camera by a set of intrinsic or internal camera parameters and a set of extrinsic or external parameters that describe its relation with the world: Intrinsic parameters: • Focal length: focal length of the camera. • Skew: shearing factor applied by the camera sensor. • Optical center: 2D image coordinates of the sensor optical center. • κ: radial distortion factors applied by the camera lens. Extrinsic parameters: • Camera rotation: 3D rotation from world to image coordinates. • Camera position: 3D world coordinates of the camera position. This proposed model has a total of 10 degrees of freedom (4 intrinsic and 6 extrinsic) plus the ones used for different degrees of radial distortion.

9

CHAPTER 2. THE CAMERA MODEL

2.1

10

Intrinsic camera parameters

The intrinsic camera parameters define a transformation that projects real world points into the image plane placed at the focal length distance. This can be achieved by using a simple projection matrix working with homogeneous coordinates. Assuming an ideal projection centered at the origin, the transformation could be expressed as Ö è

u v w

Ö

=

á

è

xw yw zw 1

f 0 0 0 0 f 0 0 0 0 1 0

ë Ç

with

å

Ç

xi u/w = yi y/w

å

(2.1)

with focal length f , image coordinates (xi , yi ) and world coordinates (xw , yw , zw ). However, the center of projection is of course not the origin of the image. In fact usually it is not the center of the image, but a specific point (cx , cy ) defined by the camera optics. Also, the focal length parameter is measured in world coordinate units instead of pixels. A 2D world units/pixels conversion factor dp is required, usually provided by the camera specifications. Introducing these factors to the equation (2.1) produces the following expression: Ö è

u v w

Ç

with

å

Ö

=

á

è

xw yw zw 1

α x 0 cx 0 0 α y cy 0 0 0 1 0

Ç

å

ë

(2.2)

xi u/w = , αx = f /dpx , αy = f /dpy yi y/w

Additionally, a pixel skew factor s can be introduced in the transformation along with a simple decomposition of the matrix expression Ö

K=

αx s αy cx 0 α y cy 0 0 1

è

Ö

, I3 = á

Ö è

u v w

î

= K I3 | O3

ó

xw yw zw 1

è

1 0 0 0 1 0 0 0 1

Ö è

, O3 =

ë

Ç

with

å

Ç

0 0 0

xi u/w = yi y/w

å

(2.3)

CHAPTER 2. THE CAMERA MODEL

11

where K defines the linear intrinsic parameters of the camera. Non-linear intrinsic parameters such as radial distortions require a special consideration and will be analyzed later in section 3.5. In the equations (2.3) the skew factor s is scaled by the factor αy . The specific mathematical reasons for this scaling are related to the concatenation of affine transforms and will be discussed in section 3.2.5.

2.2

Extrinsic camera parameters

In addition to the intrinsic specific camera parameters, the camera needs to be correctly oriented and positioned in the world. This leaves a set of 6 degrees of freedom: 3 for position and 3 for orientation. These transformations are well-known and can be implemented using a matrix multiplication with homogeneous coordinates: Ö

xc yc zc

á

è î

= R T

ó

xw yw zw 1

ë

(2.4)

where R is a 3 × 3 rotation matrix which converts from world orientation to camera orientation and T is a translation column vector. The point (xc , yc , zc ) is the world point (xw , yw , zw ) expressed in camera coordinates. The translation vector T defines, in the camera reference system, the translation required to bring the world origin into the camera origin. This can also be seen as the camera position in world coordinates rotated to match the camera system orientation but with opposite direction. Since rotation matrices are orthogonal R−1 = RT for any rotation matrix R, the translation vector T is related to camera position C by the following expressions. T = −R C C = −RT T

(2.5) (2.6)

Using these equations the transformation described in (2.4) can be expressed as á ë Ö è xw xc î ó yw yc = R I3 | −C (2.7) zw zc 1

CHAPTER 2. THE CAMERA MODEL

12

Then, using homogeneous coordinates for the camera point (xc , yc , zc ) the world to camera transformation is described by á

xc yc zc 1

ë

á ñ

R −RC = 1 O3T

ô

xw yw zw 1

ë

(2.8)

Bringing together equations (2.3) and (2.8) it is possible to build a global linear transformation P as the concatenation of the previous matrices. á

Ö è

u v w

xw yw zw 1

=P

ë

á î

= K I3 | O3 Ç

with

å

ó

Ç

ñ

R −RC O3T 1

xi u/w = yi y/w î

å

P = K R I3 | −C

ô

xw yw zw 1

ó

ë

(2.9)

(2.10)

The global transformation P encodes all intrinsic and extrinsic parameters but radial distortion factors. These factors are non-linear by nature and require a different approach.

2.3

Radial distortion factors

Radial distortion is a non-linear optical phenomena produced by the camera lens which makes straight lines to bend as they get away from the optical center. In camera coordinates, it can be expressed by the equations Ç

å

Ç

xu xd = yu yd

å

2

x 1 +

d

κ1 yd 2

4

x +

d

κ2 yd

!

+ ...

(2.11)

2

where (xd , yd ) are the distorted point coordinates, (xu , yu ) are the undistorted ones, kpk2 denotes the 2-norm of a point p and k1 , k2 , . . . are the radial distortion factors.

CHAPTER 2. THE CAMERA MODEL

13

Applying radial distortion instead of removing it implies solving the equation (2.11) expressed as the following polynomial: ru = rd (1 + rd 2 κ1 + rd 4 κ2 + . . .) Ç

å

Ç

rd xu xd = yd ru y u

å

with ru =

x

u

yu

2

and rd =

x

d

yd

(2.12)

2

In case of using only one distortion factor the equation (2.12) becomes a cubic equation with analytic solutions available. The desired rd should be the minimal positive real root of the polynomial. Complex roots should be checked also, since finite precision could move real roots slightly into the complex plane during the solving process. Both equations (2.11) and (2.12) are applied in camera coordinates. This means that the global process described in (2.10) must be decomposed in world-to-camera and camera-to-image transformations to perform radial distortion in the middle of them. These transformations have already been described by equation (2.7) and matrix K (2.3) respectively. Coordinate transformations will be analyzed in more detail in sections 3.4.1 and 3.4.2. Other lens effects such as tangential distortion can be applied to the camera coordinates but they will not be considered here since they aren’t part of the proposed camera model. In fact, to avoid unnecessary problem complexity only one radial distortion factor κ will be used from now on, as also suggested by [8]. Experiments have shown that extra distortion factors didn’t affect too much to the final calibration of our camera set, and that too many factors could introduce numerical stability problems.

Chapter 3 Camera calibration methods The base algorithm developed here is basically an initial closed-form approximation of the camera parameters ignoring distortion factors with a Levenberg-Marquardt non-linear minimization afterwards. The implementation comes in many flavors: two different approaches to global optimization and separate calibration of intrinsic and extrinsic parameters. Finally, a distortion-removal method is proposed to enhance both speed and precision in all calibration procedures. The camera calibration procedure varies significantly if the given 3D points are coplanar. This causes the loss of 2 degrees of freedom and hence mathematical singularities if not processed adequately. Because of this reason, some parts of the calibration algorithm, especially the parameter initialization, will be split in different methods for coplanar and non-coplanar cases.

3.1

Homography calculation

By definition, homographies are invertible transformations from the real projective plane to a projective plane which keeps straight lines straight. The transformation between world and image points can be hence defined as an homography and expressed by a matrix. This section will describe the Direct Linear Transformation algorithm and its enhanced normalized version as analytical approaches to the calculation of the linear homography transformation between point sets, ignoring any non-linear distortion factors at the moment. Homography matrices from this section will be the key for later camera parameter initialization. 14

CHAPTER 3. CAMERA CALIBRATION METHODS

3.1.1

15

Direct Linear Transformation

Direct Linear Transformation or DLT is a general closed-form algorithm to calculate optimal homographies between sets of similar points. Originally developed in [1] as a camera calibration algorithm, later was found to be unable to correctly calibrate radial-distorted cameras. This method calculates, as long as enough linear independent points are provided for the required degrees of freedom, an homography matrix H which minimizes the projection error from one set to the other. Let x be the original point set and y the projected one, both with N points, this can be expressed by the following equation: yk ∝ Hxk with k = 1, . . . , N

(3.1)

Since the only constraint imposed by the similarity relation is that xk and yk are always proportional, this can be expressed as yk × Hxk = 0 where ’×’ denotes the cross product. For convenience both x and y sets are converted to homogeneous coordinates. Applying this method to the general P -D to 2D case it can be computed as follows. Let H be a homogeneous 3 × (P + 1) homography matrix with M elements and H1 T , H2 T , H3 T as rows. Let yk = (uk , vk , wk ) the k-th 2D homogeneous point. Ö

Hxk = Ö

yk × Hxk = Ö

H1 T xk H2 T xk H3 T xk

vk H3 T xk − wk H2 T xk wk H1 T xk − uk H3 T xk uk H2 T xk − vk H1 T xk

04 T −wk xk T vk xk T w k xk T 04 T −uk xk T T T −vk xk u k xk 04 T

èÖ

H1 H2 H3

è

(3.2) è

(3.3) è

=Ah=0

(3.4)

with h a M × 1 column vector encoding in row order the elements of H. This way the solution of the system A h = 0 gives the optimal H homography matrix scaled by an arbitrary real factor λ 6= 0. However, the matrix A as defined in (3.4) has only rank 2 since the third row is a linear combination of −uk /wk times the first row and −vk /wk times the second one. Because of

CHAPTER 3. CAMERA CALIBRATION METHODS

16

this, at least M/2 linear-independent point matches are required to compute a non-trivial solution of the vector h. Direct Linear transformation generates an overdetermined system by adding two rows to matrix A for each point in the calibration set. This generates a 2N × M matrix: 04 T −w1 x1 T v1 x1 T T w x T 04 −u1 x1 T 1 1 0 . ... Ö è . . . . . . H1 .. T T 0 T −w x v x 4 i i i i 0 H2 = T T T w i xi 04 −ui xi . H3 . ... . ... ... 0 04 T −wN xN T vN xN T T T T wN xN 04 −uN xN

(3.5)

Singular Value Decomposition of matrix A is used to find the vector h which minimizes kA hk with khk2 = 1 as constraint. The Singular Value Decomposition of an m × n matrix decomposes it in the product of the matrices A = U D V T where U is a m×m unitary matrix, D a m × n diagonal matrix and V the conjugate transpose of a n × n unitary matrix. The values in the diagonal of matrix D are called the singular values of the matrix. In the case of the matrix A as defined in (3.5) all values should be real and the matrices U and V orthogonal. Given the Singular Value Decomposition of matrix A, the solution vector h is the unit singular vector associated to the smallest singular value. This is the i-th column vector of V where Di,i is the smallest value in the diagonal of D. This is also the eigenvector of the smallest eigenvalue of the matrix AT × A.

3.1.2

Normalized Direct Linear Transformation

To avoid numerical problems when solving the equation (3.5) one possible solution is o perform a normalization of the world and image points used in the generation of the A matrix. In [4] is proposed to translate the origin to √ the centroid of the points and scale them to have an average length of 2. This implies that the farthest points in the set once normalized will be

CHAPTER 3. CAMERA CALIBRATION METHODS

17

approximately in the boundaries of the cube defined by (±1, ±1, ±1). This normalization process grants numerical stability to the Direct Linear Transformation process in a simple way. Since the normalization is defined as the combination of a translation and a scaling it can be expressed with affine matrices using homogeneous coordinates. First translating and then scaling, the normalization matrix Nx for 3D homogeneous coordinates is á

Nx = S T =

sx 0 0 0 sx 0 0 0 sx 0 0 0

á

=

ëá

0 0 0 1

1 0 0 0

sx 0 0 −sx cx1 0 sx 0 −sx cx2 0 0 sx −sx cx3 0 0 0 1

0 1 0 0

ë

0 −cx1 0 −cx2 1 −cx3 0 1

ë

(3.6)

√ N N 2 1 X xk , s x = P N with cx = N k=1 k=1 kxk k2 ignoring the homogeneous components in the calculation of cx , sx . The equivalent normalization matrix for 2D homogeneous coordinates Ny is straightforward, as is the generalization of Nx to P -dimensional points. Once the normalization matrices Nx and Ny are calculated, the method replaces the values used in equation (3.5) with normalized ones: 04 T −w10 x0 1 T v 0 1 x0 1 T 0 0 T w 1x 1 04 T −u0 1 x0 1 T 0 Ö è . . . . . . . . . . . . H1 0 0 T 0 0 T 04 T −w x x v k k k k H2 = 0 0T 0 T 0 0 T 04 −u k x k w ix i . . H3 . ... ... ... T 0 0 T 0 0 T 0 04 −w N x N v Nx N w 0 N x0 N T 04 T −u0 N x0 N T

Ö 0 è u k

with x0 k = Nx xk ,

v0k w0 k

= Ny yk

(3.7)

CHAPTER 3. CAMERA CALIBRATION METHODS

18

As in original Direct Linear Transformation the Singular Value Decomposition is used to solve the system. Then normalization transformations are combined into the normalized homography transformation Hn . Ö

Hn = Ny −1

H1 T H2 T H3 T

è

Nx

(3.8)

The inverse of the homogeneous 2D matrix Ny can be calculated directly as the composition of inverse affine transformations. Ny −1 = (S T )−1 = T −1 S −1 Ö

Ny −1 =

3.2

1 0 cy 1 0 1 cy 2 0 0 1

èÖ

è

1/sy 0 0 0 1/sy 0 0 0 1

Ö

=

1/sy 0 cy 1 0 1/sy cy 2 0 0 1

è

(3.9)

Estimation of camera parameters for noncoplanar point sets

This section provides a method to get an initial approximation to the linear camera model parameters. Both intrinsic and extrinsic parameters are initialized, excluding the non-linear ones as the radial distortion factors. The computed values will be used later as an initial solution for a non-linear optimization process described in section 3.5. The first step is to calculate the homography transformation for the given point sets as described in section 3.1. Applying the Normalized Direct Linear Transformation method (section 3.1.2) to current 3D to 2D case results in a 2N × 12 matrix A and a 3 × 4 matrix H. This homogeneous homography matrix H is equivalent to the global transformation P defined by equation (2.10), scaled by an arbitrary real factor λ 6= 0. It is important to note that trying to obtain P as proposed in this section with coplanar point sets will lead to mathematical singularities because of the loss of degrees of freedom in the provided information. In case of handling coplanar point sets, the procedures in section 3.3 should be applied.

CHAPTER 3. CAMERA CALIBRATION METHODS

3.2.1

19

Decomposition of the global transformation matrix

Because of the properties of the closed-form solution provided by the Normalized Direct Linear Transformation, the optimal world-to-image transformation for the linear parameters of the camera model is obtained. However, if the radial distortion is included in the model the global transformation P has to be decomposed in separate world-to-camera and camera-to-image transformation matrices as shown in section 2.3. This section describes the P matrix decomposition process. As expressed by equation (2.10), P is a 3 × 4 matrix which composes the world-to-image linear transformation without any radial distortion. This equation can be also stated as î

ó

î

P = K R | −K R C = A | K T

ó

with A = K R and T = −R C as stated in equation (2.5)

(3.10)

The matrix A is the topmost-leftmost 3 × 3 submatrix of P and encodes both the camera-to-image and the rotation transformations. As seen in (2.3) matrix K is an upper triangular matrix, and R is orthogonal by definition, since it’s a 3D rotation matrix. Extracting K and R from A requires factoring the matrix A in the product of an upper triangular matrix K and an orthogonal matrix R. This factorization is called RQ decomposition and it’s very similar to the QR decomposition where a given matrix is decomposed in the product of an orthogonal matrix Q and a upper triangular matrix R. The details about this decomposition are shown in section 3.2.2. The matrix K obtained by the RQ decomposition is unnormalized and the intrinsic parameters in the matrix are scaled by a factor determined by K3,3 . This can be solved by simply scaling the matrix by an homogeneous correction factor µ = 1/K3,3 . However, other details may arise regarding the correct values for some intrinsic parameters. This subject will be analyzed in more detail in section 3.2.4. Matrix R is required to be orthogonal. All rotation matrices are orthogonal by definition but not all orthogonal matrices encode valid rotations. Some details regarding orientation-preserving transformations arise here and they will be also analyzed in section 3.2.4.

CHAPTER 3. CAMERA CALIBRATION METHODS

20

The remaining parameter, camera position C can be easily calculated using K T , the fourth column of the matrix P . K T = −K R C C

(3.11)

= −RT K −1 (K T )

The calculation of K −1 can be performed very efficiently. This method will be discussed in section 3.2.5.

3.2.2

RQ decomposition

Both the RQ and QR decompositions of a matrix A can be considered as different versions of the same problem. In fact, QR decomposition is the result of applying Gram-Schmidt orthogonalization to the columns of the matrix from the first column, while RQ decomposition is the same but applied to rows starting from the last one. This relationship between the RQ and QR decompositions can be established without referring to one of the many solving methods like GramSchmidt, but expressing the RQ decomposition as a QR one. 0 ··· . ..

Pm =

0 .. . 0 .. .

···

0 ··· .. . 0 .. .

0 ··· 1 .. . . . . . .. · · · 1 · · · 0 .. . . . . .. . · · · 0 · · · 0 .. .. . .

··· 1 . . . . .. 1 ··· 0 ···

0 ···

(3.12)

0

ˆ R ˆ RQ(A) = KR, QR(AT Pm ) = K ˆ T Pm , R = Pm R ˆT K = Pm K

(3.13)

Since the resulting factorization from the QR decomposition is known to be unique, it is also the one obtained by the RQ decomposition. This ensures only one possible upper-triangular K and only one possible orthogonal matrix R which should encode a 3D rotation. Methods used to calculate the QR decomposition vary from Gram-Schmidt orthogonalization to Givens rotations among others. Once the equivalence between decompositions is addressed, specific simplifications to the 3 × 3 RQ factorization problem can be proposed.

CHAPTER 3. CAMERA CALIBRATION METHODS

3.2.3

21

Direct calculation of R using Givens rotations

An approximation to simplify the problem of the RQ factorization is to apply Givens Rotations directly to the permuted matrices. Givens Rotations are matrices of the form 1 ··· . . .. .. 0 · · · . .. G(i, j, θ) =

0 .. .

···

0 ···

0 .. .

···

0 ··· .. .

0 .. .

· · · s · · · 0 .. . . .. . . . −s · · · c · · · 0 .. .. . . .. . . . . 0 ··· 0 ··· 1 c .. .

(3.14)

where c = cos(θ), s = sin(θ) appear at the intersections of i-th and j-th rows and columns [3]. The Givens rotation matrices are orthogonal and can be used to introduce zeros in a matrix. Consider the following simplified example: Ç

c s −s c

åÇ å

Ç å

a r = b 0

√ with r = a2 + b2 , c = a/r, s = b/r

(3.15)

Since each application of a Givens rotation introduces a zero in the matrix, three ordered Givens rotations R1 , R2 , R3 are enough to produce the 3 × 3 upper triangular matrix K. Let Aˆ = AT Pm , si = sin(θi ), ci = cos(θi ), it is possible to calculate directly the three Givens angles θ1 , θ2 , θ3 that define R. Ö

Rx (θ) = Ö

Aˆ =

1 0 0 0 c −s 0 s c

A31 A21 A11 A32 A22 A12 A33 A23 A13

è

Ö

, Rz (θ) =

è

c −s 0 s c 0 0 0 1

(3.16)

è

, θ1 = arctan(A33 /A32 ), R1 = Rx (θ1 )T

(3.17)

CHAPTER 3. CAMERA CALIBRATION METHODS

Ö

ˆ = R1 Aˆ = B

22

A31 A21 A11 c1 A32 + s1 A33 c1 A22 + s1 A23 c1 A12 + s1 A13 −s1 A32 + c1 A33 −s1 A22 + c1 A23 −s1 A12 + c1 A13

è

(3.18)

θ2 = arctan((c1 A32 + s1 A33 )/A31 ), R2 = Rz(θ2 )T ˆ = R2 R1 A, ˆ θ3 = arctan(Cˆ32 /Cˆ22 ) Cˆ = R2 B θ3 = arctan((−s2 A21 + c2 c1 A22 + c2 s1 A23 )/(−s1 A22 + c1 A23 )) (3.19) R3 = Rx (θ3 )T The Givens angles θ1 , θ2 , θ3 are now expressed as simple operations from the direct elements of A. Once calculated, matrices K and R can be easily obtained. RT = Rx (θ1 )Rz (θ2 )Rx (θ3 )Pm

(3.20)

R

(3.21)

= Pm Rx (−θ3 )Rz (−θ2 )Rx (−θ1 ) Ü

ˆ = R3 R2 R1 A, ˆ K

K

=

ˆ 33 K ˆ 23 K ˆ 13 K ˆ 32 K ˆ 22 K ˆ 12 K ˆ 31 K ˆ 21 K ˆ 11 K

ê

= ART

(3.22)

However this method has some disadvantages. One of them is related to the possible r overflows or underflows, which can be solved with extra numerical considerations at the cost of simplicity. Another problem is the anti-diagonal permutation matrix Pm in the definition of R. This permutation, required to obtain the RQ decomposition as described, transforms R in a non-orientation-preserving transformation. This means that R performs an orientation with an extra negative scaling in some axis, which is not desired. Details about this will be discussed in section 3.2.4. Because of these problems, this method, although simple, was discarded in favor of the RQ decomposition, using the QR as described in section 3.2.2. This also allows the numerical library used to decide which method is more adequate for performing the QR decomposition, given the properties of the matrix A. Other methods, like row-based Gram-Schmidt, were also considered but discarded due to numerical stability problems.

CHAPTER 3. CAMERA CALIBRATION METHODS

3.2.4

23

Orientation-preserving transformations and homogeneous coordinates

As mentioned briefly in some previous sections matrices K and R obtained from the decomposition of the global matrix P require extra handling to ensure correct camera parameter values in them. Although the equation (2.10) defines the global linear world-to-image transformation correctly, it’s important to remark that P is an homogeneous matrix obtained, if using Direct Linear Transformation, from a vector of length 1. This means that there are infinite correct P matrices, each one scaled by a different non-zero real number. This has consequences in the matrix decomposition process related in section 3.2.1. First of all, the values of the K matrix obtained by the RQ decomposition are not directly usable. Matrix K is scaled by a factor located in K33 since this element should be 1 by definition. This homogeneous scaling is quite easy to solve, just scaling the matrix K by a factor µ = 1/K33 . However the signs of the first and the second columns of K may be reversed, the ones defining the focal length and the skew. This is caused because the equation (2.10) can be also expressed as this: î

ó

µP = (µKS)(S −1 R) I3 | −C ≡ P Ö

with S =

è

sx 0 0 0 sy 0 0 0 1

(3.23) a scaling matrix.

Since R must keep its orthogonality, the scaling factors sx , sy cannot be any non-zero real number but only ±1, implying also S −1 = S. With this, the equation (3.23) suggests that it is mathematically correct to negate the first or second columns of K independently, negating also the first or second rows of R as consequence but keeping a valid P transformation matrix. Another important detail is that R is supposed to be a 3D rotation matrix and not just an orthogonal matrix. All rotation matrices are orthogonal but the reverse implication is not true. There is an extra requisite that a rotation matrix must fulfill: it should preserve the orientation. In mathematical terms this means that the determinant of R should be +1. A non-orientationpreserving matrix R will be also orthogonal but with determinant -1.

CHAPTER 3. CAMERA CALIBRATION METHODS

24

However, the scaling matrix mentioned before affects directly the determinant of SR, negating it with each negative element in the diagonal of S. It is also possible although counterintuitive that after deliberately correcting the sign of the focal lengths, the determinant of the resulting SR can be negative. This would imply that the correct transformation with correct focal lengths mirrors x or y coordinates at some point, which is obviously wrong unless there is a left-handed/right-handed coordinate system conversion. In this case all coordinate systems are assumed to be right-handed. There is a simple reason for this and it has already been noted: P is an homogeneous matrix and it can be scaled by any non-zero real value. In the case of a non-orientation-preserving SR with a correct S, all the P transformations should be scaled by -1. This will leave an orientation-preserving rotation together with a valid K since the homogeneous factor µ will be also negated, correcting the values back to positive. Introducing this sign factor σ = ±1 to correct the orientation of the SR matrix the equation (3.23) becomes î

ó

î

ó

˜R ˜ I3 | − C ≡ P P˜ = µσP = (µKS)(σSR) I3 | − C = K ˜ = µKS, R ˜ = σSR with corrected matrices K

(3.24)

This extended version of the definition can remove the numerical artifacts introduced by the Direct Linear Transformation to the signs of the K and R matrices. However, an appropriate calculation of the sx , sy , σ correction factors is required. Both sx and sy depend on µ, and σ relies on the scaling matrix S and the determinant of SR. Using this, the correct factors can be calculated as follows: µ = 1/K33 sx = sign(µK11 ) sy = sign(µK22 ) σ = −sx sy

(3.25)

It is important to remark that these sign corrections are performed under the assumption of a positive camera focal length. In case this is not true, both sx and sy factors should be negated in the previous equation. Also the sign of σ is negated in the equation because R, as defined by equation (3.21), is always a non-orientation-preserving transformation because of the permutation Pm .

CHAPTER 3. CAMERA CALIBRATION METHODS

25

Introducing the µ and σ factors in equation (3.10) affects the calculation of the camera position C and the translation vector T too: î

ó

î

˜R ˜ I3 | − C = K ˜ R ˜ | T˜ P˜ = K T˜ = µ σ T = µσK −1 P4 ˜ T T˜ C = −R

ó

(3.26) (3.27) (3.28)

Finally, the corrected the world-to-camera and camera-to-image transformation matrices can be defined respectively as î

ó

˜ | T˜ R ˜ CI = K

WC =

3.2.5

(3.29) (3.30)

Inversion of the camera-to-image matrix

˜ can be Both the camera-to-image matrices K and its corrected version K expressed as the concatenation of three affine transforms: a scaling S, a shearing H and a translation T . With K = T HS, using the parameters defined in equation (2.3) K can be defined as: Ö

K=

1 0 cx 0 1 cy 0 0 1

èÖ

èÖ

1 s 0 0 1 0 0 0 1

è

αx 0 0 0 αy 0 0 0 1

Ö

=

αx sαy cx 0 α y cy 0 0 1

è

(3.31)

Using this definition, calculating K −1 is immediate since inverses of affine matrices are well known: Ö 1

K −1 =

αx

0 0

K −1 = (T HS)−1 = S −1 H −1 T −1

0 1 αy

0

èÖ

0 0 1

èÖ

1 −s 0 0 1 0 0 0 1

1 0 −cx 0 1 −cy 0 0 1

è

=

Ü

1 αx

0 0

−s αx 1 αy

scy −cx αx −cy αy

0

1

ê

(3.32)

3.2.6

Overall camera initialization for non-coplanar sets

The concepts from the previous sections can be merged together to define a camera initialization algorithm based on Normalized Direct Linear Transformation and extraction of the camera parameters by the decomposition of the obtained transformation matrix P .

CHAPTER 3. CAMERA CALIBRATION METHODS

26

Algorithm 1 - Initialization of the camera model for non-coplanar points. −1 1: Build the matrices Nx , Ny , Ny using equations (3.6) and (3.9). 2: Solve the system described in (3.7) applied to the 3D-homogeneous to 2D-homogeneous case using Singular Value Decomposition and get the homography matrix P = Hn applying the equation (3.8). î ó 3: Let P = A | T with A a 3 × 3 matrix and T a 3 × 1 column vector. 4: Decompose A = KR using RQ decomposition as described in (3.13). 5: Calculate µ, sx , sy and σ correction factors using equations (3.25). ˜ and R ˜ as described in (3.24). 6: Calculate the corrected matrices K 7: Calculate the corrected translation vector T˜ as defined in (3.27). 8: Build the world-to-camera matrix W C, the camera-to-image matrix CI and its inverse CI −1 using equations (3.29), (3.30) and (3.32). 9: Extract intrinsic camera parameters f, s, cx , cy from CI using equation (2.3) and the dp world units/pixels conversion factors. ˜ in some way (x-y-z Euler 10: Extract extrinsic camera parameters encoding R angles for example) and calculate the camera position using (3.28). Radial distortion factor κ is set to 0 as it cannot be initialized by this algorithm due to its non-linear nature.

3.3

Estimation of camera parameters for coplanar point sets

Coplanar point sets are those where world point coordinates lie all in the same plane. Rather than an optimization case it implies a different mathematical scenario because of the linear dependencies in the data. Planar point sets have two degrees of freedom less than non-coplanar ones as they only provide rotation information around the plane normal. In this section, techniques for approximating initial camera parameters will be presented as they were for non-coplanar point sets. Most of the techniques presented here come from Zhang’s work [9] making little adjustments to the problem environment and replacing its proposed homography calculation by maximum likehood with Normalized Direct Linear Transformation.

3.3.1

Calculation of the homography transformation

As told before, coplanar points provide two degrees of freedom less than non-coplanar ones. This has immediate consequences over the overdeter-

CHAPTER 3. CAMERA CALIBRATION METHODS

27

mined matrix A from equation (3.7) in the Direct Linear Transformation algorithm. If built as described, the rank of matrix A will be lower than the expected number of parameters in the homography and consequently the algorithm will return a trivial solution vector with norm 1. Despite of that mathematical issue, the Normalized Direct Linear Transformation algorithm can still be used to calculate the homography between the world and image points. Homography matrix H requires to be reformulated to encode only the degrees of freedom that can be provided by the coplanar point sets. Without loss of generality it can be assumed that, since world points are coplanar, all of them lie on the Z = 0 plane. Any other 3D coplanar configuration is equivalent as it can be converted by means of a simple 3D rotation. With this, equations (2.7) and (2.10) can be simplified to á

Ö è

u v w

î

= K r1 r2 r3 T

Ö è

u v w

Ö î

= K r1 r2 T

ó

ó

xw yw 1

xw yw 0 1

è

ë

(3.33)

with ri the i-th column of the rotation matrix R. This redefines the previous 3 × 4 global transformation matrix P to a new 3 × 3 homogeneous homography matrix with 8 degrees of freedom: î

ó

λH = K r1 r2 T , λ 6= 0

(3.34)

This new homography matrix can be calculated by the Normalized Direct Linear Transformation algorithm using coplanar point sets. To achieve this, we must adapt the P -dimensional case of the algorithm to P = 2 producing a 2D-homogeneous to 2D-homogeneous homography transformation. This general case was described in section 3.1.1. In order to apply data normalization use 2D-homogeneous matrices for both image and world data. Details about data normalization for the Direct Linear Transformation algorithm were analyzed in section 3.1.2. The 2-dimensional case of the Direct Linear Transformation algorithm produces an overdetermined matrix A of size 2N × 9 where each point in the set provides 2 new equations. As H has 8 degrees of freedom at least 4

CHAPTER 3. CAMERA CALIBRATION METHODS

28

points are required. However, during the implementation of the algorithm, numeric problems arise in the Singular Value Decomposition process due to the underdetermined status of the matrix A with N = 4. The implementation found in some mathematical packages and tools like GNU Octave don’t seem to be affected by this. In any case, using 5 non-aligned points is enough to avoid these numerical problems. This simple adaptation allows us to obtain a closed-form solution of the transformation between world and image coordinates as done in the noncoplanar case. However this solution can’t be processed in the same way requiring additional calculations. These will be analyzed in the next section.

3.3.2

Estimation of intrinsic parameters with multiple sets

As stated by [9] but adapted to the notation used here Ö

B = K −T K −1 ≡ à

=

1 αx 2 − αsx 2 s cy −cx αx 2

B11 B12 B13 B12 B22 B23 B13 B23 B33

− αsx 2 s2 1 αx 2 + αy 2 −

s (s cy −cx ) αx 2

−

cy αy 2

è

í s cy −cx 2 αx s (s cy −cx ) − αx 2 − αcyy2 (s cy −cx )2 cy 2 + 2 αx αy 2 + 1

(3.35)

with K −T as an abbreviation for (K −1 )T or (K T )−1 . Note that γ = s αy has been used when adopting equations from original Zhang’s article [9]. The matrix B is directly related with the intrinsic camera parameters. Note that, as B is symmetric, it can be be defined by the 6-dimensional vector î óT b = B11 B12 B22 B13 B23 B33 (3.36) There’s a direct î ó relation between the homogeneous homography matrix H = h1 h2 h3 , with hi the i-th column vector of H, and the vector b. hi T Bhj = vij T b with vij = [ hi1 hj1 , hi1 hj2 + hi2 hj1 , hi2 hj2 , hi3 hj1 + hi1 hj3 , hi3 hj2 + hi2 hj3 , hi3 hj3 ]T

(3.37) (3.38)

CHAPTER 3. CAMERA CALIBRATION METHODS

29

Based on these equations, a new overdetermined system V b = 0 can be written in a similar way as previously done in the Direct Linear Transformation algorithm, but with elements defined by vij . This time V must provide information for 5 degrees of freedom: αx , αy , s, cx and cy which define the intrinsic camera parameters. One homography matrix H provides only 2 equations: ñ ô v12 T b=0 (3.39) (v11 − v22 )T which means that at least 3 homographies are required to compute all the intrinsic parameters. Those are required not to be parallel to the plane where world points lie or else they don’t provide any additional information. The more homographies are used, the more equations are added to V and better results are achieved. With M homographies V becomes a 2M × 6 matrix which can be calculated by replicating the rows in equation (3.39) with the values from each homography. In case that not enough homographies can be provided, additional constraints can be added to V . The easiest constraint, also proposed by Zhang [9], is to force the skew factor s = 0. Under this assumption more constraints can be added to the system. For example, the camera model proposed in chapter 2 assumes αx = αy , which is true in our project calibration context. Including this as a constraint also allows to force the optical center, also called principal point, to be the same as the image center. Let w, h be respectively the width and height of the camera image, these constraints can be defined in V by the following equations: î

ó

−s =0 αx 2 î ó s=0 1 1 1 0 −1 0 0 0 b = − 2 =0 2 αx αy î ó 1 2 cx s=0 0 0 1 w2 0 0 b = − =0 αx =±αy α2 w α2 î ó s=0 1 2 cy 0 0 1 0 h2 0 b = − =0 2 αy h αy 2 0 1 0 0 0 0 b=

αx 6=0

=⇒ s = 0

αx ,αy 6=0

=⇒

αx = ±αy

w 2 h αy ,h6=0 =⇒ cy = 2 α,w6=0

=⇒ cx =

(3.40) (3.41) (3.42) (3.43)

Note that if only the two first equations are applied, then αx and αy are not constrained to be the exactly the same nor to be positive. However, only positive focal lengths will be considered. This calibration context constraint is also present in the non-coplanar case. See section 3.2.4 for more details.

CHAPTER 3. CAMERA CALIBRATION METHODS

30

These constraints together provide enough information to solve the system even in the case of only one homography matrix. However, this is strongly not recommended as it introduces notable errors in the results. Detailed information about the error on b as the number of planes M increases can be found in the results section of [9]. Once enough constraints are provided, vector b can be calculated from V using the Singular Value Decomposition in the same way as done in the Direct Linear Transformation algorithm. Details of the procedure can be found in section 3.1.1. All the intrinsic camera parameters can be obtained from the elements of b: b2 b4 − b1 b5 b1 b3 − b2 2 b4 2 + cy (b2 b4 − b1 b5 ) µ = b6 − b1 » αx = + |µ/b1 | cy =

αy =

Ã µb1 + b1 b3 − b2 2

dpx αx + 2dpy αy 3 2 b2 αx s=− µ b4 αx 2 cx = scy − µ f=

(3.44) (3.45) (3.46) (3.47) (3.48) (3.49) (3.50)

In these equations, µ 6= 0 is the real scaling factor applied to the homogeneous symmetric matrix B defined implicitly by the vector b. The focal length f is calculated from the αx , αy and skew parameters as a mean value using the world unit to pixel conversion ratios. As the skew should be always very small in any real case, taking positive roots in the calculation of αx and αy ensures in practice a positive focal length without requiring to perform any additional sign adjustments like in the non-coplanar case. Another possibility to compute the intrinsic parameters from only two planes is applying a known pure translation. The direction of the translation and its magnitude can be used to build constraints to calculate K directly. This method has not been implemented here to avoid precision problems derived from the translation measurement, but the details can be found in appendix D of [9].

CHAPTER 3. CAMERA CALIBRATION METHODS

3.3.3

31

Estimation of extrinsic parameters

Once the intrinsic camera parameters have been obtained, it is possible to calculate the extrinsic parameters from an homogeneous homography matrix H and the matrix K −1 . This latter matrix can be built directly from the intrinsic parameters in the same way as described in section 3.2.5 for the non-coplanar case. Let hi be the i-th column of H, the elements defining extrinsic parameters from equations (3.33) and (3.34) can now be calculated: r1 r2 r3 T

= λK −1 h1 = λK −1 h2 = r1 × r2 = λK −1 h3

(3.51) (3.52) (3.53) (3.54)

with λ = 1/kK −1 h1 k2 = 1/kK −1 h2 k2 as the 2-norm of the column vectors from any rotation matrix is always 1. However note that this leaves ±λ as valid solutions to the constraint. For convenience the positive value is taken, with special considerations that will be applied later. Another problem to consider in the previous equation is the î noise inó the data. This noise can easily make that the matrix R = r1 r2 r3 does not satisfy the properties of a rotation matrix, like orthogonality or its determinant being equal to 1. To avoid this situation [9] proposes a ˜ which ensures its method to refine the matrix R into an approximate R T expected rotation matrix properties. Let U DV = R be the Singular Value Decomposition of R, ˜ = UV T R (3.55) ˜ the camera position can be calcuWith this corrected rotation matrix R lated as usual using the equation (2.6). However it is possible that negative z positions are obtained at this point. In our calibration environment this is simply wrong as positive z axis points out from the calibration target to the camera. This sign issue is caused by the ±λ solutions of the homogeneous scaling factor. A very simple way to solve this is, in case of negative z positions, use −λ as homogeneous scaling factor and recalculate: î

−λK −1 H = λ −r1 −r2 −T

ó

(3.56)

CHAPTER 3. CAMERA CALIBRATION METHODS Ö î

C 0 = − −r1 −r2 r3 î

ó

óT

−T =

Cx Cy −Cz

32

è

˜ r1 −˜ r2 r˜3 with r˜i the i-th column of R R˜0 = −˜

(3.57) (3.58)

As a consequence of the negation, the z component of the camera position becomes positive and the rotation angles change. Once these final corrections are performed the extrinsic parameters can be obtained. As in the non-coplanar case, any desired 3D rotation encoding ˜ or to R˜0 if λ negation was used. Camera can be applied to the matrix R 0 position is either C or C depending also on the λ sign correction.

3.3.4

Overall camera initialization for coplanar sets

In the case of coplanar point sets, the initialization of intrinsic and extrinsic camera parameters is performed separately and with different input arguments. Multiple point sets are required to calculate a good approximation of the intrinsic parameters as they are common to all world to image transformations. On the other hand extrinsic parameters depend on the point set used, consequently requiring only one of them. Given M coplanar point sets of Ni , 1 ≤ i ≤ M elements, the process to initialize the linear intrinsic camera parameters is as follows. Algorithm 2 - Initialization of intrinsic parameters for M coplanar sets. 1: Calculate the homogeneous homography matrix Hi of each i-th point set applying the 2D-homogeneous to 2D-homogeneous case of the Normalized Direct Linear Transformation algorithm described in section 3.3.1. 2: Use the M homogeneous homography matrices Hi to build an overdetermined matrix V of size 2M × 6 as described by (3.38) and (3.39). 3: If M < 3 add to V constraints for skew factor s = 0 and scaling factors αx = αy as defined by equations (3.40) and (3.41). 4: If M < 2 add to V constraints for setting the optical center or principal point to the image center, as defined by equations (3.42) and (3.43). 5: Use Singular Value Decomposition of V to calculate the vector b as done in the Direct Linear Transformation algorithm. 6: Calculate intrinsic parameters from b applying equations (3.44 - 3.50). 7: Build the camera-to-image matrix CI = K and its inverse CI −1 using equations (3.30) and (3.32).

CHAPTER 3. CAMERA CALIBRATION METHODS

33

Once the intrinsic parameters have been computed, given a coplanar point set, the initialization of the extrinsic camera parameters is as follows. Algorithm 3 - Initialization of extrinsic parameters for a coplanar point set. 1: Calculate the homogeneous homography matrix H of the coplanar point set applying the 2D-homogeneous to 2D-homogeneous case of the Normalized Direct Linear Transformation algorithm described in section 3.3.1. 2: Calculate matrix R = [r1 , r2 , r3 ] and translation vector T applying equations (3.51 - 3.54). ˜ using Singular Value Decompo3: Refine matrix R into rotation matrix R sition and equation (3.55). ˜ to equation (2.6). 4: Calculate camera position C applying R ˜ with equations (3.57 - 3.58). 5: If Cz < 0, correct sign updating C and R ˜ as a 3D rotation in some way (x-y-z Euler angles for example). 6: Encode R 7: Build world-to-camera matrix W C as expressed in equation (2.7) but ˜ using corrected rotation matrix R. Like in the non-coplanar case, radial distortion factor κ is left to 0 as it cannot be initialized by these algorithms due to its non-linear nature.

3.4

Coordinate transformations

Another important operation that should be provided by a calibration algorithm is the transformation of point coordinates from the world system to the image projected on the camera and vice versa. They are also a vital piece for calibration error metrics such as the proposed later in section 3.5.1. Other possible transformations to consider are the application or removal of radial distortion from image point sets. These will be useful for the generation of undistorted camera models in section 3.7. The aim of this section is to cover the different coordinate transformations that may arise in a camerarelated application. Note that all operations in this section are based directly on the camera model parameters and the matrices W C, CI and CI −1 generated from them. Calibrating using coplanar or non-coplanar points make no difference at this point, neither it does in subsequent sections unless explicitly stated.

CHAPTER 3. CAMERA CALIBRATION METHODS

3.4.1

34

World to image coordinate transformation

The conversion from world coordinates to image coordinates is quite simple since it’s the main transformation on which different equations until now have been based. It summarizes in the combination of three consecutive transformations: world to camera, application of radial distortion and camera to image. In a more formal and detailed way, the world-to-image transformation is defined by the following algorithm. Algorithm 4 - Transformation from world to image coordinates. 1: Let pw be the homogeneous world point to be transformed, W C the world-to-camera matrix, CI the camera-to-image matrix and κ the radial diffusion factor. 2: Calculate pˆu = W C · pw and project to x-y plane using pu = pˆu /pˆu z with pu in undistorted camera coordinates. 3: Apply radial distortion to pu and to get the distorted point pd solving the polynomial described by equation (2.12). 4: Transform pd to image coordinates applying pˆi = CI · pd and normalize with pi = pˆi /pˆi z . Then pi contains the desired image coordinates. It is important to note that any world point in the camera plane will produce a division by zero when projecting to 2D camera coordinates. This is mathematically correct and should be checked by the user to avoid infinite and Not a Number values in their results. This should be taken into account if trying to project the camera position to image coordinates expecting to get the optical center of the camera.

3.4.2

Image to world coordinate transformation

Transformation from image coordinates to world coordinates is more tricky than its counterpart described in previous section. Just taking a simple look to the geometrical problem leaves with infinite world points for a single image point. In fact for this transformation to be performed an extra parameter is required: the depth component of the resulting world point. An algebraic approach to the problem renders exactly the same problem. The W C matrix is a 3 × 4 non-square matrix with no inverse, a pseudoinverse at most. The solution proposed for this problem is based in this last approach of the problem and avoids the complicated scalar formulas proposed

CHAPTER 3. CAMERA CALIBRATION METHODS

35

by Tsai algorithm’s implementation [8] and their potential numerical stability problems. Algorithm 5 - Transformation from image to world coordinates. 1: Let pi be the homogeneous image point to be transformed, z the world depth coordinate, W C the world-to-camera matrix, CI the camera-toimage matrix and κ the radial diffusion factor. 2: Convert pi to distorted camera coordinates pd applying pˆd = CI −1 pi and normalizing with pd = pˆd /pˆd z . 3: Remove radial distortion from pd into pu using equation (2.11). 4: Build a new world-to-camera matrix Wz made of the two first columns of W C and with W C4 + z W C3 as third column (Ai as the i-th column). 5: Calculate pw world coordinates by solving the system Wz pˆw = pu and then normalizing with pw = pˆw /pˆw z . 6: Desired world coordinates are provided by the point (pw x , pw y , z). Similarly to what already happened with world-to-image transformation in section 3.4.1, trying to transform from image to world coordinates located in the camera plane will cause numeric problems. In this case the matrix W Cz will become a singular matrix causing divisions by zero or invalid results. Again, this should be taken into account by the user when projecting to world coordinates. A case of this problem would be to project the optical center of the image using the camera position world depth trying to obtain camera position coordinates.

3.4.3

Application and removal of radial distortion in images

One interesting possibility to improve both world-to-image and image-toworld transformations is to eliminate the radial distortion from the process. This can be done using precalculated distortion maps which can be also used as cache to generate undistorted camera images. Another interesting option derived from this is to remove the distortion from the camera model and recalibrate using an undistorted training set. These possibilities will be explored in sections 3.7 and 3.7.2. Regardless of its applications, the aim of this section is to describe the transformations required to remove or apply radial distortion to a given image point into another. Let pd and pu be respectively distorted and undistorted homogeneous image points, both require to be converted to camera sensor

CHAPTER 3. CAMERA CALIBRATION METHODS

36

coordinates to remove the radial distortion using the equation (2.11). After that, coordinates require to be converted back to the image system so they can be used. The following algorithms describe the distortion and undistortion processes in a more formal way. Algorithm 6 - Radial distortion removal from image coordinates. 1: Let pd be the homogeneous distorted image point to be transformed. 2: Convert pd to distorted camera coordinates cd applying cˆd = CI · pd and normalizing with cd = cˆd /cˆdz . 3: Remove radial distortion from cd into cu using equation (2.11). 4: Calculate pu undistorted image coordinates applying pˆu = CI −1 cu and then normalizing with pu = pˆu /pˆu z .

Algorithm 7 - Radial distortion application to image coordinates. 1: Let pu be the homogeneous undistorted image point to be transformed. 2: Convert pu to undistorted camera coordinates cu applying cˆu = CI · pu and normalizing with cu = cˆu /cˆuz . 3: Apply radial distortion to cu into cd using equation (2.12). 4: Calculate pd distorted image coordinates applying pˆd = CI −1 cd and then normalizing with pd = pˆd /pˆd z .

3.5

Non-linear optimization process

Algorithms 1 for non-coplanar point sets, and 2, 3 for coplanar ones provide a good initialization of camera parameters and transformations. In fact, the parameters obtained with these algorithms should be the optimal to minimize the world-to-image distance in a linear undistorted environment. However as seen in section 2.3 the real world-to-image transformation is not linear at all since camera optics introduce radial distortions. Calibration of non-linear parameters like radial distortion factors require the use of non-linear optimization techniques, such as non-linear least square methods. These methods optimize a set of free variables using Jacobianbased calculations to minimize the 2-norm of a given multivariate function.

CHAPTER 3. CAMERA CALIBRATION METHODS

37

In this implementation the Levenberg-Marquardt algorithm is used. This algorithm performs an interpolation between the Gauss-Newton algorithm and the Gradient Descent method, providing more robustness than any of the two at the cost of usually having a slower convergence. The LevenbergMarquardt algorithm is implemented by many numerical libraries such as MINPACK (in FORTRAN), the GNU Scientific Library, liboctave or some Java packages. For example, the GNU Scientific Library internally acts as an interface to the original MINPACK implementation of the algorithm. LevenbergMarquardt is available in the GNU Scientific Library as the multifit derivative solver of type gsl multifit fdfsolver lmsder for non-linear least squares minimization. Using Levenberg-Marquardt non-linear optimization method requires providing a target multivariate function to minimize and a second function to calculate the derivatives (the Jacobian matrix) of the target function. These points will be both covered in the next sections. Also, a stop criterion is required for the minimization process. This implementation uses a simple tolerance-based criteria with absolute and relative thresholds. Other methods based on gradient norms can also be employed, but were not used in the implementation of the calibration algorithm.

3.5.1

Error metrics

A target function for non-linear minimization is a function that uses as parameters the available variables and point sets to produce some kind of error measurement which should be minimized. This error comes in the form of a vector defined by the user. Consequently, an adequate error metric should be applied. One simple error metric is the world-to-image error. Calculating it is quite simple: just transform the world point set into image coordinates and create an output vector with the differences of the real image points and the transformed ones. In an undistorted environment the Normalized Direct Linear Transformation method from section 3.1.2 should provide the optimal calibration for a given set.

CHAPTER 3. CAMERA CALIBRATION METHODS

38

Algorithm 8 - World-to-image error metric. 1: Let W be the world training set and I the image training set, both in homogeneous coordinates. 2: Transform points Wi ∈ W into image coordinates Iˆi using algorithm 4. 3: Calculate the distance between image points Ei = Ii − Iˆi (error in pixels). 4: Errors Ei can be grouped together as required by the implementation. The mean of

Ei

gives a metric of the global calibration error. 2

Through simple and intuitive, this metric shows to be a very good approximation. Another obvious possibility is to use its inverse transformation: transform image coordinates to world coordinates using original depth values. The algorithm is quite straightforward. Algorithm 9 - Image-to-world error metric. 1: Let W be the world training set and I the image training set, both in homogeneous coordinates. ˆ i using algorithm 2: Transform image points Ii ∈ I into world coordinates W 5 with original Wiz world depth coordinate. ˆ i (error in world 3: Calculate the distance between world points Ei = Wi − W units). 4: Errors Ei can be grouped together as required by the implementation. The mean of

Ei

gives a metric of the global calibration error. 2

Some experimentation shows that world-to-image metric is a better calibration metric than the image-to-world one, maybe because of the different scales in error units. However both world-to-image and image-to-work metrics can be combined into a single error metric as long as error values have the same units. This technique will double the number of point error samples. Unit conversion can be easily performed with the dp ratios provided by the camera manufacturer, previously considered in section 2.1. This combined metric showed to be a good option as well. Inverse transformation was also considered as an option for error metrics. This implies transforming points from a set to the other, applying the inverse transformation and comparing the results with the original value. This metric comes in two version depending on which point set is used as the initial one. Also, the two versions can be combined in the same way as world-to-image and image-to-world were. The following algorithm describes this composed metric.

CHAPTER 3. CAMERA CALIBRATION METHODS

39

Algorithm 10 - Inverse transformation error metric. 1: Let W be the world training set and I the image training set, both in homogeneous coordinates. 2: Transform points Wi ∈ W into image coordinates Iˆi using algorithm 4. ˆ i using algorithm 3: Transform image points Ii ∈ I into world coordinates W 5 with original world Wiz depth coordinate. ˆ i (error in world 4: Calculate the distance between world points Ei = Wi − W units). 5: Calculate the distance between image points Ei+N = (Ii − Iˆi ) · dp (error in world units). 6: Errors Ei can be grouped together as required by the implementation. The mean of

Ei

gives a metric of the global calibration error. 2

However experiments with inverse error produced very poor calibration results. Inverse transformation error seemed to be always near the finiteprecision zero in any of the two versions. This led to the creation of a new error metric based on the results obtained with previous algorithms. This proposed error metric is called reciprocal error and tries to combine the worldto-image, image-to-world and inverse transformation metrics in a single one. Algorithm 11 - Reciprocal error metric. 1: Let W be the world training set and I the image training set, both in homogeneous coordinates. 2: Transform points Wi ∈ W into image coordinates Iˆi using algorithm 4. ˆ i using algorithm 3: Transform image points Ii ∈ I into world coordinates W 5 with original Wiz world depth coordinate. ˆ i into image coordinates I˜i using algorithm 4. 4: Transform back W 5: Calculate the distance between projected and inverse points Ei = I˜i − Iˆi (error in pixels). 6: Errors Ei can be grouped together as required by the implementation. The mean of

Ei

gives a metric of the global calibration error. 2

Results show that reciprocal error has very good results, quite similar to world-to-image metric or even better. In fact, since inverse transformation usually reduces to finite-precision zero values this metric tends to be equivalent to the world-to-image error. It also groups all previous ideas into a single metric which produces one error measurement per point of the training set. For these reasons this method was adopted as error metric and will be used in all posterior calibration and minimization procedures described here.

CHAPTER 3. CAMERA CALIBRATION METHODS

3.5.2

40

Numerical estimation of the Jacobian matrix

As stated in the beginning of section 3.5 the Levenberg-Marquardt non-linear minimization algorithm is used to calibrate the camera model including nonlinear distortion parameters. Since this method is based in the Gauss-Newton and Gradient Descent algorithms it requires the derivative of the function being minimized to work. In this case, as the target function is a multivariate function its derivative is defined by a m × n Jacobian matrix with m the number the size of the function output as a vector and n the number of parameters passed to it. This Jacobian matrix is composed of m × n partial derivatives for a given set of function parameters. Since the camera model is non-lineal and involves many matrix calculations it can be quite complex to compute analytical derivatives for the Jacobian matrix. If model is reduced to an undistorted camera then an analytical approach is feasible. However, it would require to recalculate the derivatives each time the target function is modified. Because of this, to allow the calibration of only a subset of parameters and independently of the target function or the metrics used a numerical-based approach was implemented. The implemented numerical calculation of the Jacobian matrix is a simple one, through it showed to be enough for this problem and the LevenbergMarquardt algorithm. The calculation of each partial derivative value is based in the original definition of the derivatives using limits, but centered around the point to reduce numerical artifacts: f (x + ) − f (x) f (x + ) − f (x − ) ∂f (x) = lim ' →0 ∂x 2

(3.59)

The simple numerical approach used is to take a very small and constant and apply the formula above as if it were not a limit. This calculation has to be repeated for each pair of input parameters and output values of the target function producing this way the desired Jacobian matrix. The precision of this method depends on the used. The closer to zero, the more precise it should be. However because of the finite precision using too small values can cause numerical instabilities and errors.

CHAPTER 3. CAMERA CALIBRATION METHODS

41

Algorithm 12 - Numerical calculation of the Jacobian matrix. 1: Define a constant value near to 0. For example, = 10−6 . 2: Let J be an m × n matrix. 3: for each i-th input parameter do 4: Define two new input parameter vectors xi− = x − i and xi+ = x + i with i a n-sized vector of zeros excepting in the i-th position. 5: Evaluate twice the target function yi − = f (xi− ), yi + = f (xi+ ) with each yi a m-sized output vector. 6: Set Ji = (yi + − yi − )/(2 ) where Ji is the i-th column of J. 7: end for 8: At this point J contains the numerical approximation to the Jacobian matrix of f for the input parameters x. As can be seen, this method performs a total of 2n evaluations of the target function, each one providing m output values. This causes the calculation of the Jacobian matrix to be computationally linear with the number of input parameters n and with the cost of the target function f . More accurate and complex methods exist for the numerical approximation the Jacobian matrix, for example using 5 points to approximate the function derivative. These are usually used in combination with NewtonRaphson-based root-finding methods. However the proposed basic method seems to be enough for the Levenberg-Marquardt algorithm applied to the camera calibration problem and its required precision.

3.6

Overall calibration process

Merging the contents from previous sections now it’s possible to define an overall camera calibration process. First of all it should be decided if global or separate camera parameter calibration is desired. Global calibration will estimate both intrinsic and extrinsic parameters at the same time. In this case two different versions were implemented for testing purposes, distinguished by the set of free variables used in non-linear optimization. The first one is a transformation-oriented calibration based on the optimization of the global transformation P . The second one is directly based on the camera model parameters described in chapter 2. On the other hand the separate calibration allows to precalculate the constant intrinsic parameters independently. This brings the possibility of using different calibration targets for intrinsic parameters which ensure that

CHAPTER 3. CAMERA CALIBRATION METHODS

42

all the image area is covered, and then a separate and more adequate target to calibrate extrinsic parameters. It also allows more complex calibration situations like using multiple point sets to calibrate the intrinsic parameters of the camera in non-coplanar point sets, as it’s the usual case for coplanar ones. All four cases, the two global calibration methods and the separate intrinsic and extrinsic parameter calibrations, will be analyzed in detail in the following sections.

3.6.1

Transformation-oriented global calibration

This global calibration method gives a matrix-based view of the calibration process. It has a total of 13 variables, 12 from the 3× 4 matrix P plus one for the radial distortion factor κ. However, not all the 12 variables from P are linearly independent which causes redundancy in the minimization process. Despite of this, testing has shown that extra degrees of freedom can help to achieve slightly lower error values than model-oriented approaches. Also this method avoids recomputing camera parameters from transformation matrices until the non-linear minimization process has finished. This method should be used when precise coordinate transformations are desired, not caring too much about the correctness of the camera model or the precision of its parameters. Algorithm 13 - Transformation-oriented camera calibration. 1: Initialize camera parameters using algorithm 1 for non-coplanar point sets or algorithms 2 and 3 for coplanar ones to obtain P0 = CI 0 × W C 0 . 2: Set the 12 elements of the 3×4 matrix P0 plus the radial distortion factor κ0 = 0 as free variables vector x0 for non-linear optimization. 3: for each iteration of the optimization target function with params xi do 4: Extract transformation matrix Pi and factor κi from xi . 5: Decompose Pi into matrices W Ci , CIi and CIi −1 using equations (3.13) and (3.32). No orientation-preserving corrections are required. 6: Apply the selected error metric to the training set (algorithms 8-11). 7: Use the point set errors Ei as the minimization iteration error. 8: end for 9: Extract transformation matrix P and factor κ from tuned variables vector x returned by the non-linear minimization. 10: Calculate W C, CI, CI −1 and camera model parameters from P performing steps 3 to 10 of algorithm 1, even if coplanar sets were used.

CHAPTER 3. CAMERA CALIBRATION METHODS

3.6.2

43

Model-oriented global calibration

As opposed to the transformation-oriented calibration, this method provides a view of the problem completely based in the camera model parameters instead of its transformations. This method has a total of 11 degrees of freedom: 4 intrinsic + 6 extrinsic parameters plus the radial distortion factor, two less than the transformation-oriented one where all elements of P are considered free variables. It’s also a bit slower and sometimes error values are slightly higher but it serves as base for the separate calibration methods described in the next sections. Algorithm 14 - Model-oriented camera calibration. 1: Initialize camera parameters using algorithm 1 for non-coplanar point sets or algorithms 2 and 3 for coplanar ones to obtain initial intrinsic parameters f0 , s0 , cx0 , cy 0 , camera position C0 and camera rotation r0 . 2: Set both intrinsic and extrinsic parameters from previous step plus the radial distortion factor κ0 = 0 as the free variables vector x0 for nonlinear optimization. 3: for each iteration of the optimization target function with params xi do 4: Extract camera parameters and the radial distortion factor κi from xi . 5: Build camera-to-image matrix CIi and its inverse CIi −1 using equations (2.3) and (3.32). 6: Encode ri into rotation matrix Ri and build world-to-camera matrix W Ci using equation (2.7). 7: Apply the selected error metric to the training set (algorithms 8-11). 8: Use the point set errors Ei as the minimization iteration error. 9: end for 10: Extract camera parameters and the radial distortion factor κ from tuned variables vector x returned by the non-linear minimization. 11: Build the matrices CI, CI −1 and W C from the final camera parameters applying equations (2.3), (3.32) and (2.7) as done in steps 5-6.

3.6.3

Calibration of extrinsic parameters

From separate calibration methods the simplest one is the calibration of extrinsic parameters. The method is almost the same as the global modeloriented calibration but using only extrinsic parameters as free variables in the optimization process. Intrinsic parameters are assumed to be precalculated and constant. As a consequence the matrices CI and CI −1 don’t need to be recomputed in each iteration, but only the matrix W C.

CHAPTER 3. CAMERA CALIBRATION METHODS

44

Algorithm 15 - Extrinsic parameters calibration. 1: Initialize camera parameters using algorithm 1 for non-coplanar point sets or algorithm 3 for coplanar ones to obtain initial values for camera position C0 and camera rotation r0 . 2: Set C0 , r0 as the free variables vector x0 for non-linear optimization. 3: for each iteration of the optimization target function with params xi do 4: Extract camera position Ci and rotation ri from xi . 5: Encode ri into rotation matrix Ri and build world-to-camera matrix W Ci using equation (2.7). 6: Apply the selected error metric to the training set (algorithms 8-11). 7: Use the point set errors Ei as the minimization iteration error. 8: end for 9: Extract camera position C and rotation r from tuned variables vector x returned by the non-linear minimization. 10: Build matrices R and W C from the final camera parameters applying equation (2.7) as done in step 5. Do not confuse rotation vector r with 3 degrees of freedom encoding a 3D rotation with its corresponding rotation matrix R. There is a direct relation between them, but matrix R is not used as part of the variables to optimize. Encoding R in r is left to the developer as there are many options. Any of them is fine as long as it has 3 degrees of freedom and preserves orientation. In our implementation, Euler x-y-z angles are used.

3.6.4

Estimation of intrinsic parameters using multiple point sets

The requirement of using multiple coplanar point sets to calibrate intrinsic parameters can be extended to the non-coplanar case, not as a constraint but as an upgrade. Since intrinsic parameters are assumed to be constant for a given camera they can be considered to stay the same on different consecutive images. This allows the use of multiple calibration points sets, even with different target types, to calculate more accurately the intrinsic parameters of the camera. The idea behind multiple point set calibration is to calibrate at the same time one set of intrinsic parameters but n sets of extrinsic ones. This leaves with a total of 4 + 6n + 1 free variables to optimize during the minimization process: 4 for intrinsic parameters, 6 for each extrinsic parameters set and one for radial distortion.

CHAPTER 3. CAMERA CALIBRATION METHODS

45

Also, the initialization method proposed by algorithm 1 for the noncoplanar case requires to be adapted to this new situation as it provides initialization values for both intrinsic and extrinsic camera parameters for each point set. This is not the case of the coplanar case where algorithm 2 initializes only the intrinsic parameters shared by given multiple point sets. Algorithm 16 - Intrinsic parameters calibration with multiple point sets. 1: if input point sets are non-coplanar then 2: Initialize camera parameters using algorithm 1 on each j-th point set to obtain fˆ0,j , sˆ0,j , cˆx0,j , cˆy 0,j and C0,j , r0,j . 3: Generate a unique set of intrinsic parameters f0 , s0 , cx0 , cy 0 from the M sets of intrinsic parameters using an adequate criteria. For example the mean values. 4: else if input point sets are coplanar then 5: Use algorithm 2 with the M input point sets to calculate f0 , s0 , cx0 , cy 0 . 6: Calculate C0,j , r0,j for each j-th point set applying algorithm 3. 7: end if 8: Set the intrinsic parameters f0 , s0 , cx0 , cy 0 , the M extrinsic parameters C0,j , r0,j with 1 ≤ j ≤ M and the radial distortion factor κ0 = 0 as free variables vector x0 for non-linear optimization. 9: for each iteration of the optimization target function with params xi do 10: Extract intrinsic parameters fi , si , cxi , cy i , all n extrinsic parameters Ci,j , ri,j and the radial distortion factor κi from xi . 11: Build matrix CIi and its inverse CIi −1 using equations (2.3), (3.32). 12: for j = 1 to M do 13: Encode ri,j into Ri,j and build W Ci,j using equation (2.7). 14: Apply the selected error metric to the training set (algorithms 8-11) using matrices CIi , CIi − 1, W Ci,j and factor κi . 15: Include the point set errors Ei in the minimization iteration error. 16: end for 17: end for 18: Extract the tuned intrinsic parameters f, s, cx , cy and the radial distortion factor κ from the returned vector x. Extrinsic parameters of each point set are ignored. −1 19: Build camera-to-image matrix CI and its inverse CI using equations (2.3) and (3.32) as done in step 11. There are many solutions to this problem. The simplest one could be to select the first set of intrinsic parameters or even a random one from the M provided. Another solution can be to use the mean value of each intrinsic

CHAPTER 3. CAMERA CALIBRATION METHODS

46

parameter in the set. Using median instead of mean can be a good idea in case of outliers in the obtained values. More complex solutions could apply the mean of a subset of values discarding outliers based on their distance to the global mean and the standard deviation. The implementation proposed here, for the sake of simplicity and the usual low number of calibration point sets, just takes the mean of each intrinsic parameter for initialization. All the details of the process can be found in algorithm 16.

3.7

Undistorted camera model

An interesting possibility to explore is the removal of the radial distortion from the calibration problem. Theoretically, once intrinsic parameters have been correctly calibrated it is possible to recalibrate them assuming a null κ factor while undistorting the input image point set using algorithm 6. The resulting model requires all input image points to be also undistorted but in exchange it transforms the calibration into a linear problem where the normalized Direct Linear Transformation method from section 3.1.2 returns the optimal solution for the world-to-image error metric described in algorithm 8. Another consequence of removing camera distortion and simplifying to a linear problem is that the world-to-image transformation can be fully expressed by the global transformation matrix P avoiding intermediate camera coordinates. This implies a speed-up in coordinate transformation as the number of operations is notably reduced. To correctly integrate the undistorted cameras into the camera model, all the intrinsic camera parameters are duplicated: one set for the original distorted camera and another set for the undistorted one. This is done to allow distortion removal in image points in any moment. Also, from the moment the camera undistortion is performed the original intrinsic parameters are not modified in any way. With the exception of specific distortion-related operations, all later calculations and calibrations are computed using the new undistorted set and ignore the radial distortion factor κ which is set to zero. Neither distortion removal nor any of the algorithms exposed in this section are affected by the use of coplanar or non-coplanar sets, as the modifications performed to the calibration algorithms are limited to the image points, not to the world ones.

CHAPTER 3. CAMERA CALIBRATION METHODS

3.7.1

47

Distortion removal

Let distorted be the already calibrated original intrinsic parameters and undistorted the new undistorted intrinsic parameters, the distortion removal algorithm proceeds as follows. Algorithm 17 - Camera distortion removal. 1: Copy the contents of distorted to undistorted, but set κ to 0 in the latter. 2: Set undistorted as the effective intrinsic parameters. From now on, when recalibrating all input image points must be previously undistorted applying algorithm 6 and the radial distortion factor κ must be ignored. 3: Recalibrate intrinsic parameters using one of the algorithms 13, 14 or 16. 4: Compute global transformation matrix P = CI × W C. Any posterior recalibration of the undistorted intrinsic or extrinsic parameters should also recompute matrix P as defined here. 5: From this point, replace algorithms 18 and 19 with 4 and 5 for coordinate transformation, supplying undistorted image points (see section 3.7.2). As stated in step 5 of algorithm 17 new optimized algorithms can be used for coordinate transformation in undistorted cameras. These algorithms avoid all intermediate camera coordinate conversions and all distortion-related operations. Algorithm 18 - Undistorted world to image transformation. 1: Let pw be the homogeneous world point to be transformed and P the global transformation matrix. 2: Calculate pˆi = P · pw and project to image plane with pi = pˆi /pˆ i uz . Point pi contains the desired image coordinates. Algorithm 19 - Undistorted image to world transformation. 1: Let pi be the homogeneous image point to be transformed, z the world depth coordinate and P the global transformation matrix. 2: Build a new global transformation matrix Pz made of the two first columns of P and with P4 +z P3 as third column (Ai as the i-th column). 3: Calculate pw world coordinates by solving the system Pz pˆw = pi and then normalizing with pw = pˆw /pˆw z . 4: Desired world coordinates are provided by the point (pw x , pw y , z). Keep in mind that, as in the original algorithms 4 and 5, numerical instabilities can arise and produce divisions by zero when trying to transform points from or to the camera plane location in the world coordinate system.

CHAPTER 3. CAMERA CALIBRATION METHODS

3.7.2

48

Generation of undistorted images

Using algorithms 18 and 19 in undistorted cameras requires also undistorted input points. A possible approach would be to remove distortion manually from each point applying algorithm 6 with the distorted original intrinsic parameters but there are better solutions. A better approach for the context where this problem applies would be to directly undistort the camera images as soon as the camera model is undistorted using algorithm 17. This can be done by applying the inverse transformation provided by algorithm 7 to each point of the new destination image, solving subpixel precision with bilinear interpolation. Algorithm 20 - Camera image undistortion. 1: Let Id be the distorted image and distorted the original distorted camera parameters. 2: Create an output undistorted image Iu with the same dimensions as Id . 3: for each pixel pu in Iu do 4: Calculate distorted pixel pd applying algorithm 7 to pu with distorted intrinsic parameters. 5: Check if pd is inside image boundaries. In case it’s not, set a background color and continue with next pixel. 6: Calculate the color of pd from Id using subpixel techniques such as bilinear interpolation. Set this color in Iu at coordinates pu . 7: end for This approach allows to solve the distortion problems from input problems independently to the number of coordinate transformations performed afterwards. However if multiple consecutive shots are to be taken with the same camera a better option can still be considered. Since intrinsic parameters should not change between shots, the coordinates obtained by algorithm 7 will remain the same, through colors will change as images do. Because of this, a distortion map can be precalculated. This distortion map basically stores the undistorted-to-distorted coordinate correspondences in the image pixel grid. Also, since it only depends on the distorted intrinsic parameters and not in the undistorted ones, it will remain constant once a camera is undistorted by algorithm 17. To adapt algorithm 20 to the distortion map is quite simple: replace algorithm 7 in step 4 with a look-up in the distortion map.

CHAPTER 3. CAMERA CALIBRATION METHODS

49

Algorithm 21 - Generation of the camera distortion map. 1: Allocate a 2-dimensional array M of 2D floating point coordinates with the same dimensions as the camera images. 2: for each position pu = (i, j) in M do 3: Calculate distorted coordinates pd applying algorithm 7 to pu with distorted intrinsic parameters. 4: Store pd into Mi,j . 5: end for 6: M is the desired distortion map. The distortion map should be updated in non-undistorted cameras after running any of the algorithms 13, 14 or 16 as they update the current set of intrinsic parameters. After distortion removal no more updates are needed.

3.7.3

Undistorted composite calibration of intrinsic and extrinsic parameters

Using the methods from previous sections, it is now possible to define an overall calibration algorithm which integrates both undistorted cameras and separate calibration of intrinsic and extrinsic parameters. Algorithm 22 - Undistorted composite camera calibration. 1: Calibrate the intrinsic camera parameters using multiple points sets with algorithm 16. Points should cover all the image area. 2: Remove camera distortion using algorithm 17. 3: Recalibrate intrinsic parameters again with algorithm 16. 4: Calibrate extrinsic parameters only applying algorithm 15. 5: Generate a camera distortion map using algorithm 21. 6: Remove distortion from camera images with algorithm 20 adapted to the use of distortion maps. The last step should be repeated on each camera shot before using image color values based on pixel position. However it should be a very fast calculation since distortion removal is replaced by a look-up in the distortion map. In our calibration context, a coplanar target was used to provide 5 coplanar point sets for intrinsic parameter calibration. Then, after removing distortion and recalibrating, the cameras were moved to their final positions and extrinsic parameters were calibrated using a non-planar target specially designed for all the 16 cameras.

Chapter 4 Evaluation of the proposed calibration methods This section analyzes the different calibration methods proposed: transformation and model-oriented global calibration, separate calibration of intrinsic and extrinsic parameters with multiple point sets and finally undistorted separate calibration. Sony XCD V60 cameras equipped with lens Myutron MV1214, with a focal length of 12 mm. and F number 1.4, were used for testing. These cameras have approximately 330000 effective pixels arranged in a 659 × 494 sensor with dimensions 3.6 × 4.8 mm (1/3 format). This focal length-sensor size combination gives rise to a horizontal field of view angle of approximately 17 degrees, a relatively narrow angle setting showing a low optical distortion. All provided images have a fixed 640×480 resolution. Two point-based targets were used for calibration: a planar target and a special non-planar target designed for the multiple camera environment. Planar target consists a 32 × 24 point grid covering uniformly all areas of the camera image. A total of 5 views of this target were used to calibrate intrinsic camera parameters in the separate calibration cases. An image of the planar target can be found in figure 4.1. The non-planar target is the one used in global calibration algorithms and separate extrinsic calibration. Figure 4.2 shows a side-view of the target. As it’s not completely symmetric the views will be different on each camera. For both targets, the distance to the camera has been set fixed to 50 cm.

50

CHAPTER 4. EVALUATION OF THE PROPOSED CALIBRATION METHODS51

Figure 4.1: Planar calibration target used during testing.

4.1

Calibration results

A total of 6 calibration methods will be compared in this section: a free implementation of Tsai’s algorithm, the transformation-oriented method from section 3.6.1, the model-oriented method from section 3.6.2, a special version of the separate calibration method using algorithm 16 without the non-linear minimization steps and then algorithm 15 as defined, the standard separate calibration with algorithms 16, 15 and finally, the undistorted composite method proposed in section 3.7.3. The first three methods perform a global calibration process making use only of the non-planar target from figure 4.2. The last three methods calibrate intrinsic and extrinsic parameters separately using both calibration targets in a two-step process. All the provided results have been obtained using real cameras in real calibration environments. Tables 4.1 and 4.2 contain the results of a real calibration applying the different proposed methods. Although not exactly known, some approximate values are expected for the calibrated intrinsic parameters. For example, the manufacturer of the

CHAPTER 4. EVALUATION OF THE PROPOSED CALIBRATION METHODS52

Figure 4.2: View of the non-planar calibration target used during testing. lens gives a nominal focal length of 12 mm, but the real focal length depends on the focusing distance and other mechanical factors. The skew value is expected to be close to zero, as it is the radial distortion factor κ, though this factor may be a bit higher. The optical center of the image in the ideal case would be the image center. This is not true in a real case, but the coordinates should also not be far away. Because of the properties of the project hardware, the theoretical camera positions are known. Exact real values are of course not known but calibrated values are expected to be quite near these theoretical values. For example, for the camera used in the calibrations of the following table, the expected theoretical camera position is, approximately, (−346.65, 98.33, 346.65). Rotation depends on the physical installation of the camera into its final position in the hardware. No specific values are expected. Another useful value to evaluate the calibration results is the error metric. In this case, the reciprocal error metric described in algorithm 11 was used. This metric is designed to minimize both the world-to-image and the inverse transformation errors. The reciprocal error values of this example calibration can be found in table 4.3.

CHAPTER 4. EVALUATION OF THE PROPOSED CALIBRATION METHODS53 Calibration method Tsai Transformation-oriented Model-oriented Separate calib. (linear) Separate calibration Undistorted composite

f (mm) skew Opt. center (pix) 15.6241 0.0143 277.76, 222.96 10.7393 -0.0060 490.00, 337.01 10.8142 -0.0089 511.42, 274.43 12.5715 0.0006 334.28, 234.89 13.0176 0.0005 342.91 225.43 13.0368 0.0004 342.98, 222.17

κ 0.0124 0.2917 -0.0180 0 0.1051 0

Table 4.1: Estimation of intrinsic parameters in a real calibration case. Calibration method Tsai Transformation-oriented Model-oriented Separate calib. (linear) Separate calibration Undistorted composite

Cam. Position (mm) -371.23, 115.58, 430.46 -271.50, 77.22, 306.55 -273.11, 87.17, 311.89 -308.07, 88.86, 354.73 -317.72, 91.61, 366.06 -318.21, 91.721, 366.52

Cam. Rotation (rad) -2.861, -0.677, 1.355 -3.050, -0.788, 1.480 -3.043, -0.738, 1.470 -2.916, -0.693, 1.391 -2.924, -0.688, 1.395 -2.924, -0.687, 1.395

Table 4.2: Estimation of extrinsic parameters in a real calibration case.

Results from this table (4.3) show that, in this calibration, the best transformation results under the specified error criterion are achieved with the transformation-oriented method, closely followed by Tsai’s algorithm. However this doesn’t mean that it is the best option, as the reciprocal error favors those methods that focus in minimizing the transformation error. Calibration method Tsai Transformation-oriented Model-oriented Separate calib. (linear) Separate calibration Undistorted composite

Reciprocal error (pixels) 0.651057 0.622307 0.786373 1.076392 1.071369 1.072648

Table 4.3: Reciprocal error results in a real calibration case.

CHAPTER 4. EVALUATION OF THE PROPOSED CALIBRATION METHODS54

4.2

Robustness against input noise

One of the main problems to face in this project is to handle noise in the input data. During the experiments, a synthetic non-planar target and its different views were created using POV-Ray, simulating an ideal pinhole camera. Calibration results for this ideal data are quite different than the ones obtained in the real case: all three global calibration methods provided excellent results. However, this situation is far from the real case. To measure the robustness and stability of the different methods Gaussian random noise has been introduced in all the input image points, leaving its standard deviation as the control parameter. This can model real situations either by deformations of the non-planar target or by precision errors when extracting the image point coordinates previously to calibration. 5

Reciprocal error (pixels)

4

3

2

1

Tsai Transform-oriented Model-oriented Separate (linear) Separate calibration Undistorted

0 0

0.5

1 1.5 2 Gaussian random noise standard deviation (pixels)

2.5

Figure 4.3: Reciprocal error with Gaussian noise in the input data. This plot shows the evolution of the reciprocal error when Gaussian random noise is introduced in the input data. The most notable detail is the

3

CHAPTER 4. EVALUATION OF THE PROPOSED CALIBRATION METHODS55 vertical line corresponding to the Tsai algorithm when applying a Gaussian noise with 2.6 pixels of standard deviation. In all experiments the Tsai’s algorithm implementation becomes completely unstable from this point or a near one. In fact, Tsai’s error becomes quickly too big to be adequately represented in the same plot as other methods. It’s important to point that during the calibration process some numerical problems arose regarding the maximum supported barrel distortion as the noise increased. Introducing even more noise, the implementation eventually fails without returning any value. It can be seen that the global calibration methods optimize the transformation offering a lower error than separate calibration methods, just as seen in table 4.3. All methods but Tsai remain quite stable in the values of the reciprocal error as noise increases, increasing in a mostly linear behaviour. 18

16

Estimated focal length (mm)

14

12

10

8

6

4

Tsai Transform-oriented Model-oriented Separate (linear) Separate calibration Undistorted Manufacturer

2

0 0

0.5

1

1.5

2

2.5

Gaussian random noise standard deviation (pixels)

Figure 4.4: Estimated focal length with Gaussian noise in the input data. This plot shows the evolution of the estimated focal length when introducing random noise in the input data. The results show again a very poor

3

CHAPTER 4. EVALUATION OF THE PROPOSED CALIBRATION METHODS56 noise tolerance and stability in the Tsai’s algorithm implementation. The other two global calibration methods, transformation-oriented and modeloriented, are very likely to underestimate the focal length, but with a bit more stability proportional to the noise levels. On the other hand, the separate calibration methods estimate focal lengths value slightly over the manufacturer value which fits nicely with the expected values. Also, these methods show a strong robustness as introducing higher noise levels do not change the estimated values in an appreciable way. Another detail to note is the small difference between the estimated values for the separated linear method and the ones that include radial distortion in the model. This difference seems to be also present in other figures. 0.4

0.2

Radial distortion factor

0

-0.2

-0.4

-0.6

-0.8 Tsai Transform-oriented Model-oriented Separate (linear) Separate calibration Undistorted

-1

-1.2 0

0.5

1 1.5 2 Gaussian random noise standard deviation (pixels)

2.5

Figure 4.5: Estimated radial distortion factor when applying Gaussian noise to the input data. This plot presents how the estimated radial distortion factors κ of each model evolve in function of the introduced Gaussian noise. Again, Tsai’s

3

CHAPTER 4. EVALUATION OF THE PROPOSED CALIBRATION METHODS57 algorithm implementation becomes unstable near the 2.6 pixel standard deviation of the noise, presenting high negative radial distortion values that leave the equation (2.12) with no real positive solutions. Transformation-oriented method presents some instabilities too. These can be easily consequence of poor calculations of the focal length and the optical center, as this method ignores the camera model details focusing only on minimizing the transformation errors. The model-oriented method compensates these problems performing a better stability. Both separate linear and composite undistorted methods have a null radial distortion factor by definition, the first because it ignores it and the second because it removes it. A full separate calibration method gives a very stable result remaining almost constant, independently of the noise applied.

Tsai Transform-oriented Model-oriented Separate (linear) Separate calibration Undistorted Image center

450

Y coordinate of the optical center (pixels)

400 350 300 250 200 150 100 50 0 0

100

200

300

400

500

600

X coordinate of the optical center (pixels)

Figure 4.6: Evolution of optical center coordinates when applying Gaussian noise to the input data.

CHAPTER 4. EVALUATION OF THE PROPOSED CALIBRATION METHODS58 Figure 4.6 shows the estimated optical center coordinates as the standard deviation of the applied Gaussian noise increases. The image center is also shown for reference. Note that the coordinate ranges and the aspect ratio of the figure are the same that in real camera images. This time the Tsai’s algorithm implementation, although not completely stable, approximates the area where the real optical center is expected to be. This is quite good since Tsai’s implementation and the other two global calibration methods only use the non-planar target points for calibration, which doesn’t cover the complete image area. Both transformation-oriented and model-oriented methods move away far from the image center presenting values not likely to be correct. In some experiments these two methods computed optical center coordinates even outside the image borders. All three separate calibration methods present a very stable behaviour with values quite near to the image center. In the case of the separate linear method the stability is even higher and the result nearer the image center. 450 Tsai Transform-oriented Model-oriented Separate (linear) Separate calibration Undistorted

Distance to theoretical camera position (mm)

400

350

300

250

200

150

100

50

0 0

0.5

1 1.5 2 Gaussian random noise standard deviation (pixels)

2.5

Figure 4.7: Distance to theoretical camera position when applying Gaussian noise to the input data.

3

CHAPTER 4. EVALUATION OF THE PROPOSED CALIBRATION METHODS59 Figure 4.7 shows the distance of the estimated camera position to the theoretical camera position in the final project hardware. These theoretical coordinates are not strict in the sense that exact position should be obtained, but calibration values are expected to be not too far from them. In the context of this project, the precision of the camera position is quite important as this parameter is explicitly used by octree carving and other procedures. A shown, Tsai’s algorithm implementation is quite unstable again, but this time the instabilities are present even for very low error levels. The other two global calibration methods present also high distances but increasing in a proportional way with the error levels. Also again, separate calibration methods present a strong stable behaviour without any notable variation independently of the noise levels. This time separate linear method provides a distance value a bit higher than the ones that consider the radial distortion. No plots have been included for the skew factor or the camera rotation parameters, as the computed values didn’t provide any remarkable information not already present in the previous plots.

Chapter 5 Summary The analysis of stability against input noise has shown important instabilities in the Tsai’s algorithm implementation. This method provides very good results when accurate data is provided, but diverges easily if errors are introduced. In fact, experiments show that the algorithm starts becoming unstable when introducing Gaussian noise of only one pixel of standard deviation. If the standard deviation of the introduced error is raised to about 2.6 pixels the algorithm becomes completely unstable and useless. These error values are quite possible in real situations where the calibration target can have small deformations or the input image can have segmentation or point extraction problems. This fact was previously found during the first project experiments and is precisely the reason that motivated the development of this work. This led to the utmost importance of developing robust and stable calibration algorithms. From the set of proposed calibration methods, the ones that perform separate calibration of intrinsic and extrinsic parameters provide excellent results in terms of stability against noise. Particularly, the separate linear method, which ignores the radial distortion of the camera, provided the most stable results in the experiments. Testing has shown a good stability in the results with small variations up to 13 pixels of standard deviation before values are completely corrupted. However, the better results of the separate linear method are probably a consequence of our testing context, as the distortion introduced by our cameras is almost null due to the relatively long focal length (12 mm. for a 1/3 sensor format). The undistorted composite calibration method is likely to provide better results in other situations where camera lens with higher radial distortion are used. For example, separate linear calibration would completely fail if fish-eye camera lens were used. 60

CHAPTER 5. SUMMARY

61

The low radial distortion of our camera lens has other consequences. As one of the intrinsic parameters, the radial distortion factor has a mutual direct influence on the estimated optical center, skew factor and focal length. If this parameter is low, its weight into the final error metric is also low, as can be seen by the fact that ignoring it provides slightly better results in this experiment. This also implies that a non-linear minimization algorithm that is only based on the numerical value of the error metric may produce poor estimations of this parameter and introduce noise in the ones closely related to it. Currently, other methods to estimate the radial distortion factors avoiding the non-linear minimization are being researched. It is important to remember that the separate calibration methods use multiple views, a total of 5 in the experiments performed, covering the entire image area for intrinsic parameter calibration. Then, one additional view is used for extrinsic parameter calibration, with points mostly distributed around the image center. On the other hand, the global calibration methods make use of only of this last view. Even if noise is introduced in the views of both calibration targets, the separate calibration methods have much more information to compensate the Gaussian noise. Another important factor in the stability of the proposed methods is that they are based on the minimization of least-square errors in overdetermined systems, which makes them less sensitive to the input noise. This design criterion was achieved with the choice of the Normalized Direct Linear Transformation for parameter initialization and the Levenberg-Marquardt algorithm for non-linear minimization. In summary, different calibration methods have been proposed and detailed, paying special attention to their stability properties. Some of the proposed methods have shown an excellent robustness against input noise that could be found in real calibration situations, outperforming the implementation of Tsai’s algorithm previously used in the project. Also, the adaptations and optimizations to the project context don’t reduce the generality of the proposed methods, being still valid for general calibration situations. Finally, this work also provides a useful compilation of many different algorithms and theoretic concepts related to the general camera calibration problem. It can be used to adapt the proposed methods to other calibration contexts, or as a reference guide to understand specific details of the camera calibration and coordinate transformation processes.

Bibliography [1] Y. I. Abdel-Aziz, Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry, Proc. the Symposium on Close-Range Photogrammetry, American Society of Photogrammetry, 1971, (1971). [2] Q. Chen, H. Wu, and T. Wada, Camera Calibration with Two Arbitrary Coplanar Circles, 2004. [3] G. H. Golub and C. F. Van Loan, Matrix computations (3rd ed.), Johns Hopkins University Press, Baltimore, MD, USA, 1996. [4] R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, second ed., 2004. ¨ and O. Silven, A four-step camera calibration procedure [5] J. Heikkila with implicit image correction, Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 0 (1997), p. 1106. [6] M. T., Geometrical modelling and calibration of video cameras for underwater navigation, Master’s thesis, Norges tekniske høgskole, Institutt for teknisk ybernetikk, 1994. [7] R. Y. Tsai, A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses, Tech. Rep. 11413, IBM Watson Research Center, october 1985. [8] R. Y. Tsai, A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses, IEEE Journal of Robotics and Automation, 3 (1987), pp. 323–344. [9] Z. Zhang, A flexible new technique for camera calibration, IEEE Trans. Pattern Anal. Mach. Intell., 22 (2000), pp. 1330–1334. [10] Y. Zheng and Y. Liu, Camera calibration using one perspective view of two arbitrary coplanar circles, Optical Engineering, 47 (2008). 62