Matching two clusters of points extracted from satellite images

Image matching is a stage one performs as soon as one has two images of the same scene, taken from two diﬀerent points of view. Matching these images aims at ﬁnding the mathematical transformation that enables passing from any point of the ﬁrst one to the corresponding point in the other. As this study is related to satellite images, we show that the geometrical transformation can be approximated by a homography. Furthermore we want to match two clusters of points with no information of radiometry. Therefore, we have to guess the right parameters for this homography, by minimizing an appropriate cost function we deﬁne here. Then, the topography of the cost function is our main concern for the minimisation process. If looking for the right mathematical parameters seems the most natural way, we show that in this case the cost function has ‘‘chaotic’’ variations, so we need a complex technique for the minimization. To avoid this, we suggest guessing the parameters determining the conditions of the snapshot. Thus, we give the expression of the homo-graphy from these ‘‘physical parameters’’ and show that the topography of the cost function gets smoother. Thus the minimization process gets simpler.


Introduction
As we would like to match satellite images captured under different conditions, we have to say something about these differences.They concern parameters such as the camera angle, the resolution of each image, the position of the satellites, the center of each scene, but also the nature of the sensors.
These considerations lead us to search a new approach to match these images, as the majority of existing works consists of extracting characteristic points in each image and finding automatically or not a corresponding point in the second image, as for example in (Groth, 1986;Kahl et al., 1980;Ranade and Rosenfeld, 1980), but which does not take account of the projective deformations.In others examples, Schmid and Mohr (1997), Lowe (1999), Lindeberg (1998), Tuytelaars and Gool (1999), base the correspondence on the color of the points.Here the different nature of the sensors implies that a point may have two different colors in each image.Thus the problem is to match two clusters of points with no knowledge of the correspondence between a point in one cluster and a point in the other.This makes the matching harder than those of traditional approaches, thus we will use the two following hypotheses.The first is that the scene viewed from the satellite can be approximated by a planar scene.The second hypothesis is that the sensor catch the scene with a pinhole model.So, it can be shown that there is an application linking the two views of a scene taken under two different angles (Pritchett and Zisserman, 1998;Chasles, 1852).Such an application is a homography (and depends upon nine parameters).
Let us consider a cluster of the same characteristic points in each image.We then look for the homography that transforms the first cluster of points into the second.In other words, we have to guess the right homography.This can be achieved by minimizing a cost function that represents the quality of the matching between the two clusters.
Section 2 deals with the definition of the homography and proposes a cost function adapted to our search.In Section 3, we will study the topography of the cost function as these nine parameters vary.As this topography appears to be ''chaotic'', one might choose to use a minimisation process able to find the global minimum of the cost function.However, we would like to find a solution in a low computational time.
Thus, the idea is the following: let us suppose we know the exact physical parameters of the capture of both images.We should be able to guess the right homography from these parameters.Moreover, let us imagine that we move one of the satellites slightly.The image taken from this satellite will be very close to the original.Therefore the value of the cost function should be close to those obtained with the original image.In other words, the topography of the cost function should be smooth at least around the solution.This will simplify the minimization of the cost function.Another reason advocated for the use of these physical parameters: satellite images are sold with a lot of information about the capture of the image and many of the physical parameters can be deduced from this information.Even if these pieces of information are not error-free, they constitute a good starting point for the minimization process.
Section 4 specifies the definition of these physical parameters, and shows how to calculate the homography from these parameters.In Section 5, we show that, as claimed before, the topography of the cost function is smoother than in the straight method.Finally Section 6 shows an example of a minimization process.We conclude in Section 7.

Homography and cost function
As our images are captured from satellites of high altitude, we can make the approximation that these scenes could have a planar geometry.Moreover, even if the satellite sensor catches an image line after line, while it advances, we work here under the hypothesis of a pinhole model.An interesting extension of this work would be to use a more appropriate model (and working line by line).Thus, the two images are the projection of a planar ground scene from different angles and, following Chasles (1852), we claim that the application transforming the ground scene to an image is a homography.As the inverse transformation (from an image to the scene) in a projective space is a homography, it can be shown that the transformation between the two images is also a planar homography.A planar homography is a function expressed as follows: where x, y give the position of a point P in the first image, x 0 , y 0 give the position of P 0 , the image of P through the homography h characterized by the coefficients h ij .h ij are real coefficients.As we do not want our matching process to be sensordependant, we will suppose here that we have been able to extract the same structural informations (points, lines, regions) within each image.In this article, we will focus on characteristic points.The automatic extraction of these points is currently investigated and for further information one might refer to existing works as Schmid and Mohr (1997), Harris and Stephens (1988) and Dufournaud et al. (2000).Then our problem is to find the correct h that will transform a cluster of geometrical points into another.
Let us find a cost function to define a ''correct homography'': If h is the solution we are looking for, for each P i in cluster C, there is a point P 0 j in cluster C 0 which verifies P 0 j ¼ hðP i Þ.Thus the cost function for h should be zero.For another homography, close to the solution, the image of P i should be close to P 0 j .Thus we can guess that each P i is transformed into its nearest neighbor in cluster C 0 .The ability for a homography to transform C into C 0 might be given by the following cost function: where P i and P 0 j are the points in the first and the second clusters, h is the homography and d is a distance function.Here we will consider an Euclidean distance.
Nevertheless, the main drawback of this cost function is that a homography projecting all the points of the first cluster close to a single point in the second cluster appears to have a very low cost.Hence, we will modify our cost function to obtain a more symmetrical behavior.Moreover, we normalize this cost function by the number of points in each cluster so that the cost function is defined by One can interpret the value of the cost function as the average error (in pixels) made on each point when trying to match it with its correspondent in the other cluster.Note that the cost function does not depend of the way the homography is expressed.
The two following sections deal with two expressions of the homography, that strongly influence the difficulty of the minimization process.

Straight on minimization method
One might think that the best method is to look directly for the correct parameters of the homography (i.e. the h ij ).Let us illustrate this on different examples.At first, let us define the homography that links the different images.For convenience reasons, the homography will be the same in each case and its parameters (1) are defined as follows: 779159 2:327801  À0:917194 À0:090371 6:597157   À0:000009 À0:000023 1:000021 Then, we use a couple of clusters ðC Now, we can study the topography of the cost function for each cluster around the solution h.In each curve, all the h ij except one (the searched parameter) are equal to those of the solution.The searched parameter varies linearly and for each value of this search parameter, the cost function is computed.The results are presented for the couple of cluster of Fig. 2. For convenience, we just present the evolution of the cost as a function of h 21 , h 22 , h 31 .
It appears that the minimization process will face different troubles.If the curves for h 21 and h 22 are smooth (as are those of h 11 and h 12 in non-presented tests), the curve for h 31 present many peaks and many local minima.This is explained by the fact that the last two parameters are involved in divisions in the definition of the homography (1), thus there are at least as many peaks as there are characteristic points in the clusters.
Moreover we are looking for a solution in a 9-dimensional space of merely mathematical parameters with no idea of the bounds of this space.As one might note in (4) the magnitude of the values of each parameter are very different, thus we have no idea of the accuracy required to explore the space of the solutions.Another drawback is that we have no idea of what could be a correct starting point for the minimization process.
As a conclusion, one homography compared with another, close in the space of the mathematical parameters will have a very different behavior.In other words, two neighbors in the space of the mathematical parameters will have a very different cost when they apply to transform a cluster into the other.Then, either we choose a complex minimization process, able to find the global minimum of the cost function, either we try to express the homography as a function of parameters that have an influence which can be more easily explained physically.

Physical parameters of the homography
Satellite images are provided with technical data concessing the satellite and the captured scene.We will make use of these data to express the homography differently.
In stereovision, many researchers use the projective formalism and the pinhole model is used to describe the camera behavior (Couapel, 1994;Zhang, 1993Zhang, , 1996;;Longuet-Higgins, 1981;Hartley, 1995).Even if this model is more adapted to the camera images than for images captured from satellite sensors (Hartley, 1997), we will follow them in this paper to express the homography starting from the data provided with the satellite images.
For the sake of simplicity, let us focus on the ground image and the first satellite.The data which we will use will be as follows (cf.Fig. 3): • S 1 : the position of the satellite, • P 1 : the position of the target point, • q 1 : the P 1 S 1 distance, • a 1 : the local azimut, • b 1 : the elevation, • c 1 : an angle orientation.
Our main goal is to explain how a planar scene is turned into an image to explain these parameters.At first, we will use two bases of the 3D-space.The first is the ðO; ĩ; j; kÞ basis of the ground scene.The second is the ðS 1 ; ĩ0 ; j0 ; k0 Þ basis of the satellite with: • ĩproj is the normalized vector of the line of the projection of ĩ by k on the normal plane to k0 , • j proj has such a value that the ð ĩproj ; j proj ; k0 Þ basis is a right basis, • ĩ0 is the rotation of ĩproj with c 1 angle around the k0 axis, • j0 is the rotation of j proj with c 1 angle around the k0 axis.
S 1 is located from P 1 as follows: S sol is the orthogonal projection of S 1 on the ground.a 1 is the angle between ĩ and P 1 S sol , b 1 is the angle between P 1 S sol and P 1 S 1 and q 1 is the distance P 1 S 1 .
From this data, one can thus deduce N and N À1 , the transition matrix between ðS 1 ; ĩ0 ; j0 ; k0 Þ and ðP 1 ; ĩ; j; kÞ, and respectively its inverse.As the ground scene is assimilated to a plan of equation z = 0 in the ground scene basis, the coordinates of a point of the ground scene are [x, y, 0] t .The coordinates [x 0 , y 0 , z 0 ] t of the same point in the basis ðS 1 ; ĩ0 ; j0 ; k0 Þ can be expressed as follows: where ½S 1 x ; S 1 y ; S 1 z t are the coordinates of S 1 in the basis ðP 1 ; ĩ; j; kÞ.
Then, any point of the scene is projected into the image using of a projection matrix A. This matrix is called the calibration matrix, and its form depends on the camera model.In this article, we use the well-known pinhole model (Zhang et al., 1994).It can be easily shown that the coordinates of the projection of the ground point in the image are obtained by (6).Now let us seek the form of the matrix of calibration A following Zhang et al. (1994).It depends on the resolution (the scale between the two bases), on the distance between the center of the image and the optical center and the shape of the pixels.
Let us name, res 1 the resolution; a 1 , the complementary of the angle between the two sides of a pixel.One should note that we also need to differentiate O 1 (the intersection between the optical axe and the image) and C 1 (center of the image whose coordinates are ½c 1x ; c 1y t in the basis associated with O 1 ) (Fig. 4).
Straight from these considerations, we obtain the form of the matrix of calibration: The last transform is to go back from the projective space to the image space (non-linear part of the homography) (cf.( 8)) x 000 ¼ x 00 z 00 ; At this point, we can calculate the transformation between the ground scene and a satellite image.Let us give four points on the ground considering three of these points are never aligned.Let us give the transformation, in the first image and in the second image, of these four points.Generally, in each image, three of the four points are never aligned.With these eight points, we can calculate the homography between the two images (cf.Berger, 1978).We face with two groups of parameters: • The intrinsic parameters, related to the optical system, which are the res i resolution, the h i angle formed by the sides of a pixel and the ðc ix ; c iy Þ position of the image center compared to the optical center.• The extrinsic parameters, associated with the scene, are the position of the center of the scenes (the target point) P i : ðP ix ; P iy Þ and the position of the satellite, determined by the a i local azimut; the b i elevation, the q i distance and a c i angle determining the image orientation.
At end, we have for each image four intrinsic parameters and six extrinsic parameters.If one supposes that the intrinsic parameters are known, we obtain 12 parameters for the homography instead of the nine mathematical coefficients.If the intrinsic parameters are unknown, we have 20 parameters that we call the physical parameters of the homography between the two images.

Topography of the cost function in the physical parameter space
As in Section 3, the clusters used to carry on the experiments are those given in Fig. 1.The homography that transforms these clusters into their correspondent is the one of (4).One can note that this homography can be expressed in terms of the physical parameters described in the previous section, as follows: • Conditions of capture for image 1: -P 1 = (0, 0) (m); q 1 = 800 km; a 1 = 1.57rad; b 1 = 0.55 rad; c 1 = 0 rad; -res 1 = 20 m/pixel; h 1 = 0.1 rad; C 1 = (1, 1) (pixel).
• Conditions of capture for image 2: -P 2 = (100, À100) (m); q 2 = 800 km; a 2 = 0.8 rad; b 2 = 1 rad; c 2 = 1.57rad; -res 2 = 20 m/pixel; h 2 = 0.1 rad; C 2 = (1, 1) (pixel) As for Fig. 2, all the parameters except one are fixed to the true value, and the free parameter varies linearly.For each value of the parameters (i.e. a tested homography), a value of the cost function is measured to estimate the quality of the tested homography.The results are shown in Fig. 5, respectively for the clusters presented in Fig. 1.
Many tests have been conducted on the topography of the cost function when the homography is expressed by its physical parameters.All these tests show a smooth topography, that makes us think the minimization process is no longer a tough problem.
Furthermore, the technical data supplied with satellite images gives us an estimation of the homography.In other words, even if these datas are not error-free, it provides us with a good starting point to find the homography between the two clusters.During the minimization, we also know what accuracy is relevant for each parameter as these parameters have a physical meaning.This is helpful when trying to explore a big space of solutions (at least a 12dimensional space).
Finally, to sum up these conclusions, it appears that mathematical parameters are the best way (the shortest) to characterize a homography.Nevertheless, the minimization process would have to find a path to the solution, in the space of the mathematical parameters.Along this path, each tested homography should have a behavior close to that of the other tested homographies.This cannot be done by moving linearly the mathematical parameters.The expression of the homography as a function of the physical parameters provides us with this path.The price to pay is a rise in the number of parameters.

Minimization process
As one can observe in Fig. 2(c), minimizing the cost function within the mathematical parameters space would be a computation intensive task, due to the numerous local minima.In the physical parameter space, the cost function appears to be smoother (cf.Fig. 5), enabling a simpler minimization method such as a gradient minimization.However, one can note that there are still some local minima in the cost function topography.For example, as soon as a homography performs a false matching between two points, there is obviously a local minimum around this homography.
To face these local minimas, we chose to use the following initialization method: Let us start from an initial hypothesis for h.At each iteration, we generate a random deformation of the physical parameters from which we calculate a new expression of h and evaluate its cost.If the cost decreases, the deformation is accepted and the whole process is repeated until the cost function reaches an arbitrary chosen value of 0.25.As seen in Section 2, this means that the average error when matching a point of C 1 into another in C 0 1 is 0.25 pixel.As the random deformation can be restricted to a certain neighbour (in our case, the deformation is randomly chosen in a [À1, 1] interval for the P x , P y and q parameters and in a [Àp/10, p/10] interval for the angles parameters), this method enables us to ''jump'' over the local minima that subsists in the cost function topography, while keeping a low computation time.
Let us illustrate the results of this method, applied to the pair of cluster of Fig. 1.The homography that links these clusters is still the one determined from the physical parameters used in (5).
The starting point of the minimization process is obtained from the following variation of the physical parameters P 1x ; P 1y , a 1 , b 1 and c 1 : • First target point P 1 (in m): (+200, À50).
The starting point of the minimization process performs the matching presented in Fig. 6(a).The result of the matching after minimization is presented in Fig. 6(b).We present in Fig. 7 the evolution of the cost as a function of the number of iterations.As one can observe, the algorithm converges toward a correct solution.
Indeed, 2900 iterations are sufficient to have a ''solution'', and the same method applied to the mathematical parameters does not give a solution after 10,000 iterations.One can note that at each iteration, the most intensive computation is that of the cost function which is about O(n 2 ), n is the number of extracted points.

Conclusion
In this article, we were concerned with the matching of satellite images.More precisely, we presented a method for matching two clusters of points, with no further information about these points than their position in the two images.Under the approximation of planarity for the ground scene, the classical approach is to express the homography that links the two images as a function of nine mathematical parameters.We then proposed a cost function to estimate the ability for the homography to match the two clusters.Nevertheless, the topography of the cost function appears to be highly chaotic.Thus, we suggested another expression for the homography, depending on 20 physical parameters.We have shown that the increase in the number of parameters is counterbalanced by a smoother topography of the cost function.Moreover, these parameters are partially known from the data supplied with the satellite images, and enable a simple minimization process to estimate the homography.
One can note that the method we presented can be used in many more applications than the matching of satellite images.Indeed, it might be used as soon as one has to match two clusters of geometrical points in two dimen-sions, and the natural extension of these works would be the matching of clusters in which some points are occulted.

Fig. 1 .Fig. 2 .
Fig. 1.A couple of clusters with 100 points.The left cluster is C 1 and the right cluster is C 0 1 .Visually a 90°left rotation helps matching C 1 to C 0 1 .

Fig. 5 .
Fig. 5. Evolution of the cost function for ðC 1 ; C 0 1 Þ as one of the physical parameters of h varies.(a) Cost evolution, function of a 1 and (b) cost evolution, function of c 1y .