Cloud classification using the textural features of Meteosat images

The sum and difference histogram approach is applied to the assessment of the textural features of Meteosat images and the resulting textural parameters are used to classify the clouds appearing over North Africa. The images under consideration were taken by Meteosat in the visible and infrared bands during the month of December 1994. They cover North Africa, the Mediterranean Sea, Europe and the Atlantic Ocean. The visible images are first processed following four directions, namely 0°, 45°, 90° and 135°. For each direction, the textural parameters are calculated from the sum and difference histograms of grey levels. To take into account the effect of the texture anisotropy in the classification, the Karhunen-Loeve transformation (KLT) is then used. The minimum number of components representing these parameters is obtained with the slightest loss of information. Finally, after averaging the textural parameters over the four directions, each image of the database is divided into homogeneous classes by using the K-means algorithm. The approach thus described, tested on one of the images of the Brodatz album, shows that the classification ratio is greater than 96%. The segmentation of the Meteosat images is performed using both the textural parameters of visible images and the brightness of infrared images. It is then found that the different ground and cloud types are classified with proper accuracy. The implementation of this method to satellite images advantageously reduces the classification time, which is found to be three times smaller than that required by classical techniques of image processing.


Introduction
The automatic classification of clouds from satellite images is of great interest in relation to various operational and scientific problems. Among image classification methods, the most important ones are the statistical approaches. Thanks to such approaches, the distributions characterizing some microphysical features of clouds can be analysed in space and time. These features can be useful to describe the radiative energy transfer through the atmosphere. They can also be used in models to estimate the solar energy resources in a given region, to observe the evolution of climatic parameters and the weather for ecological and agricultural purposes (e.g. Cano et al. 1986, Delorme et al. 1992, Hay 1993, Rosenfeld and Gutman 1994, Beyer et al. 1996, Anagnostou et al. 1999, Colle et al. 1999, Diak et al. 2000.
Of course, cloud classification from satellite sensor data has to be performed automatically, on account of the quantity of data to be processed, as well as to guarantee objectivity in the classification. The techniques for classifying pixels are principally of two sorts: the approaches based on multispectral signatures and those utilizing textural features. Multispectral methods are more commonly used (e.g. Minnis and Harrisson 1987, Saunders and Kriebel 1988, Pratt 1991, Porcu and Levizzani 1992, Cheng et al. 1993, Gougeon 1995. However, they are practically unable to discern objects with similar spectral signatures. This is the case of clouds observed over ice or snow-covered surfaces. According to recent works, the classification of satellite images is significantly improved when their textural features are taken into account (e.g. Chen et al. 1989, Gu et al. 1991, Havlicek 1997, Randen and Husoy 1999, Wang et al. 1996, Manian et al. 2000, Ameur et al. 2001. Texture is usually defined as the arrangement of elements of different sizes and shapes composing an image (Haralick et al. 1973). Texture can be decomposed into primitives. The latter are the sets that gather the maximum number of neighbouring pixels having the same features. The classification techniques dealing with texture are the grey level cooccurrence matrix (GLCM), the sum and difference histograms (SADH), and the grey level difference vector (GLDV). Several applications reported in the literature are based on the GLCM method (e.g. Haddon and Boyce 1993). However, the resulting computations are complex. The SADH method is simpler and it seems as efficient as the GLCM method.
In this article, images taken by Meteosat 4 during the month of December 1994 and representing North Africa, the Mediterranean Sea and Europe, are considered in order to classify the clouds. The method of sum and difference histograms is used to obtain the textural parameters of Meteosat images in various directions. To improve the image classification, a method based on the Karhunen-Loeve transformation (KLT) and the K-means algorithm is implemented.

Data
The data used are images from Meteosat 4. As is well known, it is a geostationary meteorological satellite, situated at 0 ‡ of latitude and 0 ‡ of longitude. It continuously observes the whole Earth disc. The images gathered by Meteosat in three channels-visible (0.4-1.14 mm), thermal infrared (10.5-12.5 mm) and water vapour (5.1-7.1 mm)-are transmitted every 30 min. The Meteosat images are available in three formats: high resolution, B2, and Wefax (Weather Facsimile) images. The high resolution images are made of 500065000 pixels for the visible channel and 250062500 pixels for the infrared and water vapour channels. B2 images are obtained by compression of the high resolution images. They are made of 4166416 pixels for the whole Earth disc and stored every 3 h on magnetic tapes. Wefax images result from the division of the high resolution images into several sub-formats. Such images are transmitted to the Secondary Data User Stations (SDUS) via Meteosat. In Algeria, Wefax images are received every 30 min at the National Office of Meteorology at Dar El Beida (Algiers). These images represent North Africa, the Mediterranean Sea, Europe and the Atlantic Ocean. The database used for our study consists of a series of Wefax images in the visible and infrared channels collected during the month of December 1994. Each image forms a window made of 5126512 pixels, where the possible grey-level values run from 1 to 256 and the size of each pixel is 565 km 2 at the sub-satellite point.
The features of the clouds observed in such images, blurred by fuzzy edges, make the direct identification of these clouds difficult. To overcome this problem, textural information can be used in the cloud classification. In general, texture depends on the correlation existing between the neighbouring pixels and the spatial frequency distribution of grey levels. This means that the textural features can be properly evaluated from the co-occurrence matrices describing the texture in an image.

Sum and difference histograms
Two pixels of (k, l) and (k', l') coordinates (with k, l, k' or l'~1, … , 512) are considered and relations k'~azk and l'~bzl are written. The displacement from one pixel to another is then expressed by the relative coordinates (a, b). If G~{1, 2, … , N g } is the set of all the possible grey levels (with N g~2 56), the brightness of these pixels can then be described by random variables, respectively denoted y k,l and y kza, lzb , which take one of the discrete values belonging to such a set. The joint probability of getting grey levels (i ) and ( j ) at the relative position (a, b) is The set of P(i, j) probabilities yields the co-occurrence matrix that enables us to evaluate the textural features of the image under study for a given position (a, b). In fact, the computation of textural parameters by means of grey level co-occurrence matrices and the storage of these matrices require a large memory. Fortunately, while the SADH and GLCM approaches practically yield the same results, the SADH method significantly reduces the memory requirement (Chen et al., 1989). Following Unser (1986), the sum and difference are found to describe the probability function of a second-order stationary process. Consequently, the sum and difference are expressed by two new random variables which are, respectively: Taking into account that both variables are de-correlated, the sum histogram is written In this expression, D and # denote the subset of indexes defining one of the texture regions and the number of elements in the subset, respectively. Similarly, the difference histogram is given by The total number of elements composing this subset is The normalized sum and difference histograms are, respectively, given by: where i~2, 4, … , 2 N g , and where j~2N g z1, … , N g 21.
According to the relationships (7) and (8), the relative frequencies arising from the sum and difference histograms are accurate estimators of P s (i) and P d ( j ), i.e. the probabilities of having the (i) and ( j ) grey levels, respectively. The shape of the obtained histograms depends on the type of texture. In the case of a coarse texture, the size of the texture elements is much greater than d~(a 2 zb 2 ) 1/2 , which is the distance separating the couple of pixels under consideration. Numerous pairs of neighbouring pixels are then obtained and the resulting histograms are found to be sharp near the origin of the grey-level axis. For a finer texture, the shape of histograms is much more flattened. Among the 14 textural parameters initially defined by Haralick et al. (1973), nine are most commonly used. These are energy (or second-order angular moment), contrast, entropy, correlation, local homogeneity (or inverse difference moment), mean, variance, cluster shade and cluster prominence. However, when evaluating all these parameters for images of the Brodatz album taken as reference, we have found that five of them bring enough textural information, namely: Contrast: CON~X where m~1 2 P i i H s i ð Þ is the mean value of the grey levels (i).
Local homogeneity: IDM~X Only these parameters will be therefore considered hereafter.

KLT-based method
To take into account the effect of texture anisotropy the textural features defined above have to be analysed in four directions, namely h~0 ‡, 45 ‡, 90 ‡ and 135 ‡. However, in this case, texture is found to be represented by vectors made of a too large number of components and the computations required by the image classification become difficult. Fortunately, when the KLT is applied to the images, the number of components reduces significantly and the whole textural information remains practically unchanged. In these conditions, each of the textural features given by equations (9)-(13) is represented by a feature vector (A h ). For a given direction (h), the average of this textural feature is: Taking into account the different interactions occurring between two different orientations h and h', the variance-covariance matrix is calculated for each of the textural features. This yields: Notice that in this equation, each variable A has its own 464 dimensional covariance matrix per pixel. The next step of the image processing is the reduction of variables by applying the Principal Component Analysis (PCA). For that, all the eigenvectors and eigenvalues of the variance-covariance matrix are calculated and sorted following the decreasing order. A transition matrix M is then built by means of these eigenvectors. The first row of the matrix M consists of the eigenvector components corresponding to the greater eigenvalue and the other rows are generated from the successive eigenvectors. Finally, to get the reduction of the feature vector components, the set of matrices A h (k, l) is multiplied by M, leading to the following matrices: Taking into account the four orientations (h), the KLT has been applied to each of the five textural features of Meteosat images given by equations (9-13). It is found that such images are satisfactorily described using only the two first principal components of A' h (k, l) with a resulting loss of information lesser than 20%.

Image segmentation
In the next step, the K-means algorithm is used to get the partition of the image into homogeneous classes. K-means is a clustering technique that starts from the initial partition of an image into K classes. It operates by iteration. At the beginning, the number of initial classes is a priori unknown. To obtain this number, the maximum likelihood method can be employed. Such a technique applied to the classification of textural features consists in maximizing the product: In this expression, b mm and w nn are the diagonal elements of the covariance matrices (B) and (W) obtained between and within the classes, respectively. An overlapping window made of 767 pixels centred around each of the pixels is chosen so as to obtain the principal components of the textural parameters defined in §3.1. When figure 1 is processed, the product Y becomes maximum after some iterations and 10 initial classes are identified. In general, the distribution of pixels around their centre of mass, for each class, is approximately Gaussian. A similarity function characterizing the partition is then defined. Its general form is: In this expression, x is a N-dimensional feature vector consisting of the principal textural components, C k is one of the K classes (with k~1, … , K ), T k is the covariance matrix associated to this class, |T k | is the mathematical expectation of T k , and d k (x) is the Mahalanobis distance written as: where x k is the mean vector, [x2x k ] t is the transposed vector of [x -x k ] and T {1 k is the inverse matrix of T k .
The K-means algorithm also operates as a maximum likelihood estimator. The similarity function has therefore to be maximized and a pixel will be assigned to a given class if the Mahalanobis distance separating this pixel from the centre of mass is minimized. A pixel class will be suppressed if it consists of only few elements (namely less than 4/100 of the total number of pixels). In this case, these elements will be assigned to the nearest neighbouring classes. Taking into account such conditions, the number of classes and their composition are modified, new centres of inertia are defined for these classes and the process is repeated until the different classes are sufficiently populated and the Mahalanobis distance is reduced to a stable minimum value. Finally, when the K-means algorithm stops, eight homogeneous classes are obtained for the image of figure 1.

Results and discussions
The techniques described in §3 have firstly been tested upon an image of the Brodatz album. This image consists of 2566256 pixels gathered into four squares of same size and different textures. The number of grey levels is 256 and each square represents a woven surface with uneasily detectable edges. The rate of classification obtained in this case is found to be more than 96%.
In the second step, the techniques under consideration are applied to a set of Meteosat images recorded in December 1994 ( §2). To illustrate the image processing, two Meteosat images recorded in the visible and infrared channels on 19 December 1994 at 14:00 UTC, are shown in figures 1 and 2. In both images, sea and continent are represented by dark grey and black regions, whereas clouds are shown as light grey and white surfaces. In the visible channel, the reflectance of the clouds is mainly influenced by their constitution and their thickness. In the infrared channel, the brightness essentially depends on the top temperature. We may expect that the brighter clouds in the visible image (figure 1) are thicker. In the corresponding infrared image, these clouds are also brighter, meaning that their top is cold and thus at high altitude.
The cloud cover over the regions observed by Meteosat can be better distinguished if a coloured combination of visible and infrared images is used. Figure 3 has then been constructed by mixing figures 1 and 2. In the resulting compound image, clouds look light beige at low altitude and white at high altitude. The main components of the cloud cover can be identified by analysing figures 1, 2 and 3 together. Altocumulus and stratocumulus clouds are covering part of North Africa and the Mediterranean Sea, stratocumulus clouds are lying over southern Europe and low cellular convection cumulus is growing upon the Atlantic Ocean. From the visible image of figure 1, energy, contrast, entropy, correlation and local homogeneity are computed for different pixel distances (d) and for the orientations h~0 ‡, 45 ‡, 90 ‡ and 135 ‡. The number of components representing these parameters are reduced by means of the KLT. These parameters are then averaged over the four directions. The most significant contributions of the textural features are found for small pixel distances. Figures 4, 5, 6, 7 and 8 give the respective maps of energy, contrast, correlation, entropy, and local homogeneity obtained for d~1 (i.e. 5 km).
All these pictures show that the cloud texture is much finer than the ground one. Energy mostly yields indications about the ground texture. Contrast and entropy (or correlation), and local homogeneity practically give the same information about ground and cloud textures. It should be noted that contrast is the most representative parameter since it describes the image texture very well.
To obtain the appropriate segmentation of Meteosat images, the principal textural parameters arising from the visible image, namely contrast and local homogeneity, and the brightness of infrared image are considered. Then, the partition of these images into homogeneous classes are performed by using the Kmeans algorithm. As explained in §3.3, eight different classes arise from such a segmentation. These classes are shown in figure 9.
In this image, the two main sorts of clouds previously identified are represented by the blue and yellow classes. The yellow class is assigned to the elevated clouds present over Europe, the Mediterranean Sea and North Africa. Let us recall that these clouds are high since their brightness is strong. Their texture is found to be both fine and uniform. The blue class corresponds to the lower clouds covering the North Atlantic and Europe. Most of these clouds are dark and scattered. They are  also present over the Mediterranean Sea and North Africa. Rifts in the clouds observed over the North Atlantic are described by the light green class. Some blue spots appearing over Algeria can be identified as thin clouds lying at low altitudes,   the sea is assigned to the light blue class. In North Africa, the Sahara desert and Atlas mountains are represented by orange and dark blue classes, respectively. Taking into account that the ground in North Africa is less wet than the ground in Europe, the bottle-green class is attributed to the vegetation growing in arid areas of North Africa whereas both the sea and the vegetation covering Europe belong to the dark green class. Thus, the Atlantic Ocean cannot be distinguished from the European continent, owing to the similarity of their respective textures.

Conclusion
Thanks to the SADH method, the clouds, soils and seas shown in the Meteosat images collected on 19 December 1994 have been properly represented by eight classes of texture. This method has the advantage of being easily implemented. The classification presented in the preceding sections is based on the analysis of the texture anisotropy of the visible images, the selection of pertinent textural parameters, namely contrast and local homogeneity, and their combination with the brightness of infrared images. Such a classification yields a clear segmentation of the images under study. In spite of the complexity of the details composing these images, the resulting classes are accurately identified, their texture is homogeneous and their edges are well defined, demonstrating the pertinence of the segmentation method presented in §3. The rate of classification is better than 96% and the time of computation is three times shorter than that required by GLCM or GLDV methods.
However, some limitations have to be mentioned. As for the other texture-based methods, the classes assigned to the cloud cover can be occupied by areas where clouds lying at different altitudes are superimposed. Consequently, these classes cannot yield complete information about the type of clouds thus observed. Another ambiguity arises from the confusion of the classes attributed to the Atlantic Ocean and the European continent. It is possible that the image classification could be improved after further investigations of the statistical properties of the visible and infrared images. In spite of these limitations, the results presented above can be usefully applied to the evaluation of the precipitation fields and solar energy resources in the regions under study.