Protuberance Selection descriptor for breast cancer diagnosis

In breast cancer field, researchers aim to automatically discriminate between benign and malignant masses in order to assist radiologists. In general, benign masses have smoothed contours, whereas, malignant tumors have spiculated boundaries. In this context, finding the adequate description remains a real challenge due to the complexity of mass boundaries. In this paper, we propose a novel shape descriptor named the Protuberance Selection (PS) based on depression and protuberance detection. This descriptor allows a good characterization of lobulations and spiculations in mass boundaries. Furthermore, it ensures invariance to geometric transformations. Experimental results show that the specified descriptor provides a promising classification performance. Also, results confirm that the new PS descriptor outperforms several shape features commonly used in breast cancer domain.


INTRODUCTION
Since several decades, breast cancer had regained a great importance from radiologists and researches. This kind of cancer threatens one of ten women life and the optimal way to decrease mortality rate is the early detection. In this context, mammography is widely recognized as the most reliable technique to perform such early detection. In mammographic images, benign masses appears with a circumscribed contour, whereas, malignant tumors are distinguished with spiculated boundaries.
Considerable works show the importance of shape features in breast cancer diagnosis [1]. In fact, the mass margin characteristics are the most important criteria deciding whether the mass is likely to be benign or malignant [2]. Several geometrical shape features where proposed in literature such as circularity, compactness and rectangularity [3]. Menut et al. [4] achieved a classification accuracy of 76% using four features based on mean and variance width of the parabolic segments. Also, Rangayyan et al. [5] propose an other boundary modeling method that treats the mass boundary as a union of piecewise continuous and locallysalient concave and convex parts. Authors achieved 91% as classification accuracy of circumscribed versus spiculated breast masses. Shi et al. [6] evaluate the classification performance of 24 individual features such as the spiculation and the patient informations. They demonstrate that, among these evaluated features, the spiculation measure had the best area under the Receiver Operating Characteristic (ROC) curve: =0.78. In fact, performed researches prove that features based on spiculation and concavity measures are very effective for mass characterization. Also, classification should be performed independently of the lesion position in the mammogram. Therefore, the shape description must be invariant in relation to geometric transformations such as translation, rotation and scaling.
In this context, we propose, the Protuberance Selection (PS) descriptor based on depression and protuberance detection for breast cancer classification. The PS descriptor, first, satisfies geometric invariance and, second, allows to well distinguish malignant from benign solid breast lesions. The remainder of the paper is organized as follows. Next section is preserved to detail the formulation of the proposed descriptor. Section 3 is allocated to experimental results. First, ROC curves are represented to assess the ability of the descriptor to discriminate between benign and malignant masses. Second, a comparison between PS and other descriptors is performed. The last section is dedicated to provide conclusions.

PROTUBERANCE SELECTION (PS)
Generally, a benign mass has a regular round or oval form with smoothed boundary. However, a malignant mass has an irregular form with a spiculated and a rough blurry boundary [7]. So, respecting a good shape characterization, improves at most mass classification performance. For this, we intend to detect lobulations that describe the spiculation rate of the mass contour. Therefore, we propose the Protuberance Selection (PS) as detailed below.

Spiculation detection
We follow the contour fluctuation by measuring the sign variation of the derivatives according to abscissa and ordinate. In fact, a derivative preserves the same sign when considering the contour in a given direction. Therefore, we could extract interest points allowing to characterize protuberances and depressions by detecting the derivative sign variation.
We consider the contour of a lesion , defined on the interval ,a sas e to f plane curves in such manner that = { 1 ∪ ... ∪ ... }. Each plane curve admits a parametric representation of class 1 on the interval ∈ so that for: where ∈ and and are continuously differentiable on . We denote by ( )= ( ( ), ( )) and we compute the derivatives ( ) and ( ) of respectively ( ) and ( ) for each contour point as follows: where + ℎ ∈ and ℎ>0. Since, the proposed measures of derivatives are sensitive to noise, we consider ℎ>1 which allows to smooth the contour and to obtain more stable derivatives.
We note by the number of points in the contour and we compute initially, the n-dimensional vectors and . These vectors represent respectively the derivative of ( ) and ( ) with respect to for each contour point , ∈{1, 2, ..., }. and equations are given by: The null values in and vectors represent stationary points obtained where the corresponding tangent is horizontal or vertical. We mention that when the second derivative is negative and the first derivative is null, we detect only inflection points and we could miss certain lobulations in the contour. So, we have just to follow the sign variation of the first derivative before and after stationary points. For this, we remove zero values from and . We define two new vectors ′ of the size 1 ≤ and ′ of the size 2 ≤ as: where ℜ * is the set of non null real numbers.
When two successive elements of ′ (or ′ )h a v et h e same sign, the contour keeps the same direction according to ( ) (or ( )). However, any sign variation between two successive elements implies a direction change and so a presence of lobulation. Let be the indicator function allowing to follow the derivative sign variation: Using the indicator function , we determine the contour lobulation position. We store coordinates of detected lobulations from ′ and ′ in and matrices:  Considering that and sizes are respectively ( , 2) and ( , 2), we should note that ≤ 1 and ≤ 2 .W e define the matrix gathering the two sets of detected lobulation coordinates as = ∪ of the size ( , 2). Since the same lobulation could be detected twice through ′ and ′ sign variation, is always less or equal to ( + ).  Figure 1.c demonstrates that covers all lobulations (protuberances and depressions) in the contour.

Protuberance selection
Computation of spiculation measure relies only on protuberances whereas detected lobulations include both protuberances and depressions. So, next operation is the elimination of depressions in order to keep only protuberances reflecting the number of spiculations. We exploit the fact that a protuberance is defined by at most 4 neighbors belonging to the lesion. We compute for each element in the intensity sum of its 8 neighbors. In fact, we consider ℎ as the ℎ neighbor of each element in and ( ℎ )=1when the pixel is inside the lesion and ( ℎ )=0when it is outside. We define as a matrix containing coordinates of interest points characterizing the protuberances. In fact, consists of elements which have at most 4 neighbors belonging to the lesion.
The Protuberance Selection based on depression and protuberance detection is then: = ℎ( ). Figure  2.a details the elimination of depressions by means of neighborhood intensity and figure 2.b summarizes the obtained protuberances.

CLASSIFICATION RESULTS
The general methodology of breast cancer Computer Aided Diagnosis (CAD) system contains principally three main steps namely segmentation, description and classification. The segmentation consists on extracting the mass contour from a region of interest. The lesion description uses specified features to characterize masses. Finally, classification allows to take decision and to distinguish between malignant and benign masses. In the following, we detail our adopted CAD system.

Used database
To perform benign versus malignant mass classification, first, images are selected from a publicly available database, the Digital Database for Screening Mammography (DDSM), assembled by a research group at the University of South Florida [8]. Mass boundaries in the DDSM have been subjectively characterized as (1) spiculated, (2) circumscribed, (3) ill defined, (4) microlobulated, and (5) obscured. The considered data set consists of 242 masses (128 benign/114 malignant) which are partitioned into 130 training (70 benign/60 malignant masses) and 112 test (58 benign/54 malignant masses) sets. Figure 3 shows some samples of the used data set. First line represents benign masses and second line malignant ones.

Segmentation
We define the boundary contour using the region-based active contour model as proposed by Li et al. [9]. The proposed model is able to segment images with intensity inhomogeneity. Also, it achieves good performance for images with weak object boundaries such the case of ill defined and obscured margins. We present in figure 4 two different masses segmented using the proposed level set evolution without reinitialization [9].

Feature extraction
We compute the descriptor for the whole data set. In order to assess the pertinence of the proposed feature, we compare it to other shape features proposed in literature.

Normalized Radial Length features
Kilday et al. [10] developed a set of six shape features based on the Normalized Radial Length (NRL) from the objects centroid to the points on the boundary. The NRL features have had a good success in CAD applications and provide satisfying results [11]. Chen et al. [12] proposed new features from the NRL properties which had shown higher performance than basic NRL features. The normalized radial length ( ) is filtered using a moving average filter and the filtered curve is noted ( ). From the NRL features proposed by Chen et al. [12], we select the difference of standard deviation ( ) and the entropy of the difference between ( ) and ( ) named defined as follows: is the standard deviation of ( ) which is the result of ( ) filtered using the moving average filter. As while tumor shape becomes more irregular, approches higher values.
where is the probability that | ( ) − ( )| will be between | ( ) − ( )| and | ( ) − ( )| +1/ . is the number of bins the normalized histogram, ranging in the [0,1] interval, has been divided in ( = 100 in our analysis). This parameter is a measurement of the distribution for the difference between ( ) and ( ).T h e value decreases while NRL approaches regularity.

Compactness
Among well known geometrical features, compactness noted has proven to be a good measure for classifying breast lesions by their shape [13].
Where P and A are the mass perimeter and area respectively. Compactness represents the roughness of an objects boundary relative to its area. The smallest compactness values are for regular contours whose perimeters are lower than of complicated shapes.

Curvature
We compare, also, to the curvature which behavior is based on spiculations. In 2D, curvature at a given point on curve is defined as the inverse of the radius of the osculating circle at . The osculating circle can be found as follows: for any two points and near compute the unique circle passing through , and . If these points are collinear, than the circle has infinite radius and null curvature [14,15].
The radius of the osculating circle is defined as: where = | |, = | | and = | |. In order to perform a reasonable comparison between the different features, we encode them on the same conditions (same database, segmentation method and classifier).

Classification and discussion
Finally, classification of breast masses is performed using the Support Vector Machine (SVM) classifier. SVM is a machine learning technique based on statistical learning theory [16]. When used in classification, SVM maps the input space to higher dimensional feature space and constructs a hyperplane, which separates class members from non-members. Given an input set and a projection space . The function that returns in the space the inner product between two variables , ′ ∈ is known as the kernel function . The SVM decision function is: ( , )+ where and are training parameters, the query image. We evaluate the classification performance by means of the Receiver Operating Characteristic (ROC) curve analysis using for this aim the same training and test database.   =0.93. The higher value of the area under the ROC curve proves that applied descriptor which preserves the same value of for the same form independently of its translation, rotation or scaling, insures a good classification rate. Table 1 shows the performance of the different features in terms of the area under the ROC curve. We remark that the curvature relatively fails to well classify circumscribed/spiculated lesions with area under ROC =0 .76. In such case, we have to determine, for each lesion, the corresponding threshold to find the correct number of spiculations. Such procedure prevents the feature extraction automation and also the use of an automated computer aided diagnosis system.
The difference of standard deviation ( ) descriptor provides the area under ROC =0.78 while the entropy of the difference between ( ) and ( ) noted provides satisfying results with =0 .87. This result proves that the radial length measures could be used for the characterization of spiculations. In fact,NRL features proposed by [12] allow to characterize surface roughness. They are based on the centroid of a mass which provide satisfying results with a generally round boundary. However, in the case of complex shapes, the centroid may lie outside the tumor region and could not be a valid point to measure distance to the boundary.
The compactness provides a satisfying result with =0.84. It has been shown that geometrical features could be very informative and could improve classification results especially when they are used in junction with other features [17]. However, used individually they could not be robust enough to provide all necessary information about mass complexity.
The best result is obtained using the proposed Protuberance Selection descriptor.
outperformes all the other shape-based features and seems to be the most effective in the benign versus malignant classification of breast masses. The proposed feature satisfies invariance criterion and is based on spiculation detection which is the most informative detail in contour about malignancy.

CONCLUSIONS
In this paper, we have proposed the Protuberance Selection ( ) descriptor based on depression and protuberance detection . This descriptor was devised to discriminate between benign and malignant masses in mammographic images in order to detect the breast cancer at its first stage. Classification efficiency is achieved using the area under the receiver operating characteristics (ROC) curve. Compared to known descriptors based on spiculation measures, provides better classification results and seems to be a relevant descriptor adapted to breast cancer recognition.