Severity Analysis of Cervical Cancer in Pap Smear Images by using EEETCM, ERSTCM & CFE method based Texture Features and Hybrid Kernel based Support Vector Machine Classifier

S. Athinarayanan 1 and Dr. M. V. Srinath 2 . 1. Research Scholar, Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu, India. 2. Director, Department of MCA, STET Women's College, Mannargudi. ...................................................................................................................... Manuscript Info Abstract ......................... ........................................................................ Manuscript History

Classification of medical imagery is a difficult and challenging process due to the intricacy of the images and lack of models of the anatomy that totally captures the probable distortions in each structure. Cervical cancer is one of the major causes of death among women in worldwide. Proper and timely diagnosis can prevent the life to some level. Due to its importance, the aim of the paper is to investigate about the classification of Abnormal Cell of the pap smear image by using individual and combining individual feature extraction method with the classification technique. In this paper three feature extraction methods were used: From that three, two were individual feature extraction methods namely Effective Extending Enriched Texton Co-Occurrence Matrix (EEETCM) & Enriched Rough Set Texton Co-Occurrence Matrix (ERSTCM) and remained one was combining individual feature extraction method named asConcatenated Feature Extraction (CFE). The CFE method represents all the individual feature extraction method of EEETCM & ERSTCM features are combining together as one feature to assess their joint performance. Then these three feature extraction methods are tested over Hybrid Kernel based Support Vector Machine (HKSVM) Classifier. This Examination was conducted over a set of single cervical cell based pap smear images. The dataset contains four classes of images, with a total of 512 images. The distribution of number of images per class is not uniform. Then the performance was evaluated inboth the individual and combining individual feature extraction method with the classification techniques by using the statistical parameters of sensitivity, specificity & accuracy. Hence the resultant value of the statistical parameters described in individual feature extraction method with the classification technique, proposed ERSTCM+HKSVM Classifierhad given the better results than the other EEETCM+HKSVM Classifier and combining individual feature extraction method with the classification technique described, proposed CFE+HKSVM Classifier had given the better results than other EEETCM+HKSVM & ERSTCM+HKSVM classifiers.

Introduction:-
Cancer is incurable disease all around the world [1][2][3]. So many researchers, pathologists have found many number of methodologies to cure the cells which can be affected by cancer. In earlier days so many methodologies are available to detect the symptoms of cancer [4][5][6]. All around the world women were affected by two types of cancer viz., breast and cervical cancer [7,8]. Un controlled growth of cells in the part of breast is breast cancer [9]. The cells in part of the Cancer that forms in tissues of the cervix is cervical cancer [10]. Cervical Cancer cells originates from pre-cancerous, benign lesions in the uncontrolled cells which are present inside the cervix. The results interpreted by World Health Organization (WHO), the initial stage for the development of cervical cancer is mild dysplasia, thereafter its moderate dysplasia, severe dysplasia, finally leads to carcinoma in situ (CIS) as well as invasive cervical cancer [11,12].
Pap Smear is the most popular screening method to detect cervical cancer from cervicted portion (leisons). To detect the portion which is affected by cervical cancer, it requires a recognized laboratory, trained cytologist and their repeatable visited reports used for evaluate the results [13]. In order to detect cervical cancer automatically with the help of screening tests. Among them one of the most important method is the segmentation of cell nuclei from the stained specimens [14]. However, isolated nuclei of the cells in high-quality acquisitions provide difficult tricks in the segmentation is more number of nuclei with various characteristics under occurring different acquisition conditions in good-resolution scans of the complete microscope slide [15]. Thresholding is the most important method for segmentation process [16] and also it is an easiest method to convert gray scale image into binary image based on the global or local threshold value [17,[18]. Bi-level thresholding is a method which must be classifies the pixels into two groups. First one represents in which the pixels with gray levels are lie above a certain threshold value, whereas the other in which the pixels are lie below a certain threshold value [19,20].
Binaryclassification is one of the method which classifies the elements of a given cell into two groups based on the classification rule. Binary classification is one of the characteristic method used to determine if the patient which has been affected by the cancer [21]. Principally classification process focussed the detection of cancer with the help of pap smear screening test results. But, it is difficult to detect the critical stage of the cancer from the pap smear screening test images.
The Pap smear reporting classification has evolvedand been refined over time. In this present work, according to the WHO descriptive histological classification of the abnormal squamous cell of the image as mild dysplasia, moderate dysplasia, severe dysplasia and Carcinoma in Situ(CIS) [22] by using effective texture based feature extraction and classification methods. As a results produced from the proposed system we have determine whether the cell is become which stage of the cancer.

Related Works:-
First order statistical features and Second order grey level co-occurrence matrix (GLCM) with the SVM classifier was predict the treatment outcome of the patient is cured or relapsed from cervical cancer disease [23]. Higher-order statistical approaches have been successfully used in textural recognition. The author applied both low and high order statistical features to extract thecell texture, including maximum, minimum, range, edian, mean, standard deviation,energy, skewness, kurtosis and entropy for detecting cervical cancer [24]. A combination of features are calculated for a single stage of cancer and depending upon the number of positives we can predict for certain whether the cervical cancer has progressed to a particular stage or not [25].
Srinath et al. proposed or Computer aided detection for classifying the cervical cell as normal or abnormal based on the extracted selective feature of the nucleus like area value. Then the results were represented a given cell was normal means we come to a conclusion, that cell having the particular patient was not affected by the cervical cancer. But the results represented a given cell was abnormal means the particular patient was affected by the cervical cancer. Not only to end my proposed system whether the patient is affected by this cervical cancer, but also it is used for finding the severity of the cancer based on the above said same feature (area) up, down tuned values of the nucleus. According to the WHO classification of the pap smear image, abnormal cell was further classified into mild, moderate, Severe or CIS dysplasia respectively [26].Srinath et al. proposed test results shows the mean and area features based cell classification done as normal or abnormal cell and also the result would be abnormal means further the cell was used for identifying the stage of the cervical cancer. Hence the results would be more useful for the pathologists to reduce their work load and minimize the human error while improving the accuracy of diagnosis 2753 [7]. The rest of the paper is organized as follows: Our proposed multi class cancer classification system is presented in section 2. The detailed experimental results and discussions are given in section 3, while the conclusion is summarized in section 4.

Cervical Cancer Severity Classification System:-
Current manual screening methods are costly and sometimes result in inaccurate diagnosis caused by human error. The introduction of machine assisted screening will bring significant benefits to the community, which can reduce financial costs and increase screening accuracy. In this research article, we have developed, Severity class of cervical cancer classification system based on individual and combining individual texture features and hybrid kernel based support vector machine using pap smear images. Two major contribution of the proposed system is feature extraction and feature classification.

Feature extraction:-
The purpose of feature extraction is to reduce the original data set by measuring certain properties, or features, that distinguish one input pattern from another pattern. The extracted feature is expected to provide the characteristics of the input type to the classifier by considering the description of the relevant properties of the image into a feature space. In this paper, we proposed three novel feature extraction methods. From that three, two were individual feature extraction methods, they are Effective Extending Enriched Texton Co-Occurrence Matrix (EEETCM) and Enriched Rough Set Texton Co-Occurrence Matrix (ERSTCM) and the remained one was combining individual feature extraction features method named as Concatenated Feature Extraction (CFE). The CFE method represents all the individual feature extraction methods of EEETCM &ERSTCM features are combining together to one feature to assess their joint performance. The types of individual and combining individual feature extraction features method is given below. Individual Feature Extraction Methods:  Computation of Feature Vector F(V1) using EEETCM.  Computation of Feature Vector F(V2) using ERSTCM.

Combining Individual Feature Extraction Features Method:
 Computation of Feature Vector F(V3) using CFE. The detailed Process of the above feature extraction method is given below.

Individual Feature Extraction Methods: Computation of Feature Vector F(V1) using EEETCM:
In this method will be contributed to improve EETCMby adding new textons to detect information that lost. sothereare6textons.Thenew textons are horizontal bottom and vertical right. The aim of this method is addition to prevent the loss of information when pixels are co-occurred on horizontal bottom and vertical right.

Texton Detection:-
The Texton template defined in EEETCM are different from those in EETCM. In this method, six special texton types are used. defined on a 2 x 2 grid, as shown in figure.1(a-e). Denote the four pixels as V1, V2, V3 and V4. If the two pixels highlighted in the gray color have the same value, the grid will form a texton. Those  The Working Mechanism of Texton detection is illustrated in figure.2. In the final segmented nucleus image, we move 2 x 2 block from left to right and top to bottom throughout the image to detect textons with 2 pixel as step length. If a texton is detected, the detected pixel pair values in the 2 x 2 grids are kept unchanged. Otherwise it will have the zero value. Finally we will obtain a Texton image, denoted by T(x,y). This process results was shown in figure.2(a-e). After the formation of final texton image, the feature vector F(V1) (Five features such as, ASM, entropy, IDM, contrast and Maximumprobability) is extracted from it.  The six texton types used in EEETCM contain rich information than those in EETCM because the co-occurrence probability of two same-valued pixels is bigger than that of three or four same-valued pixels in a 2 x 2 grid. As for the texton detection procedure, the computational complexity of proposed EEETCM is slightly slower than EETCM because of additionally added 2 textons, but in terms of the effective texture information extraction the proposed EEETCM will better than EETCM.
Computation of Feature Vector F(V2) using ERSTCM:-As per Julesz [27] description a texton is a pattern which is shared by an image as a common property all over the image. Textures will be formed only if the adjacent elements lie within the neighbourhood. Texton Image has the discrimination power of color, texture and shape features. Based on the Julez"s [27] textons theory the texton cooccurrence matrices (TCM) algorithm, can describe the spatial correlation of textons for image retrieval. With the use of limited number of selected pixels the algorithm computes different features. Extraction of effective texture information will be increased by calculating texton using Rough texture. This method is represented Enriched Rough Set Texton Co-occurrence Matrices. In this Method. As a texture descriptor, good retrieval accuracy can be achieved especially for directional textures. The texton can be inclined with the use of critical distances between texture elements. Texture element size determines the critical distance. Texture can be resolved into minute units, like orientation, Texton classes of colors, elongated blobs of specific width, aspect ratios and terminators of elongated blobs. If texture elements are expanded to a large extent in one orientation discrimination reduces. If the elongated elements are not jittered in orientation texture gradients increase at boundaries. Hence by using a sub image of size 3 x 3 a texton gradient can be obtained. Here 20 textons of 3 x 3 grids are proposed as shown in figure  3. Even the co-occurrence probability of same valued pixels in 3 x 3 grids is smaller than that of 2 x2 grid, but the textons developed using 2 x 2 grid may not give complete information regarding direction. The computational complexity for using the overlapped components of 20 textons is also less to obtain final texton image. The 20 textons of 3 X 3 grid can detect textons in all directions and also the corners of the textures. If three pixels are highlighted and have same value then, grid will form a texton as shown in  figure.4. In the final segmented nucleus image, we move 3 x 3 block from left to right and top to bottom throughout the image to detect textons with 3 pixel as step length. If a texton is detected, the same value of the three pixels in the 3 x 3 grids are kept unchanged and the remained pixels are zero value. Finally we will obtain a Texton image, denoted by T(x,y). After the formation of final texton image, the feature vector F(V2) (Seven features such as contrast, inverse difference moment, correlation, variance, cluster shade, cluster prominence and Homogeneity) is extracted from it.

Difference between ERSTCM and EEETCM:-Texton Detection and Computational Complexity:-
Even thoughthe occurrence probability of same valued pixels in 3 x 3 grids in ERSTCM is smaller than the 2 x2 grid in EEETCM, but the 20 types of texton detection in ERSTCM contains rich and complete information than the six types of Texton detection in EEETCM method regarding all directions and corners of texture. This result was shown in the following figure.5.Then the computational complexity of the ERSTCMmethod is also less to obtain final texton image because the texton detected step length of this method is three pixels, when compared to the EEETCM method step length of two pixels.   Feature Classification using HK-SVM:-After feature extraction process, In-order to detect the severity of the cancer from the pap smear images, we perform the classification step. In our Proposed method, we have developedHybrid kernel based SVM method for classifyingthe abnormal cell of the pap smear image intoany one of the severity classes of mild, moderate, severe or CIS dysplasia. In computer science, such type of the classification process explained as following two phases. There are two phases in the support vector machine namely, (1) Training phase and (2) Testing phase.

Training Phase:-
The features of the three methods of EEETCM, ERSTCM and CFE for the corresponding number of the training images are given as input to the training phase. The input function gives the set of values which are non-separable. All the possible separations of the point set can be achieved by a hyperplane. For that, a set of data drawn from an unknown distribution, ((x1, y1), . . . , (x1, yl), xi) ∈ Rn, yi ∈ {−1, 1} and also a set of decision functions, or hypothesis space fλ : λ ∈∧ are given, where Λ (an index set) is a set of abstract parameters, not necessarily vectors. fλ : Rn → {−1, +1} is also called a hypothesis.
The set of functions fλ could be a set of Radial Basis Functions or a multilayer neural network. All the possible separations of the point set can be achieved by a hyperplane. In the Lagrange optimization formulation, we can find the optimal separating hyperplane normal vector, 2757 A kernel is any function K : Rn × Rn → R. This corresponds to a dot product for some feature mapping K(X1, X2) = φ(X1) · φ(X2) for some φ The kernel function can directly compute the dot product in the higher dimensional space. Introduce kernel-based Lagrange multipliers αi ≥ 0 ∀i -----------(1) Minimize L p with respect to w, b and maximize with respect to αi. In a convex quadratic programming problem, the plane is a nonlinear combination of the training vectors -------------(2) Thus, the hyperplane is separated into two clusters. The sample representation of this process is shown in figure. 5. We have analyzed the kernel equation from the existing work [28] and used them in the proposed work namely, RBF and polynomial function. Radial Basis Function The support vector will be the center of the RBF and σ will determine the area of influence. This support vector has the data space.

Experimental Results and Discussion:-
Experimental image Data set. :-For any machine learning algorithm, the database with which it is trained plays an important role. It is said that a machine can be made to learn and reproduce any human behaviour, provided it is trained with suitably precise database. The database prepared in this work consists of four classes of abnormal cell based 512 single cervical cell presented in pap smear images. These Images were collected from Muthamil Hospital, Tirunelveli District, Tamilnadu State, India in 2013 and these images were taken with 100X lens magnification using Olympus ch20i Microscope. Each image was examined and diagnosed by pathologists of that hospital before being used as reference for this study. The Sample Image data sets are shown in the table.2. The sample Pap smear experimental images of various classes is shown in Figure 6. The data set contains 512 images and these images were histologically diagnosed and graded based on World Health Organization (WHO) criteria as the following abnormal class distribution:  Mild dysplasia, 107 images. The Implementation was done in the tool of matlab.

Experimental Results& Comparative Analysis:-
This section describes the experimental results of the proposed classification method using Pap smear images with different types of cervical cancer. In the proposed method, the experimental image data set is divided into two sets such as training set and testing set. The details of this set was shown in table.2. The classifiers are trained with the training images and the classification accuracy is calculated only with the testing images. In the testing phase, the testing dataset is given to the proposed technique to find the cancers type in smear images and the obtained results are evaluated through evaluation metrics namely, sensitivity, specificity and accuracy [29], it is given by (eqn.6-8) -----------(6) ----------- (7) --------(8) Where TP corresponds to True Positive, TN corresponds to True Negative, FP corresponds to False Positive and FN corresponds to False Negative. These parameters for a specific category, say, Mild dysplasia are as follows: TP is True Positive (an image of "Mild dysplasia" type is categorized correctly to the same type), TN = True Negative (an image of "Non-Mild dysplasia" type is categorized correctly as "Non-Mild dysplasia" type), FP =False Positive (an image of "Non-Mild dysplasia" type is categorized wrongly as "Mild dysplasia" type) and FN is False Negative (an image of "Mild dysplasia" type is categorized wrongly as "Non-Mild dysplasia" type)."Non-Mild dysplasia" actually corresponds to any of the three categories other than "Mild dysplasia". Thus, "TP & TN" corresponds to the correctly classier images and "FP & FN" corresponds to the misclassified images.   Table 3, the row-wise elements correspond to the four categories and the column-wise elements correspond to the target class associated with that abnormal category. Hence, the number of images correctly classified (TP) under each category is determined by the diagonal elements of the matrix. The row-wise summation of elements for each category other than the diagonal elements corresponds to the "FN" of that category. The column-wise summation of elements for each category other than the diagonal element corresponds to the "FP" of that category. Similarly, "TN" of the specific category is determined by summing the elements of the matrix other than the elements in the corresponding row and column of the specific category. For example, among the 57 Mild dysplasia testing images, 45 images have been successfully classified (TP) and the remaining 12 images (first row-wise summation) have been misclassified to any of the non-Mild dysplasia categories (FN). Similarly, 12 images (first column-wise summation) from the other three categories (non-Mild dysplasia) have been misclassified as Mild dysplasia category (FP). In the Table 4, the classification accuracy of EEETCM with HKSVM in class 1(Mild dysplasia) type cancer is 92.31%, class 2(Moderate dysplasia) is 92.31%, class 3(Severe dysplasia) is 92.63% and class 4(carcinoma in situ) is 92.63%. The miss classification rate of class 1(Mild dysplasia) and class 2(Moderate dysplasia) type cancer is high compared to the other two classes. The Performance of overall classifier sensitivity, specificity and accuracy was determined by using the following equations (eqn: 9 -11). Based on the similarly approach of the Table.3, the confusion matrix of the ERSTCM+HKSVM method is illustrated in Table 5. In the Table 5, the row-wise elements correspond to the four categories and the column-wise elements correspond to the target class associated with that abnormal category. Hence, the number of images correctly classified (TP) under each category is determined by the diagonal elements of the matrix. The row-wise summation of elements for each category other than the diagonal elements corresponds to the "FN" of that category. The column-wise summation of elements for each category other than the diagonal element corresponds to the "FP" of that category. Similarly, "TN" of the specific category is determined by summing the elements of the matrix other than the elements in the corresponding row and column of the specific category.
For example, among the 57 Mild dysplasia testing images, 47 images have been successfully classified (TP) and the remaining 10 images (first row-wise summation) have been misclassified to any of the non-Mild dysplasia categories (FN). Similarly, 8 images (first column-wise summation) from the other three categories (non-Mild dysplasia) have been misclassified as Mild dysplasia category (FP). In the   Figure 7.

Individual and Combining individual features & Classification combination comparison:-
Based on the similarly approach of the Table.3, the confusion matrix of the ERSTCM+HKSVM method is illustrated in Table 7.  In the Table 7, the row-wise elements correspond to the four categories and the column-wise elements correspond to the target class associated with that abnormal category. Hence, the number of images correctly classified (TP) under each category is determined by the diagonal elements of the matrix. The row-wise summation of elements for each category other than the diagonal elements corresponds to the "FN" of that category. The column-wise summation of elements for each category other than the diagonal element corresponds to the "FP" of that category. Similarly, "TN" of the specific category is determined by summing the elements of the matrix other than the elements in the corresponding row and column of the specific category.  (10) and (11).  2763 For comparative analysis, the proposed cervical cancer classification system (CFE + HKSVM) is compared to the other methods (EEETCM + HKSVM & ERSTCM + HKSVM), the overall classification performance measured parameters of sensitivity, specificity &accuracy of the proposed method is 94.23%, 98.07%& 97.11% and this method produced better results than the existing methods of EEETCM + HKSVM & ERSTCM + HKSVM Results. The overall classification results of sensitivity, specificity and accuracy of existing and proposed method are shown in Figure 8.

Conclusion:-
In this paper, a novel approach of individual and combining individual feature extraction method with classification technique developed for identify the severity stage for the abnormal cell of the pap smear image. Two major contributions of this paper are feature extraction and classification. In feature extraction, we have taken the advantage of combining individual methods such as EEETCM & ERSTCM texture features as one feature to assess the joint performance. In Classification, multiple kernels are combined and developed a hybrid kernel based SVM classifier for improving the classification process. For comparative analysis, our proposed approach is compared with existing works in individual feature extraction with the classification method, the proposed ERSTCM+HKSVM produce better result than EEETCM+HKSVM and combining individual feature extraction with the classification method, the proposed CFE+HKSVM produce better result than EEETCM+HKSVM & ERSTCM+HKSVM methods in terms of the statistical parameters results sensitivity, specificity & accuracy. Hence, finally our proposed individual and combining individual method are proved good at detecting the severity class of the cervical cancer in the pap smear image.