A STUDY ON DATA MINING AND STATISTICAL METHODS USED IN DIABETES MELLITUS DIAGNOSIS.

Diabetes is one of the most prevalent diseases in the world today with high mortality and morbidity rate, thus one of the biggest health problems in the world. Diagnosis of diseases is a vital role in medical field. The use of data mining on medical data brings important, valuable and effective achievement, which can enhance the medical knowledge to make necessary decision. The paper is organized as follows; it first gives a study done on diabetes and its types. Second it explains the Data Mining techniques and Statistical method used to predict Diabetes. Then the paper ends by concluding with summary of investigated methods


448
Diabetes mellitus is a clinically and genetically heterogeneous group of disorders that have one common Featureabnormally high levels of glucose in the blood due either to insulin deficiency or to resistance of the body's cells to the action of insulin.

Types of diabetes:-
There are three main types of diabetes: Type-I diabetes used to be called juvenile-onset diabetes or insulin dependent diabetes. It is usually caused by an auto-immune reaction where the body's defence system attacks the cells that produce insulin. People with Type-I diabetes produce very little or no insulin. The disease may affect people of any age, but usually develops in children or young adults. People with this form of diabetes need injections of insulin every day in order to control the levels of glucose in their blood. If people with Type-I diabetes do not have access to insulin, they will die.
Type-II diabetes used to be called non-insulin dependent diabetes or adult-onset diabetes, and accounts for at least 90% of all cases of diabetes. It is characterized by insulin resistance and relative insulin deficiency, either or both of which may be present at the time diabetes is diagnosed. The diagnosis of type-II diabetes can occur at any age. Type-II diabetes may remain undetected for many years and the diagnosis is often made when a complication appears or a routine blood or urine glucose test is done. It is often, but not always, associated with overweight or obesity, which itself can cause insulin resistance and lead to high blood glucose levels. People with Type-II diabetes can often initially manage their condition through exercise and diet. However, over time most people will require oral drugs and or insulin.
Gestational diabetes (GDM) is a form of diabetes consisting of high blood glucose levels during pregnancy. It develops in one in 25 pregnancies worldwide and is associated with complications to both mother and baby. GDM usually disappears after pregnancy but women with GDM and their children are at an increased risk of developing Type-II diabetes later in life. Approximately half of women with a history of GDM go on to develop Type-II diabetes within five to ten years after delivery. Table-1 shows the Normal Glucose Level chart. The purpose of data mining is to extract useful information from large databases or data warehouses. Data mining applications are used for commercial and scientific sides [13]. Data mining is process of selecting, exploring and modeling large amounts of data in order to discover unknown patterns or relationships which provide a clear and useful result to the data analyst [14]. KDD process may Consists several steps: like data selection, data cleaning, data transformation, pattern searching i.e. data mining, finding presentation, finding interpretation and finding evaluation [15].Data mining technique are applied to analyse medical data for decision-making to guide the physicians. Figure

Data Mining Techniques used in predicting diabetes Mellitus:-
The

Statistical Techniques and methods used in predicting diabetes Mellitus:-
The

Metric used in Performance Evaluation
A distinguished confusion matrix was obtained to calculate sensitivity, specificity and accuracy. Confusion matrix is a matrix representation of the classification results Table-4 shows the confusion matrix. From the confusion matrix to analyse the performance criterion for the classifiers in disease detection accuracy, precision, recall have been computed for all datasets. Accuracy is the percentage of predictions that are correct. The precision is the measure of accuracy provided that a specific class has been predicted. Recall is the percentage of positive labeled instances that were predicted as positive [5]. The fitness criteria are calculated as follows:

Conclusion:-
The main goal of medical data mining algorithm is to get best algorithms that describe given data from multiple aspects. The study made gives a various data mining and statistical method used in diagnosis diabetes Mellitus. Diet plays a main role in preventions and treatment of diabetes. Various factor are responsible for Type-I and Type-II diabetes. Awareness is needed to the people about self management and have a methodology which provide valuable information regarding improvement of healthcare using the application like smartphone So with the help of newer statistical application there is need to more study the causes of increasing diabetes in people mostly in youth because diabetes have long term complications such as retinopathy, neuropathy & nephropathy in diabetic patients. And there is also need to use proper methods, because poor methods affect the reliability of prediction model & ultimately compromise the accuracy of result. It is recommended to diagnosis diabetes with other methods such as Neuro Fuzzy Networks and compare with the algorithm used in this study, to determine the better method to diagnose the diabetes.