28Jun 2017

PERFORMANCE EVALUATION OF WEKA CLUSTERING ALGORITHMS ON LARGE DATASETS

  • Department Of Computer Science, Himachal Pradesh University, Shimla, India.
  • Abstract
  • Keywords
  • References
  • Cite This Article as
  • Corresponding Author

Data Mining is the process of analyzing data from different viewpoints and summarizing it into useful information. By using Data mining tool, the user can analyze data from different dimensions or angles, categorize it, and process the relations recognized. Clustering is one of most widely used techniques in data mining. Clustering is the process of grouping data by finding similarities between data based on their features. Similar Items are grouped in one cluster and dissimilar in another. In this paper, a comparative study of nine clustering algorithms is performed. For comparison three datasets are used. The main objective of the study is to observe the effect of size of different dataset on data mining tool and clustering algorithms. The dataset chosen for comparison are diverse in terms of number of attributes and instances. All the nine algorithms are compared according to the factors such as size of the dataset, number of clusters and time taken to form clusters. For performing comparison, data mining tool Weka is used. Also the performance of Weka for handling large datasets is analyzed.


  1. Han and M. Kamber,?Data Mining, Concepts and Techniques?,Second Edition, Morgan Kaufman Publishers
  2. Kalyani M Raval, ?Data Mining Techniques?, International Journal of Advanced Research in Computer Science and Software Engineering , Volume 2, Issue 10, October 2012
  3. Smita, Priti Sharma, ?Use of Data Mining in Various Field: A Survey Paper?, IOSR Journal of Computer Engineering (IOSR-JCE) ,Volume 16, Issue 3, Ver. V (May-Jun. 2014)
  4. Prachi Surwade, Prof. Satish S. Banait, ?A Survey on Clustering Techniques For Mining Big Data?, International Journal Of Advanced Research in Science And Management, Volume 2, Issue 2, Feburary 2016
  5. Harshada S. Deshmukh, Prof. P. L. Ramteke, ?COMPARING THE TECHNIQUES OF CLUSTER ANALYSIS FOR BIG DATA?,International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), Volume 4 Issue 12, December 2015
  6. KeshavSanse, Meena Sharma, ?Clustering methods for Big data analysis?, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), Volume 4 Issue 3, March 2015
  7. Sajana, C. M. Sheela Rani and K. V. Narayana, ?A Survey on Clustering Techniques for Big Data Mining ?,Indian Journal of Science and Technology, Vol 9(3), DOI:10.17485/ijst/2016/v9i3/75971, January 2016
  8. Narendra Sharma, Aman Bajpai, Mr. Ratnesh Litoriya, ?Comparison the various clustering algorithms of weka tools?, International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012)
  9. Sunita B Aher, Mr. LOBO L.M.R.J, ?Data Mining in Educational System using WEKA?, International Conference on Emerging Technology Trends (ICETT) 2011 Proceedings published by International Journal of Computer Applications? (IJCA)
  10. Garima, Hina Gulati, P.K.Singh, ?Clustering Techniques in Data Mining: A Comparison?,2nd International Conference on Computing for Sustainable Global Development, 2015
  11. Prakash Singh, Aarohi Surya, ?PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA?, International Journal of Advances in Engineering & Technology, Jan., 2015
  12. Sapna Jain, M AfsharAalam, M. N Doja,? K-MEANS CLUSTERING USING WEKA INTERFACE?, ,Proceedings of the 4th National Conference; INDIACom-2010 Computing For Nation Development, February 25 ? 26, 2010
  13. Rupali Patil, Shyam Deshmukh, K Rajeswari, ?Analysis of Simple K-Means with Multiple Dimensions using WEKA?, International Journal of Computer Applications (0975 ? 8887) ,Volume 110 ? No. 1, January 2015
  14. Mugdha Jain, Chakradhar Verma, ?Adapting k-means for Clustering in Big Data?, International Journal of Computer Applications (0975 ? 8887) ,Volume 101? No.1, September 2014
  15. Olga Kurasova, VirginijusMarcinkevicius, Viktor Medvedev, AurimasRapecka, and Pavel Stefanovic , ?Strategies for Big Data Clustering?, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence , DOI74110.1109/ICTAI.2014.115
  16. Bhagyashri S. Gandhi, Leena A. Deshpande, ?The Survey on Approaches to Efficient Clustering and Classification Analysis of Big Data?,International Journal of Engineering Trends and Technology (IJETT) ? Volume 36 Number 1- June 2016
  17. Venkateswara Reddy Eluri, MS. Amina Salim Mohd AL-Jabri, Dr.M.RAMESH, Dr. Mare Jane, ?A Comparative Study of Various Clustering Techniques on Big Data Sets using Apache Mahout?, 2016 3rd MEC International Conference on Big Data and Smart City
  18. Aris-Kyriakos Koliopoulos, Paraskevas Yiapanis, FiratTekiner, Goran Nenadic, John Keane, ?A Parallel Distributed Weka Framework for Big Data Mining using Spark?, IEEE International Congress on Big Data,2015.

[Anju Parmar, Divya Chauhan and K.L. Bansal. (2017); PERFORMANCE EVALUATION OF WEKA CLUSTERING ALGORITHMS ON LARGE DATASETS Int. J. of Adv. Res. 5 (Jun). 2209-2216] (ISSN 2320-5407). www.journalijar.com


Anju Parmar
DEPARTMENT OF COMPUTER SCIENCE, HIMACHAL PRADESH UNIVERSITY, SHIMLA, INDIA

DOI:


Article DOI: 10.21474/IJAR01/4661      
DOI URL: https://dx.doi.org/10.21474/IJAR01/4661