22Apr 2017

CLASSIFICATION OF WEB DOCUMENTS USING HYBRID FEATURE SELECTION.

  • Research Scholar, Periyar E.V.R College (Autonomous), Trichy.
  • Assistant Professor, Periyar E.V.R. College (Autonomous), Trichy
Crossref Cited-by Linking logo
  • Abstract
  • Keywords
  • References
  • Cite This Article as
  • Corresponding Author

Knowledge discovery and data mining is a process of retrieving the meaningful knowledge from the raw data, using different techniques. Therefore, text mining is a sub domain of knowledge discovery from the text data. Web mining is a one class of data mining. Web Mining is a variation of data mining that distills untapped source of abundantly available free textual information. The need and importance of web mining is growing along with the massive volumes of data generated in web day-to-day life. Feature selection is an effective technique for dimension reduction and an essential step in successful data mining applications. It is a research area of great practical significance and has been developed and evolved to answer the challenges due to data of increasingly high dimensionality. In this paper, a hybrid feature selection is proposed. The Relative Reduct and Particle Swarm Optimization Technique are hybridized to reduce the size of the feature space


  1. Vajrapu Anusha, Banda Sandhya, "A Learning Based Emotion Classifier with Semantic Text Processing", Advances in Intelligent Systems and Computing, 2015, pp.371-382.
  2. Heng Chen, Hai Jin, Feng Zhao, Hanhua Chen, Fei Fang, "A Novel Vector Representation Model for Text Mining Based on Enhancing Features," Journal of Internet Technology, Vol. 16 No. 3, PP. 476-485, 5 2015
  3. Verma V.K, Ranjan M, Mishra P, "Text Mining and Information Professionals: Role, issues and Challenges", Emerging Trends and Technologies in Libraries and Information Services (ETTLIS), 2015 4th International Symposium on 6-8 January 2015, pp.133-137.
  4. Xiang Ren, Ahmed El-Kishky, Chi Wang and Jiawei Han, "Automatic Entity Recognition and Typing from Massive Text Corpora: A Pharse and Network Mining Appraoch", PMC US National Library of Medicine National Institutes of Health, August 2015, pp.2319-2320.
  5. SARVNAZ KARIMI and CHEN WANG and ALEJANDRO METKE-JIMENEZ and RAJ GAIRE and CECILE PARIS, "Text and Data Mining Techniques in Adverse Drug Reaction Detection", ACM Computing Surveys, Vol. 1, No. 1, Article 1, January 2015, pp.1-37.
  6. Hsin-Chang Yanga, Chung-Hong Lee, Han-Wei Hsiao, "Incorporating Self-Organizing Map with Text Mining Techniques for Text Hierarchy Generation", Applied Soft Computing, April 2015, pp.1-25.
  7. Naw, Naw, and Ei Ei Hlaing. "Relevant words extraction method for recommendation system." Bulletin of Electrical Engineering and Informatics 2.3 (2013): 169-176.
  8. Adeva, JJ Garc?a, et al. "Automatic text classification to support systematic reviews in medicine." Expert Systems with Applications 41.4 (2014): 1498-1508.
  9. Lima, Rinaldo, Bernard Espinasse, and Fred Freitas. "Relation Extraction from Texts with Symbolic Rules Induced by Inductive Logic Programming." Tools with Artificial Intelligence (ICTAI), 2015 IEEE 27th International Conference on. IEEE, 2015.
  10. S and N. Radha, ?Text Classification using Keyword Extraction Technique?, International Journal of Advanced Research in Computer Science and Software Engineering, pp.2013.
  11. Koplenig, Alexander, et al. "The statistical trade-off between word order and word structure-large-scale evidence for the principle of least effort." arXiv preprint arXiv:1608.03587 (2016).
  12. Iqbal, Farkhund, et al. "A unified data mining solution for authorship analysis in anonymous textual communications." Information Sciences 231 (2013): 98-112.
  13. Taghandiki, Kazem, Ahmad Zaeri, and Amirreza Shirani, "A Supervised Approach for Automatic Web Documents Topic Extraction Using Well-Known Web Design Features." (2016).
  14. obon-Mejia, Diego Alejandro, et al. "A data-driven failure prognostics method based on mixture of Gaussians hidden Markov models." IEEE Transactions on reliability 61.2 (2012): 491-503.
  15. Grimmer, Justin, and Brandon M. Stewart. "Text as data: The promise and pitfalls of automatic content analysis methods for political texts." Political Analysis (2013): mps028.
  16. Anami, Basavaraj S., Ramesh S. Wadawadagi, and Veerappa B. Pagi. "Machine learning techniques in Web content mining: a comparative analysis." Journal of Information & Knowledge Management 13.01 (2014): 1450005.
  17. Xing Zhai, Zhihong Li, Kuo Gao, Youliang Huang, Lin Lin, Le Wang, "Research Status and Trend Analysis of Global Biomedical Text Mining Studies in recent 10 years", Scientometrics, Volume 105, Issue 1, October 2015, pp.509-523.
  18. Sheng Yu, Katherine P Liao, Stanley Y Shaw, Vivian S Gainer, Susanne E Churchill, Peter Szolovits, Shawn N Murphy, Isaac S. Kohane, Tianxi Cai, "Toward High-Throughput phenotyping: unbiased Automated Feature Extraction and Selection from Knowledge Source", Journal of the Americal Medical Informatics Association, 2015, pp.993-1000.
  19. Basant Agarwal and Namita Mittal, "Sentiment Classification using Rough Set based Hybrid Feature Selection", Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis- Association for Computational Linguistics, pp.115-119, 2013.
  20. Girish Chandrashekar, Ferat Sahin, "A Survey on Feature Selection Methods", Computers and Electrical Engineering-Elsevier, pp.16-28, 2014.
  21. Jan Platos, Vaclav Snasel, Tomas Jezowicz, Pavel Kromer, Ajith Abraham, ?A PSO-Based Document Classification Algorithm accelerated by the CUDA Platform?, 2012 IEEE International Conference on Systems, Man, and Cybernetics October 14-17, 2012, COEX, Seoul, Korea.
  22. Xiangyang Wang, Jie Yang, Xialong Tens and Weijan Xia, Richard Jension, ? Feature selection basedon Rough Set and Particle Swarm Optimization?, Pattern Recognition Letters, 2007, pp: 459-471.
  23. http://en.wikipedia.org/wiki/Rough_set
  24. Dataset is collected from KEEL Repository. Dataset Source: http://sci2s.ugr.es/keel/

[V. David Martin and T. N. Ravi. (2017); CLASSIFICATION OF WEB DOCUMENTS USING HYBRID FEATURE SELECTION. Int. J. of Adv. Res. 5 (Apr). 174-181] (ISSN 2320-5407). www.journalijar.com


V David Martin
Research Scholar, Department of Computer Science, Periyar E.V.R. College (Autonomous)

DOI:


Article DOI: 10.21474/IJAR01/3793      
DOI URL: http://dx.doi.org/10.21474/IJAR01/3793