DOCUMENT SUMMARIZATION USING SENTENCE BASED TOPIC MODELING AND CLUSTERING
- Research Scholar, Bharathiyar University.
- Professor, Bangalore University.
- Abstract
- Keywords
- References
- Cite This Article as
- Corresponding Author
In recent years, the practical application of automatic document summarization has become popular and numerous papers published based on the topic. There are many approaches to identify the significant portion of each document. Topic representation and modelling is an intermediate representation of the text that captures the topics discussed in the input and aids the automatic summarization. The significance of sentences decided based on the representations of topics in the input document. This article attempts to provide a comprehensive summary that includes sentence extraction, tokenization on the extracted sentences. Sentence based Structural Topic Modeling (STM) is used to determine important content for each domain in the integrated document and sentences are grouped using k-means clustering under each topic. Further Text Summarization of sentences under each topic achieved using its Term Frequency of each sentence. Finally, the sentences are arranged based on its Lexical Ranking score in the summarized text.
- Ravikiran Vadlapudi , Rahul Katragadda "On Automated Evaluation of readability of summaries: Capturing Grammaticality, Focus, Structure and coherence.", Proceedings of the NAACL HLT 2010 Student Research Workshop, pages 7?12, Los Angeles, California, June 2010. C 2010 Association for Computational Linguistics.
- Tadashi Nomoto. Bayesian learning in text summarization. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT ?05, pages 249?256, Stroudsburg, PA, USA, 2005. Association for Computational Linguistics
- Hal Daume, III and Daniel Marcu. Bayesian query-focused summarization. In ? Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, ACL-44, pages 305?312, Stroudsburg, PA, USA, 2006. Association for Computational Linguistics
- Mihalcea and P. Tarau. TextRank: Bringing order into texts. In Proceedings of EMNLP04and the 2004 Conference on Empirical Methods in Natural Language Processing, July 2004.
- Tadashi Nomoto. Bayesian learning in text summarization. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT ?05, pages 249?256, Stroudsburg, PA, USA, 2005. Association for Computational Linguistics.
- Wei Wang, Furu Wei, Wenjie Li, and Sujian Li. Hypersum: hypergraph-based semisupervised sentence ranking for query-oriented summarization. In Proceedings of the 18th ACM conference on Information and knowledge management, CIKM '09, pages 1855?1858, New York, NY, USA, 2009. ACM.
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, November 1999. Previous number = SIDL-WP-1999-0120
- Barzilay and McKeown, 2005, Sentence Fusion for Multidocument News Summarization, Journal of Computational Linguistics, Vol 31, issue 3, PP 297-328
- Greenbacker, 2011, Towards a Framework for Abstractive Summarization of Multimodal Documents, Proceedings of the ACL-HLT 2011 Student Session, pages 75?80, Portland, OR, USA 19-24 June 2011. Association for Computational Linguistics
- Genest and Lapalme, 2012, Fully Abstractive Approach to Guided Summarization, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 354?358, Jeju, Republic of Korea, 8-14 July 2012. c 2012 Association for Computational Linguistics
- Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet Allocation. The Journal of Machine Learning Research 3:993?1022
- Deepa Nagalavi, M.Hanumanthappa, N-gram Word prediction language models to identify the sequence of article blocks in English e-newspapers, Proceedings of International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), IEEE, 1109/CSITSS.2016.7779376 ISBN: 978-1-5090-1022-6.
[Augustine George and Dr. Hanumanthappa. (2018); DOCUMENT SUMMARIZATION USING SENTENCE BASED TOPIC MODELING AND CLUSTERING Int. J. of Adv. Res. 6 (May). 285-291] (ISSN 2320-5407). www.journalijar.com
Research Scholar, Bharathiyar University, Vice Principal Kristu Jayanti College