SUMMARIZATION OF CUSTOMER REVIEWS USING SENTENCE TAGGING AND ANALYSIS.

Technology has been exponentially changing over the years and Internet is easily available to everyone. E-Commerce is gaining more and more importance each day. Various E-Commerce sites are available that are dealing with selling products, goods and services. People buy various products online. With the increasing popularity of ecommerce, people give their opinions or reviews on different products. As many products are available on these sites, a large volume of customer reviews is also available corresponding to these products. People go through these reviews and make their decision whether to buy a product or not. Manufacturers also make use of these reviews to increase their sales and business. The customer reviews are considered more reliable than the description provided by the merchants or manufacturers on their products. However, to read all these reviews and go through the reviews of a product on different ecommerce sites is not possible for customers. So, a summarization system is required that will provide the summary by taking input of these reviews and help the customer to make their purchase decision. It will also help the manufacturers to see which products are popular among customers and accordi ngly

Technology has been exponentially changing over the years and Internet is easily available to everyone. E-Commerce is gaining more and more importance each day. Various E-Commerce sites are available that are dealing with selling products, goods and services. People buy various products online. With the increasing popularity of ecommerce, people give their opinions or reviews on different products. As many products are available on these sites, a large volume of customer reviews is also available corresponding to these products. People go through these reviews and make their decision whether to buy a product or not. Manufacturers also make use of these reviews to increase their sales and business. The customer reviews are considered more reliable than the description provided by the merchants or manufacturers on their products. However, to read all these reviews and go through the reviews of a product on different ecommerce sites is not possible for customers. So, a summarization system is required that will provide the summary by taking input of these reviews and help the customer to make their purchase decision. It will also help the manufacturers to see which products are popular among customers and accordi ngly they can plan their demand and supply method. This way the customers/manufacturers will have a summary of the products and it will save their time and energy.

Introduction:-
Customer reviews are asset for different organizations who are selling products as this will improve the business by providing the manufacturers with information on their products i.e. what are the strengths and weakness of a product. Based on this information, the organizations improve the quality of their product and try to increase the sales. Before buying any product online, the customers will go through the ratings of the product. In short, customer reviews help in increasing conversions and online business. The organizations use them to make their product better so that they can compete with other organizations and get a hold in the market.
Hundreds and Thousands of reviews are available on single product on the e-commerce sites. However, it is very tiresome job for a customer to go through all the reviews to make decision about the purchase of the product. Thus, a system to provide summarized text of all customer reviews will be helpful. The focus of this paper is to discuss current state of the art and propose an algorithm to provide summarized view of all the reviews for a given product.

232
Several different ways are used by different organizations to fetch reviews about the products from their customers. An example is, Amazon uses email to gain product reviews. These emails serve as the feedback of the product and describe the opinion of the customers. Social networking sites like Facebook, Twitter etc. are also used to get customer reviews. Some other ways are using customer service or suggestion cards by which customers are asked to leave opinions on various products.

Related Works:-
A lot of work has been done in the field of processing customer reviews. In this section numerous studies were reviewed and some of the research papers were considered that has been taken as motivation towards the study. People express their opinions online about the products these days. This trend has raised many techniques within the context of mining customer concerns from online product reviews. Jade Goldstein, Vibhu Mittal, Jaime Carbonell and Mark Kantrowitz in -Multi-Document Summarization by Sentence Extraction‖ 2000, discusses a text extraction approach to multi-document summarization that builds on single-document summarization methods by using additional, available information about the document set and the relationships between the documents. Multi-document summarization differs from single in that the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Their approach addresses these issues by using domain independent techniques based mainly on fast, statistical processing, a metric for reducing redundancy and maximizing diversity in the selected passages, and a modular framework to allow easy parameterization for different genres, corpora characteristics and user requirements. Since the system was not based on the use of sophisticated natural language understanding or information extraction techniques, summaries lack co-reference resolution, passages may be disjoint from one another, and in some cases, may have false implicature.
Kathleen R. McKeown. et al. (2001) in -Columbia multi-document summarization: Approach and evaluation‖ presented MultiGen and DEMS for Columbia multi-document summarization system built on the observation that depending on the intended purpose of the summary and on the types of document summarized. This technique focused on the summarization of sets of documents that all describe the same event or news. They used an enhanced version of MultiGen to summarize the document. They used alternative system DEMS (Dissimilarity Engine for Multi-Document Summarization) for biographical documents. While processing stage, the input articles are transformed into a uniform XML format. After that, the router components of the system determined the type of each input document set and direct the input texts to the summarizers.
Dave, K., Lawrence, S., and Pennock, D., 2003, Classify sentences obtained from web search results using Sentiment Classifier. Performance was limited because a sentence contains much less information than a review. Their work does not mine product features from reviews on which the reviewers have expressed their opinions.
Feature-based opinion summarization proposed by Minqing Hu and Bing Liuis 2004, performed in two steps: Identify the features of the product that customers have expressed opinions on (called opinion features) and rank the features according to their frequencies that they appear in the reviews. The feature words were extracted based on word histograms and were ranked according to their frequency. This system was very promising, but required further improvement and refinement. In their work, they did not determine the strength of opinions, which is equally important, as some opinions are very strong than others. Highlighting such strong opinions can be very useful for the buyers.
Soo-Min Kim and Eduard Hovy 2006, have developed an approach for automatic identification of pros and cons of sentences in a review. Subjectivity detection is the task of identifying subjective words, expressions and sentences. Semantic orientation classification is a task of determining positive or negative sentiment of words. Reason identification and reason classification were the two key areas they have worked upon. They have achieved 71% Fscore in reason identification whereas 61% F-Score in reason classification, which leaves a scope of improvement in their work. Fuzzy Logic Based Method for Improving Text Summarization by Ladda Suanmali, Naomie Salim and Mohammed Salem Binwahlan 2009, proposed a system that consists of the following main steps: 1) read the source document into the system; 2) for pre-processing step, the system extracts the individual sentences of the original documents. Reviews from different online stores were collected and sentence tagging was done to get the tagged words for sentence analysis. Important features are selected and Scores are calculated for each sentence. The top leading scores sentences set are extracted as summary for the document. It uses a small dictionary and it was not suitable for categorization.
A Frequent Term and Semantic Similarity based Single Document Text Summarization Algorithm by Naresh Kumar Nagwani and Dr. Shrish Verma 2011, proposed a single document frequent terms based text summarization algorithm. Semantic similarity was also used in the algorithm. The proposed algorithm was implemented using open source technologies and was verified over the standard text mining corpus. The discovered results were interesting and meaning of the summarized document was also preserved. However, this study was limited to only single document.
Gather customer concerns from online product reviews -A text summarization approach by Jiaming Zhan a, Han Tong Loh a, Ying Liu 2016, proposed a technique of Summarization Based on Topical Structure. Pre-processing includes stop words removal and word stemming to reduce the noisy information. Main idea here is to determine the important topics in the review and develop subjective topical design. The topic identification is done by frequent words recurring and correlated classes. Previous approaches were based on ranking of sentences whereas here ranking of topics were done. The summary is created based on the ranked topics. This algorithm had the capability to explore the most important topic. Moreover, a case study to summarize multiple online customer reviews is introduced. However, the scope of this work was limited to EIM (Engineering Information Management) applications.
Akkamahadevi R Hanni, Mayur M Patil and Priyadarshini M Patil et. al. 2016, feature extraction and opinion extraction analysis was used in the formation of an efficient summary. The design of a unified opinion mining and sentiment analysis framework was presented with natural language processing approach. In this paper, the reviews were extracted from Amazon by using web crawling, which is not available on all online shopping sites. Further, the work finds disadvantage on the front of pronoun resolution which is the one important part of natural language and ignoring that crucial review could be missed out from the review.

Methodology:-
There are many different techniques used to conclude the overall reviews about a product given by different customers to decide about buying it. Different techniques include feature based summarization, summarization by fuzzy logic, summarization through lexical chains, frequent pattern mining algorithm and many more. By combining these several techniques, a new method can be generated to summarize the reviews in an efficient and effective way.

Data Collection:-
Data is gathered for the proposed work from the social sites (Amazon, Flipkart, Snapdeal etc.) containing product reviews and saved in text documents.
Preprocessing:-Pre-processing step like stop word removal and word stemming is performed. Stop words are the words which appear frequently in the reviews but are not of much importance. For example, articles like a, an, the; of, for, etc. Word stemming is a process which removes prefixes and suffixes of each word. After preprocessing, the data is passed to the summarization system which produced summary of the available product reviews.
234 Algorithm:-The algorithm for review extraction and sumarization is explained below: Pseudo Code:-Review sentences are collected from all review documents with respective review tags. Sentence tagging is done on these sentences using Stanford Tagger to tag each word in the sentence as noun, adjectives, verbs etc. Sentence tagging is required for getting the nouns as features and adjectives as ratings. A list of features for product domain under review is made manually and review feature words are tagged as noun in the sentence. Similarly, the adjectives words in sentence are tagged as ratings of the feature. Nouns and adjectives are extracted in individual sentence. These sentences are then mapped with features and rating word set and review sentence set are formed. The repeated sentences are filtered from review sentence set. In the end, final review summary is formed which exactly mapped to standard features and rating word set. Then, computation of sentiment rating percentage is done to get the positive or negative review.

Results and Discussion:-
For performance review of the algorithm, following methodology is adopted: Review documents with predefined positive and negative review inputs are generated for performance analysis. These review documents are tested under the developed algorithm and review ratings are generated based on computational analysis. Then, manual analysis is done on these review documents and a summary is generated. Then the results are compared for both computationally computed positive/negative ratings and manually computed ratings for performance evaluation. Then, real time review analysis is conducted based on the data gathered from different sources. The algorithm has been tested on electronic appliances like TV, Laptop, and Mobile etc. The results for the synthetically collected reviews for Mobile are given in TABLE-1. Modular flow of algorithm is represented in Figure-1

Consideration of Negative Rating Sentences:-
The sense of negative rating in review is taken care off at the end of the summarization. The negative rating is computed while giving the rating words the least weightage in computation of rating percentage. For example, below weightage is assigned to the rating words in this work.
Poor -1 Average -2 Good -3 Very Good -4 Excellent -5 Thereby, the least rating value itself leads to pull back the performance percentage of the product under review.

Future Work:-
The proposed algorithm is providing promising results for electronic products like Mobile, TV, LED, Laptops etc. Further the scope of the application can be extended to other products after adding related features to the feature word file as per the requirement. Also, we will be working towards making this algorithm more efficient by automating the process of fetching the reviews and feature list from the source directly.

Conclusion:-
The presented algorithm has been tested for real time text reviews as well as for synthetically generated reviews. The synthetically generated reviews results show large amount of accuracy in review summary generation. The realtime review summary also proves to be of faithful level as there may be different views depending upon the human reviewer. As compared with the synthetically generated reviews, the real-time reviews summary seems to be at fine accuracy level.