DEVANAGARI HANDWRITTEN WORD RECOGNITION USING EFFICIENT AND FAST FEED FORWARD NEURAL NETWORK CLASSIFIER

1. Research Scholar, Dept. of ECE, Karpagam Academy of Higher Education, Karpagam University, Coimbatore,Tamilnadu,India. 2. PROFESSOR & HoD, Dept of ECE, Karpagam Academy of Higher Education, Karpagam University, Coimbatore, Tamilnadu,India. 3. PROFESSOR & HoD Dept. of ETC, AISSMSCOE, Pune,Maharashtra, India. ...................................................................................................................... Manuscript Info Abstract ......................... ........................................................................ Manuscript History

Handwritten character recognition is attaining popularity due to its potential application areas which would reduce the task of data entry and save the time.Design ofDevnagari handwritten word recognition poses a challenge to the researchers due to the variable size of character, various writing styles & acquisition device used and many other factors. The large character set of 34 consonants and 18 vowels with attached modifiersmakes the Devnagari character recognition very challenging.This paper proposes an effective method for recognition of isolated Marathi handwritten word for Devnagari script. Handwrittenwordrecognition method is composed of three main phases such as Segmentation, Feature extraction and classification. In first phase, input image is preprocessed using Gaussian filter for smoothing and noise removal. Further using thresholding, preprocessed image is segmented with additional morphological operations such as dilation, filling, erosion in order to get finalized segmented image. In second phase, faster and optimized hybrid feature vector of length 91 is presented using combination of geometrical features, regional features, distance transform and gradient features. In third phase, efficient and accurate classifier called Feed Forward Neural Network [FFNN] is presented for online Devnagari handwritten word recognition.This classifier is trained with 91features of training samples. Here 200 commonly used handwritten words are collected from 50 users with different handwriting styles to create database of 10,000 words. For experimentation, 7500 word samples are used to create 15 dataset out of which 70% samples are used for training, 20% for testing & 10% for validation.Overall recognitionaccuracy obtained usingFFNN classifier is 94.57%.

ISSN: 2320-5407
Int. J. Adv. Res. 4 (10), 2034-2043 2035 Devnagari has the most accurate scientific origin and it is used by Sanskrit, Hindi, Marathi and Nepali languages. Marathi is the widely spoken language in Maharashtra and since its script is Devnagari, so it's the most popular script. Marathi is well known 23 official languages of India and used as co-official language in Maharashtra and Goa states of Western India respectively. Thus research on Devnagari script mainly Marathi language attracts a lot of attention and interest. Any Marathi word can be divided into three Zones i.e. Upper, Middle, Lower Zone. The shirorekha i.e. the header line can be used to separate the Upper Zone and middle Zone. The modifiers can be Upper modifiers or Lower modifiers depending on the position. It consists of basic set of symbols of 34 consonants and 18 vowels, and though Devnagari has a built-in set of symbols for numerals. It consists of basic set of symbols which includes 34 consonants or ('vyanjan') and 18 vowels ('svar'). A syllable ("akshar") is formed by a vowel alone or any grouping of consonants with a vowel. Some characters have upper and lower modifiers. Obviously, these modifiers make word Recognition with Devnagari script very challenging. [1] Nowadays it's easier to input the data by stylus than by keyboard and filling a form is easier with a stylus than with a keyboard since one could directly go to the appropriate field and make the entry. In desktop systems, the stylus could be a very important complement to the keyboard for editing, marking, drawing, etc. As the handwriting recognition technology becomes more established, applications such as longhand note taking in the classroom are going to be more of a reality. Various researches have suggested the different methods and algorithms to recognize characters and also developed related software's for optical Marathi character recognition. For character recognition various processes have to be performed to achieve good recognition accuracy. Due to increasing demand & use of hand held devices Marathi character and word recognition is becoming more and more important and interesting area.It has been observed that all kinds of structural, topological and statistical data about the characters does not provide a helping hand in the recognition process due to different writing styles and moods of persons at the time of writing.
This paper is particularly focused on domain of handwritten word recognition for Devanagari script. There are different techniques presented for offline handwritten recognition, but very few are presented over online handwritten recognition. Online handwritten character recognition is dynamic and needs immediate and accurate recognition. Therefore online handwritten character recognition becomes most dominating research domain for researchers in recent years. Under real time applications, such techniques required to work accurately, faster and efficiently to provide fruitful information to end users which heavily depend on such automated tools of handwritten recognition. Online handwritten word recognition is composed of three main phases such as 1) Segmentation 2) Feature Extraction and 3) Classification. For each step different methods and algorithms are used.First phase i.e. segmentation consist of three stages such as image acquisition, image processing and image segmentation. This paper shows the use of efficient Feed forward Neural Networks classifier for recognition of isolated Marathi handwritten words. The performance metrics used for comparative analysis are false positive rate, false negative rate, True positive rate, True negative rate, Recall, Precision, and accuracy. The rest of the paper is organized as follows. Section 2 describes survey of related Devnagari handwritten character recognition work. Section 3 describes experimental setup in terms of Data collection and data set creation, preprocessing, Hybrid feature set generation and use of FFNN classifier for Devnagari word recognition and experimental results are discussed in Section 4and lastly, conclusion is presented in section 5.

Related work:-
In this section, different methods reported previously in literature by different authors are summarized based on main categories such as segmentation, feature extraction and classification method used.
AnoopNamboodiri presented a method to classify words and a line in an online handwritten document into six major scripts like Arabic, Cyrillic, Devanagari, Han, Hebrew, or Roman. The spatial and temporal features are extracted from the strokes of the words and the proposed system achieved an overall classification accuracy of 87.1% for a data set of 13,379 words [2]. M. Hanmandlu et al., presented Fuzzy logic based method for recognition of handwritten Hindi numerals and characters with 92.67% and 90.65% over all accuracy for Handwritten Devanagari numerals and characters respectively [3]. Satish Kumar et.al. author presented Zenrike moment feature based method for recognition of handwritten Devnagari character using artificial neural network for classification [4]N.Sharma et.al, authors presented handwritten Devanagari characters recognition using five preprocessing stages i.e. size normalization and centering, interpolating missing points, smoothing, slant correction and resampling of points. The directional chain code features extraction and quadratic classifier which yields 80.36 % of overall recognition accuracy. [5]. PrachiMukherji et.al proposed method using basic structural features like endpoint, cross point, 2036 junction points. The segments of characters are coded using Average Compressed Direction Code algorithm. With top modifier 71.68 % and without top modifier 88.33% accuracy is achieved [6]. SandhyaArora et.al presented a scheme for online Handwritten Devnagari Character Recognition, which uses different feature extraction and recognition algorithms. After preprocessing, Chain code histogram, four side views, shadow based features are extracted and fed to MLP. The proposed system was tested on 1500 samples yielding 98.16% and 89.58% recognition rates for top 5 and top 1 result respectively [7]- [8]. Sushama Shelke et.alelaborated a novel approach for recognition of handwritten Marathi compound characters using a multi-stage multi-feature classifier. The various features like pixel density features, Euclidean distance features and modified wavelet approximation features are extracted and then applied to three different neural networks which yielded recognition accuracy of 97.95% [9]. Mitrakshi B. Patil et.aldeveloped method for recognition of offline handwritten Devanagari characters using artificial neural networks. The input characters are represented as N-vector and given as input to neural network. The size of the hidden layer selected for first and second layer is 8 and 16. The output values are represented using Binary vectors of size 4 with the learning rates ranging from 0.1 to 0.4 [10].
VedAgnihotri et.alproposed a new method of classification by extracting diagonal features from zones of an image using neural network for Handwritten Devanagari script recognition system. Here the features of each character image are converted into chromosome bit string which is having length 378. Genetic Algorithm is used for the recognition and classification, which yields precision as 85.78% match and 13.35% mismatch [11]. Anilkumar N Holambe, presented combining statistical, structural Global transformation and moments features to form hybrid feature vector. The combination of SVM & KNN algorithm gives highest accuracy of 96% [12]. VijayaPawar proposed an artificial neural network based classifier and statistical and structural method based feature extraction. Features are extracted in terms of various structural and statistical features like End points, middle bar, loop, end bar, aspect ratio etc. Feature vector is applied to Self organizing map (SOM) which attains 95% accuracy [13].

Experimental setup:-Data Collection &Dataset Formation:-
Data collection is the important phase of online Devnagari handwritten word recognition. Due to unavailability of standard database, the database for handwritten Marathi word is created with respect to various handwritingstyles using an android based i-ball Digital Tablet-4030. Gesture class is used to capture gesture which user will draw on screen and then the gestures are transferred to Matlab tool for further processing. Here 200 commonly used handwritten words are collected from 50users with different handwriting styles to create database of 10,000 words. Out of 10,000 handwritten word samples, 7500 samples are used for creation of 15 dataset.The training of15dataset is done using neural network in which 91 input are used with one hidden layer & 10 neurons in hidden layers. Each Dataset consists of 10 words of 50 samples each. For experimentation, 7500 input samples are used out of which 70% samples are used for training,20% for testing & 10% for validation.

Preprocessing & Segmentation:-
Pre-processing is first vital step of any image processing. Use of effective methods in pre-processing and segmentation defines the efficiency and accuracy of handwritten character recognition.During capturing input data by using Digital Tablet, there may be possibility of presence of certain noise and distortions in the input text due to some limitations. The noise or distortions may be irregular size, missing points. To remove these noise and distortions present in the input text pre-processing is used. During pre-processing stage the input image goes through various stages like Gray scale conversion, Image Resizing, Smoothing, Binarization, edge detection Image Denoising and Smoothing. RGB image is required to be converted into gray scale image which is done by using rgb2gray function. The imresize function is used to resize input image to 512 * 512 sizes. Further the Gaussian filter is used to enhance its contrast with sigma value equal to 1 with filter size [2 2] andthe thresholding method is applied for binarization. These both approaches gives better outputs with less processing time. The performance results such as PSNR, MSE and Mutual information showing very improved quality of preprocessed image as compared to existing solutions.
Segmentation of handwritten word is difficult because of various modifiers may be lower or upper attached with the characters. The image is segmented using dilation, erosion and perimeter detection morphological operations. Characters from segmented image i.e. word are separated using vertical segmentation method and then horizontal segmentation method. For Vertical Character Segmentation white pixel from segmented image are foundcolumn wise. Ifthe column having number of white pixels less than or equal to 10 then 0 value is assigned to entire column 2037 to make sure that it is represented as black, else keep as it is in output image. For Horizontal Character Segmentation white pixel from segmented image row wise are computed. If row contains white pixels then start counting number of rows those having white pixel using count variable. For upper body segmentation, if counts are equal to10, two black rows after 10th row are inserted to represent upper part of Devnagari word. The lower body segmentation can be done by just applying reverse process on character segmented image.
Feature Extraction:-Feature extraction is a very important step as the success of a recognition system is always based on feature extraction method. The feature extractor determines which properties of the preprocessed data are most significant and useful in further phases. The accuracy of recognition system is majorly depending on feature extraction phase, types of features and size of features. In proposed research work hybrid efficient, faster and optimized feature vector is used which is combination of geometrical features, regional features, distance transform and gradient features. Total 91 features are extracted which is highest ever as compared to all existing methods for handwritten character recognition. The feature extraction process includes Statistical /Geometric Features, Regional/Structural Features, Gradient Features Extraction and Distance Transform.In addition to this, in existing cases, the time required for extracting the geometrical features is very high; however Universe of discourse is used to speed up the retrieval. Feed Forward Neural NetworkClassifier:-An artificial neural network is an information processing modelwhich is inspired by the way biological nervous systems, such as the brain which process information. Neural networks are composed of simple elements called as Neurons which operate in parallel. Neurons are Similar to the human brain and transport the incoming information on their outgoing connections to the other neurons. A neural network can be trained to perform a particular function by adjusting the values of the connections or weights between elements. Commonly neural networks are adjusted, or trained, so that a particular input leads to a specific target output. In brief, there are a variety of kinds of design and learning techniques that enrich the choices that a user can make.
The FFNN is nothing but biologically motivated approach of classification which is composed of large number of simple processing units organized in layers. Each unit in current layer is connected with all other previous layer units. Every connection may have varying weight or strength; hence there is no possibility of similarity between all connections. Network knowledge is encoded into the weights on such connections. Commonly neural network units are known as nodes. In FFNN, data feeding is done at inputs and then passing through network. This data passing is done layer by layer, until data received at outputs. When FFNN acts as classifier, there is no feedback mechanism among layers. Therefore such classifier is knows as feed forward neural network classifier.
An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves variations to the synaptic connections that exist between the neurons.There are two modes of learning i.e. Supervised and Unsupervised.Supervised learningis the system trying to predict results for known examples and is a commonly used training method.It compares its predictions with target answer and "learns" from its mistakes. The data is given to input layer neurons which passes it to next node where the weight or connection, is applied.When the inputs reach the next node, the weights are summed and either intensified or weakened. This continues the predicted output is higher or lower than the actual result in the data, the error is propagated back through the system and the weights are adjusted accordingly. Unsupervised Learning is most effective for describing data rather than predicting it. Unsupervised networks can be used, to identify groups of data and it doesn'trequire initial assumptions about what constitutes a group or how many groups there are. The system learns new knowledge by adjusting these connection weights.
The neural network can be assumed to work in two phases, i.e.learning phase and classification phase. The FFNN 2038 uses a supervised learning algorithm. In learningprocess a pattern is presented at the inputs and then it will be changed in its passage through the layers of the network until it reaches the output layer. For learningthree different objects together are selected as: a FFNN which is the classifier, a Pattern i.e. the inputs and Categoriesnothing but the correct outputs. During the learning phase the weights in the FFNN will be modified in such a way the output unit with the correct category, will have the largest output value.In the classification phase, the weights of the network are fixed. A pattern, presented at the inputs, will be changed from layer to layer until it reaches the output layer by selecting the correct category related with the output unit with largest output value. After training is complete, a test pattern is given to the neural network and the results are compared with the desired result. The table1 above shows the neural network parameters settings used for the proposed method. These parameters can be changed and the optimum combination is obtained which gives the best results. Results showed that the network with single hidden layer, with 10 neurons, hyperbolic tangent sigmoid transfer function and Levenberg-Marquardt training algorithm proves to be the best. The Levenberg-Marquardt training algorithm requires the least number of epochs for training the network. The Figure 1 below shows the architecture of Feed forward neural networks used in proposed approach.

Results & Discussion:-
For experimentation,Databaseis formed due to lack of availability of standard database in Devanagari script. From the collected database,20 dataset are formed to extract the feature set. CHF is optimized hybrid feature set. In order to investigate the effectiveness of the proposed method, experiments were carried out on the Marathi handwritten data set obtained as described in section 2. The results are found to be best for 91 feature vector compared to other 2039 sizes of the dimension under study, namely, 16,32, 75 and 84. With 91 hybrid feature vector overall recognition rate of 94.57% is achieved using FFNN. In result computation,out of 7500 word images, 20% are used for testing and rest 10% imagesare used for validation purpose.The results are encouraging and average recognition accuracy of 94.57 % is obtained.Percentage accuracy is found as follows: Precision = (TP/(TP + FP)) * 100; Recall = (TP/(TP+FN))*100; Accuracy = ((TP+TN)/(TP+FN+FP+TN))*100; Where, True positive (TP) is the measure of correctly identified words, False positive (FP)is incorrectly identified words, True negative (TN) is correctly rejected words and False negative (FN) is incorrectly rejected words.Any word sample that is to be recognized is preprocessed and extracted Features from this word sample are sent to the Classifier. Three classifiers namely, FFNN, k-nearest neighbor (KNN) and SVM classifier are used to study the recognition accuracy. Out of this FFNN results are summarized in next section. The proposed method performs well and appears promising compared to other methods in the literature. Table 2 below shows the output results & recognition accuracy achieved using FFNN classifier, Below Table 3 shows Summary of No. of epoch's required and classification details and Figure 2 shows plot of correctly classified & incorrectly classified dataset using FFNN.      Ratio i.e. the number of outputs less than the threshold, divided by the number of zero targets. Here ROC plots the receiver operating characteristic for each output class. The more each curve hugs the left and top edges of the plot, the better the classification.

Conclusion:-
Although fully automated on-line handwritten Devnagari word recognition is difficult task to be achieved in the near future. This research, as well as other work in the field of on-line handwritten word recognition, is significant steps towards a completely automatic on-line handwritten Devnagari recognition system. In this paper a method for recognition ofisolated Marathi handwritten wordsis presented. Gradient,distance transform, regional & geometric features were computed andused as features of the images representing handwrittenwords. Classification was done using FFNN classifier.For computation of recognition accuracy the FN,FP,TN,TP, Precision, Recall are used as important performance parameters. Overall recognition rate of94.57 %, was achieved for FFNN classifiers respectively. The main recognition errors were observed due toabnormal writing and ambiguity among similar shaped words.Future work can include improving the recognition accuracy ofthe individual words by combining the multiple classifiers.It can be extended for the recognition of words, sentence and documents. This approach can be used in multilingual character recognition as well. Writer adaptation can be incorporated.An extraction time required can be reduced so that overall process will become faster with maximum accuracy.