DETECTION AND RECOGNITION OF BANGLADESHI FISHES USING SURF AND CONVOLUTIONAL NEURAL NETWORK

M. I. Pavel, A. Akther, I. Chowdhury, S. A. Shuhin and J. Tajrin. Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh. ...................................................................................................................... Manuscript Info Abstract ......................... ........................................................................ Manuscript History Received: 15 April 2019 Final Accepted: 17 May 2019 Published: June 2019

This paper represents a model to detection and recognize local fishes of Bangladesh implementing image processing and neural networking approaches. The aim of the research work is to apply computer vision and AI techniques so that people of next generation can recognize Bangladeshi fishes as most of the young people in city, have less idea to classify traditional and deshi fishes. We implemented our custom Dataset consisting of 400 sample images for the experiment method to measure out its credibility. In the proposed, model sequential grassfire algorithm is used along with pre-processing techniques like noise cancelation, gray scaling, flood-fill method, binarization to detect and analysis shape of fish. Further, to do classification and recognition of the detected fishes, convolutional neural network (CNN) and method of Speeded up robust feature (SURF) had been applied to visualize difference between to techniques. CNN architecture got better accuracy with 90.9% score for recognition and classify fishes where SURF algorithm visualize better recognition matching putative points after extracting features.

…………………………………………………………………………………………………….... Introduction:-
It is surprising to learn how many types of fishes are available in this world and how very different they are from each other in aspect to their lifestyle, food value and appearance. In order to have a vague idea about known fish species first of all what is needed to know that more than half of vertebrate species are various types of fishes. The known extant species are around almost 28,000 and 27,000 of them are bony fishes, in the remaining 1,000 about 970 of them are different types of sharks, rays and chimeras and around 108 types of Hagfish and lampreys. One third of these species fall within nine largest families these families are Cyprinidae, Gobiidae, Cichlidae, Loricariidae, Balitoridae, Serranidae, Labridae, and Scorpaenidae. Around 64 of these families are monotopic meaning they include only one species. The number of known extant species is still growing and may grow to exceed 32,500 [1]. Fish is a very important part of our eco-system as our planet earth consists of 70 percent water. Fish detection and recognition is very useful in the application of fish farming, meteorological monitoring and fish quota. It can be really helpful for understanding marine ecosystem. Understanding the ecosystem leads us to calculate the change of climate, environment pollution and marine environment. Marine biologists will greatly benefit by an automatic fish detection and recognition system so that they don't have to use manual annotation which tends to be very expensive and time consuming. So marine observers can be benefited from us without knowing significant programming skill set and ISSN: 2320-5407 Int. J. Adv. Res. 7(6), 888-899 889 that will help a lot for their investigation, research and census. At the same time, fishes are an incomprehensible part of human resource especially as food. A proper nutritious diet must have fish in menu as its nutritious value is unbeatable but the taste and nutrition type differs from species to species and some species are not edible at all they are downright poisonous. Aside from food value fishes can be great pets and their quality, price and the way of taking care of them also differs from species to species. So, we must know the species of fish but it is hard for many to differentiate between them so in this paper we are developing an algorithm through which we can detect and recognize fish faster and more accurately. In our thesis we focused on finding the species by using still images with the possibility of recognizing fish species in real time more specifically in underwater videos in future work. The rest of the paper is categorized as follows: Section II represents the related works done in the field of Fish Recognition using Image processing and deep learning, Section III describes the key features and the detailed proposed mode with implementation, Section IV presents the experimental results and analysis and finally in Section V and VI the conclusion and future works have been drawn.
Literature Review:-Not much work has been done on this area still Some of the ways that fish were detected are by shape, color, texture and by extracting feature also another way that has been explored is recognizing the species of the fish by Using CNN. Al-Omari et al used CNN (a deep neural network model which contains convolutional layers, subsampling layers and fully-connected layers) to recognize the features of fish. Its weight-sharing network structure for convolutional layers makes it more similar to a biosphere network and can reduce the number of weights and the complexity of the network, and improve the training efficiency. [2] Bai et al. has used image classification based on shapes and texture. It has been done by the feature extraction process, the purpose of feature extraction is to determine the most relevant and the least amount of data representation of the image characteristics in order to minimize the within-class pattern variability, whilst, enhancing the between-class pattern variability [3].Alsmandi et al has done Fish recognition based on the combination between robust features selection from the size and shape measurement using neural network [4]. F. Storbecka et al used a fish image which was easy to extract a fish region with a white background or uniform background for automatic processing. This research adapted an approach to give several feature points by manual operation by the user. The proposed approach is able to accept the fish image in the complicated background taken on the rocky place to recognize fish species it has used computer vision and neural network [5]. Spampinato et al. proposed a vision system for detecting, tracking and counting fish from realtime videos, which consists of a series of video texture analysis, object detection and tracking procedures [5]. Hwang et al. reported an automatic segmentation algorithm for fish acquired by a trawl-based underwater camera system [6] To overcome the limitations of existing models, this paper presents efficient features for fish recognition, we defined various features, such as, image pre-processing, Noise Filtering, Gaussian blurring method and Sequential Grass-fire algorithm to detect the fish and Speeded-up Robust Features (SURF) algorithm to finally recognize the fish. Furthermore, we used CNN classifier to recognize the fish to check whether the accuracy rate is better than the previous one. This is to be mentioned that we have done the whole research on various species of Fishes available on Bangladesh which has not been done before.

Research Methodology:-
The proposed research methodology is divided into three major parts. First of all, fish detection using image processing techniques like binary masking, outer contour marking using Sequential Grass-Fire algorithm. Secondly, a feature based fish recognition system using SURF algorithm and thirdly, CNN architecture is used to recognize and compare accuracy with other mentioned processes.

Image Acquisition and Pre-Processing:
At first, we acquire some of our selected data to begin the fish detection process. In this stage, we take around 500 of our sample data. This sample data will be used for the fish detection in the later stages. The source image is converted into gray scaling to reduce the complexity of the color. It reduces the build-up noise and signal distortion during processing. The rules for gray scale formula R×0.118 + B×0.385+ G ×0.51 is being used to reduce the complexity of color images [7][8][9]. In Figure 1 shows the source and gray scale image. Since the image is going back to preprocessing, it should ensure that the images are without noise. Specially salt and pepper noises as it will hamper in segmentation and feature extraction part and also the quality of the image processing or analysis result as well [10]. The whole process of reducing noise is done using mean filtering.
Gaussian blur method is an effective approach to blur an image. It is widely used to reduce to image noise and details. In the source image we have to perform Gaussian blur method to reduce noises of the image. It is also known as Gaussian smoothing. Gaussian elimination depends on standard deviation. Our preferred standard deviation (σ) for this project is σ = 0.5 [11]. In Fig.3.shows RGB to gray scale image.

Binary Masking for Segmentation:
The binary masking of an image is the process of converting an image to a black and white image which can be misleading sometimes [17]. Grayscale and binary image, both, can be black and white image. Grey scale image is an image which consists color range of [0,255] and in binary image the color range is [0, 1]. Thus, the image of binary and greyscale both can be black and white but the binary image contains only two color which is 0 and 255(0 and 1) [17,18]. So, we have to be careful not to convert our image to grayscale and convert our image to binary for the next approach. In Figure 4 it shows Images after Binary masking.

Flood-Fill method for Image Reconstruction:
Flood fill is a process to fill the image of connected area with similar color. It is also known as a seed fill. In flood filling algorithm we start with some seed color and examine the neighbor colors, but in this case the pixels will check for a specific color bit and that specific color will be replaced by boundary color with this algorithm [19]. In 891 this process, we will fill the holes with similar color for detection purpose. So, in our approach we are changing the background pixels (0s) to foreground pixels (1s) stopping when it reaches the boundary for our binary image. We will approach to fill the boundary as hole. By this approach we can have a clearer vision of the object that being represented in the array as shape.

Boundary Extraction Using Sequential Grass-Fire Algorithm:
The grass-fire algorithm can be implemented in many ways. But among them sequential grassfire algorithm has less shortcomings and that is the reason why we are using sequential grass-fire algorithm for our purpose. The sequential grass-fire algorithm starts from the top left and ends with scanning all the way to bottom right. After reaching its object pixel it will do two tasks. Firstly, it will create a label for the object pixel in the output image and secondly it will remove its object pixel from the input image. Then it will check for its neighbor object pixels. The neighbor object pixels are also identified as label and then deleted from the input image and also, they are put down in a list. The next stage, we take the first pixel from the list and neighbor and if we find any object pixel are in output we set to zero in input and set it to list. The procedure is then repeated until the all object pixels are investigated. Then it continues the scanning until it finds the next object pixel and then start the next sequential grass-fire algorithm [21]. In our objective, the scanning of the binary image after filling starts from top left and ends in bottom right. And after running the algorithm we will get black and white image as well. In the white part it is the fish that we are detecting and in the black side it is the background. In our output image we count the white pixels. The white pixel is efficient to measure the size of the fish. Thus, we will be able to know the if the fish is small, medium or large. And from our dataset we can approximately measure its weight from the size of the fish by learning the current dataset we will shortlist our fish from 400 fish image datasets.

Feature Based Fish Recognition Implementing SURF Algorithm:
After successfully detecting fish in order to fulfill the main purpose of these paper that is recognizing fish the Object Recognition method Speeded-Up Robust Features have been used. Speeded-Up Robust Features (SURF) algorithm is a scale and rotation invariant robust features detector and descriptor, it was firstly presented by Herbert Bay, et al., at the 2006 European Conference on Computer Vision and published in 2008 [28]. IT is widely used for object recognition, image registration, traffic monitoring in image sequences, 3D reconstruction or augmented reality applications etc.

Feature Extraction:
Feature extraction is a must step in almost all object detection algorithm. Now the question is what exactly is feature extraction, well feature extraction is the process of extraction useful information, identifiable attributes referred as features from an input image.
Step i: When an input image is given the output returns with an array of extracted feature points.
Step ii: Convert the image in an 8 bit gray scale image. When doing the detection part input image was already converted so no need for this step in this case.
Step iii: Calculating the integral image in order to find the sum of pixels enclosed within any rectangular region of the image the function MyIntegralImage.m is used to do it.
Step iv: Hessian based blob detector is used in Speeded-Up Robust Features to find interest points the function is called FastHessian.m. The determinant of a hessian matrix expresses the extent of the response and is an expression of the local change around the area [29].
(x, σ) in equations are the convolution of the image with the second derivative of the Gaussian. The heart of the Speeded-Up Robust Features detection is non-maximal-suppression of the determinants of the hessian matrices.
Step v: Filtering out the only useful interest points from the output gotten from FastHessian.m function based on some factors. It can be done by NonMaxSuppression_gpu.m function. In order to find the exact location of interest points are interpolated by fitting a 3D quadratic in scale space. An interest point is located in scale space by ( , , ) where , are relative coordinates ( , , ∈ [0; 1]) and is a scale.
Step vi: Calculating and assigning orientation to the interest points located in the Step v by OrientationCalc.m Function. The final Result that is gotten from feature extraction is an array of interest points and is a structure with following fields.

Feature Description:-
The interest points founded each should have a have a unique description that does not depend on the features scale and rotation. The Speeded-Up Robust Features descriptor is based on Haar wavelet responses and can be calculated efficiently with integral images. The descriptors are robust to rotations and an upright version. The interest area is weighted with a Gaussian centered at the interest point to give some robustness for deformations and translations [24]. The Calculation For each subarea a vector v is calculated, based on 5×5 samples. The descriptor for an interest point is the 16 vectors for the subareas concatenated.

Feature Matching:
Recognizing the object and a possibly transforming it from the given input image based on based on predetermined interest points is what done on matching. After calculating the descriptors, we can find a match between them by testing given the distance between the two vectors is sufficiently small. This is called putative point matching/ The SURF algorithm does add another detail to speed up matching that is the sign of Laplacian. ∇ 2 = ( ) = ( , ) + ( , ) (4) The Laplacian is the trace of the hessian matrix and when calculating the determinant of the hessian matrix these values are available, it is a matter of storing the sign. The sign of Laplacian should be stored because it distinguishes 893 between bright blobs on dark backgrounds and vice versa. Only on the cases with same sign the full descriptors vectors are needed to be compared this significantly lowers the cost of matching. It is only necessary to compare the full descriptor vectors if they have the same sign, which can lower the computational cost of matching.
In order to understand how it works with images let's see the following flowchart. In Figure 8. It can be seen how a sample image is inputted and then processed after than Feature is extracted the information is sent to database and by taking help from the trained pattern it predicts the result based on pattern and give a result.

CNN Architecture for fish recognition:
Convolutional Neural Network (CNN) has developed rapidly in recent years for its high accuracy in image classification and recognition. It is highly used for its weight sharing network structure and the ability to reduce the number of weights and the complexity of the network. CNN is a multi-layer, fully trainable model which can capture highly nonlinear mappings between inputs and outputs [25]. Convolutional networks were inspired by biological processes in that the connectivity pattern between neurons resembles the organization of the animal visual cortex. A CNN consists of an input and an output layer and multiple hidden layers. The hidden layers are consisting of convolutional layers, pooling layers, fully connected layers [26].

Input layer:
The input layer works on the original Input data and input image; the input data is the pixel value of the image.

Convolutional layer:
Convolutional layer is the core layer of CNN. It is also called the feature extraction layer. The aim of this layer is to extract features form input data. The features of the input images extracted by each convolutional kernel are different. The more convolution kernels of the convolutional layer, the more features of the input data can be extracted.

Pooling layer:
Pooling layer is another important concept of CNN. It is also known as the subsampling layer; the main purpose of this layer is to reduce the number of data while retaining useful information and to speed up the training speed. The more layers of convolution and pooling, the more abstract features can be extracted. In general, the convolutional layer and the pooling layer appear alternately.
Fully-connected: layer Finally, several convolutional and max pooling layers, the high-level reasoning in the neural network is done via fully connected layers. In fact, it is the hidden layer of multi-layer Perception machine.
894 It is designed to enable fast experimentation with deep neural networks; it focuses on being user-friendly, modular, and extensible. It supports convolutional networks and recurrent networks, as well as combinations of the two. To add, it also runs seamlessly on CPU and GPU [27].

Result and Analysis:-
The proposed system is designed to detect and recognize regional fishes available in Bangladesh. It contains both image processing and convolutional neural network. Image processing algorithm like grass-fire algorithm is used for outlining contour of fish's body where the inputted is preprocessed, then has been through binary masking. The accuracy of contour marking depends on the threshold values of binary masking. Recognizing is at first done by speeded up robust feature (SURF) algorithm, where a set of fishes are made putting image in row by row based on species. SURF algorithm creates keynote points and connecting the featured keynote point of testing image to dataset image it denotes the recognized species of fish. The accuracy is gained by trial and error method that how many featured points are points wrong row of fishes' species. In the convolutional neural network part, the accuracy of recognition is gained from the cross validation of training and testing dataset.
Source image are converted into gray scaling to reduce the complexity of the color. It reduces the build-up noise and signal distortion during processing. Following that, binary making is applied so that only white and black pixels can exit which is needed for binary tracing. Now before going to be boundary detecting part, the images need to be reconstructed by morphological reconstruction flood-fill algorithm. As a result, if black pixels exist in a white pixel bounded region, then it is converted into white pixels and finally median filter is used for reducing the rest noises. Grass-fire algorithm which has less shortcomings and that is the reason why we are using sequential grass-fire algorithm for tracing the boundary of the binary image. Finally, the contour array is stored and the value is converted into (255, 0,) color space. Table 1 shows each step of fish detection and accuracy rate-based threshold value during binarization.
895 Table 1:-Fish detection accuracy scoring SURF algorithm creates keynote points and connecting the featured keynote point of testing image to dataset image it denotes the recognized species of fish. Here 1000 featured points are selected for testing and dataset image. The accuracy is gained by trial and error method that how many featured points are points wrong row of fishes' species. In the convolutional neural network part, the accuracy of recognition is gained from the cross validation of training and testing dataset. When it is processed, the error connected parts are gained when testing image's one of 1000 featured points is connected to the wrong specifies. In Figure 10. Below we can see result of feature point marking.
896 Figure 10:-Putative matching points The threshold value is very important for image binarization for fish detection part. It works like edge detecting process. The whole figure will be detected if it the threshold value is .814 to 0.9 which is gained by trial and error method, manually testing again and again. When the whole fish is detected, then area is gained and compared with other threshold values' areas as it can't fully get the whole fish, the area will less than the actual one, and finally, deference is presented in parentage. while the threshold is 0.5 then the accuracy of tracing the fish region decreases less than 60%, when threshold value is 0.6, accuracy is between 60 to 63 percent. When the threshold value is 0.7, accuracy becomes 80%. But when threshold value is something between 0.8 and 0.9 and have Gaussian blurring applied with it, increases the accuracy of trace region and contour of fish to 90 to 98.55% accuracy. In convolution neural network different dataset need to be split into Train, Test and Validation dataset. During each epoch data is trained over and over to learn the feature of data. Later it can accurately predict the data which is not seen before. It takes the decision based on what it learns from Training data. Validation data is used to validate our data during training the dataset. Validation data is separate from Train data. Validation data is needed to ensure that model is not over feeding which means model become really good to classify the data from train set. Now the test set is the set which predict the fish after train and validation. Difference between test set and other two set is it must be unlabeled. Keras is expecting training dataset in NumPy or list of NumPy array. To create validation set need to pass the parameter to split and give a fraction that instruct keras to use as validation set from training set.
Here in Figure 12. shows feature of image during training epoch.

Conclusion:-
We have given our best in this thesis to detect and recognize native Bangladeshi fishes using SURF and CNN (Convolutional Neural Network) to do so we had classified our custom dataset of 400 images. Methods like Gaussian Elimination, noise margin were used for image preprocessing and then binary masking, flood-filling of binary image and grass-fire algorithm was used for the detection of fishes. After that for recognition we used SURF algorithm at first and observed the accuracy later on we used CNN and noted its accuracy too. The accuracy of SURF is from 87.08 to 98.67% where grass crap fish species has the highest and shrimp has the lowest accuracy. At the period of CNN image classifier, it gains 90.9% training accuracy and 23.7% loss. When a fish is known or gained from dataset SURF algorithm will work better, but when a testing image acquisition is inserted from an unknown source CNN image classifier will work better as it compares layer by layer.
Future Works:-In our current system, we are detecting the object from a still image. We want to customize it a more and making it workable for real time image. Moreover, in real-time video analysis, we can detect and recognize the fishes when it is moving and making it a more viable option for marine scientist and researchers. It will be able to solve a lot of frustrating issue regarding this matter. There are a lot of fishes around the world. They have lots of genus and species. The family of fishes is called shoal or school of fishes and it has so many species. In our current project we have detected 7 of our native available fishes in our country. It can be done by firstly including the all native fishes and 898 then we will move on to including fishes around the world. We are using around 500 sample data to detect fish. If we increase the sample data more, we may able to detect the fishes more accurately and moreover if we change the algorithm of our current process to faster one, we may get better outcome from it. The current detection process is fast enough considering our data set of 500 images but when we will get more dataset and the data table will have thousands of value than there is a chance of being the process slower and that is the reason why there is a lot of work to be done to make the detection process faster. Current approach is a fundamental approach to establish our target but there is a lot of chances to present this solution to the people in need by making a mobile application or computer software.