FLOOD FORECASTING USING TRANSBOUNDARY DATA WITH THE FUZZY INFERENCE

In the present study, in order to predict the current flow of the Kirişhane station (Turkey) from the transboundary data of Plovdiv and Svilengrad stations (Bulgaria), four different models (M 1 ‒M 4 ) were developed by using the fuzzy inference system (FIS) for different number of membership functions (MFs) (i.e. 13, 25, and 49 MFs). In addition, multiple linear regression (MLR) was selected as simpler data driven forecasting method to show how FIS improves the other simpler forecasting models. Flow data from the Plovdiv, Svilengrad and Kirişhane stations were gauged at two hour-intervals covering the period from 9 February 2010 00:00:00 to 21 February 2010 22:00:00. In addition, flow data at two hour-intervals covering the flood period from 6 February 2012 14:00:00

In the present study, in order to predict the current flow of the Kirişhane station (Turkey) from the transboundary data of Plovdiv and Svilengrad stations (Bulgaria), four different models (M 1 -M 4 ) were developed by using the fuzzy inference system (FIS) for different number of membership functions (MFs) (i.e. 13, 25, and 49 MFs). In addition, multiple linear regression (MLR) was selected as simpler data driven forecasting method to show how FIS improves the other simpler forecasting models. Flow data from the Plovdiv, Svilengrad and Kirişhane stations were gauged at two hour-intervals covering the period from 9 February 2010 00:00:00 to 21 February 2010 22:00:00. In addition, flow data at two hour-intervals covering the flood period from 6 February 2012 14:00:00 to 13 February 2012 10:00:00 were obtained to test developed FIS and MLR models. In the first model, estimation was made using the current flows of the Plovdiv and Svilengrad stations. In the second model, estimation was made based on a two hour ahead prediction of the Svilengrad station and a four hour ahead prediction of the Plovdiv station. In the third model, calculations were based on predictions of four hours ahead of the Svilengrad station and eight hours ahead of the Plovdiv station. In the last model, estimation was based on predictions of six hours ahead of the Svilengrad station and twelve hours ahead of the Plovdiv station.The performance of the developed FIS models and MLR was evaluated by using the mean absolute error (MAE), the Nach-Sutcliffe model efficiency coefficient (NSMEC), and the normalized root mean square error (NRMSE). According to the performance criteria of the models, FIS model with 49 number of MFs provided highest accuracy. When FIS models with 25

Introduction:
In recent years, soft computing techniques have been increasingly used for forecasting studies in hydrology (Mukerji et al., 2009). Within flow regulation and water resources management studies, streamflow forecasting is a very demanding task which seeks to mitigate the effects of floods on human and dam safety as well as on ecosystem sustainability (Campolo et al., 1999;Lekkas et al., 2005). However, the process of forecasting streamflow is a very complex hydrological process owing to the tremendous spatial and temporal variability in the characteristics of terrain and rainfall patterns in conjunction with other variables associated with modeling (Tokar and Markus, 2000;Nayak et al., 2005). Reliable water level forecasts enable the use of early warning systems to alert the population as well as real time control of hydraulic structures in order to mitigate the adverse effects when floods occur (Alvisi et al., 2006). Recording and analyzing the streamflow are indispensable procedures because they can generate significant indications of both past and future flow characteristics (Küçük and Ağıralioğlu, 2006). Furthermore, flood management studies require knowledge of the magnitude and frequency of high flows (Amisigo et al., 2008). Accurate and timely prediction of high and low flow events can provide the information required to make strategic decisions at any watershed location (Besaw et al., 2010). Hence, the forecasting of stream flows in real time has received noticeable attention from hydrologists and resource engineers for many decades (Chang and Chen, 2001).
To date, a wide variety of models for streamflow forecasting have been developed and applied, ranging from completely black box models to very detailedconceptual models (Nayak et al., 2005). The methods used for forecasting gauged and/or ungauged streamflow are categorized as conceptual, metric, physics-based and datadriven (Besaw et al., 2010). Data-driven methods have been extensively adopted for forecasting streamflow. That's why, the main aim of the present study was to develop a fuzzy inference system from flow data available only for flood periods (which covers very short period several days) originating from three stations (Plovdiv and Svilengrad in Bulgaria and Kirişhane in Turkey) located on the Maritza River, and to show the capability of the developed models to use the transboundary data in the prediction of flood hydrographs.

Material and method:
Study area and dataset: The Maritza River, which is 490 km long, originates at 2400 m a.s.l in the Rila Mountains (Bulgaria) and flows southeast for 320 km (Fig. 1). The catchment of the Maritza River has an area of 52600 km 2 , 28% of which is located in Turkey. After flowing along a short portion of the Greek-Bulgarian border, the river arrives in Turkey, where it flows for 13 km, after which it forms the border between Turkey and Greece until it disembogues into the Aegean Sea (Yıldız et al., 2014). There are two main tributaries (Arda and Tundja rivers) can be effect the flow located before Kirişhane gauging station. The length of Tundja is approximately 390 km and the length of Arda is around 240 km. While Tundja has around 50 tributaries, Arda has around 25 tributaries (Tuncok et al., 2014).
Floodwaters rising from the Maritza River, which is the biggest river on the Balkan Peninsula, affect Turkey, Greece, and Bulgaria. In particular, the lower regions of the Maritza River are vulnerable to floods because the physical and hydrological characteristics of the river create a high flooding potential. In Turkey, the city of Edirne and the surrounding region are prone to flooding. The Maritza basin is under the influence of both the continental and the Mediterranean climates. The high flow period for the regions under the continental climatic influence can be observed in the northern part of Maritza basin and comes in late spring. On the other hand, the Mediterranean climatic influence is more visible to the south, affecting the lower parts of the Maritza and causing high flow conditions during winter. Flooding of the Maritza River typically occurs in the autumn, winter, and spring seasons and is caused mainly by heavy rainfalls and snowmelts (Tuncok, 2015).  (Tuncok, 2015). Moreover, the flood occurring on 16 February 2010 is considered to be the second biggest flood in the last 26 years. During this flood, a maximum discharge of 1713 m 3 /s was measured in the city of Edirne, while 2800 m 3 /s was measured in Ipsala (Batur and Maktav, 2012). Turkey is unable to provide adequate warning time to alert the population against floods because the section of the river inside the country is short (Sezen et al., 2007). Thus, getting current flow information from Bulgarian sites and using it in the current prediction for Turkey's section of the River is essential.

Figure 1: Location map of Maritza River
In the present study, flow data from the Plovdiv and Svilengrad stations (Bulgaria), and the Kirişhane station (Turkey), all located on the Maritza River, were gauged at two hour-intervals covering the flood period from 9 February 2010 00:00:00 to 21 February 2010 22:00:00. In addition, flow data at two hour-intervals covering the flood period from 6 February 2012 14:00:00 to 13 February 2012 10:00:00 were obtained to test developed models. Streamflow is gauged in normal at eight hours-intervals along one year, however, flow data is gauged at two hour-intervals only during flood period. That's why, flow data used in the present study is limited with several days. In addition, two hour-intervals of 2012 flow data is available for shorter period than 2010 flow data and available data period belongs to recession stage of flood as seen in the hydrograph (Fig. 2).

Fuzzy Inference System:
Fuzzy Inference System (FIS), known in the literature as fuzzy rule based system, fuzzy expert system, or a fuzzy system, is the process of mapping from a given input set using fuzzy logic which was suggested by Zadeh in 1965 (Elsayed, 2009). Linguistic terms in a rule-based system are used.to provide an inference structure for modeling sophisticated and complex structures (Jamshidi et al., 2013). There are two types of widely used fuzzy inference systems, Takagi-Sugeno FIS and Mamdani FIS (Jang et al. 1997). The definition of the consequent parameters is the basic difference between these fuzzy inference systems (Takagi and Sugeno, 1985). In Takagi-Sugeno FIS, consequent parameters can be either a linear equation or a constant, whereas in Mamdani FIS, rule base is constructed from input-output pairs, which can have both fuzzy sets and crisp values, but the outputs are always fuzzy sets. In this study, Mamdani type inference were adopted. The general structure of FIS is depicted in Fig. 3. As shown in Fig. 3, FIS includes four main parts (1) fuzzification, (2) rule base, (3) decision making unit, and (4) defuzzification.  The fuzzification is the process of transforming crisp values into grades of membership for linguistic terms of fuzzy sets transferring crisp values into fuzzy If-Then rules. Transforming crisp values into fuzzy in the present study was made by the definition of flow data as S N (Small N),…, S 1 (Small 1), CE (Center), B 1 (Big 1),…, B N (Big N) adopted for Triangular membership function (MF). The input-output relationships are defined by these rules in form of "If S12 AND S22 then S17". This process is fulfilled with the help of membership functions (MFs). Decision making unit uses these fuzzy If-Then rules to assign a map from fuzzy inputs to fuzzy outputs based on fuzzy composition rules. Finally, the defuzzification process, Centroid defuzzification in the present study, is used to transfer fuzzy sets into crisp value.

Multiple linear regression (MLR):
Regression analysis is used to predict the value of one or more responses from a set of predictors, and also to estimate the linear association between the predictors and responses. Linear regression is a statistical method fitting a linear function of independent variables onto dependent variables by minimizing the least square difference between the predicted and observed values of the data (Draper and Smith 1966). MLR in matrix form can be shown as (1) where is regression coefficient matrix, is fitting error matrix and is response matrix. is obtained by solving Equation (1) as follow: where is transpose of . In obtaining inverse of , high relativity between independent variables should be avoided, otherwise matrix cannot inverse causing more error. Solution is the avoiding multicollinearity between independent variables. The variance inflated factor (VIF) criterion is usually applied to check multicollinearity. The ideal value for VIF is 1. The higher VIF values mean that the more multicollinearity between independent variables exist (Noori et al., 2010).

Models performance evaluation and validation:
The performance of all developed FIS models and MLR was evaluated by using the mean absolute error (MAE), the Nach-Sutcliffe model efficiency coefficient (NSMEC), the normalized root mean square error (NRMSE). The output of these analysis helps to select best model among developed models. MAE is a common measure of forecast error in time series to determine how developed model outputs fit the observed data. The formula of MAE is shown in Eq. (3): where represents the observed flow values, represents the predicted flow values, is the number of flow values.
The NSMEC is commonly used to assess the predictive power of hydrological discharge models. It is defined as: (4) where mean of observed flow values. The NSMEC can range from to 1. An NSMEC equal to 1 meansthere is a perfect match between the model and the observations. The NRMSE statistic indicates a model's ability to predict a value away from the mean. The NRMSE is calculated by Eq. (5): Splitting available data (or record) into two segments; one of which is used for calibration (or training) and the other for validation, is the usual/straightforward method applied for model validation. However, this is possible when the available data is sufficient and can meaningfully split. This splitting procedure can be done as one half for calibration and other for validation if data is long enough, or can be done as 70% of data for calibration and 30% of data for validation (Klemeš, 1986). In the present study, since we are modelling prediction of flood hydrographs, 2010 flow data covering flood period was used for calibration (corresponding about 73% of all data) and 2012 flow data covering flood period (about 27% of all data) was used for validation.

Structure of forecasting models and application:
In the study, four different models were developed for the prediction of the flood hydrograph of the Kirişhane station (Q_K i ) using the transboundary data of the Plovdiv and Svilengrad stations. In the first model, Q_K i was estimated using the current flows of the Plovdiv (Q_P i ) and the Svilengrad (Q_S i ) stations. In the second model, Q_K i was estimated based on a two hour ahead predicted flow of the Svilengrad station (Q_S i-2 ) and a four hour ahead flow prediction of the Plovdiv station (Q_P i-4 ). In the third model, Q_K i was estimated based on a four hour ahead predicted flow of the Svilengrad station (Q_S i-4 ) and an eight hour ahead flow prediction of the Plovdiv station (Q_P i-8 ). In the last model, Q_K i was estimated based on a six hour ahead predicted flow of the Svilengrad station (Q_S i-6 ) and a twelve hour ahead predicted flow of the Plovdiv station (Q_P i-12 ). All developed models are shown in Table 1. FIS based forecasting was carried out using Mamdani model, in which both input and output variables are fuzzified (Özger, 2009). To this aim, the FIS editor graphical user interface (GUI) toolbox in the MATLAB/Simulink program was used. The triangular MF was selected for the two inputs (flows of Plovdiv and Svilengrad stations) and one output (flow of Kirişhane station) of the models (Fig. 4).  (Fig. 4) to test sensitivity of prediction results to number of MFs and accordingly number of rules. Since ranges of classes for each gauging station data is different due to range of data, class number was increased (from 13 MFs to 49 MFs) until obtaining best results. The equal ranges of class for each membership were defined depending on the number of class (for example; the range of class is 3.2 for Plovdiv due to range of data is 160, and number of class is 50 as a result of narrow classification of flow data depending on ranges defıned). The fuzzy rules of the models were generated by using AND logical conjunction. This process was separately made for each model by filtering classified flow data in MS Excel until remains unique rules in the form of "If S12 AND S22 then S17", thus different number of rules was obtained. The centroid defuzzification procedure was employed to obtain the predicted flow value of the output based on the fuzzy rule base. Model validation was made and then the performances of the developed models were compared.

Results and discussion:-
In the present study, FIS models with different number of MFs (13 MFs, 25 MFs, and 49 MFs) and MLR were developed to predict the flood hydrograph of the Kirişhane station from the transboundary flow data (belonging to 2010 and 2012 floods) of the Plovdiv and Svilengrad stations on the Maritza River and test the obtained results for comparison with respect to prediction accuracies. Streamflow is gauged in normal at eight hour-intervals along one year, however, flow data is also gauged at two hour-intervals only for flood periods. That's why flow data used in the present study is limited with several days. While 2010 flow data was used for construct models, 2012 flow data was used to validate models. Flow data used as test data is less than training data because test data could be gauged only during recession period of flood. Maritza River has two main tributaries (Arda and Tundja) located between Svilengrad and Kirişhane gauging stations. Detailed information on Maritza, Arda, and Tundja Rivers and their tributaries can be found in Tuncok et al., (2014). And also, distance between Svilengrad and Kirişhane (about 45 km) is quite short relative to distance between Plovdiv and Svilengrad (about 180 km). That's why, high differences of flows between gauging stations are available due to contributions from Arda and Tundja Rivers.
Three different criteria were used to test performance of each model (MAE, NSMEC, and NRMSE). A summary of the results of the performance criteria of the models is given in All FIS models with 13 MFs provided better prediction of 2012 data than MLR. This is most probably due to limited data availability used in model construction. Even so, all FIS models applied in the study can predict flood hydrograph better than MLR. That's why, it is possible to say that FIS as a soft computing technique in prediction of flood hydrograph or streamflow is more powerful technique than simpler statistical methods (such as MLR). But this doesn't mean that the existence of statistical methods can be denied (Kar et al., 2010) In the present study, fuzzy rules were generated by filtering classified flow data to generate MFs until unique fuzzy rules in the form of "If S12 AND S22 then S17" remain. The procedure of generating rules applied in the present study is quite easy and provides all possible fuzzy rules for available flow data of three gauging stations in construction of the models. Due to high number of MFs (such as 25 and 49) depending on class range of flow data, and variation in the data distribution in each class, different (and high) number of fuzzy rules (varying from 44 to 167) were obtained for each FIS models. In the present study, albeit the high number of fuzzy rules can be seen as an artifact, the best prediction accuracies could only be obtained by defining high number of MFs (i.e. 49 MFs) and accordingly high number of fuzzy rules (more than 150 fuzzy rules). In the literature, examples of high number of fuzzy rules are also available: for instance, Turan (2007) generated fuzzy models with 19 MFs resulting in 75 fuzzy rules. Obtaining the best results from FIS generated by high number of fuzzy rules is probably due to the nonlinearity of flood hydrograph, differences in the flows between each gauging station, and the range of minimum and maximum flow of each gauging. When compared number of flow data, the number of rules seems to be equal to (even more) flow data, because of availability of flow data for only two years (2010 and 2012) which limits the construction of a general (global) model for Maritza River to predict flood hydrograph. In order to overcome this limitations, re-arrangements in the construction of all models should make by adding additional flow data will be gauged for future flood periods. The results of the FIS with 49 MFs and MLR models developed using data belongs to 2010 and 2012 are given graphically in Fig. 5. In addition, charts resulting from observed and predicted values for all models developed with FIS with 49 MFs using data belongs to 2010 and 2012 are given in Fig. 6.

Conclusions:
The Maritza River, which is the biggest river on the Balkan Peninsula, affects Turkey, Greece, and Bulgaria. The Maritza River has two main tributaries (Arda and Tundja Rivers) located between Svilengrad and Kirişhane gauging stations which cause an increase in flow. These three rivers are international shared water bodies in the Balkan Peninsula. The lower regions of the Maritza River (especially Edirne city and surrounding regions) have suffered from floods. Streamflow or flood forecasting has become very demanding task in mitigation of effects of floods on human and dam safety as well as on ecosystem sustainability. Soft computing techniques have become common techniques in prediction streamflow or flood. That's why, the main objective of the present study was to employ FIS for the prediction of the flood hydrograph of the Kirişhane station from the transboundary data of the Plovdiv and Svilengrad stations on the Maritza River. For this aim FIS models with different MFS (13 MFs,25 MFs, and 49 MFs) and also MLR were selected to predict flood hydrograph. Prediction performances of all models was evaluated by using three criteria: MAE, NSMEC, and NRMSE. Because flow data used in the study is available at two hour-intervals for only flood period (in this study, only flow data of 2010 and 2012 years is available), prediction was made by using limited data. With respect to model construction this can be a limitation since it limits the representation power of models for flood behaviors rising from Maritza River. That's why, developed models should re-arrange by using additional flow data which will be obtained from future flood events. Because fuzzy models can be easily modified dynamically depending on whether addition data is available or not. Even though flow data belonging to only two years available, FIS models provided predictions with high values of NSMEC. All FIS models provided higher prediction accuracies than MLR, and MLR method failed to predict test data (2012 flow data). Of course, it must keep in mind that availability of more flow data in the model construction phase will provide more reliable results. This study demonstrated that FIS models can be successfully applied for prediction of flood hydrograph. Another important point in the study is the usage of transboundary flow data from different three gauging stations (Plovdiv, Svilengrad, and Kirişhane), two of them is located in Bulgaria.