APPLICATION OF DESIGN EXPERT IN THE ANALYSIS OF RESPONSE TRANSFORMATION OF PROCESSES – A CASE STUDY OF BIOETHANOL PRODUCTION PROCESS FROM CORN-STOVER

* Ohimor Evuensiri Onoghwarite 1 , Ndirika Victor Ifeanyichukwu Obiora 2 and Eke Akachukwu Ben 2 . 1. Department of Chemical Engineering, Federal University of Petroleum Resources, Effurun, Delta State, Nigeria. 2. Department of Agricultural and Bioresources Engineering, Michael Okpara University of Agriculture, Umudike Abia State, Nigeria. ...................................................................................................................... Manuscript Info Abstract ......................... ........................................................................ Manuscript History


(3), 2006-2017
2007 from the fermentation of the resulting glucose and other sugars (from the pretreatment of hemicellulose,) to ethanol. The separate saccharification step enables operation of the saccharification step at an elevated temperature. For fermentation, the Saccharomyces cerevisiae is used as the biocatalyst which will ferment glucose and other sugars to ethanol. Some important factors that affect the performance of separate hydrolysis and fermentation (SHF) and thus bioethanol yield are: concentration of the hydrolysis solution, hydrolysis time, pH of hydrolysate, concentration of yeast, fermentation temperature and time. The feasibility of these two operations is determined by the effects of each factor as well as the interactions between factors at varying values of all the factors.
Thus to design the experiment involving multivariable factors, a statistical software package like the Design-Expert, that is specifically dedicated to performing comparative tests, screening, characterization, optimization, robust parameter design, mixture designs and combined designs is highly recommended (Tanco et al., 2008). Design-Expert offers test matrices for screening up to 50 factors. To keep pace with this, therefore, a power calculator which helps to establish the number of test runs needed is required. Statistical significance of these factors is established with analysis of variance (ANOVA). Based on the validated predictive models, a numerical optimizer helps the user to determine the ideal values for each of the factors in the experiment. Graphical tools help identify the impact of each factor on the desired outcomes and reveal abnormalities in the data (Cornley, 2009). Design-Expert provides 11 graphs in addition to text output to analyze the residuals (Plant, 2013). The software determines the main effects of each factor as well as the interactions between factors by varying the values of all factors in parallel (Black, 2013). Design-Expert provides the user with a broad range of possible response transformations. The appropriate choice depends on subject matter and/or statistical considerations. The software provides extensive diagnostic capabilities to validate statistical assumptions. This feature of the Design-Expert offers a helpful plot, called the Box-Cox, which recommends the appropriate power transformation (including the no transformation option). Design-Expert also has an option of plotting the responses in terms of the original response data by calculating the surface matrix of data points and then applying the inverse transform before making the plot.
In the present study, the application of statistical tools of the design expert software to determine statistical significance of the various process factors in bioethanol production from corn-stover is presented as a case study. The impact of these factors on the response (i.e. bioethanol yield), diagnostic capacities to validate statistical assumptions, possible response transformations and the criteria for the choices of transformation were clarified.

Design of Experiment:-
Box-Behnken design (BBD), a statistical tool, was used in the design of the experiment in this study. In BBD, the six independent variables; sulphuric acid concentration X 1 (A) , hydrolysis time X 2 (B) , fermentation time X 3 (C) , concentration of yeast X 4 (D), fermentation temperature X 5 (E) , and pH of hydrolysate X 6 (F), were all set at three levels; minimum (-1), centre (0) and maximum (1). The BBD was used to determine the number of runs or sets of experiments that were needed to be carried out. Each experiment was carried out in triplicates and the mean values outcome, taken as response, Y, which is bioethanol yield (mg/l).
For statistical calculations, the variables X 1 , X 2 , X 3 ….. X n were coded as x 1 , x 2 , x 3……Xn respectively, according to Equation 1 Nuran (2007): Where x i is a dimensionless value of the independent variable, x i is the mean value of the independent variable at the center point and ∆x i is the step change.
The design matrix used for the six independent variables showing the real values of the variables in terms of the three levels is presented in Table 1. 2008 The outcome of the BBD experimental design of the process (Separate Hydrolysis and Fermentation, SHF) using the Design Expert software, resulted in a total of 54 experimental runs as sets of coded variables.

Pretreatment, Hydrolysis and Fermentation of corn stover:-
The pretreatment, hydrolysis and fermentation of the corn stover were carried out in line with standard procedure (Ohimor et al., 2016). The milled corn stover was alkaline pretreated with dilute sodium hydroxide (2% w/w NaOH) summing to a solid to liquid weight ratio of 1:8. The hydrolysis of the alkaline pretreated corn stover was with dilute-acid (H 2 SO 4 ) of varying concentrations (1, 2.5 and 4%), such that a solid to liquid ratio of 1:10 is maintained in a 250ml round bottom flask and then refluxed. Hydrolysate samples were retained at intervals of 2, 4, and 6 hours for subsequent fermentation. The hydrolysate sample for fermentation was adjusted to the various pH of 5, 6.5 and 8 by adding concentrated sulphuric acid and 2N sodium hydroxide as may be appropriate.
Varying quantities of yeast (Saccharomyces cerevisiae) equivalent to 3, 6 and 9 g/l respectively where added to each hydrolysate samples contained in 250 ml Erlenmeyer flask, then incubated at various temperatures (30, 35 and 40 °C) for fermentation. The bioethanol content was determined by gas chromatography at various fermentation times of 12, 30, and 48 hours.
Statistical Analyses:-Upon completion of the laboratory analyses, the results (bioethanol yield) of all 54 experimental runs were entered into a table in the file already created in the Design Expert software for subsequent analysis. The results were statistically analyzed in order to have an understanding of the interactions between variables and the bioethanol yield. These results were also used for the optimization of the process.

(2.2)
Where Y is the predicted response; X 1 , X 2, X 3 are the independent variables; b o is the offset term; b 1 , b 2 , b 3 are the linear effects; b 11 , b 22 , b 33 are the square effects; and b 12 , b 23 , b 13 are the cross effects of the interaction terms. Using regression analysis, the significance of the coefficients of the terms in the model equation was determined by computing the standard errors, T-values and P-values. Analysis of variance (ANOVA) was also used to determine the level of confidence (α) as well as linear, interaction and quadratic effects. The quality of fit of the BBD model was estimated by comparing the variance (R 2 ) of the adjusted model and that of the predicted Model.
Finally, the experiment was repeated in triplicate using the variables that gave the optimum bioethanol yield and the average of the results was compared to the optimum value. The validity of the model was adjudged to be good since the result is comparable to the optimum bioethanol yield to an extent. Point Prediction was carried out which allows the entering of each factor or component into the current model at different levels. The software then calculates the expected responses and associated confidence intervals based on the model equation that is shown in the ANOVA output. The predicted values are updated as the levels are changed. The 95% confidence interval (CI) is the range in which the process average is expected to fall into 95% of the time. The 95% prediction interval (PI) is the range in which any individual value is expected to fall into 95% of the time. The prediction interval will normally have a wider spread than the confidence interval because of the random nature of the individual values when compared to the averages.

Results and Discussion:-Experimental Design of the Bioethanol Production Process and Bioethanol Yield:-
The results of the design of experiments and experimental bioethanol yields (mg/l) for each runs are presented in Table 2; it shows the experimental design of the six (6) independent variables for the bioethanol production process in terms of their coded levels (i.e. -1, 0, +1). The analytical experiments were then carried out according to the assignment of variables in Table 2. The experimental runs were performed randomly to avoid systemic error.

Transformation Equation:-
The response (i.e Bioethanol Yield) ranged from 28.62 mg/l to 147.64 mg/l. The ratio of the maximum to minimum response determines whether a transformation equation is needed or not for the statistical analysis. A ratio greater than 10 usually indicates a transformation is required. For ratios less than 3, the power transform will have little effect. For ratios ranging from 3 to 10, a transform is not required.
Transformation types can either be Square Root, Natural Logarithm, Base 10 Logarithm, Inverse Square Root, Inverse, Power, Arcsin Square Root or None.
From the experimental results, the ratio of maximum to minimum bioethanol response is 1 : 5.15863, hence a transform was not required before we proceeded with the statistical analysis.

Model type selection:-From the statistical analysis using Design Expert, the information on Sequential Model Sum of Squares [Type I] and
Lack of Fit Tests, shown in Tables 3 and 4 respectively, were obtained.  For the "Lack of Fit Tests", the selected model is to have an insignificant lack-of-fit, which is a "P-value " that is > 0.05. In this case the software suggested Linear, however, the Quadratic model is preferrable in as much as it was not aliased, with a P-value of 0.0932. The choice of the quadratic over linear model is to be able to account for interactions between two or more variables.
The "Lack of Fit F-value" of 3.32 implies there is a 3.32% chance that a "Lack of Fit F-value" as much as this could occur due to noise. Lack of fit is bad --we want the model to fit. This relatively low probabiity (<10%) is worrisome. What this means is that there are many insignificant model terms and as such, there is need to consider model reduction in order to improve the model.

Regression analysis:-
The result of the regression analysis is shown in Table 5 below. If the T-value is above threshold and P-value is lower than 0.05 for any given term, then that term is said to have a meaningful contribution to bioethanol production. Terms that have no contribution were removed from the predicted model and a new model referred to as adjusted model was obtained.

Model summary statistics:-
The summary of the statistical analyses are shown in Tables 6 and 7 below.  Table 6, the choice of model was based on the Predicted R-Squared value. A positive value of Predicted R-Squared is more preferable to a negative term, hence the Cubic model was aliased (i.e ruled out) while the Linear model was suggested. However, the Two-Factor Interaction (2FI) and Quadratic models were not aliased because their Predicted R-Squared values were not too negative, hence the Quadratic model was selected. The adequate precision and various R-Squared values for the quadratic model are given in Table 8. The "Adeq Precision" is a measure of the signal to noise ratio and a ratio greater than 4 is desirable. For the quadratic model, the "Adeq Precision" is 8.309. Since it is greater than 4, the model can be used to navigate the design space, but there is need for improvement, through the removal of insigificant model terms.

Confidence interval:-
The confidence interval (CI) depicted in Table 8 is another statistical data that can be used to determine whether a model term has effect (i.e. significant) or not on the model). Hence, the terms in the columns "95% CI Low and 95% CI High" represent the range that the true coefficient should be found in 95% of the time. If the range spans zero (i.e. one limit is positive and the other negative) then the coefficient would not be true, indicating that the corresponding term or factor has no effect. However, if the range does not span zero (i.e. both limits having the same sign, either positive or negative) then the coefficient would be true, meaning that the corresponding term or factor has effect. For example, the term B has a confidence interval which spans 0 since it is between -6.80 and 7.72, hence term B has no effect. Whereas term A has a confidence interval which does not span 0 since it is between -30.48 and -15.96, as such it has significant effect on the model equation. Similarly, the confidence intervals of the following terms A, F, AF, CD, A 2 as well as the intercept do not span 0 and therefore have significant effects on the model equation.  Table 8, measures how much the variance of the model is inflated by the lack of orthogonality in the design. If the factor is orthogonal to all other factors in the model, the VIF is one. Values greater than 10 indicate that the factors are too correlated together (they are not independent.). Thus, since the values of Variance Inflation Factors range from 1.00 to 1.30, it indicates that the factors or terms are independent. Table 8 is associated with the calculation of the mean. It comes from the standard deviation of the data divided by the square root of the number of repetitions in a sample. The larger the value of the Standard Error of a term the more insignificant is the term.

Diagnostics Case Statistics:-
The actual value, predicted value of bioethanol and their residuals are given in Table 9, which informed the plots in  The normal probability plot of the studentized residuals in Figure 1, follows a linear pattern, which indicates a normal distribution of the residuals, hence a transformation is not required. This is in line with Hothorn, and Everitt (2014), who reported that a non-linear pattern indicates that the residuals did not follow a normal distribution, in which case a transformation of the response may provide a better analysis. The studentized residuals versus predicted values plot, Figure 2, is a plot of the residuals versus the ascending predicted response values. It tests the assumption of constant variance. A random scatter (constant range of residuals across the graph.) is acceptable while an expanding variance ("megaphone pattern <") indicates the need for a transformation (Engle and Sheppard 2001). Since the plot showed a random scatter, it indicates that there was no assumption of a constant variance, thus, transformation is not required in the data so analyzed.  Figure 3, show that there were no lurking variables that may have influenced the bioethanol responses, since it followed a random scatter. According to Engle and Sheppard (2001), internally studentized residuals versus run plot which results in a trend will normally indicate the presence of lurking variables which may have influenced the responses and as such a transformation would be required. The predicted bioethanol yield versus actual bioethanol yield plot, Figure 4, shows points clustering along a linear graph which indicates that there are no values that would not be easily predicted by the model. A predicted responnse versus actual response plot helps detect a value, or group of values, that are not easily predictable by a model. Thus, the clustering of the points around a line indicates parity between the predicted and actual data being analyzed; hence the model can easily be used to predict a value of a response when the actual value is known (Cao et al., 2009). The Box Cox Plot, Figure 5, gave the best lambda value of 0.39 corresponding to the minimum point of the curve generated by the natural log of the sum of squares of the residuals. It was also observed that the 95% confidence interval around this best lambda value, includes 1, therefore no specific transformation is recommended for the data that was analyzed, based on the recommended transformation list in Box and Cox, (1964).

Conclusion:-
The Design Expert software and its several statistical tools are dedicated to designing experiments involving multivariable factors and analyzing the main effects of each factor as well as the interactions between factors. Thus in this paper it has been employed to determine statistical significance of the various process factors in bioethanol production from corn-stover, their impact on the response variable (bioethanol yield), diagnostic capacities to validate statistical assumptions, models and possible response transformations.