Tecnología y Ciencias del Agua - page 57

55
Tecnología y Ciencias del Agua
, vol. VIII, núm. 2, marzo-abril de 2017, pp. 51-60
Kan
et al.
,
Daily streamflow simulation based on improved machine learning method
ISSN 2007-2422
conform that the order of 10 is large enough
to describe the mapping relationship between
the rainfall and runoff data of the Chengcun
catchment.
K-means clustering
We use the Silhouette value to optimize the
best number of clusters. The Silhouette refers
to a method of interpretation and validation of
consistency with clusters of data. The technique
provides a succinct graphical representation
of how well each object lies within its cluster
(Rousseeuw, 1987). The classification result
show that the Silhouette value reaches the
maximum value when the input samples are
clustered into 3 clusters. Therefore, we divide
the input sample into 3 clusters.
The PEK module
The PEK module are calibrated by the NSGA-II,
LM, and cross-validation methods. The arith-
metic parameters of the NSGA-II algorithm are
set as: population size
= 90, evolution genera-
tion total number =1000, crossover probability
= 0.85, and mutation probability = 0.15. The
arithmetic parameters of the LM method are
set as suggested by the MATLAB software. As
for the early stopping strategy, approximately
3/4 of the calibration data are utilized as the
training set, and the remaining data are utilized
as testing set. The maximum number of test-
ing failures is set to 5. The lower and upper
boundaries of
K
for the KNN algorithm are
set to 1 and 300, and the
K
is optimized by the
leave-one-out cross-validation method. Because
we divide the input samples into 3 clusters, we
construct a PEK module for each cluster for the
PKEK model. The objective function used for
the NSGA-II algorithm is mean squared error
(MSE), mean squared value of the network
parameters (MS), and hidden layer neuron
number. The calibration results indicate that all
the Pareto fronts distribute evenly which means
that the optimization results is reasonable.
Model performance comparison
Scatter plots comparison
We use the scatter plots of the observed and
simulated discharges and the regression
R
val-
ues to inspect the overall performance of the
models. The scatter plots and the regression
R
values are demonstrated in figure 2.
As demonstrated in figure 2, as for the
calibration period, the PKEK model obtains
better result (
R
= 0.9634). The NU-PEK model
obtains the worse result (
R
= 0.9539). It can
be noticed that the data scatters of the PKEK
model intensively lie close to the 45 degree
line. The distribution of the data scatters of
the PKEK model is relatively more uniform
for small, middle, and large discharge values.
These results indicate that the PKEK model
generates the best result and obtains the most
stable performance in the calibration period.
As for the validation period, the PKEK model
obtains the best result (
R
= 0.9296). The NU-PEK
model obtains the worse result (
R
= 0.8912). As
demonstrated in the figure 2, the data scatters of
the PKEK model distributed normally around
the 45 degree line. This result indicates that the
simulation results of the PKEK model do not
have bias to be larger or smaller and shows a
very stable property. However, the simulated
discharges of the NU-PEK model is larger than
the observed values. This phenomenon becomes
more obvious especially for large discharge
values.
After analyzing the accuracy in calibration
and validation period, the accuracy declination
is also compared in this section. The PKEK
model obtains better accuracy declination
value (0.9296/0.9634 = 0.9649), and the NU-PEK
model obtains worse accuracy declination value
(0.8912/0.9539 = 0.9343). The comparison of ac-
curacy declination value shows that although
the PKEK model outperforms the original
NU-PEK model and generates better forecasted
results.
The analysis of the three models shows that
the PKEK model can generate better overall
1...,47,48,49,50,51,52,53,54,55,56 58,59,60,61,62,63,64,65,66,67,...166
Powered by FlippingBook