52
Tecnología y Ciencias del Agua
, vol. VIII, núm. 2, marzo-abril de 2017, pp. 51-60
Kan
et al.
,
Daily streamflow simulation based on improved machine learning method
•
ISSN 2007-2422
Introduction
With the development of hydrological auto-
matic measuring technology, the hydrological
data become more and more sufficient nowa-
days. The best way to make full use of these
big hydrological data is to adopt the machine
learning method. The most popular machine
learning methods, which have been widely used
in the field of hydrological simulation, are the
artificial neural network (ANN) and K-nearest
neighbor (KNN) method (Li
et al
., 2014; Chen
et al
., 2017; Dong
et al
., 2015; Kan
et al
., 2016a,
2016b, 2016c, 2016d, 2016e; Lei
et al
., 2016; Li
et
al
., 2016; Zuo
et al
., 2016). In previous literatures,
we proposed an effective and efficient machine
learning based streamflow simulation model,
NU-PEK model, which is constituted by cou-
pling the ANN and KNN methods. It has been
successfully applied in the field of event-based
hourly streamflow simulation task. However,
when applied to daily streamflow tasks, its
performance becomes poor significantly.
In order to overcome the poor performance
problem of the NU-PEK model for daily stream-
flow simulation task, we proposed an improved
machine learning based streamflow simulation
model, named PKEK. The PKEK model is
composed by partial mutual information (PMI)
based input variable selection (IVS) module,
the K-means clustering input vector clustering
module, the ensemble artificial neural network
(ENN) based output estimation module, and the
KNN based output error estimation module.
The PKEK model and the previously proposed
NU-PEK model were applied in Chengcun
catchment in China to compare the model
performance and stability. Simulation results
indicated that the improved model has better
accuracy and stability, and has a bright applica-
tion prospect for daily streamflow simulation
task.
Watershed, hydrological andmeteorological
data utilized in this research
The daily streamflow simulation is carried on
in the Chengcun catchment. The Chengcun
catchment lies in the Qiantang River basin,
Anhui province, China. It is located in the
subtropical monsoon region and is a typical
humid catchment. Rainfall mainly falls in the
period fromApril to June. There are ten rainfall
gauges located in this area. Observed daily
rainfall, evaporation, and average discharges
range from 1986 to 1994 were utilized as the
calibration data, while data from 1995 to 1999
were utilized as the validation data. The wa-
tershed map, hydrological and meteorological
characteristics for the Chengcun catchment are
shown in figure 1 and table 1.
Methodology
K-means clustering algorithm
The
K
-means clustering is a famous and
widely used partitioning and clustering method
(MATLAB, 2012; Grigorios & Aristidis, 2014;
Kapageridis, 2015). It is a method of vector
quantization, originally from signal process-
ing, that is popular for cluster analysis in data
mining.
K
-means clustering aims to partition
n
observations into
K
clusters in which each ob-
servation belongs to the cluster with the nearest
mean, serving as a prototype of the cluster. The
K
-means method is usually calibrated or trained
by the iterative method which minimizes the
sum of distances from each object to its cluster
centroid over all clusters.
PEK model
The PEK model is a hybrid data-driven model
(Kan
et al
., 2015a, 2015b) and is composed by
ensemble artificial neural network (ENN) and
K
-nearest neighbor (KNN) algorithm. The PEK
approximator functions as a general purpose
function approximator. It can be applied for
the simulation of the multi-input single-output
(MISO) system mapping relationship. The PEK
approximator is firstly proposed by Kan
et al
.
(2015a, 2015b) and its detailed principle can be
found in the corresponding literatures (Kan
et
al
., 2015a, 2015b).