## Related terms:

- Exchange Rate
- Hidden Markov Models
- Energy Demand
- Time Series
- Foreign Exchange
- Artificial Intelligence
- Neural Network
- Traffic Sign

## Metro traffic flow monitoring and passenger guidance

Hui Liu, ... Ye Li, in Smart Metro Station Systems, 2022

### 2.2.2 Principles of time series prediction

By studying the nonlinear correlation between the data at a certain moment and its historical data, the time series prediction method can summarize the characteristics of the fluctuation law and realize the prediction of some future time points [6]. The key technology used in this study is to realize the prediction of traffic flow time series. The original traffic flow fluctuation data is one-dimensional data, and there must be corresponding observed values at a certain time point [7].

In order to meet the modeling requirements of the supervised learning algorithm, the original one-dimensional traffic flow time series data should be transformed into the format of multidimensional input feature vectors and model output sample labels. At present, scholars usually adopt the strategy of “sliding window” to transform one-dimensional data into two-dimensional data. This method can effectively transform a one-dimensional traffic flow time series into a two-dimensional machine learning data form [8].

The main steps of this method are as follows [9]:

Step 1: Select time *T*, collect *N* historical values from time *T* to time *T*_{N−1} in sequence and set them as feature vectors. N is the length of the feature vector, that is, the dimension of the input.

Step 2: Set the observed value from T+1 to T+M as the label to construct the output vector. M is the number of output variables and represents the predicted step size.

The basic flow chart of data format transformation based on the “sliding window” method is shown in Fig. 2.4:

View chapterPurchase book

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780323905886000020

## Forecasting

Elena Mocanu, ... Madeleine Gibescu, in Local Electricity Markets, 2021

### 14.1 Introduction

As prediction developed, different subfields were created. The electrical forecasting problem can be regarded as a nonlinear time series prediction problem depending on many complex factors since it is required at various aggregation levels and high resolution [1]. Furthermore, the electrical forecasting accuracy and the resulting errors will be reflected in the performance of the local energy market. In this context, a variety of forecasts are necessary at national level, regional level, or specific to the type of consumers (residential, industrial). Worldwide, residential buildings have one of the highest energy consumption rates, on average they consume around 40% of the global primary energy and contribute to over 30% of CO_{2} emission. Within Europe, residential energy usage grows at an annual rate of 1.5%. This is higher than the industrial and transportation sector energy consumption increase rate [2].

Consequently, the current growth of urbanization and electricity demands introduce new requirements for future power grids and keep the electricity market under pressure. To satisfy these demands, future power grids will need to predict, learn, schedule, make decisions, and monitor local energy production and consumption. Following this, to improve the flow of energy requires energy predictions over various time horizons [3,4].

As both the aggregation level and the prediction horizon are decreasing more and more, the fluctuations are increasing in the electrical patterns. To solve these challenging problems, various time series and machine learning (ML) approaches have been proposed in the literature. These range from heuristic based approaches to mathematically grounded ones such as those residing in the realm of ML.

When analyzing the local energy market impact, it is imperative to not only predict the electrical pattern, but also to consider a deeper range of factors. This allows the decomposition of demand and price forecasting, to not only help identify consumption and generation trends, detect faults, or predict savings, but it allows for better decision-making strategies to control and schedule loads to off-peak times [3,4]. The choice of a high-performance method depends on the special characteristics that electrical patterns have.

View chapterPurchase book

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128200742000071

## Personalized mobility services and AI

George Dimitrakopoulos, ... Iraklis Varlamis, in The Future of Intelligent Transport Systems, 2020

### 20.1 Artificial intelligence in transportation

*Artificial intelligence* (AI) is a simulation of human intelligence processes by machines and depending on the problem complexity either aims to support or to replace human intelligence. The increasing interest of researchers and companies for artificial intelligence solutions brought AI-driven development among the top-10 strategic technology trends for 2019 (Gartner’s top-10 strategic technology trends for 2019: https://www.gartner.com/smarterwithgartner/gartner-top-10-strategic-technology-trends-for-2019/). Although AI is not new, and its history begins back in the 1950s, there have been dramatic changes in the last 10 years, which is mainly due to the rise in *computing power* and *available data*. These two components are the main enablers that allow AI algorithms to control smart services and automate complex operations.

Almost all the major cloud vendors invested in AI in order to create a new market of AI solutions as a service, with applications in many domains.

IBM launched Watson (IBM Watson AI platfotm: https://www.ibm.com/watson) as a service in an attempt to attract new customers and allow third-party companies to develop AI solutions. IBM Watson started as a question-answering system that was able to understand natural language and respond with facts from a huge knowledge base and evolved into a suite of enterprise-ready AI services, applications, and tools. According to the IBM, Watson allows companies to accelerate research and discovery, enrich their interactions, anticipate, and preempt disruptions, recommend with confidence, scale expertise and learning, detect liabilities and mitigate risk and consequently frees employees from repetitive tasks and empowers them to focus on the high-value work that is critical for an enterprise. Question-answering systems that extract facts from company documents, virtual assistants that respond to online customers and chatbots and smart readers of complex documents (e.g., contracts) are among the tools offered by the Watson suite. In the domain of mobility, IBM Watson has powered with AI technology the autonomous vehicle of Local Motors, Olli, a fully electric car that can be 3D printed and can hold up to 12 people (English, 2016).

Microsoft’s Azure AI (Microsoft Azzure AI platform, https://gallery.azure.ai/) offers a suite of algorithms, solution templates, reference architectures, and design patterns that allow companies to develop custom AI solutions to their problems. Several experiments are showcased in their website including image classification and recognition, outlier detection and time-series predictions as well as an example that uses regression algorithms to score parking availability in the city of Birmingham, UK, using open data. The domains of application include among others, retail market (sales forecasting, predicting customer churn, and pricing models), manufacturing (predict equipment maintenance, forecast energy prices), banking (predict credit risk and monitor for online fraud), and healthcare (detect disease and predict hospital readmissions).

Salesforce’s solution to smart CRM is called Einstein AI (Salesforce Einstein platform https://www.salesforce.com/products/einstein/overview/). The platform uses machine learning and AI to support faster decision making for managers and increase the productivity of employees, and personalized recommendations that increase customers’ satisfaction. With a combination of the AI tools which are available as a service, companies can build solutions that discover significant patterns and trends in sales data, can understand their customers by learning which channels, messages, and content they prefer and allows every employee to have instant access to smart insights and business AI-powered analytics.

C3 AI develops solutions for the operators of transportation, logistics, and travel companies, which comprise IoT analytics and predictive machine-learning models. For example, the C3 predictive maintenance solution, which is a part of the C3 AI suite, provides estimations related to the risk of failure of various vehicle equipment and recommends maintenance actions that can prolong their safe operation. It is, in essence, a supportive tool for fleet owners with monitoring and preventing capabilities that guarantee optimized fleet performance. In addition, C3 AI provides data analytic services for sales and demand data and allows enhanced demand forecasting, and increased customer service. The full list of applications (C3 AI solutions, https://c3.ai/products/c3-applications/) comprises sensor health algorithms, inventory optimization, facility energy management, supply network, and CRM solutions.

Google AI (Google AI platform https://ai.google) brings together hardware, software, and AI, and makes devices faster, smarter, and more useful. It offers Cloud AutoML, an online solution that allows developers, researchers, and businesses with limited AI expertise to build their own custom models. Google’s parent company Alphabet runs several transport-relevant projects and encompasses a host of other subsidiaries called “Other Bets,” such as (1) Waymo, the self-driving car initiative, (2) Sidewalk Labs, the urban innovation organization, and (3) Project Wing, which is developing an autonomous delivery drone service. The combination of cloud computing and IoT-edge computing (Google’s IOT edge computing https://cloud.google.com/iot-edge/) allows the training of complex models on the cloud and the deployment of the trained models at the edge for faster real-time prediction. Thus intelligent technologies that prevent and avoid collisions, detect driver’s distraction and provide alerts, collect and analyze traffic information, and give route alternatives can be easily integrated with existing infrastructure and embedded on existing and future vehicles using IoT devices.

Since 2015, Nvidia invests in hardware acceleration of deep learning architectures, for providing autonomous car and driver assistance functionality (Oh & Yoon, 2019). The Nvidia Drive AGX open autonomous-vehicle computing platform collects data from various sources such as cameras, lidars, ultrasonic sensors, and radars. Then, processes the data in order to get a 360-degree understanding of the surrounding environment in real-time, detect the vehicle location on the map and within the surrounding area and be able to plan the next movement safely. This high-performance computing platform is energy-efficient and able to develop safe and highly responsive self-driving models. TuSimple (TUSimple website https://www.tusimple.com/) is a Chinese startup that uses Nvidia GPUs and the cuDNN CUDA deep neural-network library to power its driverless trucks, that promise to autonomously transport products in a depot-to-depot basis.

View chapterPurchase book

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128182819000206

## Traffic congestion detection: data-based techniques

Fouzi Harrou, ... Ying Sun, in Road Traffic Modeling and Management, 2022

### 5.4.1 Features extraction with PCA

Detecting anomalies based on PCA has gained significant attention in the last decade. This is mainly due to the flexibility and capability of PCA in modeling multivariate data without the need for deep physical knowledge on the process, and the only information needed is historical measurements characterizing the normal process operation [44].

PCA is possibly the most frequently applied methods for dimensionality reduction. The introduction of PCA can be traced back to Pearson (1901) and Hotelling (1933). This has become a popular feature extraction technique by relating process variables. In this framework, PCA can represent a process efficiently in a reduced subspace via PCA scores, which are linear combinations of original variables. As a matter of fact, PCA has been found useful in various contexts including detection of outliers, denoising [49], data smoothing, time series prediction [50], and process monitoring.

PCA is performed based on one data set containing all the process variables concerned with the problem (i.e., input and output variables). We will denote by *Y* the whole data set for consistency with the following chapters, whereas the notation *X* is often used in the literature. Let ${\mathbf{Y}}^{b}={[{\mathbf{y}}_{1}^{T},\dots ,{\mathbf{y}}_{n}^{T}]}^{T}\in {R}^{n\times m}$ denote traffic measurements collected from a detector station containing *n* observations and *m* variables. For instance, traffic variables may include flow, speed, occupancy, truck flow, vehicle mile travel, and vehicle hour travel. It is worth pointing out that the traffic data are recorded by different detectors during the absence of traffic congestion. This allows us to build a reference PCA model reflecting the congestion-free behavior.

Before applying the PCA-based approach, an important step is to first preprocess the data by removing the data mean by scaling the data variance to unity. In other words, the data should be autoscaled. This is mainly due to the fact that the process variables collected from different sensors may have different scales. The variables with large variance can mask the variables with weak variance and thus make their comparison challenging. The benefits of data autoscaling are facilitating comparison and data analysis and bypassing the problem of several scales and data with distinct means and standard deviations in diverse units. Hence each autoscaled variable ${\mathbf{y}}_{j}$ ($j=1,\dots ,m$) is expressed as

(5.16)${\mathbf{y}}_{j}=\frac{{\mathbf{y}}_{j}^{b}-{\mu}_{j}}{{\sigma}_{j}},$

where ${\mathbf{y}}_{j}^{b}$ is the *j*th column of the input matrix ${\mathbf{Y}}^{\mathbf{b}}$, ${\mu}_{j}$ denotes the sample average of all observations in the variables ${\mathbf{y}}_{j}$ and can be written as

(5.17)${\mu}_{j}=\frac{1}{N}\sum _{i=1}^{n}{\mathbf{y}}_{j}^{b}(i),$

and ${\sigma}_{j}$ refers to the sample standard deviation of the variables ${\mathbf{\text{y}}}_{j}$:

(5.18)${\sigma}_{j}=\sqrt{\frac{1}{n}\sum _{i=1}^{n}{({\mathbf{y}}_{j}^{b}-{\mu}_{j})}^{2}}.$

The autoscaled matrix **Y** is written as

$\mathbf{Y}={\left(\begin{array}{cccc}\hfill {y}_{1,1}\hfill & \hfill {y}_{1,2}\hfill & \hfill \dots \hfill & \hfill {y}_{1,m}\hfill \\ \hfill {y}_{2,1}\hfill & \hfill {y}_{2,2}\hfill & \hfill \dots \hfill & \hfill {y}_{2,m}\hfill \\ \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \ddots \hfill & \hfill \vdots \hfill \\ \hfill {y}_{n,1}\hfill & \hfill {y}_{n,2}\hfill & \hfill \dots \hfill & \hfill {y}_{n,m}\hfill \end{array}\right)}_{n\times m}.$

The attempt of the modeling stage is to model nominal traffic evolution based on historical congestion-free (reference) data. These data are characterized by traffic evolution consistently in an acceptable manner without accidents and congestions, and in which only good quality of data has been obtained. The missing data should be excluded or imputed using dedicated procedures.

Essentially, PCA is initially used to reduce the dimensionality of the multivariate correlated data via the compression of the original data **Y** into a lower-dimensional subspace of dimension $l<rank(\mathbf{Y})$ such that the most data variability is preserved by a fewer number of principal components (PCs). The latter are linear combinations of the original variables and orthogonal one to each other. PCA can be performed using different procedures, such as singular value decomposition (SVD) and the nonlinear Iterative partial least squares (NIPLAS) algorithm [51]. Using the PCA in both procedures, the data matrix **Y** is split into two complementary parts, the approximated matrix $\stackrel{\u02c6}{\mathbf{Y}}$ and a residual matrix **E** (Fig. 5.2):

(5.19)$\mathbf{Y}=\mathbf{T}{\mathbf{W}}^{T}=\sum _{i=1}^{k}{t}_{i}{w}_{i}^{T}+\sum _{i=k+1}^{m}{t}_{i}{w}_{i}^{T}=\stackrel{\u02c6}{\mathbf{Y}}+\mathbf{E},$

where $\mathbf{T}\in {R}^{n\times m}$ refers to the score matrix containing the PCs. As discussed above, the PCs are uncorrelated variables defined as linear combinations of the original variables. They capture the maximum variability in the original data in ascending order. Consequently, in the presence of highly cross-correlated data **Y**, only a few components, *k*, are able to preserve the relevant variability in the original data. $\mathbf{W}\in {R}^{m\times m}$ denotes the loading matrix containing the eigenvectors corresponding to the coefficients for the linear transformation. The first *l* eigenvectors represent the directions of the largest variability of the new PC one-dimensional subspace.

Singular value decomposition of the sample covariance matrix **C** of the data **Y** is often used to calculate loadings of the data matrix **Y**:

(5.20)$\mathbf{C}=\frac{1}{n-1}{\mathbf{Y}}^{T}\mathbf{Y}=W\mathrm{\Lambda}{W}^{T}=\left[\begin{array}{ccc}\hfill {\mathbf{w}}_{\mathbf{1}}\hfill & \hfill \cdots \hfill & \hfill {\mathbf{w}}_{\mathbf{m}}\hfill \\ \hfill \hfill \end{array}\right]\left[\begin{array}{ccc}\hfill {\lambda}_{1}\hfill & \hfill \cdots \hfill & \hfill 0\hfill \\ \hfill \vdots \hfill & \hfill \ddots \hfill & \hfill \vdots \hfill \\ \hfill 0\hfill & \hfill \cdots \hfill & \hfill {\lambda}_{m}\hfill \\ \hfill \hfill \end{array}\right]\left[\begin{array}{c}\hfill {{\mathbf{w}}_{\mathbf{1}}}^{T}\hfill \\ \hfill \vdots \hfill \\ \hfill {{\mathbf{w}}_{\mathbf{m}}}^{T}\hfill \\ \hfill \hfill \end{array}\right]$

with $W{W}^{T}={W}^{T}W={I}_{n}$, where Λ denotes a diagonal matrix with the diagonal terms corresponding to the eigenvalues of **C** ordered in descending order (${\lambda}_{1}>{\lambda}_{2}>\dots >{\lambda}_{m}$). It is worth pointing out that the eigenvalues ${\lambda}_{i}$ are equal to the variances of the PC ${\mathbf{t}}_{i}$ (i.e., ${\sigma}_{i}^{2}={\lambda}_{i}$) [52]. Mathematically, the variance of ${\mathbf{t}}_{i}$ is obtained as $Var({\mathbf{w}}_{i}^{T}\mathbf{Y})={\mathbf{w}}_{i}^{T}\mathbf{C}{\mathbf{w}}_{i}={\lambda}_{i}$. In fact, we have $\mathbf{C}{\mathbf{w}}_{i}={\lambda}_{i}{\mathbf{w}}_{i}$, and as ${\mathbf{w}}_{i}$ is the eigenvector for ${\lambda}_{i}$, we have ${\mathbf{w}}_{i}^{T}\mathbf{C}{\mathbf{w}}_{i}={\mathbf{w}}_{i}^{T}{\lambda}_{i}{\mathbf{w}}_{i}$. Of course, the variance of the *i*th PC is the *i*th eigenvalue ${\lambda}_{i}$. Another important property that should be pointed out is that $Cov({\mathbf{w}}_{i}^{T}\mathbf{Y},{\mathbf{w}}_{j}^{T}\mathbf{Y})={\mathbf{w}}_{i}^{T}\mathbf{C}{\mathbf{w}}_{i}={\lambda}_{i}{\mathbf{w}}_{i}^{T}{\mathbf{w}}_{i}=0$.

$\left[\begin{array}{c}\hfill {\stackrel{\u02c6}{W}}_{l\times l}^{T}\hfill \\ \hfill {\tilde{\mathbf{W}}}_{m\times m-l}^{T}\hfill \end{array}\right]\left[\begin{array}{cc}\hfill {\stackrel{\u02c6}{W}}_{l\times l}\hfill & \hfill {\tilde{\mathbf{W}}}_{m\times m-l}\hfill \end{array}\right]=\left[\begin{array}{cc}\hfill {\mathbf{I}}_{l\times l}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill {\mathbf{I}}_{(m-l)\times (m-l)}\hfill \end{array}\right].$

The matrices *W* and Λ can be partitioned as follows:

(5.21)$\mathrm{\Lambda}=\left(\begin{array}{ccc}\hfill {\stackrel{\u02c6}{\mathrm{\Lambda}}}_{l\times l}\hfill & \hfill \phantom{\rule{0ex}{0ex}}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill \phantom{\rule{0ex}{0ex}}\hfill & \hfill {\tilde{\mathrm{\Lambda}}}_{m-l\times m-l}\hfill \\ \hfill \hfill \end{array}\right),$

(5.22)$\mathbf{W}=\left(\begin{array}{ccc}\hfill {\stackrel{\u02c6}{\mathbf{W}}}_{l\times l}\hfill & \hfill \phantom{\rule{0ex}{0ex}}\hfill & \hfill {\tilde{\mathbf{W}}}_{m\times m-l}\hfill \\ \hfill \hfill \end{array}\right).$

It is obvious that Eq. (5.19) can be written as

(5.23)$\mathbf{Y}={\stackrel{\u02c6}{\mathbf{T}}}_{l}{\stackrel{\u02c6}{\mathbf{W}}}_{l}^{T}+{\tilde{\mathbf{T}}}_{m-l}{\tilde{\mathbf{W}}}_{m-l}^{T}$

with $\stackrel{\u02c6}{\mathbf{Y}}=\mathbf{Y}{\stackrel{\u02c6}{\mathbf{C}}}_{l}$ and $\tilde{\mathbf{Y}}=\mathbf{Y}{\tilde{\mathbf{C}}}_{m-l}$.

A simple geometrical illustration of the basic PCA framework is presented in Fig. 5.2. In the cases where the process variables are cross-correlated, PCA decomposes the space of measure ${\mathbb{R}}^{m}$ into a PCs subspace ${S}_{\mathrm{PCs}}$, where important variations take place, and residuals subspace ${S}_{\mathrm{res}}$, where outliers and errors can appear. The approximated data $\stackrel{\u02c6}{\mathbf{Y}}=\mathbf{Y}{\stackrel{\u02c6}{\mathbf{C}}}_{l}$ are obtained via the projection of the original data into the subspace of the PCs, ${S}_{\mathrm{PCs}}{\mathbb{R}}^{l}$, and the residuals **E** are computed by projecting **Y** into the subspace of the residuals ${S}_{\mathrm{res}}\in {\mathbb{R}}^{m-l}$ (i.e., $\mathbf{E}=\mathbf{Y}{\tilde{\mathbf{C}}}_{m-l})$.

To geometrically show the principle of principal components, for simplicity, assume that we have three cross-correlated variables (Fig. 5.3). Fig. 5.3 indicates that two principal components are sufficient to explain the covariance structure in this three-dimensional data. As shown in Fig. 5.3, the first principal component, which captures the maximum variance in the original data, is presented by a line through the largest variance in data, and the second component describes the largest variance omitted by the first component, and it is orthogonal to the first PC. Thus the data can be summarized in reduced dimensional space.

It is worth pointing out that the residual matrix generated by PCA is an important indicator for sensing traffic congestions. Generally, anomalous traffic detection can be done by evaluating the residuals using multivariate monitoring schemes.

View chapterPurchase book

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128234327000100

## Soft computing hybrids for FOREX rate prediction: A comprehensive review

Dadabada Pradeepkumar, Vadlamani Ravi, in , 2018

### 1 Introduction

The Foreign Exchange (FOREX) rate is the price of one currency paid in terms of another. It is the most important price in any country’s economic system and is a measure of the economic health of that country. The exchange rates have the tremendous influence on the trade relationship of one country with another, which in turn, affect the common man’s standard of living. A country with lower currency rate has to make its exports very cheap and its imports very expensive in foreign currency market thereby affecting its economy. It can revisit its economic policies and change them suitably based on the accurate prediction of FOREX rate. These changes help in maintaining trade relationships properly which, in turn, lead its economy to be stronger. Thus, the prediction of FOREX rate is paramount, and it should never be underestimated (Hoagand Hoag,2002).

Financial time series is a collection of chronologically recorded observations of the financial variable(s). For example, the daily FOREX rate of a currency pair is a univariate financial time series. Compared to other time series, the financial time series are intrinsically non-stationary and chaotic (Yaoand Tan,2000). A time series is said to be chaotic if and only if it is nonlinear, deterministic and sensitive to initial conditions (Dhanyaand Nagesh Kumar,2010). The prediction of a chaotic time series engages with the prediction of future behavior of the chaotic system by utilizing the current and past states of that system.

In addition to these, financial time series prediction is a highly complicated task as a financial time series exhibits the following characteristics:

- 1.
Financial time series often behave nearly like a random-walk process, making the prediction almost impossible (from a theoretical point of view) (Hellstromand Holmstrom,1998).

- 2.
Financial time series are usually very noisy, i.e., there is a large amount of random (unpredictable) day-to-day variations (Magdon-Ismailetal., 1998).

- 3.
Statistical properties of the financial time series are different at different points in time as the process is time-varying (Hellstromand Holmstrom,1998).

Time series forecasting involves collecting historical observations of a variable, analyzing them to develop a model capturing the underlying process of data generation and utilizing that model to predict the future. Whenever a single model fails to find all characteristics of a Time series and a bunch of models, in their stand-alone mode, cannot find the true process of generating data in, it is better to build hybrid models (Teruiand VanDijk,2002). A hybrid is either homogeneous or heterogeneous depending on whether only nonlinear models comprise it or a combination of linear and nonlinear models comprise it (Taskaya-Temizeland Casey,2005).

Several researchers demonstrated that hybrid or ensemble models do yield better results compared to the constituent stand-alone models. Reid(1968) and Batesand Granger(1969) laid the foundation for proposing various hybrid time series models. Batesand Granger(1969) concluded that suitably combining different forecasting models can yield better predictions than the stand-alone models. Similarly, Makridakisetal.(1982) reported that a hybrid or an ensemble of several models is commonly needed to improve forecasting accuracy. Pelikanand DeGroot(1992), and Ginzburgand Horn(1993) reported that the combination of several artificial neural networks (ANNs) improved time series forecasting accuracy. An excellent comprehensive review of various hybrid prediction models and annotated bibliography can be found in Clemen(1989). Usually, a good hybrid prediction model can:

- 1.
Improve the forecasting performance.

- 2.
Overcome deficiencies of the constituent stand-alone models.

- 3.
Reduce the model uncertainty (Chatfield,1996).

Soft Computing (SC) is a collection of various computational techniques from computer science and some engineering disciplines. It is aimed at exploiting the tolerance for imprecision, uncertainty, partial truth, and approximation to achieve tractability, robustness and low solution cost. The idea of hybridizing two or more machine learning techniques emanated from the fact that each one of them in their stand-alone mode has merits and demerits. Once hybridized, their demerits would be nullified, while merits would be amplified (Zadeh,1994). It is founded on the fact that the human mind can store and process information which is commonly unclear, ambivalent and lacking in categorization. It can model and analyze complex systems arising in bioscience, medicine, the humanities, management sciences, and all fields of science and engineering (Galindo,2008).

The Table1 presents the constituents of SC such as Artificial Neural Network (ANN), Evolutionary Computation (EC), Fuzzy Logic (FL), Support Vector Machine (SVM) and Chaos Theory that are used in predicting FOREX rate time series along with their respective merits and demerits. The constituents of SC, in general, include fuzzy computing, neural computing, evolutionary computing, SVM, decision trees, chaos, probabilistic reasoning and rough sets. Some of the hybrid systems (or Hybrids) include neuro-fuzzy, neuro-genetic, neuro-fuzzy-genetic, fuzzy-neural, fuzzy-genetic, etc. to mention a few.

Table 1. Constituents of soft computing used in hybrids.

Constituent of soft computing | Basic idea | Merits | Demerits |
---|---|---|---|

Artificial Neural Network (ANN) | Capable of learning patterns from examples using various algorithms mimicking human learning | Suitable for diverse tasks of classification, clustering, forecasting, optimization and function approximation | Optimal parameter combination of a training algorithm is found by fine tuning. A lot of training data and time are needed. |

Evolutionary Computation (EC) | Imitates the Darwin’s principles of evolution in order to solve nonlinear, non-convex global optimization problems | Capable of finding the near global optimal solution of an nonlinear, non-convex function without getting entrapped in local minima | Convergence process is slow,while convergence to the global optimal solution is not guaranteed unless improved by a suitable direct search method. |

Fuzzy Logic (FL) | Fuzzy sets can model the imprecision and the ambiguity in the data. FL brings the human experiential knowledge into the model via suitable fuzzy mathematics. | Capable of deriving human understandable fuzzy ‘if-then’ rules; It has low computational complexity | Often, the selection of a membership function is not scientific and unique. |

Support Vector Machine (SVM) | Finds a hyperplane which separates the d-dimensional data perfectly into two classes. | Training is relatively easy and scales relatively well to high- dimensional data and yields the global optimal solution. | Need to choose a good kernel function |

Chaos Theory | Characterizes a dynamical system by transforming it into its equivalent phase-space. | It models underlying deterministic complex behavior in a system. | It is not clear how much data are required to construct the phase space set and susceptible to initial conditions. |

It is interesting to note that even though some past reviews (Bahrammirzaee, 2010; Cavalcante etal., 2016; Huang etal., 2004; Li and Ma, 2010; Mochon etal., 2008; Yu etal., 2007a) covered the use of intelligent techniques to financial time series prediction appeared in the literature, to the best of our knowledge, no study performed a comprehensive review of the use of hybrid intelligent techniques also known as soft computing techniques exclusively to the problem of FOREX rate prediction. Therefore, this paper attempts to fill that gap in the literature by presenting a comprehensive review of various Soft Computing hybrid forecasting models for FOREX rate prediction appeared during 1998–2017. This is necessitated because in other fields such as bankruptcy prediction (Ravi Kumar and Ravi, 2007; Verikas etal., 2010) and software engineering (Mohantyetal., 2010) such reviews caught the attention of the researchers and proved to be useful to them, while FOREX rate prediction area does not have a single such paper.

The objectives of the current review paper are as follows:

- 1.
To systematically analyze the current state-of-the-art of soft computing models for FOREX rate prediction.

- 2.
To identify gaps in the current research efforts towards involving hybrid intelligent forecasting models in FOREX rate prediction, which will hopefully stimulate fruitful research in new exciting and hitherto unexplored areas.

The remainder of this paper is organized as follows: various earlier reviews are presented in Section2. Section3 presents the overview of the review methodology. The Sections4–8 present ANN-based hybrids, EC-based hybrids, FL-based hybrids, SVM-based hybrids and Chaos-based hybrids respectively. Each of these sections describes corresponding hybrids. Section9 discusses overall observations and gaps found in the literature. Finally, Section10 presents various conclusions and future directions. Table2 presents various acronyms, Table3 presents the currency codes and Table4 presents the performance metrics used in the paper.

Table 2. Acronyms used in the paper.

Acronym | Interpretation | Acronym | Interpretation |
---|---|---|---|

AAE | Average Absolute Error | ICA | Independent Component Analysis |

ACO | Ant Colony Optimization | k-NN | k-Nearest Neighbor |

ANN | Artificial Neural Network | LSSVR | Least-Squares SVR |

APE | Absolute Percentage Error | Max AE | Maximum Absolute Error |

AR | Annualized Return | MACD | Moving Average Convergence/ Divergence |

ARMA | Autoregressive Moving Average | MAFE | Mean Absolute Forecast Error |

ARIMA | Autoregressive Integrated Moving Averages | MAPE | Mean Absolute Percentage Error |

ARMSE | Absolute Root Mean Square Error | MARS | Multivariate Adaptive Regression Splines |

BFO | Bacterial Foraging Optimization | MD/MDD | Maximum Drawdown |

BPNN/BPN | Back propagation Neural Network | MF | Membership Function |

BVAR | Bayesian Vector Autoregression | MKL | Multiple Kernel Learning |

CART | Classification and Regression Tree | ML | Machine Learning |

CC | Correlation Coefficient | MLP | Multi-layer Perceptron |

CD | Correct Downtrend | MRE | Mean Relative Error |

CDC | Conservative Dual- Criteria | MSE | Mean Squared Error |

CGP | Cartesian Genetic Programming | MTF | Multivariate Transfer Function |

CP | Correct Uptrend/ Correct Prediction | NMSE | Normalized Mean Square Error |

CSO | Cat Swarm Optimization | NRMSE | Normalized Root Mean Squared Error |

CTR | Correct Trend Rate | PCA | Principal Component Analysis |

DBN | Deep Belief Network | PNN | Probabilistic Neural Network |

DE | Differential Evolution | PSN | Psi Sigma Neural Network |

DENFIS | Dynamic Evolving Neuro-Fuzzy Inference System | RBF | Radial Basis Function |

Dstat | Directional change statistic | RE | Relative Error |

DWT | Discrete Wavelet Transform | RLSE | Recursive Least Squares Estimator |

EC | Evolutionary Computation | RMSE | Root Mean Squared Error |

ELM | Extreme Learning Machine | RNN | Recurrent Neural Network |

ES | Exponential Smoothing | RPNN | Ridge Polynomial Neural Network |

FBLMS | Forward Backward Least Mean Square | RSQ | R-Square |

EMD | Empirical Mode Decomposition | SC | Soft Computing |

FL | Fuzzy Logic | SMAPE | Symmetric MAPE |

GA | Genetic Algorithm | SNR | Signal to Noise Ratio |

GARCH | Generalized Auto Regression Conditional Heteroskedasticity | SVM | Support Vector Machine |

GLAR | Generalized linear auto-regression | SVR | Support Vector Regression |

GMDH | Group Method of Data Handling | TAFE | Total Absolute Forecast Error |

GMM | Generalized Method of Moments | TE | Total Error |

GP | Genetic Programming | VAR | Vector Autoregression |

GRNN | General Regression Neural Network | RW | Random Walk |

HMM | Hidden Markov Model | PSO | Particle Swarm Optimization |

IC | Independent Component | NSGA-II | Non-dominated Sorting GA-II |

QR | Quantile Regression | RF | Random Forest |

QRRF | Quantile Regression RF | LASSO | Least Absolute Shrinkage Selection Operator |

Table 3. Currency codes used in the paper.

Code | Currency |
---|---|

AUD | Australian Dollar |

CAD | Canadian Dollar |

CHF | Swiss Franc |

CNY | Chinese Yuan |

DEM | Deutsche Mark |

EUR | Euro |

FF | French Franc |

GBP | British Pound |

HKD | Hongkong Dollar |

INR | Indian Rupees |

IRR | Iranian Rial |

JPY | Japanese Yen |

KRW | Korean Won |

MOP | Macanese Pataca |

MXN | Mexican Peso |

MYR | Malaysian Ringgit |

NTD | New Taiwan Dollar |

PHP | Philippine Peso |

RMB | Yuan Renminbi |

ROL | Romanian Lei |

RUB | Russian Ruble |

SGD | Singapore Dollar |

USD | United States Dollar |

Table 4. Performance measures used.

Performance measure | Description |
---|---|

$SSE={\sum}_{t=1}^{N}{{e}_{t}}^{2}$ | SSE measures the sum of squared errors. |

Less value results in more accurate predictions. | |

$MSE=\frac{SSE}{N}$ | MSE measures the mean of squared errors. |

Less value results in more accurate predictions. | |

$RMSE=\sqrt{MSE}$ | RMSE measures the square root of mean of squared errors. |

Less value results in more accurate predictions. | |

$NMSE=\frac{1}{N}{\sum}_{t=1}^{N}\frac{{{e}_{t}}^{2}}{{({y}_{t}-\overline{Y})}^{2}}$ | NMSE measures the mean of normalized squared errors. |

Less value results in more accurate predictions. | |

$NRMSE=\sqrt{NMSE}$ | NRMSE measures the square root of mean of normalized squared errors. |

Less value results in more accurate predictions. | |

$MAD=\frac{{\sum}_{t=1}^{N}|{y}_{t}-\overline{Y}|}{N}$ | MAD measures the average distance between each data value and mean. |

Less value results in more accurate predictions. | |

$MAE=\frac{{\sum}_{t=1}^{N}\left|{e}_{t}\right|}{N}$ | MAE measures the mean of absolute errors. |

Less value results in more accurate predictions. | |

$MAPE=\frac{100}{N}{\sum}_{t=1}^{N}\left|\frac{{e}_{t}}{{y}_{t}}\right|$ | MAPE measures the mean of absolute errors in percentages. |

Less value results in more accurate predictions. | |

$U=\frac{\sqrt{\frac{1}{N}SSE}}{\sqrt{\frac{1}{N}{\sum}_{t=1}^{N}{{y}_{t}}^{2}}+\sqrt{\frac{1}{N}{\sum}_{t=1}^{N}{{\widehat{y}}_{t}}^{2}}}$ | Theil’s Inequality coefficient (U) measures the closeness of predictions to |

actual values. The value of U closer to zero results in | |

more accurate predictions. | |

$Dstat=\frac{1}{N}{\sum}_{t=1}^{N}{a}_{t}*100\%$ | |

where ${a}_{t}=\{\begin{array}{c}1,\phantom{\rule{0ex}{0ex}}\text{if}({y}_{t+1}-{y}_{t})*(\widehat{{y}_{t+1}}-\widehat{{y}_{t}})\ge 0\hfill \\ 0,\phantom{\rule{0ex}{0ex}}\text{otherwise}\hfill \end{array}$ | Dstat Measures the direction movement of financial variable. |

Higher value results in more accurate predictions. | |

$RE={\sum}_{t=1}^{N}\left|\frac{{e}_{t}}{{y}_{t}}\right|$ | RE measures the ratio between the absolute error and the actual data. |

Less value results in more accurate predictions. | |

$MRE=\frac{1}{N}\left(RE\right)$ | MRE measures the percentage of accuracy of predictions expressing it |

in a stricter way. Less the value more accurate the predictions. $CTR=\frac{1}{N}{\sum}_{t=1}^{N}\frac{{b}_{t}}{N}$ | |

where ${b}_{t}=\{\begin{array}{c}1,\phantom{\rule{0ex}{0ex}}\text{if}({y}_{t+1}-{y}_{t})*(\widehat{{y}_{t+1}}-{y}_{t})\ge 0\hfill \\ 0,\phantom{\rule{0ex}{0ex}}\text{otherwise}\hfill \end{array}$ | CTR measures the prediction effect of the algorithm. |

Higher the value more accurate the predictions are. | |

$TE={\sum}_{t=1}^{N}\left|{e}_{t}\right|$ | TE measures the total error. |

Less the value more accurate the predictions are. | |

$RSQ=1-\frac{{\sum}_{t=1}^{N}{{e}_{t}}^{2}}{{\sum}_{t=1}^{N}{({y}_{t}-\overline{Y})}^{2}}$ | RSQ measures how close the data are to the fitted regression line. |

$SNR=10*log\left(\frac{max\left({{y}_{t}}^{2}\right)*N}{SSE}\right)$ | SNR Measures how much noise is in the data. |

Less the value more accurate the predictions are. | |

$CC=\frac{N{\sum}_{t=1}^{N}{y}_{t}\widehat{{y}_{t}}-{\sum}_{t=1}^{N}{y}_{t}{\sum}_{t=1}^{N}\widehat{{y}_{t}}}{\sqrt{N{\sum}_{t=1}^{N}{{y}_{t}}^{2}-{\left({\sum}_{t=1}^{N}{y}_{t}\right)}^{2}}\sqrt{N{\sum}_{t=1}^{N}{\widehat{{y}_{t}}}^{2}-{\left({\sum}_{t=1}^{N}\widehat{{y}_{t}}\right)}^{2}}}$ | CC Measures the capability of predicted series whether it follows |

the upward or downward jumps same as actual series. | |

A CC value near 1 shows that both have same jumps. However, a negative CC sign points out that the predicted series follows the same ups or downs of the actual series with a negative mirroring. |

*y _{t}*=Actual observation at time t; $\widehat{{y}_{t}}$=Predicted value at time t; ${e}_{t}\phantom{\rule{0ex}{0ex}}=\phantom{\rule{0ex}{0ex}}{y}_{t}\phantom{\rule{0ex}{0ex}}-\phantom{\rule{0ex}{0ex}}\widehat{{y}_{t}}$;$\overline{Y}$=Mean of actal observations

View article

Read full article

URL:

https://www.sciencedirect.com/science/article/pii/S0305054818301436

## Enhancing transportation systems via deep learning: A survey

Yuan Wang, ... Loo Hay Lee, in Transportation Research Part C: Emerging Technologies, 2019

### 6.5 Tips for DL model design

We have reviewed the technology evolving trend for the tasks of time series prediction and observed that they follow similar patterns. Initially, simple DNN models are applied. Afterwards, CNN or LSTM models are used for improvement. Finally, hybrid models are proposed and achieve state-of-the-art performance. It is also well recognized that CNN models are particularly effective to process image data; LSTM and GRU are more effective in extracting useful features from sequential data. In certain scenarios, these two types of models can be integrated in an end-to-end network to improve the accuracy. Finally, attention mechanism has been shown very effective in many applications (Vaswani et al., 2017). It can be conveniently integrated with existing deep learning models. Unfortunately, we rarely observe the usage of attention mechanism in ITS and this is a direction that is worth exploration.

When there lacks sufficient amount of training data, overfitting is rather common in DL models because they are too many parameters that require training. To resolve the issue, a useful strategy is to apply dropout (Srivastava et al., 2014) that randomly ignores parameter update in certain neurons during the training phase. Another solution is to apply regularization with L1 or L2 norm on the weight parameters of the network.

View article

Read full article

URL:

https://www.sciencedirect.com/science/article/pii/S0968090X18304108

## Electrical load forecasting models: A critical systematic review

Corentin Kuster, ... Monjur Mourshed, in Sustainable Cities and Society, 2017

### 2.3.4 Support vector machine

Support Vector Machines have been first introduced by Vladimir Vapnik with a paper at the COLT 1992 conference (Boser, Guyon, & Vapnik, 1992). Then, in 1995, the soft margin classifier was introduced by Cortes and Vapnik in the paper Support Vector Networks (Cortes & Vapnik, 1995). Originally, SVMs were created to deal with pattern classification problems like character recognition, face identification and text classification. In 1995, Vladimir Vapnik extends SVM to a regression algorithm in his book, The Nature of Statistical Learning Theory (Cherkassky, 1997). Over the years various applications were found in the literature; e.g. time series prediction problem. The purpose of an SVM is to create an optimal separating hyperplane in a higher dimensional feature space such that subsequent observations can be classified into separate subsets. In practice, real data are not as perfectly separable. In order to provide a hyperplane, one has to relax the requirement that a separating hyperplane will perfectly separate every training observation. For that, a soft margin classifier (SVC) has been constructed. In the case of non-linear boundaries, the use of SVM is convenient (Auria & Moro, 2008). Indeed, the SVM allows non-linear decision boundaries by using an appropriate transformation that makes them linear on a higher dimensional feature space. Unfortunately, computation on high dimension feature space can be very costly and SVM depend a lot on the proper selection of the hyper-parameters (Adhikari & Agrawal, 2013). To improve the computation efficiency, a solution also called the “Kernel trick” is used (Adhikari & Agrawal, 2013). Kernels are functions used to represent inner products between observations rather than observations themselves. Thus, it modifies how we calculate “similarity” between two observations in a more flexible way, allowing to change and solve a non-linear problem by a linear problem on a higher-dimensional space.

View article

Read full article

URL:

https://www.sciencedirect.com/science/article/pii/S2210670717305899

## A review on renewable energy and electricity requirement forecasting models for smart grid and buildings

Tanveer Ahmad, ... Biao Yan, in Sustainable Cities and Society, 2020

### 2.1.4 Electricity requirement forecasting models

Future electricity requirement prediction, also identified as energy electricity requirement prediction or load prediction, is not a novel theory. The primary study on electricity requirement prediciton can back to history in 1965 (Heinemann, Plant, & Nordman, 1966). Reliable electricity requirement prediction is essential such as critical economic importance (Bunn, 2000). In Ref. (Hobbs, 1999), with 1 % decrease in the prediciton for mean absolute percentalge error (MAPE), 10,000 MW h energy can be conserved, it explains that a precise energy method may conserve up to 1.6 $ million in a specific fiscal year. Electricity requirement prediction is doing a significant part in system operations, energy production, transmission, and storage, while accurate prediciton is shifting considerable attention of energy planning and management (Khuntia, Rueda, & van der Meijden, 2016). However, more studies also concentrated on how to increase the accuracy and robustness of energy demand prediction.

#### 2.1.4.1 Machine learning models

Electricity requirement prediction technologies based on ML models are extensively applied in the area of applied energy, like wind energy and wind speed prediction (Meng, Ge, Yin, & Chen, 2016), load demand and peak demand prediciton (Hernandez et al., 2014), building load requirement forecasting (Ahmad, Chen, & Shair, 2018), cooling load prediction (Ahmad & Chen, 2018b) and so on. The heating, ventilation and air conditioning (HVAC) system in the commercial sector accounts for about 40 % of real-time energy expenditure, specifically for subtropical regions, therefore the fundamental tools to increase the building energy performance, reliability and accuracy (Fan, Xiao, & Wang, 2014). The rapid advancement in the ML approach do this an efficient system of univariate time-series prediction, and the basic challenge extends in the choice and selection of prediction techniques (Bontempi, Ben Taieb, & Le Borgne, 2013).

Current studies of load prediction give an overview of the present recent forecasting classifications and their further distribution in different sectors. Studies (Deb, Zhang, Yang, Lee, & Shah, 2017) has analyzed and compared the forecasting accuracy results of prior researches like (Deb et al., 2017; Zhao & Magoulès, 2012). They investigated the nine general prediction approaches which comprised of the ML platform. Magoules and Zhao (2012) reviewed the proposed classifications for forecasting energy load demand, including complicated and complex statistical techniques, AI approaches and engineering-based techniques (Zhao & Magoulès, 2012). Though, from these studies, analyzed that the past studies are based on forecasting analysis results but including the different kinds of database. The accuracy of these models is further presented in Table 9.

Table 9. Short-term electricity forecasting with machine learning models.

Sr. No. | Models | Advantages | Year | Region | Ref. | Performance evaluation statistics | ||||
---|---|---|---|---|---|---|---|---|---|---|

MAE | CV | MAPE | RMSE | R | ||||||

1 | Grey wolf optimization, Least-squares SVM | A practical technique that can increase the prediciton performance remarkably | 2019 | Australia | (Yang, Li, & Yang, 2019) | 32.20 | – | 0.555 | – | 0.99 |

2 | Autoregressive integrated moving average | The model can gain the various characteristics connected with power load | 2018 | Australia | (Zhang, Wei, Li, Tan, & Zhou, 2018) | 113.6 | – | – | 1.42 % | – |

3 | CDT, FitcKnn, LRM, and Stepwise-LRM | Facilities for investment management by power companies, commercial and industrial consumers | 2018 | Beijing, China | (Ahmad & Chen, 2018c) | 1.67 | – | 0.03 % | 15.10 % | – |

4 | Machine learning models | Support consumer in energy planning and management | 2018 | New Taipei City, Taiwan | (Chou & Tran, 2018) | 0.02 | – | 15.65 % | 0.09 % | 0.79 |

5 | An artificial neural network with nonlinear autoregressive exogenous multivariable Inputs, Multiple linear regression model, AdaBoost | Appearances or accuracy are shown to be promising | 2018 | ISO New England | (Ahmad & Chen, 2018a) | – | 4.99 % | 0.01 % | – | – |

6 | Chaos–support vector regression | The developed algorithm could also be applied in different forecasting areas | 2019 | – | (Xuan et al., 2019) | – | 2.44 % | 3.89 % | – | 0.74 |

7 | Extreme learning machine model | Useful mapping capability and can adequately handle with a considerable variety of designing and mapping difficulties | 2018 | – | (Chen, Kloft, Yang, Li, & Li, 2018) | 71.00 | – | 0.92 % | 83.93 % | – |

8 | Machine learning, Support vector regression, Regression trees | Capability to find world energy minima rather than local minima in the forecasting solution space, higher speed and accuracy | 2017 | University of New South Wales | (Yildiz, Bilbao, & Sproul, 2017) | – | – | 1.04 % | – | 0.99 |

9 | Recurrent extreme learning machine | Higher potential to be used in modeling dynamic systems efficiently | 2016 | Portugal | (Ertugrul, 2016) | – | – | – | 0.02 % | – |

10 | Machine Learning algorithms | Better and higher performance | 2018 | Canada | (Saloux & Candanedo, 2018) | – | – | – | 2.9 % – 3.9 % | – |

11 | Nadaraya–Watson Kernel density estimator approach | Captures one of the higher reliability indicator numbers | 2018 | Spain and Portugal | (Monteiro, Ramirez-Rosado, Fernandez-Jimenez, & Ribeiro, 2018) | 5.55 | – | – | – | – |

12 | Machine learning model | Big-data can be consolidated in the forecasting approach to increase the performance and interpret complicated data analytic issues | 2016 | United States | (Naimur Rahman, Esmailpour, & Zhao, 2016) | – | – | 4.13 % | – | – |

13 | Supervised based machine learning models | Higher speed, higher accuracy, the stability of the network | 2018 | ISO New England | (Ahmad, Chen Huang et al., 2018) | – | 1.60 % | 0.98 % | – | – |

14 | Stepwise regression, nonlinear autoregressive model | Algorithms are guaranteed the precise design and network operation of the different distributed energy system operations | 2019 | ISO New England | (T. Ahmad & Chen, 2019) | – | 4.53 % | 3.19 % | 402 % | – |

15 | Deep learning, Multi-modal | Calculate the energy tendency more perfectly with lower errors | 2018 | New York City | (Tong et al., 2018) | – | – | 1.72 % | – | – |

16 | Multiple linear regression | Precise cooling prediction in the building sector | 2018 | Beijing, China | (Ahmad, Chen Shair, 2018) | 0.73 | – | 1.70 % | 0.78 % | – |

17 | Multivariate adaptive regression spline | Helpful scientific tools for the different investigation of real-time power requirement data prediciton | 2018 | Queensland, Australia | (Al-Musaylh, Deo, Adamowski, & Li, 2018) | 0.76 | – | – | 0.995 | – |

18 | Chaos-SVR, WD-SVR, SVR and BP | Several features which are suitable for various kinds of cooling load in time series | 2017 | (Xuan, Zhubing, Liequan, Junwei, & Dongmei, 2017) | – | – | – | – | 0.85 | |

19 | Artificial fish swarm and gene expression programming | The benefits in mean time-consumption, the mean number of convergences, higher predicted efficiency and most top parallel performance in scale-up and speedup | 2018 | Multiple locations | (Deng, Yuan, Yang, & Zhang, 2018) | – | – | 3.69 % | – | – |

20 | Gaussian process regression | Models are useful in forecasting the abnormal behavior in the datasets as well as cooling energy requirement prediciton | 2019 | Beijing, China | (Ahmad, Chen, Shair, & Xu, 2019) | 0.19 | 2.05 % | 2.59 % | – | 0.99 |

#### 2.1.4.2 Ensemble-based approaches

Ensemble-based approaches consist of a different number of ensemble methods (ELMs) used for model training. It is observed that the ensemble approaches possess the strengths in term of unique methods for increased robustness and accuracy. Many ensemble techniques have been proposed for short-term energy prediction (Abdel-Aal, 2005; Brown, Wyatt, & Tino, 2005; Taylor & Buizza, 2002). For instance, in Ref. (De Felice & Yao, 2011), the negative regularized correlation learning approach was applied to improve the forecastability of the ensemble network.

Ensemble learning techniques, which achieve higher prediction accuracy and efficiency by strategically connecting recurring learning models, has been extensively used in different research areas including time-series forecasting, regression and pattern classification. Dietterich has presumed three primary causes for the success and achievement rate of ensemble techniques: representational, computational and statistical characteristics (Dietterich, 2007). Furthermore, the decomposition of bias-variance (Geman, Bienenstock, & Doursat, 1992) and strong relationship also demonstrate why ensemble models have higher accuracy and efficiency than their non-ensemble classifications. Between the many ensemble techniques (EMD based AdaBoost-BPNN method for wind speed forecasting, 2014; Hu, Bao, & Xiong, 2014; Qiu, Zhang, Ren, Suganthan, & Amaratunga, 2014; Ren, Suganthan et al., 2016; Wei, 2016), conquer and divide (Radhakrishnan, Kolippakkam, & Mathura, 2007) is a theory which holds models which frequently used in the time series prediction. The wavelet transform approach is generally applied in time-series decomposition model. It disintegrates the primary time series into several orthonormal subseries from seeing at a different domain of frequency-time in the network (Benaouda, Murtagh, Starck, & Renaud, 2006) use multistate decomposition wavelet-based nonlinear approach for energy demand prediction. Adaptive wavelet ANNs applied for short-term prediciton with the feed-forward network and different hidden layers with neurons. Table 10 shows the short-term electricity forecasting with ensemble-based approaches. It can be observed that the ensemble-based empirical model decomposition and deep learning-based ensemble models widely used in different kinds of research including classification and time series analysis.

Table 10. Short-term electricity forecasting with ensemble-based approaches.

Sr. No. | Models | Advantages | Year | Region | Ref. | Performance evaluation statistics | ||||
---|---|---|---|---|---|---|---|---|---|---|

MAE | CV | MAPE | RMSE | R | ||||||

1 | Partial least squares regression approach, Extreme learning machine | The different numerical results determine that the developed approaches can substantially increase prediction accuracy | 2016 | ISO New England | (Li, Goel, & Wang, 2016) | – | – | 1.14 % | – | – |

2 | Ensemble method | Achieve better prediction results in contrast with different state-of-art standard approaches | 2016 | ISO New England | (Li, Wang, & Goel, 2016) | – | – | 0.91 % | – | – |

3 | Deep learning, Ensemble method | Fault reliability and prediction is higher in real time applications | 2014 | National Aeronautics and Space Administration | (Qiu et al., 2014) | – | 0.11 % | 27.33 % | 0.16 % | – |

4 | Empirical mode decomposition, Deep learning ensemble method | Widely used in different research areas including pattern classification, time-series, and regression prediciton | 2017 | Australia | (Qiu et al., 2017) | 266.58 | – | 3.00 % | – | – |

5 | Autoregressive integrated moving average | Benefits of some predictive algorithms to obtain reliable results | 2016 | Iran | (Barak & Sadegh, 2016) | 12.59 | – | – | 15.74 % | – |

6 | Random forests, Gradient boosting regression trees | Gradient boosting and random forests trees can be suitable for energy prediciton applications and yield actual results | 2015 | Burlington, Concord, Portland, Boston, Bridgeport | (Papadopoulos & Karakatsanis, 2015) | – | – | 1.97 % | 270.6 % | – |

7 | Generalizable approach | Ensemble approach is proficient of incorporating complicated forecasters | 2015 | California | (Burger & Moura, 2015) | – | – | 7.5 % | – | – |

8 | Ensemble empirical mode decomposition | Great generalization capability, Higher training accuracy and speed and a better balance of error | 2018 | Jiangsu, China | (Li, Tao, Ao, Yang, & Bai, 2018) | 26,765 | – | 5.31 % | 22,3.0 % | – |

9 | Ensemble learning | Ensemble learning approach an accurate and convenient method to forecast household energy usage requirement | 2018 | United States | (Chen, Jiang, Zheng, & Chen, 2018) | – | – | – | 1562 % | 0.16 |

10 | Ensemble Kalman filter | The efficiency of developed algorithms is substantially higher than the present state-of-art approaches | 2016 | Japan | (Takeda, Tamura, & Sato, 2016) | – | – | 1.86 % | – | – |

11 | Evolutionary algorithms, Multi-objective optimization | Results show reliability and higher accuracy of used models | 2018 | New Zealand | (Peimankar, Weddell, Jalal, & Lapthorn, 2018) | – | – | 0.02 % | 0.09 % | – |

12 | AdaBoost ensemble model | Don't require a pre-assumed form of the method; higher nonlinear mapping capability; can solve the complex nonlinear problems | 2018 | China | (Xiao, Li, Xie, Liu, & Huang, 2018) | – | – | 1.20 % | 0.46 % | – |

13 | Ensemble learning, Robust regression | Conceptual advantages of ensemble learning, relying on the need for diversity within different kinds of network datasets | 2018 | France | (Alobaidi, Chebana, & Meguid, 2018) | – | – | 11.39 % | 296.34 % | – |

14 | Ensemble forecasting, Echo state network | Boosting models more appropriate for unstable time-series prediction | China | (Wang, Lv, & Zeng, 2018) | 4.18 | – | 4.69 % | 6.48 % | – | |

15 | Ensemble approach | Can be used immediately to HVACs to tackle the time-lag issue | 2018 | Hong Kong | (Wang, Lee, & Yuen, 2018) | – | – | – | – | 0.86 |

16 | Ensemble learning, Bagging trees | Can be used for the real-time energy networks such as system fault detection and diagnosis | 2018 | Gainesville, Florida | (Wang, Wang, Srinivasan, 2018) | – | – | 3.68 % | 1.91 % | 0.89 |

#### 2.1.4.3 Artificial neural networks

A considerable number of researches have been carried on load demand forecasting, and various methods, like autoregressive integrated moving average (AIMA), SVM, and ANNs have been introduced to solve such kinds of complex problems. For places where no accurate technique has been recognized, the combination of energy prediction has been one of the most effective, essential and successful study aspects used since it is the introduction part by Granger and Bates (Bates & Granger, 1969) in earlier 1960s. However, defining these determinants is a vital and challenging issue for the ANNs applications. Therefore, the systematic approach prevails unavailable; various heuristic methods have been developed in different literature such as presented in (Aladag, 2011; Aladag, Egrioglu, Gunay, & Basaran, 2010; Anders & Korn, 1999; Aras & Kocakoç, 2016; Egrioglu, Yolcu, Aladag, & Bas, 2015; Heravi, Osborn, & Birchenhall, 2004; Lachtermacher & Fuller, 1995).

A substantial number of researches have studied the building energy forecast applying various computational intelligence techniques. In the field of building load, the ANNs recognized as the common favourite option for forecasting load demand in the buildings sector (Ahmad, Mourshed, & Rezgui, 2017). The ANNs was applied to estimate the load demand for the passive solar house in reference (Kalogirou & Bojic, 2000). A single backpropagation ANN for building short-term energy prediction was applied by Gonzales et al. (González & Zamarreño, 2005). A customary regression-based ANNs was applied to predict the cooling energy demand which associated with energy usage for three main buildings (Ben-Nakhi & Mahmoud, 2004). Four forecasting models consist of the conventional back-propagation ANNs, general regression ANNs, the radial function and SVM were applied to forecast the one-hour cooling energy demand of a building located in China (Li, Meng, Cai, Yoshino, & Mochida, 2009). Several other studies have concentrated on ANNs for short-term energy prediciton (Hippert, Pedreira, & Souza, 2001; Rodrigues, Cardeira, & Calado, 2017). The forecasting results from these studies explain that the ANNs approaches a comparatively efficient method to predict the short-term energy demand for commercial buildings and homes. Table 11 explicates the advantages of ANNs. It also presents the that the ANNs improves the stability and accuracy of forecasts with simplicity and higer perfroamnce.

Table 11. Short-term electricity forecasting with artificial neural networks.

Sr. No. | Models | Advantages | Year | Region | Ref. | Performance evaluation statistics | ||||
---|---|---|---|---|---|---|---|---|---|---|

MAE | CV | MAPE | RMSE | R | ||||||

1 | Least absolute shrinkage and selection operator, quantile regression Neural network, probability density forecasting | Can not only higher get the high-dimensional of the data in energy demand prediciton, but also give more accurate results | 2019 | Guangdong province, China | (Yaoyao He, Qin, Wang, Wang, & Wang, 2019) | – | – | – | 0.16 % | – |

2 | Probabilistic load forecasting, Neural networks | Very precise predictions among the top machine learning models | 2019 | ISO New England | (Dimoulkas, Mazidi, & Herre, 2018) | – | – | 2.54 % | 26.2 % | – |

3 | Artificial neural network | An effective method to calculate the short-term load for commercial and homes buildings | 2018 | Japan | (Yuan, Farnham, Azuma, & Emura, 2018) | – | – | – | – | 0.99 |

4 | Neural networks | Specify models parsimoniously at a lower computational cost | 2018 | Egypt | (Tealab, 2018) | – | – | – | – | |

5 | Neural networks-based linear ensemble framework | Attempting to determine the familiar overfitting issue of the networks | 2018 | Northern Canada | (Wang, Wang, Qu, & Liu, 2018) | −0.49 | – | – | – | – |

6 | Back-propagation (BP) neural network | Improves the stability and accuracy of forecasts, and it’s appropriate for the short-term forecasting | 2018 | China | (Ye & Kim, 2018) | – | – | – | 659.4 % | – |

7 | Wavelet transform with best basis selection | It decreases the dimensionality of data without losing relevant information | 2016 | Australia & Spanish | (Rana & Koprinska, 2016) | 23.58 | – | 0.26 % | – | – |

8 | Deep neural network | Models are quite adjustable and can be used to other time-series forecast tasks | 2017 | China | (W. He, 2017) | 99.41 | – | 1.34 % | – | – |

9 | Deep belief networks, Restricted Boltzmann machines | Concentrate on the parameters that are very important for the network output and neglect the learning rate that has a small influence on the output | 2016 | Macedonia | (Dedinec, Filiposka, Dedinec, & Kocarev, 2016) | 8.6 % | ||||

10 | Time series forecasting | Simplicity, higher accuracy | 2018 | University of Granada, Granada, Andalucía, Spain | (Ruiz, Rueda, Cuéllar, & Pegalajar, 2018) | – | – | – | – | – |

11 | Artificial Neural Network, Bayesian regularisation | Algorithm with adaptive training classifications is intelligent of predicting the power consumption | 2016 | International Business Machines Building | (Chae, Horesh, Hwang, & Lee, 2016) | – | 9.35 % | – | – | – |

12 | The artificial neural network, COCO framework | Give the advantage of shorter training time | 2018 | New Pool England | (Singh & Dwivedi, 2018) | -- | 3.28 % | – | – | |

13 | Forecast neural network, CID-STNN forecasting model | The models have a substantial power to prepare unconstrained issue | 2018 | West Texas | (Cen & Wang, 2018) | 0.89 | – | 1.21 % | – | – |

14 | Artificial neural networks | Higher performance and forecasting accuracy | 2018 | Multiple locations | (Ahmad, Chen, Guo, & Wang, 2018) | 21.45 | – | – | – | – |

15 | Artificial intelligence approaches | Ability to generalization and construct-in cross-validation and low sensitivity to variable costs | 2017 | France | (Mordjaoui, Haddad, Medoued, & Laouafi, 2017) | 3.26 % | 2604.4 % | |||

16 | Artificial neural networks | Calculate the maximum peak load and minimum off-peak order with greater efficiency | 2018 | Taiwan | (Hsu, Tung, Yeh, & Lu, 2018) | – | – | 1.90 % | – | – |

17 | Combined forecasting method, BP, ANFIS, diff-SARIMA | The methods are effective to decrease errors and increase the performance between the forecasted and real time load consumption effectively | 2016 | (Yang, Chen, Wang, Li, & Li, 2016) | 139.65 | – | 1.59 % | – | – | |

18 | Artificial neural networks, Ensemble neural networks | Higher prediction accuracy and performance as they can accurately algorithm the highly non-linear correlation | 2017 | New England | (Khwaja, Zhang, Anpalagan, & Venkatesh, 2017) | – | – | 1.99 % | – | – |

19 | Fruit fly optimization model, General regression ANNs | This kind of methods present the full play to the benefits from every single approach | 2019 | Langfang, China | (Liang, Niu, & Hong, 2019) | 7.35 | – | 0.80 % | 9.58 % | – |

View article

Read full article

URL:

https://www.sciencedirect.com/science/article/pii/S2210670720300391