Current students

SANGIORGIO MATTEO | Cycle: XXXIII |

Section:

**Systems and Control**

Tutor:

**DERCOLE FABIO**

Advisor:

**GUARISO GIORGIO**

__Major Research topic__:

**Multi-step forecasting of chaotic dynamics with deep neural networks**

*Abstract:*

In the last few decades, many attempts to forecast the evolution of chaotic systems, and to discover how far into the future they can be predicted, have been done adopting a wide range of models. Some early attempts were performed in the 80s and 90s, but the topic became more and more debated in recent years, due to the development of lots of machine learning techniques in the field of time series analysis and prediction.

Forecasting chaotic dynamics one or few steps ahead is usually an easy task as demonstrated by the high performances obtained on many systems, both time-continuous and discrete.The situation dramatically changes when considering a long-term horizon due to the fact that infinitesimal errors will lead to a completely different evolution of the system even when one knows the actual model of the chaotic system. The most widely used models are artificial neural networks, which can be divided into those that present a feed-forward and fully connected structure and those that include recurrent neurons. The first are static approximators capable of reproducing the relation between input and output, in principle with arbitrary accuracy. When adopting these models, the forecasting of a chaotic time series over a multi-step horizon is commonly done by recursively performing one-step-ahead predictions (recursive predictor). A possible alternative consists of training the model to directly compute multiple outputs (multi-output predictor), each representing the prediction at a certain time step in the future. Other possible forecasting strategies have been proposed in the literature, such as identifying a specific model for each future step (multi-model predictor).Each of these forecasting methods has its own weakness. The recursive one is optimized only to predict one step in the future, and thus its performance is not guaranteed on mid-long-terms, in particular, when considering a chaotic dynamics.The multi-output predictor takes into account the whole forecasting horizon: each neuron in the output layer focuses on the forecast of the considered variable at a different time step. The main issue with this architecture is that we are not able to specify that the outputs are sequential (i.e., the same variable at different time steps). In fact, the model acts as the outputs are independent variables, rather than the same variable sampled at subsequent steps. In addition, the mapping between input and output briefly becomes complex when taking into account a high number of steps ahead.This last issue affects also the multi-model predictor, which also require a huge computational effort (i.e. train a specific model for each time step to forecast).To overcome these critical aspects, it is necessary to adopt a neural model which is able to deal with the temporal dynamics of a certain variable (or of many variables): the recurrent neural networks (RNNs). Recurrent neurons (LSTM cell) have been demonstrated to be efficient when used as basic blocks to build up sequence to sequence architectures. This kind of structure outperforms the state of the art approaches when applied on many sequence tasks (e.g., natural language processing). The RNNs are almost always trained using a technique known as “teacher (or professor) forcing”. It consists of using the ground truth as the input for each time step, rather than the output predicted by the network at the previous step. It has been demonstrated that this technique is necessary when considering natural language processing related task, and it is currently always adopted even in numerical time series prediction.Training with teacher forcing does not allow the network to correct small errors because, during the training phase, the prediction at a certain time step does not affect future predictions. In principle, this can lead to a situation that is somehow similar to that of the feed-forward recursive predictor.We thus proposed to adopt a recurrent architecture, and to train it without teacher forcing. Coupling these two elements solves at the same time the drawbacks of the recursive, multi-output predictors, and LSTM with teacher forcing. First, this structure is trained to reproduce the entire set of output variables. Second, it explicitly takes into account that these outputs represent the same variable computed at consecutive time steps. Third, small prediction errors propagate along the predicted sequence during training, and thus the training process should be able to correct them. We tested the capability of the neural predictors on three well-known chaotic systems: the logistic and the Hénon maps, the prototypes of chaos in non-reversible and reversible systems, and the generalized Hénon map, as a case of hyperchaos. All the predictors have been trained exclusively on noise-free data generated by chaotic oscillators, without taking advantage of any physical knowledge on the systems.

The obtained results show that LSTM nets trained without teacher forcing are able to efficiently couple the strengths of all the benchmark competitors, and provide the best performances in terms of predictive power on all the considered chaotic attractors. The recursive predictor can mimic almost perfectly the behavior of the real system for 5 Lyapunov times. The multi-output feed-forward predictor provides poor performances after just 2 Lyapunov times. LSTMs trained with teacher forcing predict almost perfectly for 7 Lyapunov times the logistic map behavior. Its performances suffer the increasing of the dimension (i.e. the number of state variables) of the system. For instance, it has an accuracy similar to that obtained by the recurrent predictor on a third-order model (the generalized Hénon map).Training the LSTM without teacher forcing strongly mitigates the effect of the system dimension, and allows predicting with high precision for more than 7 Lyapunov times the three chaotic dynamics considered.The results obtained seems to be robust since they are in accordance with those obtained by other recent works on the prediction of the Kuramoto-Sivashinsky system using reservoir computing. We also proved that LSTM architectures are more robust than the feed-forward nets even when a redundant number of time lags are included in the input; a feature that makes our approach suitable to work on real data.

The availability of accurate multi-step ahead predictions is extremely important also to enhance the performances of control systems. This is particularly true for control schemes adopting a receding horizon, such as model predictive control, and in dealing with complex systems for which a model-free or hybrid knowledge-learning-based approach is necessary. Traditionally adopted in process control in several fields (for instance chemical plants, oil refineries, and power system balancing), model predictive control is jointly used with machine learning in many applications, such as the control of smart grids and autonomous vehicles.

Forecasting chaotic dynamics one or few steps ahead is usually an easy task as demonstrated by the high performances obtained on many systems, both time-continuous and discrete.The situation dramatically changes when considering a long-term horizon due to the fact that infinitesimal errors will lead to a completely different evolution of the system even when one knows the actual model of the chaotic system. The most widely used models are artificial neural networks, which can be divided into those that present a feed-forward and fully connected structure and those that include recurrent neurons. The first are static approximators capable of reproducing the relation between input and output, in principle with arbitrary accuracy. When adopting these models, the forecasting of a chaotic time series over a multi-step horizon is commonly done by recursively performing one-step-ahead predictions (recursive predictor). A possible alternative consists of training the model to directly compute multiple outputs (multi-output predictor), each representing the prediction at a certain time step in the future. Other possible forecasting strategies have been proposed in the literature, such as identifying a specific model for each future step (multi-model predictor).Each of these forecasting methods has its own weakness. The recursive one is optimized only to predict one step in the future, and thus its performance is not guaranteed on mid-long-terms, in particular, when considering a chaotic dynamics.The multi-output predictor takes into account the whole forecasting horizon: each neuron in the output layer focuses on the forecast of the considered variable at a different time step. The main issue with this architecture is that we are not able to specify that the outputs are sequential (i.e., the same variable at different time steps). In fact, the model acts as the outputs are independent variables, rather than the same variable sampled at subsequent steps. In addition, the mapping between input and output briefly becomes complex when taking into account a high number of steps ahead.This last issue affects also the multi-model predictor, which also require a huge computational effort (i.e. train a specific model for each time step to forecast).To overcome these critical aspects, it is necessary to adopt a neural model which is able to deal with the temporal dynamics of a certain variable (or of many variables): the recurrent neural networks (RNNs). Recurrent neurons (LSTM cell) have been demonstrated to be efficient when used as basic blocks to build up sequence to sequence architectures. This kind of structure outperforms the state of the art approaches when applied on many sequence tasks (e.g., natural language processing). The RNNs are almost always trained using a technique known as “teacher (or professor) forcing”. It consists of using the ground truth as the input for each time step, rather than the output predicted by the network at the previous step. It has been demonstrated that this technique is necessary when considering natural language processing related task, and it is currently always adopted even in numerical time series prediction.Training with teacher forcing does not allow the network to correct small errors because, during the training phase, the prediction at a certain time step does not affect future predictions. In principle, this can lead to a situation that is somehow similar to that of the feed-forward recursive predictor.We thus proposed to adopt a recurrent architecture, and to train it without teacher forcing. Coupling these two elements solves at the same time the drawbacks of the recursive, multi-output predictors, and LSTM with teacher forcing. First, this structure is trained to reproduce the entire set of output variables. Second, it explicitly takes into account that these outputs represent the same variable computed at consecutive time steps. Third, small prediction errors propagate along the predicted sequence during training, and thus the training process should be able to correct them. We tested the capability of the neural predictors on three well-known chaotic systems: the logistic and the Hénon maps, the prototypes of chaos in non-reversible and reversible systems, and the generalized Hénon map, as a case of hyperchaos. All the predictors have been trained exclusively on noise-free data generated by chaotic oscillators, without taking advantage of any physical knowledge on the systems.

The obtained results show that LSTM nets trained without teacher forcing are able to efficiently couple the strengths of all the benchmark competitors, and provide the best performances in terms of predictive power on all the considered chaotic attractors. The recursive predictor can mimic almost perfectly the behavior of the real system for 5 Lyapunov times. The multi-output feed-forward predictor provides poor performances after just 2 Lyapunov times. LSTMs trained with teacher forcing predict almost perfectly for 7 Lyapunov times the logistic map behavior. Its performances suffer the increasing of the dimension (i.e. the number of state variables) of the system. For instance, it has an accuracy similar to that obtained by the recurrent predictor on a third-order model (the generalized Hénon map).Training the LSTM without teacher forcing strongly mitigates the effect of the system dimension, and allows predicting with high precision for more than 7 Lyapunov times the three chaotic dynamics considered.The results obtained seems to be robust since they are in accordance with those obtained by other recent works on the prediction of the Kuramoto-Sivashinsky system using reservoir computing. We also proved that LSTM architectures are more robust than the feed-forward nets even when a redundant number of time lags are included in the input; a feature that makes our approach suitable to work on real data.

The availability of accurate multi-step ahead predictions is extremely important also to enhance the performances of control systems. This is particularly true for control schemes adopting a receding horizon, such as model predictive control, and in dealing with complex systems for which a model-free or hybrid knowledge-learning-based approach is necessary. Traditionally adopted in process control in several fields (for instance chemical plants, oil refineries, and power system balancing), model predictive control is jointly used with machine learning in many applications, such as the control of smart grids and autonomous vehicles.