|SANGIORGIO MATTEO||Cycle: XXXIII |
Section: Systems and Control
Tutor: DERCOLE FABIO
Advisor: GUARISO GIORGIO Major Research topic
:Deep learning in forecasting chaotic dynamicsAbstract:
In the last few decades, many attempts to forecast the evolution of chaotic systems, and to discover how far into the future they can be predicted, have been done adopting a wide range of models. Some early attempts were performed in the 80s and 90s, but the topic became more and more debated in recent years, due to the development of lots of machine learning techniques in the field of time series analysis and prediction. Forecasting chaotic dynamics one or a few steps ahead is usually an easy task, as demonstrated by the high performances obtained on many systems, both time-continuous and discrete.The situation dramatically changes when considering a long-term horizon because infinitesimal errors lead to a completely different evolution of the system even when one knows the actual model of the chaotic system. The most widely used prediction tools are artificial neural networks, which can be divided into those that present a feed-forward and fully connected structure and those that include recurrent neurons.The first are static approximators capable of reproducing the relation between input and output, in principle with arbitrary accuracy. When adopting these models, the forecasting of a chaotic time series over a multi-step horizon is commonly done by recursively performing one-step-ahead predictions (recursive predictor). A possible alternative consists of training the model to directly compute multiple outputs (multi-output predictor), each representing the prediction at a specific time step in the future. Both the forecasting methods has its weakness. The recursive one is optimized only to predict one step in the future Thus its performance is not guaranteed on mid-long-terms, in particular, when considering chaotic dynamics.The multi-output predictor takes into account the whole forecasting horizon: each neuron in the output layer focuses on the forecast of the considered variable at a different time step. The main issue with this architecture is that we are not able to specify that the outputs are sequential (i.e., the same variable at different time steps). The model acts as if the outputs were independent variables, rather than the same variable sampled at subsequent steps. In addition, the mapping between input and output becomes complex when taking into account a high number of steps ahead.To overcome these critical aspects, it is necessary to adopt a neural model that is able to deal with the temporal dynamics of a certain variable (or of many variables): the recurrent neural networks (RNNs).Recurrent neurons (LSTM cell) have been demonstrated to be efficient when used as basic blocks to build up sequence to sequence architectures. This kind of structure represents the state-of-the-art approach in many sequence tasks (e.g., natural language processing).The RNNs are almost always trained using a technique known as “teacher forcing”. It consists of using the ground truth as the input for each time step, rather than the output predicted by the network at the previous step. It has been demonstrated that this technique is necessary when considering natural language processing related tasks, and it is currently always adopted even in numerical time series prediction.Training with teacher forcing does not allow the network to correct small errors because, during the training phase, the prediction at a particular time step does not affect future predictions. In principle, this can lead to a situation that is somehow similar to that of the feed-forward recursive predictor.We thus proposed to adopt a recurrent architecture and to train it without teacher forcing. Coupling these two elements solves at the same time the drawbacks of the recursive, multi-output predictors, and LSTM with teacher forcing. First, this structure is trained to reproduce the entire set of output variables. Second, it explicitly takes into account that these outputs represent the same variable computed at consecutive time steps. Third, small prediction errors propagate along the predicted sequence during training, and thus the training process should be able to correct them. We tested the capability of the neural predictors on four well-known chaotic systems: the logistic and the Hénon maps, the prototypes of chaos in non-reversible and reversible systems, and two generalized Hénon maps, a low- and a high-dimensional case of hyperchaos.First, the predictors have been trained on noise-free data generated by chaotic oscillators, without taking advantage of any physical knowledge on the systems.The obtained results show that LSTM nets trained without teacher forcing are able to efficiently couple the strengths of all the benchmark competitors, and provide the best performances in terms of predictive power on all the considered chaotic attractors. The results obtained are robust because the predictors rank in the same way in all the chaotic systems. We also proved that LSTM architectures are more robust than the feed-forward nets even when a redundant number of time lags are included in the input. The absence of noise is an ideal condition that is rarely verified when considering practical applications. We thus introduce additive white Gaussian noise on the signals obtained simulating the deterministic systems. A sensitivity analysis considering different levels of noise have been performed. As expected, the performances are considerably worse than those obtained in the noise-free case due to the chaotic behavior of the considered systems, which exponentially amplify the noise on the initialcondition. This analysis confirm the ranking already obtained in the noise-free case: LSTM nets trained without teacher forcing turn out to be the best performing architecture.Another test takes into account a modified logistic map, with a slow-varying growth rate (i.e., the logistic parameter). Testing the predictors on a slow-fast system is interesting because the forecasting task requires to retain information about both the slow-varying context (long-term memory) and the fast dynamics of the logistic map. Again, the recurrent structure of the LSTM nets provides better predictive accuracy than feed-forward ones due to the LSTMs dynamic nature: they have an internal memory, and the values of their gates change at each step. At last, we consider two real-world applications: solar irradiance measured in Como, and the ozone concentration in Chiavenna, Northern Italy. Both the time series exhibit a chaotic behavior (positive largest Lyapunov exponent) and thus represent an appropriate case study. In general, the results confirm that the LSTM without teacher forcing outperforms the competitors. However, the ranking seems to be more system-dependent than that obtained with the artificial systems. For instance, the LSTM with teacher forcing provides the worst performance on the solar irradiance dataset. Another interesting result is that the feed-forward multi-output net reaches comparable (though still worse) forecasting accuracy of LSTM without teacher forcing in both the time series.Besides the accuracy of the forecasted values, another essential characteristic of the forecasting models is their generalization capability, often mentioned as domain adaptation in the neural nets literature. It means the possibility of storing knowledge gained while solving one task and applying it to different, though similar, datasets.To test this feature, the neural networks developed to predict the solar irradiance at the Como station (source domain) have been used, without retraining, on other sites (target domains) spanning more than one degree of latitude and representing quite different geographical settings. The neural networks developed in our study have proved to be able to forecast solar radiation in other stations with a minimal loss of precision.