Experiments
In this section, we compare the prediction performance achieved by the recurrent neural network architectures presented in the previous sections on both the synthetic tasks and the real-world datasets. For each architecture, we report the optimal configuration of its hyperparameters for the task at hand, and the best learning strategy adopted for training the model weights. We perform several independent evaluation of the prediction results due to the stochastic initialization of the internal model weights. The accuracy of the forecast is evaluated in terms of normalized mean squared error and the results are reported both as numerical value and graphical depictions of the predicted time series.