Current students


Section: Computer Science and Engineering

Major Research topic:
Meta Reinforcement Learning for Hyperpameter Tuning

Deep Reinforcement Learning (RL) methods have driven impressive advances in artificial intelligence in recent years, exceeding human performance in domains ranging from Atari to Go to no-limit poker. A RL agent interacts with its environment by observing its current state, choosing an action and obtaining a reward. The ultimate goal is to learn a policy of actions able to maximize the sum of rewards. Policy gradient methods are among the best techniques to solve complex control problems, but their application has several main drawbacks. At first, the algorithms are often parametrics, hence their performance is depending on the value of several hyperparameters (as, for example, the step size in the Stochastic Gradient Ascent). In order to optimize the results, practitioners have to manually tune the settings of the algorithms with a trial-and-error procedure. This procedure can be transformed in a Meta Decision Process, where learning itself can be considered as a reward for RL algorithms. Another main concern with standard algorithms is related to the ability of the learnt models to generalize to different tasks. Meta Learning has the aim of designing models that can learn new skills and to rapidly adapt to new environments with few training examples. Current solutions for Meta Learning involve the consideration of a distance metric between tasks (Matching Networks, Context Embeddings), the application of recurrent neural architectures with explicit storage buffer or with different update speeds (Meta-RL, RL^2, MANN), or an External Learning Optimizer, which considers the search directions seen in different tasks (MAML, LSTM, Reptile). While these works are mainly devoted to the ability to generalize,  they are seldom used for hyperparameter tuning and learning optimization (Meta-SGD, Autonomous Optimization), and the theoretical research related to convergence properties of these method is very poor. The goal of this thesis is the development of Meta Reinforcement Learning techniques, with the ability to learning the best way to learn (and generalize) with strong theoretical guarantees.