|PAPINI MATTEO||Cycle: XXXIII |
Section: Computer Science and Engineering
Tutor: GATTI NICOLA
Advisor: RESTELLI MARCELLO Major Research topic
:Safe Policy OptimizationAbstract:
Reinforcement Learning (RL) allows to solve control problems under uncertainty by repeated interaction of an artificial agent with an unknown environment. This approach has proven successful in applicatios, such as games, where experience can be easily simulated and the agent's actions have no concrete consequences. The natural next step is to apply RL to real-world problems, such as the control of industrial processes. Among RL methods, Policy Optimization (PO) algorithms are the most suited to the task, due to their ability to manage high-dimensional decision variables and noisy signals. However, the very trial-and-error nature of these methods poses additional challenges to their concrete application. Collecting experience in real-world environments is often a slow and expensive process. Moreover, exploratory actions, which are essential to gather information on how to improve the agent's strategy, can easily result in dangerous behavior and potentially harm machines and people. To apply PO to real-world problems, we need guarantees not only on the quality of the final learning outcome and the time needed to achieve it, but also on the safety of intermediate solutions. This requires to identify the conditions under which PO can be reasonably applied, to understand the theoretical properties of existing algorithms and to develop new, more reliable ones.