|Thesis abstract: |
Modern High Performance Computing (HPC) systems expose an unprecedented level of exploitable parallelism. They successively move from superscalar architectures to many core systems and nowadays higher level solutions include clusters of GPUs. This strive for parallelism is explained by the continuous increase in the performance needed by the customers which generally require a high quality of service (QoS) for their applications, usually in term of execution time. The problem is that applications are becoming more and more complex given the cumbersome amount of data available today (the so called ¿big data¿ problem) and this aspect pushes the research in HPC even forward. Furthermore, over the last years, also reconfigurable hardware entered this field and some commercial solutions already exploit this kind accelerators. The advantage of reconfigurable hardware, as FPGA, over conventional solutions, is that it can be configured on demand to implement highly specialized accelerators.
An important aspect about these HPC systems is that, due to their huge fixed and maintenance costs, they are generally run by a service provider which sells computational time and power to the end user. In this scenario the increase in available parallelism is beneficial both for the end user, but also for the service provider which can satisfy in this way a larger amount of requests. However, in this situation, one of the main issues consists in the capability to handle efficiently multiple users workloads respecting the desired QoS. The solution adopted nowadays consists in performing a spatial or time allocation of resources in the datacenter to a single user, which means that a single node (or a part of it) is allocated to a single user for a predefined time slot. The amount of resources to be allocated to a job is generally decided in advance by the end user which, as a consequence, usually over-provisions the amount of resources needed for his workload.
Works over the last years focused on the possibility of guaranteeing a predictable performance to applications once they are collocated on the same node permitting the provider to switch off unused resources to save energy. However the problem with these kind of solutions is that collocated applications tend to influence each other due to contention on shared resources. Reconfigurable HW is a valuable solution to address this contention problem since one of the advantages of such technology is that the implemented accelerators can run independently one from the other effectively reducing the need for resource sharing between collocated workloads. However a major drawback that limit the spread of this technology in large scale installations consists in the great expertise required to build these HW accelerators.
The research project aims to analyze and solve these issues that limit the adoption of reconfigurable HW components in HPC systems. The goal of the project is the realization of an operating system management layer able to support an efficient resource sharing of these hardware accelerators between multiple end users.
In more detail the research will focus on realizing a software layer able to manage the system with the goal of respecting the QoS of collocated workloads. This layer will exploit the availability of reconfigurable HW and will adopt mechanisms to effectively share these HW resources between different users. As an example one of the feature that will be studied and implemented is the capability of scaling up and down applications making use, for instance, of the so called ¿malleable pipelines¿. This feature will exploit the availability of different parallel implementations crafted by the user or automatically derived by a high level analysis of the application. A great focus, in fact, will be given to the possibility of automatically generate different parallel implementations of a given application starting from high level code analysis and/or profiling of representative executions. Note that the goal of this software management layer is aimed not at exploiting the maximum performance, but instead is concerned with giving the users the performance they request and balancing the resource usage between different users. This layer can be implemented at different level of the OS and a study has to be conducted to determine the best solution. Furthermore given the shared nature or the target system it will be necessary to define standard interfaces between all the actors in the system. This part is of a great importance since a wrong decision at this level may deeply influence the system performance.