Current students


Section: Computer Science and Engineering

Major Research topic:
High-Level Synthesis for Deep Learning

One of the current challenges in the field of Deep Learning (and Machine Learning in general) is the push towards running inference on edge devices, such as mobile phones or autonomous cars. Training sessions can be efficiently deployed on HPC clusters containing multiple CPU and GPU nodes; inference on the other hand is often bound by strict latency and power consumption constraints, and so it could benefit from the use of Field Programmable Gate Arrays (FPGAs).
FPGAs are an interesting alternative in the trade-off between the high performance of specialized hardware accelerators (ASICs) and the ease of designing software for general purpose processors (CPUs and GPUs). Their low power consumption, coupled with the possibility of fast reconfiguration, made them desirable targets for the development of Deep Learning accelerators. However, programming FPGAs is still a non-trivial task, especially for software developers: without a good knowledge of hardware design the process of writing Verilog/VHDL code can be long and error-prone, and in the end it could lead to a design that does not fully exploit the performance of the FPGA.
High-Level Synthesis (HLS) tools provide a solution to this problem by automatically translating software descriptions (typically C or C++) into optimized Verilog/VHDL code ready for logic synthesis and implementation. The goal of this research is to specialize an existing High-Level Synthesis framework (PandA – Bambu, developed at Politecnico di Milano) for Deep Learning applications, exploring domain-specific optimizations and offering an interface towards popular software frameworks used to develop Deep Learning algorithms. The approach will not be based on a fixed acceleration engine template: this is a solution that has been proposed many times in scientific literature, but it becomes a limiting factor in a field where the size, complexity and diversity of emerging algorithms is constantly increasing. Instead, HLS will keep the design flow flexible and enable the introduction of innovations with a smaller effort than what is required to redesign a whole processor-like template.
Previous works also exploited commercial HLS tools by lowering a Deep Learning algorithm to C/C++ and manually annotating the code to indicate required transformations, but in this way the HLS tools are completely unaware of the specific nature of Deep Learning workloads. The proposed workflow instead relies on Domain Specific Languages (DSLs), maintaining high-level details about the computation that would be lost if translated to a low-level language such as C or LLVM. Specific techniques will be applied to exploit the inherently parallel nature of FPGAs and the possibility to perform computations with customized bit-width.