CURZEL SERENA | Cycle: XXXV |
Section: Computer Science and Engineering
Advisor: FERRANDI FABRIZIO
Tutor: SILVANO CRISTINA
Major Research topic:
High-Level Synthesis for Deep Learning
Abstract:
One of the current challenges in the field of Deep Learning (and Machine Learning in general) is the push towards running inference on edge devices, such as mobile phones or autonomous cars. Training sessions can be efficiently deployed on HPC clusters containing multiple CPU and GPU nodes; inference on the other hand is often bound by strict latency and power consumption constraints, and so it could benefit from the use of Field Programmable Gate Arrays (FPGAs).
FPGAs are an interesting alternative in the trade-off between the high performance of specialized hardware accelerators (ASICs) and the ease of designing software for general purpose processors (CPUs and GPUs). Their low power consumption, coupled with the possibility of fast reconfiguration, made them desirable targets for the development of Deep Learning accelerators. However, programming FPGAs is still a non-trivial task, especially for software developers: without a good knowledge of hardware design the process of writing Verilog/VHDL code can be long and error-prone, and in the end it could lead to a design that does not fully exploit the performance of the FPGA.
High-Level Synthesis (HLS) tools provide a solution to this problem by automatically translating software descriptions (typically C or C++) into optimized Verilog/VHDL code ready for logic synthesis and implementation. The goal of this research is to specialize an existing High-Level Synthesis framework (PandA – Bambu, developed at Politecnico di Milano) for Deep Learning applications, exploring domain-specific optimizations and offering an interface towards popular software frameworks used to develop Deep Learning algorithms. The approach will not be based on a fixed acceleration engine template: this is a solution that has been proposed many times in scientific literature, but it becomes a limiting factor in a field where the size, complexity and diversity of emerging algorithms is constantly increasing. Instead, HLS will keep the design flow flexible and enable the introduction of innovations with a smaller effort than what is required to redesign a whole processor-like template.
Previous works also exploited commercial HLS tools by lowering a Deep Learning algorithm to C/C++ and manually annotating the code to indicate required transformations, but in this way the HLS tools are completely unaware of the specific nature of Deep Learning workloads. The proposed workflow instead relies on Domain Specific Languages (DSLs), maintaining high-level details about the computation that would be lost if translated to a low-level language such as C or LLVM. Specific techniques will be applied to exploit the inherently parallel nature of FPGAs and the possibility to perform computations with customized bit-width.
FPGAs are an interesting alternative in the trade-off between the high performance of specialized hardware accelerators (ASICs) and the ease of designing software for general purpose processors (CPUs and GPUs). Their low power consumption, coupled with the possibility of fast reconfiguration, made them desirable targets for the development of Deep Learning accelerators. However, programming FPGAs is still a non-trivial task, especially for software developers: without a good knowledge of hardware design the process of writing Verilog/VHDL code can be long and error-prone, and in the end it could lead to a design that does not fully exploit the performance of the FPGA.
High-Level Synthesis (HLS) tools provide a solution to this problem by automatically translating software descriptions (typically C or C++) into optimized Verilog/VHDL code ready for logic synthesis and implementation. The goal of this research is to specialize an existing High-Level Synthesis framework (PandA – Bambu, developed at Politecnico di Milano) for Deep Learning applications, exploring domain-specific optimizations and offering an interface towards popular software frameworks used to develop Deep Learning algorithms. The approach will not be based on a fixed acceleration engine template: this is a solution that has been proposed many times in scientific literature, but it becomes a limiting factor in a field where the size, complexity and diversity of emerging algorithms is constantly increasing. Instead, HLS will keep the design flow flexible and enable the introduction of innovations with a smaller effort than what is required to redesign a whole processor-like template.
Previous works also exploited commercial HLS tools by lowering a Deep Learning algorithm to C/C++ and manually annotating the code to indicate required transformations, but in this way the HLS tools are completely unaware of the specific nature of Deep Learning workloads. The proposed workflow instead relies on Domain Specific Languages (DSLs), maintaining high-level details about the computation that would be lost if translated to a low-level language such as C or LLVM. Specific techniques will be applied to exploit the inherently parallel nature of FPGAs and the possibility to perform computations with customized bit-width.
Cookies
We serve cookies. If you think that's ok, just click "Accept all". You can also choose what kind of cookies you want by clicking "Settings".
Read our cookie policy
Cookies
Choose what kind of cookies to accept. Your choice will be saved for one year.
Read our cookie policy
-
Necessary
These cookies are not optional. They are needed for the website to function. -
Statistics
In order for us to improve the website's functionality and structure, based on how the website is used. -
Experience
In order for our website to perform as well as possible during your visit. If you refuse these cookies, some functionality will disappear from the website. -
Marketing
By sharing your interests and behavior as you visit our site, you increase the chance of seeing personalized content and offers.