Nowadays, machine learning empowers several different consumer applications, where artificial intelligence is well supported by statistical models and mathematical algorithms that allow computer systems to accurately perform specific tasks. Deep neural networks (DNNs) have dramatically enhanced classification and recognition operations by exploiting a general-purpose learning procedure in a multi-layer architecture.
However, these architectures have several limitations. First, the training and inference of DNNs using standard digital systems is time expensive and power consuming. Secondly, trained DNN cannot adapt to a constantly changing environment. For instance, biological organisms constantly acquire and modulate knowledge with respect to the environment in which they live (lifelong learning) while DNNs are affected by catastrophic forgetting whenever new data is learnt. For these limitations, the scientific community and the industry are looking for novel methods to improve the performance and efficiency of DNNs.
Taking advantage from the latest advances related to innovative computing approaches such as in-memory computing, the training and testing DNNs could be highly improved in terms of speed and power efficiency. In-memory computing rises as a very effective method for overcoming the limitations of typical von Neumann architectures since it massively parallelizes the operations and performs the calculations where the data are stored, avoiding the so called “von Neumann bottleneck”.
In particular, in-memory computing requires the use of memory elements capable of storing data and performing calculations at the same time. New emerging non-volatile memories (NVM), such as phase change memory (PCM) or resistive switching RAM (RRAM) give solution to these requirements, as they have small size, fast switching, multilevel capability, time-dependent dynamics, low-voltage operation, and they can arranged in array architectures.
As the main operations in DNNs are related to dense matrix-matrix multiplications, trained weights can be mapped in NVMs crossbar arrays as conductance values, exploiting the Ohm’s and Kirchhoff’s laws for performing matrix-vector-multiplication (MVM). These new advances could outperform current GPU and CPUs in terms of power consumption and speed, since multiply-and-accumulate (MAC) operations can be performed in just one step by using MVM.
This doctoral dissertation aims to the improvement of DNNs from a technical point of view following two different research paths. The first regards the introduction of bio-inspired methods into the network architecture for improving the capability of DNNs in the recognition of unknown images. Spike-Timing-Dependent plasticity, brain-inspired homeostasis and neural redundancy are some of the elements that have been included in the network to stabilize the learning processes.
The second research line regards the integrated design of a mixed-signal integrated circuit based on PCM synapses for the development of deep neural accelerators. Following the in-memory computing hardware approach, DNN weights are mapped in NVM arrays. A generic 1-layer fully connected (FC) multilayer perceptron (MLP) is proposed, where the training weights are mapped into 4-bit unsigned (uint4) digital words of ‘0’ and ‘1’, taking advantage from the wide resistive window of PCM devices between the high resistive state (HRS) and low resistive state (LRS), respectively.
The circuit has been designed using the ST Microelectronics BJT-CMOS-DMOS (BCD) 90 nm design kit with an embedded 5Kb 1T1R PCM array. The memory cells are manufactured with an optimized Ge-rich chalcogenide alloy and are stacked over the CMOS circuitry in the back end of the line. The design faces several circuital challenges such as the implementation of the analogue-to-digital interface between the array and the input-output (IO) peripherals and the signal processing for driving the PCM devices. The circuit performs intelligent tasks of recognition of handwritten digits (MNIST) at high bandwidth (500 kHz) and low power (~ 200 mW). All the MNIST inference activity is performed in less than 0.8s (256 mega operations per second, MOPS), which is far less with respect to the state of the art of standard Von Neumann processors.
Furthermore, the whole chip relies on a significant robustness with respect to the non-idealities of PCM devices, since the results are resilient both to drift and resistance variability, achieving almost the same software classification accuracy of the MNIST dataset (~85%).
This work highlights the main features, problems, and design requirements for efficiently implementing a hardware integrated DNN using PCM cells. The adopted solutions and the obtained results are extensively described by pointing out the advantages of analogue in-memory computing for the realization of arithmetic calculations.
In the following, a summary of the main sections of this doctoral dissertation is proposed.
Chapter 1 gives a short overview about the current learning and computing methods exploited in the artificial intelligence, mainly focusing on the so called “in-memory computing”, describing its advantages and the theoretical hardware implementation.
Chapter 2 focuses on the description of emerging non-volatile memories as phase change memory (PCM), resistive random switching memory (RRAM), magneto-tunnel junction (MRAM) and ferroelectric RAM (FeRAM). This chapter explains their main physical properties and suitability for implementing synaptic elements in neuromorphic engineering.
Chapter 3 presents the digital development of a hybrid supervised-unsupervised neural network using the Xilinx Zynq-7000 System-On-Chip (SoC) for performing lifelong learning. The supervised part is formed by a convolutional neural network that allows the extraction of generic features from the training dataset; the unsupervised section is constituted by a spike-timing-dependent plasticity (STDP) network performing a winner-take-all (WTA) procedure. The inference results are validated for the correct classification of up to 5 non-trained classes of the MNIST and Fashion MNIST datasets and compared with PCM-based approaches.
Chapter 4 covers the design of a new kind of hardware that supplements the accuracy of convolutional neural networks with the flexibility of bio-inspired spike-timing dependent plasticity. To enable the cohesion between the stable and the plastic section of the network, the bio-inspired spike-frequency adaptation of the neurons is exploited, as it enhances the efficiency and accuracy of the network. Chapter 5 and 6 are the core of this doctoral dissertation, since they deal with an innovative approach for the integrated design of artificial neural networks.
Chapter 5 introduces the main characteristics and application requirements of the 1-layer MLP integrated design. Based on experimental measurements of the PCM 1T1R cell, Monte Carlo simulations have been performed. The obtained results support the choice of the hardware architecture and the methodology to follow for the implementation of weights.
Chapter 6 describes the circuital design, the simulated results, and the physical realization (layout) of each one of the circuital blocks that configure the hardware implementation. The analogue part is responsible for a correct readout and amplification of the MVM operation, while the digital part covers the processing of the signals for determining the correct neuronal classification.
Finally, as an appendix, this doctoral thesis also proposes an insight over the state of the art of bio-inspired computation, which has led this research activity to significant choices in the integrated design of neural networks.