2020
Controlling aggregate signal amplitude from device arrays by segmentation and time-gating
Burr Geoffrey W, Narayanan Pritish
current mirror, resistive touchscreen, current, signal, voltage, high dynamic range, amplitude, scaling, topology, physics
Abstract
High dynamic range resistive arrays are provided. An array of resistive elements provides a vector of current outputs equal to the analog vector-matrix product between (i) a vector of voltage inputs to the array encoding a vector of analog input values and (ii) a matrix of analog resistive weights within the array. First stage current mirrors are electrically coupled to a subset of the resistive elements through a local current accumulation wire. A second stage current mirror is electrically coupled to the first stage current mirrors through a global accumulation wire. Each of the first stage current mirrors includes at least one component having respective scaling factors selectable to scale up or down the current in the local current accumulation wire, thus controlling the aggregate current on the global accumulation wire.
2019
COMPRESSION OF FULLY CONNECTED / RECURRENT LAYERS OF DEEP NETWORK(S) THROUGH ENFORCING SPATIAL LOCALITY TO WEIGHT MATRICES AND EFFECTING FREQUENCY COMPRESSION
Chia-yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Suyog Gupta, Pritish Narayanan
weight distribution, matrix, component, executable, transformation, artificial neural network, compression, set, algorithm, computer science
Abstract
A system, having a memory that stores computer executable components, and a processor that executes the computer executable components, reduces data size in connection with training a neural network by exploiting spatial locality to weight matrices and effecting frequency transformation and compression. A receiving component receives neural network data in the form of a compressed frequency-domain weight matrix. A segmentation component segments the initial weight matrix into original sub-components, wherein respective original sub-components have spatial weights. A sampling component applies a generalized weight distribution to the respective original sub-components to generate respective normalized sub-components. A transform component applies a transform to the respective normalized sub-components. A cropping component crops high-frequency weights of the respective transformed normalized sub-components to yield a set of low-frequency normalized sub-components to generate a compressed representation of the original sub-components.
Analog-to-Digital Conversion With Reconfigurable Function Mapping for Neural Networks Activation Function Acceleration
Massimo Giordano, Giorgio Cristiano, Koji Ishibashi, Stefano Ambrogio, Hsinyu Tsai, Geoffrey W. Burr, Pritish Narayanan
Patent 2
Abstract activation function, hardware acceleration, circuit design, routing, artificial neural network, acceleration, undersampling, process, computer hardware, computer science
Hardware acceleration of deep neural networks (DNNs) using non-volatile memory arrays has the potential to achieve orders of magnitude power and performance benefits versus digital von-Neumann architectures by implementing the critical multiply-accumulate operations at the location of the weight data. However, realizing these system-level improvements requires careful consideration of the circuit design tradeoffs involved. For instance, neuron circuitry at the periphery, in addition to accumulating current and having mechanisms for routing, must also implement a non-linear activation function (for forward propagate) or a derivative (for reverse propagate). While it is possible to do this with analog-to-digital converters (ADCs) followed by digital arithmetic circuitry, this approach is power-hungry, suffers from undersampling, and could occupy a large area footprint. These large circuit blocks may therefore need to be time-multiplexed across multiple neurons, reducing the overall parallelism and diminishing the performance benefits. In this paper, we propose a new function mapping ADC that directly implements non-linear functions as a part of the process of conversion into the digital domain. The design is applicable to both inference and training, since it is capable of implementing both the activation function and its derivative using the same hardware. It is capable of fast and parallel conversion across all neuron values, while also being flexible and reconfigurable. We describe the design, followed by detailed circuit-level simulations demonstrating the viability and flexibility of the approach and quantifying the power and performance numbers. The simulation results show a total conversion time of 207 ns for 512 neurons in parallel, while the total energy consumption is found to be 9.95 nJ, which corresponds to 19.4 pJ per neuron.
doi
activation function, hardware acceleration, circuit design, routing, artificial neural network, acceleration, undersampling, process, computer hardware, computer science
2018
Apparatus for deep learning operations on resistive crossbar array
Pritish Narayanan, Scott C. Lewis
transistor, node, voltage, pulse width modulation, line, neuromorphic engineering, resistive touchscreen, word, electrical engineering, computer science
Abstract
A system and method are shown for both forward and reverse read operations in a neuromorphic crossbar array that is part of an artificial neural network (ANN). During a forward read operation, a plurality of neuron activations are encoded into a pulse width drive array word line that gates a cell access transistor. A source-follower transistor is biased at a source follower voltage (VRDP) and a column voltage node (BLV) is held at read voltage (VREAD). During a reverse read operation, the cell access transistor operates as another source follower by: encoding a neuron error signal into the column voltage node (BLV), driving a gate line of the cell access transistor to the source follower voltage (VRDP), and holding an intermediate node between the cell access transistor of (a) and the source-follower transistor of (b) at the read voltage (VREAD).
SYSTEM, METHOD AND ARTICLE OF MANUFACTURE FOR SYNCHRONIZATION-FREE TRANSMITTAL OF NEURON VALUES IN A HARDWARE ARTIFICIAL NEURAL NETWORKS
Geoffrey W. Burr, Pritish Narayanan
clock synchronization, artificial neural network, synchronization, crossbar switch, transfer, encode, topology, computation, layer, computer science
Abstract
Computations in Artificial neural networks (ANNs) are accomplished using simple processing units, called neurons, with data embodied by the connections between neurons, called synapses, and by the strength of these connections, the synaptic weights. Crossbar arrays may be used to represent one layer of the ANN with Non-Volatile Memory (NVM) elements at each crosspoint, where the conductance of the NVM elements may be used to encode the synaptic weights, and a highly parallel current summation on the array achieves a weighted sum operation that is representative of the values of the output neurons. A method is outlined to transfer such neuron values from the outputs of one array to the inputs of a second array with no need for global clock synchronization, irrespective of the distances between the arrays, and to use such values at the next array, and/or to convert such values into digital bits at the next array.
TEMPORAL MEMORY ADAPTED FOR SINGLE-SHOT LEARNING AND DISAMBIGUATION OF MULTIPLE PREDICTIONS
Geoffrey W. Burr, Pritish Narayanan
hierarchical temporal memory, component, sequence, pattern recognition, computer science, artificial intelligence, distributed representation, single shot
Abstract
Single-shot learning and disambiguation of multiple predictions in hierarchical temporal memory is provided. In various embodiments an input sequence is read. The sequence comprises first, second, and third time-ordered components. Each of the time-ordered components is encoded in a sparse distributed representation. The sparse distributed representation of the first time-ordered component is inputted into a first portion of a hierarchical temporal memory. The sparse distributed representation of the second time-ordered component is inputted into a second portion of the hierarchical temporal memory. The second portion is connected to the first portion by a first plurality of synapses. A plurality of predictions as to the third time-ordered component is generated within a third portion of the hierarchical temporal memory. The third portion is connected to the second portion by a second plurality of synapses. Based on the plurality of predictions, additional synaptic connections are added between the first portion and the second portion.