國立虎尾科技大學 |

RRAM-Based In-Memory Computing Architecture Designs.

Record Type:	Language materials, manuscript : Monograph/item
Title/Author:	RRAM-Based In-Memory Computing Architecture Designs./
Author:	Wang, Xinxin.
Description:	1 online resource (128 pages)
Notes:	Source: Dissertations Abstracts International, Volume: 85-03, Section: B.
Contained By:	Dissertations Abstracts International85-03B.
Subject:	Elementary education. -
Online resource:	click for full text (PQDT)
ISBN:	9798380374125

RRAM-Based In-Memory Computing Architecture Designs.
Wang, Xinxin.

RRAM-Based In-Memory Computing Architecture Designs. - 1 online resource (128 pages)

Source: Dissertations Abstracts International, Volume: 85-03, Section: B.

Thesis (Ph.D.)--University of Michigan, 2023.

Includes bibliographical references

New computing applications, e.g., deep neural network (DNN) training and inference, have been a driving force that changed the semiconductor industry landscape. The data-intensive nature of DNN applications usually leads to high computation costs and complexity, and different hardware accelerators have thus been developed to improve the efficiency of running these models. For example, NVIDIA GPU, google TPU and near-memory computing architectures can enhance DNN performance (inferences/second) and energy efficiency (TOPS/W) compared to conventional CPUs. Past that, IMC methods can circumvent the fundamental von Neumann bottleneck and enable highly parallel computing, and will lead to even higher hardware efficiency and performance. In this thesis, we examine several aspects of IMC accelerators based on emerging memory devices such as RRAM, which can potentially offer high computation density, throughput and energy efficiency. To start, we present an IMC design that is reconfigurable and can accelerate general arithmetic and logic functions. The system consists of small look-up tables (LUTs), a memory block, and search auxiliary blocks, all implemented in the same RRAM crossbar array. External data access and data conversions are eliminated to allow operations fully in-memory. Details of logic and arithmetic functions such as addition, AND and multiplication operations are discussed on the basis of search and writeback steps. A compact instruction set is demonstrated in this architecture through circuit-level simulations. Performance evaluations show that the proposed IMC architecture is suitable for handling data-intensive tasks with very low power consumption.Next, we discuss DNN accelerator designs using a tiled IMC architecture. Popular models including VGG-16 and MobileNet are successfully mapped and tested on the RRAM-based tiled IMC architecture. Effects of finite RRAM array size and quantized partial sums (Psums) due to ADC precision constraints are analyzed. Methods are developed to address these challenges and preserve DNN accuracy and IMC performance gains. For practical IMC implementations and to support larger models, we develop a Tiled Architecture for In-memory Computing and Heterogeneous Integration (TAICHI), a general IMC DNN accelerator design. TAICHI is based on tiled RRAM crossbar arrays heterogeneously integrated with local arithmetic units and global co-processors to allow the same chip to efficiently map different models while maintaining high energy efficiency and throughput. A hierarchical mesh network-on-chip is implemented to facilitate communication among clusters in TAICHI to balance reconfigurability and efficiency. Detailed implementation of the different circuit components is presented, and the system performance is benchmarked at several technology nodes. The heterogeneous design also allows the system to accommodate models larger than the on-chip storage capability to make the hardware system future-proof.In general, large-scale implementations of IMC accelerators face two technological challenges - high ADC overhead and device variability. We note these challenges can be addressed by restricting neuron activations to single-bit values, i.e., spikes, and by employing binary weights, respectively. Based on these principles, we propose efficient hardware implementation of binary-weight SNNs (BSNNs) that can be achieved using current RRAM devices and simple circuits. Binary activations also provide opportunities for intra and inter-layer data routing, and neuron circuit design optimizations. Through high-precision backpropagation-through-time (HP-BPTT) and a proper neuron design, we show BSNN can achieve accuracies comparable to floating-point models. With these co-designs, the proposed architecture can achieve very high energy efficiency and accuracy for common SNN datasets. The robustness of the BSNN model against device non-idealities is further verified through experimental chip measurements.Finally, we discuss other opportunities to further enhance IMC architecture performance, including possible pipelining optimization, mapping strategy and BSNN training optimization strategies.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798380374125Subjects--Topical Terms:

1148439
Elementary education.
Subjects--Index Terms:

Computing architectureIndex Terms--Genre/Form:

554714
Electronic books.

RRAM-Based In-Memory Computing Architecture Designs.
LDR:05619ntm a22004217 4500 001 1141807
005 20240414211524.5
006 m o d
007 cr mn ---uuuuu
008 250605s2023 xx obm 000 0 eng d
020 $a 9798380374125
035 $a (MiAaPQ)AAI30748384
035 $a (MiAaPQ)umichrackham005247
035 $a AAI30748384
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Wang, Xinxin. $3 1191817
245 1 0 $a RRAM-Based In-Memory Computing Architecture Designs.
264 0 $c 2023
300 $a 1 online resource (128 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 85-03, Section: B.
500 $a Advisor: Lu, Wei D.
502 $a Thesis (Ph.D.)--University of Michigan, 2023.
504 $a Includes bibliographical references
520 $a New computing applications, e.g., deep neural network (DNN) training and inference, have been a driving force that changed the semiconductor industry landscape. The data-intensive nature of DNN applications usually leads to high computation costs and complexity, and different hardware accelerators have thus been developed to improve the efficiency of running these models. For example, NVIDIA GPU, google TPU and near-memory computing architectures can enhance DNN performance (inferences/second) and energy efficiency (TOPS/W) compared to conventional CPUs. Past that, IMC methods can circumvent the fundamental von Neumann bottleneck and enable highly parallel computing, and will lead to even higher hardware efficiency and performance. In this thesis, we examine several aspects of IMC accelerators based on emerging memory devices such as RRAM, which can potentially offer high computation density, throughput and energy efficiency. To start, we present an IMC design that is reconfigurable and can accelerate general arithmetic and logic functions. The system consists of small look-up tables (LUTs), a memory block, and search auxiliary blocks, all implemented in the same RRAM crossbar array. External data access and data conversions are eliminated to allow operations fully in-memory. Details of logic and arithmetic functions such as addition, AND and multiplication operations are discussed on the basis of search and writeback steps. A compact instruction set is demonstrated in this architecture through circuit-level simulations. Performance evaluations show that the proposed IMC architecture is suitable for handling data-intensive tasks with very low power consumption.Next, we discuss DNN accelerator designs using a tiled IMC architecture. Popular models including VGG-16 and MobileNet are successfully mapped and tested on the RRAM-based tiled IMC architecture. Effects of finite RRAM array size and quantized partial sums (Psums) due to ADC precision constraints are analyzed. Methods are developed to address these challenges and preserve DNN accuracy and IMC performance gains. For practical IMC implementations and to support larger models, we develop a Tiled Architecture for In-memory Computing and Heterogeneous Integration (TAICHI), a general IMC DNN accelerator design. TAICHI is based on tiled RRAM crossbar arrays heterogeneously integrated with local arithmetic units and global co-processors to allow the same chip to efficiently map different models while maintaining high energy efficiency and throughput. A hierarchical mesh network-on-chip is implemented to facilitate communication among clusters in TAICHI to balance reconfigurability and efficiency. Detailed implementation of the different circuit components is presented, and the system performance is benchmarked at several technology nodes. The heterogeneous design also allows the system to accommodate models larger than the on-chip storage capability to make the hardware system future-proof.In general, large-scale implementations of IMC accelerators face two technological challenges - high ADC overhead and device variability. We note these challenges can be addressed by restricting neuron activations to single-bit values, i.e., spikes, and by employing binary weights, respectively. Based on these principles, we propose efficient hardware implementation of binary-weight SNNs (BSNNs) that can be achieved using current RRAM devices and simple circuits. Binary activations also provide opportunities for intra and inter-layer data routing, and neuron circuit design optimizations. Through high-precision backpropagation-through-time (HP-BPTT) and a proper neuron design, we show BSNN can achieve accuracies comparable to floating-point models. With these co-designs, the proposed architecture can achieve very high energy efficiency and accuracy for common SNN datasets. The robustness of the BSNN model against device non-idealities is further verified through experimental chip measurements.Finally, we discuss other opportunities to further enhance IMC architecture performance, including possible pipelining optimization, mapping strategy and BSNN training optimization strategies.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Elementary education. $3 1148439
650 4 $a Computer engineering. $3 569006
650 4 $a Electrical engineering. $3 596380
653 $a Computing architecture
653 $a Deep neural network
653 $a Hardware system
653 $a Spiking Neural Network
653 $a Binary activations
655 7 $a Electronic books. $2 local $3 554714
690 $a 0524
690 $a 0544
690 $a 0464
690 $a 0800
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of Michigan. $b Electrical and Computer Engineering. $3 1372805
773 0 $t Dissertations Abstracts International $g 85-03B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30748384 $z click for full text (PQDT)