國立虎尾科技大學 |

Navigating Heterogeneity and Scalability in Modern Chip Design.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Navigating Heterogeneity and Scalability in Modern Chip Design./
作者:	Orenes-Vera, Marcelo.
面頁冊數:	1 online resource (187 pages)
附註:	Source: Dissertations Abstracts International, Volume: 85-12, Section: B.
Contained By:	Dissertations Abstracts International85-12B.
標題:	Computer engineering. -
電子資源:	click for full text (PQDT)
ISBN:	9798382810263

Navigating Heterogeneity and Scalability in Modern Chip Design.
Orenes-Vera, Marcelo.

Navigating Heterogeneity and Scalability in Modern Chip Design. - 1 online resource (187 pages)

Source: Dissertations Abstracts International, Volume: 85-12, Section: B.

Thesis (Ph.D.)--Princeton University, 2024.

Includes bibliographical references

Computing systems have become ubiquitous in the modern world but their design is far from one-size-fits-all. From battery-powered devices to supercomputers, deployment requirements are a primary driver of heterogeneity in computer design. As modern systems rely on parallelism and specialization to achieve their performance and power goals, new challenges arise. The system's complexity grows with the number of distinct hardware modules, complicating the verification of correct and secure behavior. Moreover, expanding parallelization across more processing units (PUs) increases the pressure on the memory hierarchy and inter-PU network, which results in severe bottlenecks for applications traversing graph-like data structures with indirect memory accesses (IMAs). These challenges call for re-thinking software abstractions and hardware designs to achieve scalable and efficient systems, as well as introducing robust methodologies to ensure their correctness. My dissertation aims to tackle these challenges with three main thrusts.First, to facilitate hardware designers applying formal verification to their modules, this dissertation introduces AutoSVA, a toolflow that generates formal verification testbenches from module interface annotations. Testbenches generated with AutoSVA have uncovered bugs in open-source projects, including a widely used RISC-V CPU. Second, to alleviate IMA latency without increasing verification complexity, this dissertation introduces MAPLE, a network-connected memory-access engine that supports data pipelining and prefetching without requiring PU modifications. As such, off-the-shelf PUs can offload IMAs to MAPLE, and consume data via software-managed queues. Using MAPLE effectively mitigates memory latency, providing 2x speedups over software- and hardware-only prefetching. Third, to further the scalability of graph and sparse workloads, this dissertation co-designs scale-out architectures with a data-centric execution model, Dalorex, where IMAs are split into tasks that only access a confined address range and execute at the PU with dedicated access to that memory range. The parallelization of breadth-first-search on a billion-edge graph across a million PUs results in nearly an order of magnitude faster runtimes than Graph500's top entries.By introducing novel hardware designs, execution models, and verification tools, this dissertation contributes towards addressing the challenges posed by the increasing demand for high-performance, energy-efficient, and cost-effective computing systems.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798382810263Subjects--Topical Terms:

569006
Computer engineering.
Subjects--Index Terms:

Processing unitsIndex Terms--Genre/Form:

554714
Electronic books.

Navigating Heterogeneity and Scalability in Modern Chip Design.
LDR:03894ntm a22003737 4500 001 1148400
005 20240924101916.5
006 m o d
007 cr bn ---uuuuu
008 250605s2024 xx obm 000 0 eng d
020 $a 9798382810263
035 $a (MiAaPQ)AAI31294177
035 $a AAI31294177
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Orenes-Vera, Marcelo. $3 1474354
245 1 0 $a Navigating Heterogeneity and Scalability in Modern Chip Design.
264 0 $c 2024
300 $a 1 online resource (187 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 85-12, Section: B.
500 $a Advisor: Martonosi, Margaret;Wentzlaff, David.
502 $a Thesis (Ph.D.)--Princeton University, 2024.
504 $a Includes bibliographical references
520 $a Computing systems have become ubiquitous in the modern world but their design is far from one-size-fits-all. From battery-powered devices to supercomputers, deployment requirements are a primary driver of heterogeneity in computer design. As modern systems rely on parallelism and specialization to achieve their performance and power goals, new challenges arise. The system's complexity grows with the number of distinct hardware modules, complicating the verification of correct and secure behavior. Moreover, expanding parallelization across more processing units (PUs) increases the pressure on the memory hierarchy and inter-PU network, which results in severe bottlenecks for applications traversing graph-like data structures with indirect memory accesses (IMAs). These challenges call for re-thinking software abstractions and hardware designs to achieve scalable and efficient systems, as well as introducing robust methodologies to ensure their correctness. My dissertation aims to tackle these challenges with three main thrusts.First, to facilitate hardware designers applying formal verification to their modules, this dissertation introduces AutoSVA, a toolflow that generates formal verification testbenches from module interface annotations. Testbenches generated with AutoSVA have uncovered bugs in open-source projects, including a widely used RISC-V CPU. Second, to alleviate IMA latency without increasing verification complexity, this dissertation introduces MAPLE, a network-connected memory-access engine that supports data pipelining and prefetching without requiring PU modifications. As such, off-the-shelf PUs can offload IMAs to MAPLE, and consume data via software-managed queues. Using MAPLE effectively mitigates memory latency, providing 2x speedups over software- and hardware-only prefetching. Third, to further the scalability of graph and sparse workloads, this dissertation co-designs scale-out architectures with a data-centric execution model, Dalorex, where IMAs are split into tasks that only access a confined address range and execute at the PU with dedicated access to that memory range. The parallelization of breadth-first-search on a billion-edge graph across a million PUs results in nearly an order of magnitude faster runtimes than Graph500's top entries.By introducing novel hardware designs, execution models, and verification tools, this dissertation contributes towards addressing the challenges posed by the increasing demand for high-performance, energy-efficient, and cost-effective computing systems.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Computer engineering. $3 569006
650 4 $a Computer science. $3 573171
653 $a Processing units
653 $a Indirect memory accesses
653 $a Computing systems
653 $a Hardware designs
655 7 $a Electronic books. $2 local $3 554714
690 $a 0464
690 $a 0984
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Princeton University. $b Computer Science. $3 1179801
773 0 $t Dissertations Abstracts International $g 85-12B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31294177 $z click for full text (PQDT)