國立虎尾科技大學 |

Automatic Cache Partitioning Method for High-level Synthesis.

Record Type:	Language materials, printed : Monograph/item
Title/Author:	Automatic Cache Partitioning Method for High-level Synthesis./
Author:	Jones, Bryant M.
Published:	Ann Arbor : ProQuest Dissertations & Theses, : 2018,
Description:	169 p.
Notes:	Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B.
Contained By:	Dissertation Abstracts International79-12B(E).
Subject:	Computer engineering. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10751888
ISBN:	9780438184688

Automatic Cache Partitioning Method for High-level Synthesis.
Jones, Bryant M.

Automatic Cache Partitioning Method for High-level Synthesis. - Ann Arbor : ProQuest Dissertations & Theses, 2018 - 169 p.

Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B.

Thesis (Ph.D.)--Oakland University, 2018.

Existing algorithms can be automatically translated from software to hardware using High-Level Synthesis (HLS) allowing for quick prototyping or deployment of embedded designs. HLS has been gaining popularity over the past several years as vendor tools have begun adopting the methodology, making it easily available to developers. High-level software is written with a single main memory in mind whereas hardware designs can take advantage of many parallel memories. For high-performance designs it is important to effectively translate and optimize memory access and architectures. Tools provide optimizations on memory structures targeting data reuse and partitioning, but generally these are applied separately for a given object in memory. Memory access that cannot be effectively optimized is serialized to the memory hindering any further parallelization of the surrounding generated hardware.

ISBN: 9780438184688Subjects--Topical Terms:

569006
Computer engineering.

Automatic Cache Partitioning Method for High-level Synthesis.
LDR:02906nam a2200313 4500 001 931647
005 20190716101634.5
008 190815s2018 ||||||||||||||||| ||eng d
020 $a 9780438184688
035 $a (MiAaPQ)AAI10751888
035 $a (MiAaPQ)oakland:10077
035 $a AAI10751888
040 $a MiAaPQ $c MiAaPQ
100 1 $a Jones, Bryant M. $3 1213841
245 1 0 $a Automatic Cache Partitioning Method for High-level Synthesis.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2018
300 $a 169 p.
500 $a Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B.
500 $a Adviser: Darrin M. Hanna.
502 $a Thesis (Ph.D.)--Oakland University, 2018.
520 $a Existing algorithms can be automatically translated from software to hardware using High-Level Synthesis (HLS) allowing for quick prototyping or deployment of embedded designs. HLS has been gaining popularity over the past several years as vendor tools have begun adopting the methodology, making it easily available to developers. High-level software is written with a single main memory in mind whereas hardware designs can take advantage of many parallel memories. For high-performance designs it is important to effectively translate and optimize memory access and architectures. Tools provide optimizations on memory structures targeting data reuse and partitioning, but generally these are applied separately for a given object in memory. Memory access that cannot be effectively optimized is serialized to the memory hindering any further parallelization of the surrounding generated hardware.
520 $a In this work, we present an automated optimization method for creating custom cache memory architectures for HLS-generated designs. Our optimization uses runtime profiling data and is performed at a localized scope. This method combines data reuse savings and memory partitioning to further increase the potential parallelism and alleviate the serialized memory access increasing performance. Comparisons are made against architectures without this optimization and against other HLS caching approaches. Results are presented for several benchmarks from multiple application domains showing this method provides a reduction of up to 89% of the number of execution cycles and 87% of the execution time compared to designs with no caches. Benchmarks are further tested through increasing data sets and external memory latency, and show that this method performs well in both cases. Furthermore, this method is not dependent on the Flowpaths toolchain, and an example is given showing that applying this method to other tools provides similar results and savings.
590 $a School code: 0446.
650 4 $a Computer engineering. $3 569006
650 4 $a Electrical engineering. $3 596380
690 $a 0464
690 $a 0544
710 2 $a Oakland University. $b Engineering. $3 1182277
773 0 $t Dissertation Abstracts International $g 79-12B(E).
790 $a 0446
791 $a Ph.D.
792 $a 2018
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10751888