國立虎尾科技大學 |

Harmonic CUDA : = Asynchronous Programming on GPUs.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Harmonic CUDA :/
其他題名:	Asynchronous Programming on GPUs.
作者:	Wapman, Jonathan.
面頁冊數:	1 online resource (51 pages)
附註:	Source: Masters Abstracts International, Volume: 85-01.
Contained By:	Masters Abstracts International85-01.
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9798379776329

Harmonic CUDA : = Asynchronous Programming on GPUs.
Wapman, Jonathan.

Harmonic CUDA :Asynchronous Programming on GPUs. - 1 online resource (51 pages)

Source: Masters Abstracts International, Volume: 85-01.

Thesis (M.S.)--University of California, Davis, 2023.

Includes bibliographical references

We introduce Harmonic CUDA, a dataflow programming model for GPUs that allows programmers to describe algorithms as a dependency graph of producers and consumers where data flows continuously through the graph for the duration of the kernel. This makes it easier for programmers to exploit asynchrony, warp specialization, and hardware acceleration. Using Harmonic CUDA, we implement two example applications: Matrix Multiplication and GraphSage. The matrix multiplication kernel demonstrates how a key kernel can break down into more granular building blocks, with results that show a geomean average of 80% of cuBLAS performance, and up to 92% when omitting small matrices, as well as an analysis of how to improve performance in the future. GraphSage shows how asynchrony and warp specialization can provide significant performance improvements by reusing the same building blocks as the matrix multiplication kernel. We show performance improvements of 34% by changing to a warp-specialized version compared to a bulk-synchronous implementation. This thesis evaluates the strengths and weaknesses of Harmonic CUDA based on these test cases and suggests future work to improve the programming model.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798379776329Subjects--Topical Terms:

573171
Computer science.
Subjects--Index Terms:

AsynchronousIndex Terms--Genre/Form:

554714
Electronic books.

Harmonic CUDA : = Asynchronous Programming on GPUs.
LDR:02517ntm a22004097 4500 001 1143552
005 20240517104557.5
006 m o d
007 cr mn ---uuuuu
008 250605s2023 xx obm 000 0 eng d
020 $a 9798379776329
035 $a (MiAaPQ)AAI30000843
035 $a AAI30000843
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Wapman, Jonathan. $3 1468274
245 1 0 $a Harmonic CUDA : $b Asynchronous Programming on GPUs.
264 0 $c 2023
300 $a 1 online resource (51 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Masters Abstracts International, Volume: 85-01.
500 $a Advisor: Owens, John D.
502 $a Thesis (M.S.)--University of California, Davis, 2023.
504 $a Includes bibliographical references
520 $a We introduce Harmonic CUDA, a dataflow programming model for GPUs that allows programmers to describe algorithms as a dependency graph of producers and consumers where data flows continuously through the graph for the duration of the kernel. This makes it easier for programmers to exploit asynchrony, warp specialization, and hardware acceleration. Using Harmonic CUDA, we implement two example applications: Matrix Multiplication and GraphSage. The matrix multiplication kernel demonstrates how a key kernel can break down into more granular building blocks, with results that show a geomean average of 80% of cuBLAS performance, and up to 92% when omitting small matrices, as well as an analysis of how to improve performance in the future. GraphSage shows how asynchrony and warp specialization can provide significant performance improvements by reusing the same building blocks as the matrix multiplication kernel. We show performance improvements of 34% by changing to a warp-specialized version compared to a bulk-synchronous implementation. This thesis evaluates the strengths and weaknesses of Harmonic CUDA based on these test cases and suggests future work to improve the programming model.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
650 4 $a Computer engineering. $3 569006
650 4 $a Electrical engineering. $3 596380
653 $a Asynchronous
653 $a CUDA
653 $a GEMM
653 $a GPU
653 $a GraphSage
653 $a Programming model
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0464
690 $a 0544
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of California, Davis. $b Electrical and Computer Engineering. $3 1178925
773 0 $t Masters Abstracts International $g 85-01.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30000843 $z click for full text (PQDT)