國立虎尾科技大學 |

Accelerating Graph Neural Network Computation on CPUs and GPUs.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Accelerating Graph Neural Network Computation on CPUs and GPUs./
作者:	Fu, Qiang.
面頁冊數:	1 online resource (124 pages)
附註:	Source: Dissertations Abstracts International, Volume: 85-06, Section: A.
Contained By:	Dissertations Abstracts International85-06A.
標題:	Computer engineering. -
電子資源:	click for full text (PQDT)
ISBN:	9798380887229

Accelerating Graph Neural Network Computation on CPUs and GPUs.
Fu, Qiang.

Accelerating Graph Neural Network Computation on CPUs and GPUs. - 1 online resource (124 pages)

Source: Dissertations Abstracts International, Volume: 85-06, Section: A.

Thesis (Ph.D.)--The George Washington University, 2024.

Includes bibliographical references

Graph Neural Networks (GNNs) are becoming popular because of their effectiveness in extracting structural information from graph data. Recent years have seen the revolutionizing results of GNNs in various applications, e.g., chemistry, social science, knowledge graph, recommendation system, and neuroscience. However, high-performance computation of GNNs is challenging. This dissertation strives to understand and accelerate the performance of GNN computation on multi-core CPUs and general-purpose GPUs by performance profiling, code generation, and efficient workload scheduling.This dissertation first presents a novel compiler-based software framework GIN optimized for GNN inference on CPUs (Chapter 3), which offers a user-friendly interface, via an intuitive programming model, for defining graph neural network models. GIN builds high-level dataflow graphs as intermediate representations, which are transformed into highly efficient codes and then compiled into binary inference kernels. Our evaluation shows that GIN significantly accelerates the inference on billion-edge graphs, beating three state-of-the-art GNN solutions as well as a traditional graph processing system Ligra.Chapter 4 describes TLPGNN, a lightweight two-level parallelism paradigm for GNN computation on single and multiple GPUs. First, we divide the GNN computation into two levels, Le., vertex parallelism for the first level and feature parallelism for the second. Next, we employ a novel hybrid dynamic workload assignment to address the imbalanced workload distribution. Furthermore, we fuse the kernels to reduce the number of kernel launches and cache the frequently accessed data into registers to avoid unnecessary memory traffic. To scale TLPGNN to multi-GPU environments, we propose an edge- aware row-wise 1-D partition method to ensure a balanced workload distribution across different GPU devices. Together, TLPGNN is able to significantly outperform existing GNN computation systems. Evaluations of multiple-GPU TLPGNN also demonstrate that our solution achieves both linear scalability and a well-balanced workload distribution.Chapter 5 proposes JITSPMM, a just-in-time (JIT) assembly code generation framework to accelerated SpMM (Sparse Matrix-Matrix Multiplication) computation, which is an important component of GNN implementation, on multi-core CPU with SIMD extensions. Firstly, JITSPMM integrates the JIT assembly code generation technique into three widely- used workload division methods for SpMM, i.e., row-split, nnz-split and merge-split, to achieve balanced workload distribution among CPU threads. Next, with the availability of runtime information, JITSPMM employs a novel technique coarse-grain column merging to maximize instruction level parallelism by unrolling the performance-critical loop. Furthermore, JITSPMM intelligently allocates registers to cache frequently accessed data, minimizing memory access, and employs selected SIMD instructions to enhance arithmetic throughput. Together, JITSPMM is able to significantly outperform two AOT implementations for SpMM including Intel MKL by 3.8x and 1.4x, respectively, on average.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798380887229Subjects--Topical Terms:

569006
Computer engineering.
Subjects--Index Terms:

Code generationIndex Terms--Genre/Form:

554714
Electronic books.

Accelerating Graph Neural Network Computation on CPUs and GPUs.
LDR:04509ntm a2200385K 4500 001 1141623
005 20240318062657.5
006 m o d
007 cr mn ---uuuuu
008 250605s2024 xx obm 000 0 eng d
020 $a 9798380887229
035 $a (MiAaPQ)AAI30638889
035 $a AAI30638889
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Fu, Qiang. $3 1465524
245 1 0 $a Accelerating Graph Neural Network Computation on CPUs and GPUs.
264 0 $c 2024
300 $a 1 online resource (124 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 85-06, Section: A.
500 $a Advisor: Huang, H. Howie.
502 $a Thesis (Ph.D.)--The George Washington University, 2024.
504 $a Includes bibliographical references
520 $a Graph Neural Networks (GNNs) are becoming popular because of their effectiveness in extracting structural information from graph data. Recent years have seen the revolutionizing results of GNNs in various applications, e.g., chemistry, social science, knowledge graph, recommendation system, and neuroscience. However, high-performance computation of GNNs is challenging. This dissertation strives to understand and accelerate the performance of GNN computation on multi-core CPUs and general-purpose GPUs by performance profiling, code generation, and efficient workload scheduling.This dissertation first presents a novel compiler-based software framework GIN optimized for GNN inference on CPUs (Chapter 3), which offers a user-friendly interface, via an intuitive programming model, for defining graph neural network models. GIN builds high-level dataflow graphs as intermediate representations, which are transformed into highly efficient codes and then compiled into binary inference kernels. Our evaluation shows that GIN significantly accelerates the inference on billion-edge graphs, beating three state-of-the-art GNN solutions as well as a traditional graph processing system Ligra.Chapter 4 describes TLPGNN, a lightweight two-level parallelism paradigm for GNN computation on single and multiple GPUs. First, we divide the GNN computation into two levels, Le., vertex parallelism for the first level and feature parallelism for the second. Next, we employ a novel hybrid dynamic workload assignment to address the imbalanced workload distribution. Furthermore, we fuse the kernels to reduce the number of kernel launches and cache the frequently accessed data into registers to avoid unnecessary memory traffic. To scale TLPGNN to multi-GPU environments, we propose an edge- aware row-wise 1-D partition method to ensure a balanced workload distribution across different GPU devices. Together, TLPGNN is able to significantly outperform existing GNN computation systems. Evaluations of multiple-GPU TLPGNN also demonstrate that our solution achieves both linear scalability and a well-balanced workload distribution.Chapter 5 proposes JITSPMM, a just-in-time (JIT) assembly code generation framework to accelerated SpMM (Sparse Matrix-Matrix Multiplication) computation, which is an important component of GNN implementation, on multi-core CPU with SIMD extensions. Firstly, JITSPMM integrates the JIT assembly code generation technique into three widely- used workload division methods for SpMM, i.e., row-split, nnz-split and merge-split, to achieve balanced workload distribution among CPU threads. Next, with the availability of runtime information, JITSPMM employs a novel technique coarse-grain column merging to maximize instruction level parallelism by unrolling the performance-critical loop. Furthermore, JITSPMM intelligently allocates registers to cache frequently accessed data, minimizing memory access, and employs selected SIMD instructions to enhance arithmetic throughput. Together, JITSPMM is able to significantly outperform two AOT implementations for SpMM including Intel MKL by 3.8x and 1.4x, respectively, on average.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Computer engineering. $3 569006
650 4 $a Computer science. $3 573171
650 4 $a Information science. $3 561178
653 $a Code generation
653 $a Compiler-based software
653 $a Graph Neural Networks
653 $a Performance profiling
655 7 $a Electronic books. $2 local $3 554714
690 $a 0464
690 $a 0984
690 $a 0723
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a The George Washington University. $b Computer Engineering. $3 1195524
773 0 $t Dissertations Abstracts International $g 85-06A.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30638889 $z click for full text (PQDT)