國立虎尾科技大學 |

An OpenCL Framework for Real-time Inference of Next-Generation Convolutional Neural Networks on FPGAs.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	An OpenCL Framework for Real-time Inference of Next-Generation Convolutional Neural Networks on FPGAs./
作者:	Kumawat, Sachin.
面頁冊數:	1 online resource (93 pages)
附註:	Source: Masters Abstracts International, Volume: 57-05.
Contained By:	Masters Abstracts International57-05(E).
標題:	Electrical engineering. -
電子資源:	click for full text (PQDT)
ISBN:	9780355764413

An OpenCL Framework for Real-time Inference of Next-Generation Convolutional Neural Networks on FPGAs.
Kumawat, Sachin.

An OpenCL Framework for Real-time Inference of Next-Generation Convolutional Neural Networks on FPGAs. - 1 online resource (93 pages)

Source: Masters Abstracts International, Volume: 57-05.

Thesis (M.S.)--University of California, Davis, 2017.

Includes bibliographical references

Modern Convolutional Neural Networks (CNNs) consist of billions of multiplications and additions which require the use of parallel computing units such as GPUs, FPGAs and other DSP processors. Consequently, General Purpose GPU (GPGPU) computing has taken this field by storm. At the same time, there has been an increased interest in FPGA based acceleration of CNN inference.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780355764413Subjects--Topical Terms:

596380
Electrical engineering.
Index Terms--Genre/Form:

554714
Electronic books.

An OpenCL Framework for Real-time Inference of Next-Generation Convolutional Neural Networks on FPGAs.
LDR:03215ntm a2200349Ki 4500 001 917076
005 20181005115847.5
006 m o u
007 cr mn||||a|a||
008 190606s2017 xx obm 000 0 eng d
020 $a 9780355764413
035 $a (MiAaPQ)AAI10682711
035 $a (MiAaPQ)ucdavis:17550
035 $a AAI10682711
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Kumawat, Sachin. $3 1191002
245 1 3 $a An OpenCL Framework for Real-time Inference of Next-Generation Convolutional Neural Networks on FPGAs.
264 0 $c 2017
300 $a 1 online resource (93 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Masters Abstracts International, Volume: 57-05.
500 $a Adviser: Soheil Ghiasi.
502 $a Thesis (M.S.)--University of California, Davis, 2017.
504 $a Includes bibliographical references
520 $a Modern Convolutional Neural Networks (CNNs) consist of billions of multiplications and additions which require the use of parallel computing units such as GPUs, FPGAs and other DSP processors. Consequently, General Purpose GPU (GPGPU) computing has taken this field by storm. At the same time, there has been an increased interest in FPGA based acceleration of CNN inference.
520 $a In this work, we present FICaffe, a framework for FPGA-based Inference with Caffe, which provides a complete automated generation and mapping of CNN accelerators on FPGAs. We target applications with critical latency requirements and design high processing efficiency accelerators for CNNs. The architecture is structured in a highly concurrent OpenCL library, which enables High Level Synthesis tools to effectively exploit data, task and pipeline parallelism. We propose a unified memory model, that drives exploration of optimal design by matching on-chip and off-chip memory bandwidths available on FPGA platforms. We also identify origins of all clock cycle stalls and overheads inherent to CNN acceleration designs and provide a detailed model to accurately predict the runtime latency with less than 4% error against on-board tests. Furthermore, with FICaffe we provide support for cross-network synthesis, such that it is possible to processes a variety of CNNs, with reasonable efficiency, without long re-compilation hours. FICaffe is integrated with the popular deep learning framework Caffe, and is deployable to a wide variety of CNNs. FICaffe's efficacy is shown by mapping to a 28nm Stratix V GXA7 chip, and both network specific and cross-network performance are reported for AlexNet, VGG, SqueezeNet and GoogLeNet. We show a processing efficiency of 95.8% for the widely-reported VGG benchmark, which outperforms prior work. FICaffe also achieves more than 2X speedup on Stratix V GXA7 compared with the best published results on this chip, to the best of our knowledge.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Electrical engineering. $3 596380
650 4 $a Computer engineering. $3 569006
655 7 $a Electronic books. $2 local $3 554714
690 $a 0544
690 $a 0464
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of California, Davis. $b Electrical and Computer Engineering. $3 1178925
773 0 $t Masters Abstracts International $g 57-05(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10682711 $z click for full text (PQDT)