國立虎尾科技大學 |

Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Network.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Network./
作者:	Kulkarni, Adwaya Amey.
面頁冊數:	1 online resource (112 pages)
附註:	Source: Masters Abstracts International, Volume: 57-04.
Contained By:	Masters Abstracts International57-04(E).
標題:	Computer engineering. -
電子資源:	click for full text (PQDT)
ISBN:	9780355674163

Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Network.
Kulkarni, Adwaya Amey.

Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Network. - 1 online resource (112 pages)

Source: Masters Abstracts International, Volume: 57-04.

Thesis (M.S.)--University of Maryland, Baltimore County, 2017.

Includes bibliographical references

Lightweight Machine Learning (ML) and Convolution Neural Network (CNN) can offer solutions for wearable cognitive devices and the resource-constrained Internet of Things (IoT) platforms. However, the implementation of ML and CNN kernels are computationally intensive and faces memory storage issues on tiny embedded platforms. In recent years, heterogeneous hardware and acceleration, where compute intensive tasks are performed on kernel specific cores, have gained attention with growing interest in the industry to develop tiny lightweight manycore accelerators that address these issues. In this thesis, we propose two extended versions of an existing manycore architecture ``PENC: Power Efficient Nano Cluster" which can efficiently implement common ML and CNN kernels with much-reduced computation and memory complexity. First, we propose ``PACENet: Programmable many-core ACcElerator'', which has CNN specific instructions for frequently used kernels such as convolution, activation functions such as ReLU (RELU) and Max-pool (MP), and machine learning specific instructions for Manhattan distance calculation (MNT). Secondly, we propose ``BiNMAC: Binarized Neural network Manycore ACcelerator'' that implements the binary neural network. Reducing weights to binary format will not only reduce memory access bottleneck but also reduce computations since most arithmetic operations are replaced with bit-wise operations. To add binarized CNN capability, we implemented instructions such as Batch XOR, and XNOR, PCNT (population count), PCH (patch selection) and BCAST (a communication-based instruction) in the existing instruction set hardware. Both PACENet and BiNMAC cores were fully synthesized and placed and routed using TSMC 65~nm CMOS technology. Each single processing core of PACENet occupies 98.7um2 area and consumes 32.2mW power operating at 1GHz frequency and 1V, while BiNMAC single core occupies 97.9um2 area and consumes 31.1mW power. Compared to existing PENC manycore architecture, PACENet achieves 13.3% area reduction and 14.1% power reduction at 1GHz frequency. Compared to the original PENC architecture, BiNMAC achieves 17.1% area reduction and 13.2% power reduction at 1GHz frequency. To conclude this work, we also evaluated the performance of PACENet and BiNMAC accelerators with respect to personalized biomedical applications namely stress detection and seizure detection, and computer vision namely object detection application. The stress detection and seizure detection application are evaluated on ARL dataset and Boston hospital dataset for K-nearest neighbor algorithm. The proposed PACENet shows 59% increase in throughput, and 43.7% reduction in energy consumption for stress detection application, whereas, for seizure detection application, PACENet improves 60% throughput and 43.6% reduction in energy consumption in comparison to the PENC manycore. For computer vision application, we evaluated ResNet-20 network trained using CIFAR-10 dataset for both PACENet and BiNMAC accelerators. PACENet achieves 2.3x higher throughput per watt performance and requires 57.3% less reduction in energy consumption compared to the PENC manycore. For SensorNet implementation, the proposed BiNMAC achieves 1.8x higher throughput and consumes 13x less energy as compared to PENC manycore, while the ResNet-20 network implementation takes 36x higher throughput consuming 195x less energy.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780355674163Subjects--Topical Terms:

569006
Computer engineering.
Index Terms--Genre/Form:

554714
Electronic books.

Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Network.
LDR:04679ntm a2200337Ki 4500 001 916837
005 20180928111502.5
006 m o u
007 cr mn||||a|a||
008 190606s2017 xx obm 000 0 eng d
020 $a 9780355674163
035 $a (MiAaPQ)AAI10683279
035 $a (MiAaPQ)umbc:11778
035 $a AAI10683279
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Kulkarni, Adwaya Amey. $3 1190687
245 1 0 $a Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Network.
264 0 $c 2017
300 $a 1 online resource (112 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Masters Abstracts International, Volume: 57-04.
500 $a Adviser: Prof. Tinoosh Mohsenin.
502 $a Thesis (M.S.)--University of Maryland, Baltimore County, 2017.
504 $a Includes bibliographical references
520 $a Lightweight Machine Learning (ML) and Convolution Neural Network (CNN) can offer solutions for wearable cognitive devices and the resource-constrained Internet of Things (IoT) platforms. However, the implementation of ML and CNN kernels are computationally intensive and faces memory storage issues on tiny embedded platforms. In recent years, heterogeneous hardware and acceleration, where compute intensive tasks are performed on kernel specific cores, have gained attention with growing interest in the industry to develop tiny lightweight manycore accelerators that address these issues. In this thesis, we propose two extended versions of an existing manycore architecture ``PENC: Power Efficient Nano Cluster" which can efficiently implement common ML and CNN kernels with much-reduced computation and memory complexity. First, we propose ``PACENet: Programmable many-core ACcElerator'', which has CNN specific instructions for frequently used kernels such as convolution, activation functions such as ReLU (RELU) and Max-pool (MP), and machine learning specific instructions for Manhattan distance calculation (MNT). Secondly, we propose ``BiNMAC: Binarized Neural network Manycore ACcelerator'' that implements the binary neural network. Reducing weights to binary format will not only reduce memory access bottleneck but also reduce computations since most arithmetic operations are replaced with bit-wise operations. To add binarized CNN capability, we implemented instructions such as Batch XOR, and XNOR, PCNT (population count), PCH (patch selection) and BCAST (a communication-based instruction) in the existing instruction set hardware. Both PACENet and BiNMAC cores were fully synthesized and placed and routed using TSMC 65~nm CMOS technology. Each single processing core of PACENet occupies 98.7um2 area and consumes 32.2mW power operating at 1GHz frequency and 1V, while BiNMAC single core occupies 97.9um2 area and consumes 31.1mW power. Compared to existing PENC manycore architecture, PACENet achieves 13.3% area reduction and 14.1% power reduction at 1GHz frequency. Compared to the original PENC architecture, BiNMAC achieves 17.1% area reduction and 13.2% power reduction at 1GHz frequency. To conclude this work, we also evaluated the performance of PACENet and BiNMAC accelerators with respect to personalized biomedical applications namely stress detection and seizure detection, and computer vision namely object detection application. The stress detection and seizure detection application are evaluated on ARL dataset and Boston hospital dataset for K-nearest neighbor algorithm. The proposed PACENet shows 59% increase in throughput, and 43.7% reduction in energy consumption for stress detection application, whereas, for seizure detection application, PACENet improves 60% throughput and 43.6% reduction in energy consumption in comparison to the PENC manycore. For computer vision application, we evaluated ResNet-20 network trained using CIFAR-10 dataset for both PACENet and BiNMAC accelerators. PACENet achieves 2.3x higher throughput per watt performance and requires 57.3% less reduction in energy consumption compared to the PENC manycore. For SensorNet implementation, the proposed BiNMAC achieves 1.8x higher throughput and consumes 13x less energy as compared to PENC manycore, while the ResNet-20 network implementation takes 36x higher throughput consuming 195x less energy.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Computer engineering. $3 569006
650 4 $a Artificial intelligence. $3 559380
655 7 $a Electronic books. $2 local $3 554714
690 $a 0464
690 $a 0800
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of Maryland, Baltimore County. $b Engineering, Computer. $3 1179338
773 0 $t Masters Abstracts International $g 57-04(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10683279 $z click for full text (PQDT)