國立虎尾科技大學 |

Video Understanding with Deep Networks.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Video Understanding with Deep Networks./
作者:	Ng, Joe Yue-Hei.
面頁冊數:	1 online resource (130 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B.
Contained By:	Dissertation Abstracts International79-12B(E).
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9780438154162

Video Understanding with Deep Networks.
Ng, Joe Yue-Hei.

Video Understanding with Deep Networks. - 1 online resource (130 pages)

Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B.

Thesis (Ph.D.)--University of Maryland, College Park, 2018.

Includes bibliographical references

Video understanding is one of the fundamental problems in computer vision. Videos provide more information to the image recognition task by adding a temporal component through which motion and other information can be additionally used. Encouraged by the success of deep convolutional neural networks (CNNs) on image classification, we extend the deep convolutional networks to video understanding by modeling both spatial and temporal information.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780438154162Subjects--Topical Terms:

573171
Computer science.
Index Terms--Genre/Form:

554714
Electronic books.

Video Understanding with Deep Networks.
LDR:03839ntm a2200385Ki 4500 001 916892
005 20180928111502.5
006 m o u
007 cr mn||||a|a||
008 190606s2018 xx obm 000 0 eng d
020 $a 9780438154162
035 $a (MiAaPQ)AAI10790412
035 $a (MiAaPQ)umd:18918
035 $a AAI10790412
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Ng, Joe Yue-Hei. $3 1190756
245 1 0 $a Video Understanding with Deep Networks.
264 0 $c 2018
300 $a 1 online resource (130 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B.
500 $a Adviser: Larry S. Davis.
502 $a Thesis (Ph.D.)--University of Maryland, College Park, 2018.
504 $a Includes bibliographical references
520 $a Video understanding is one of the fundamental problems in computer vision. Videos provide more information to the image recognition task by adding a temporal component through which motion and other information can be additionally used. Encouraged by the success of deep convolutional neural networks (CNNs) on image classification, we extend the deep convolutional networks to video understanding by modeling both spatial and temporal information.
520 $a To effectively utilize deep networks, we need a comprehensive understanding of convolutional neural networks. We first study the network on the domain of image retrieval. We show that for instance-level image retrieval, lower layers often perform better than the last layers in convolutional neural networks. We present an approach for extracting convolutional features from different layers of the networks and adopt VLAD encoding to encode features into a single vector for each image. Our work provides guidance for transferring deep convolutional networks to other tasks.
520 $a We then propose and evaluate several deep neural network architectures to combine image information across a video over longer time periods than previously attempted. We propose two methods capable of handling full length videos. The first method explores various convolutional temporal feature pooling architectures, examining the various design choices which need to be made when adapting a CNN for this task. The second proposed method explicitly models the video as an ordered sequence of frames. For this purpose, we employ a recurrent neural network that uses Long Short-Term Memory (LSTM) cells which are connected to the output of the underlying CNN.
520 $a Next, we propose a multitask learning model ActionFlowNet to train a single stream network directly from raw pixels to jointly estimate optical flow while recognizing actions with convolutional neural networks, capturing both appearance and motion in a single model. Experiments show that our model effectively learns video representation from motion information on unlabeled videos.
520 $a While recent deep models for videos show improvement by incorporating optical flow or aggregating high-level appearance across frames, they focus on modeling either the long-term temporal relations or short-term motion. We propose Temporal Difference Networks (TDN) that model both long-term relations and short-term motion from videos. We leverage a simple but effective motion representation: difference of CNN features in our network and jointly modeling the motion at multiple scales in a single CNN.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
650 4 $a Artificial intelligence. $3 559380
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0800
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of Maryland, College Park. $b Computer Science. $3 1180862
773 0 $t Dissertation Abstracts International $g 79-12B(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10790412 $z click for full text (PQDT)