國立虎尾科技大學 |

Automatic Video Captioning using Deep Neural Network.

Record Type:	Language materials, manuscript : Monograph/item
Title/Author:	Automatic Video Captioning using Deep Neural Network./
Author:	Nguyen, Thang Huy.
Description:	1 online resource (90 pages)
Notes:	Source: Masters Abstracts International, Volume: 56-06.
Contained By:	Masters Abstracts International56-06(E).
Subject:	Computer engineering. -
Online resource:	click for full text (PQDT)
ISBN:	9780355160499

Automatic Video Captioning using Deep Neural Network.
Nguyen, Thang Huy.

Automatic Video Captioning using Deep Neural Network. - 1 online resource (90 pages)

Source: Masters Abstracts International, Volume: 56-06.

Thesis (M.S.)--Rochester Institute of Technology, 2017.

Includes bibliographical references

Video understanding has become increasingly important as surveillance, social, and informational videos weave themselves into our everyday lives. Video captioning offers a simple way to summarize, index, and search the data. Most video captioning models utilize a video encoder and captioning decoder framework. Hierarchical encoders can abstractly capture clip level temporal features to represent a video, but the clips are at fixed time steps. This thesis research introduces two models: a hierarchical model with steered captioning, and a Multi-stream Hierarchical Boundary model. The steered captioning model is the first attention model to smartly guide an attention model to appropriate locations in a video by using visual attributes. The Multi-stream Hierarchical Boundary model combines a fixed hierarchy recurrent architecture with a soft hierarchy layer by using intrinsic feature boundary cuts within a video to define clips. This thesis also introduces a novel parametric Gaussian attention which removes the restriction of soft attention techniques which require fixed length video streams. By carefully incorporating Gaussian attention in designated layers, the proposed models demonstrate state-of-the-art video captioning results on recent datasets.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780355160499Subjects--Topical Terms:

569006
Computer engineering.
Index Terms--Genre/Form:

554714
Electronic books.

Automatic Video Captioning using Deep Neural Network.
LDR:02432ntm a2200325Ki 4500 001 918321
005 20181114145235.5
006 m o u
007 cr mn||||a|a||
008 190606s2017 xx obm 000 0 eng d
020 $a 9780355160499
035 $a (MiAaPQ)AAI10618993
035 $a (MiAaPQ)rit:12735
035 $a AAI10618993
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Nguyen, Thang Huy. $3 1192612
245 1 0 $a Automatic Video Captioning using Deep Neural Network.
264 0 $c 2017
300 $a 1 online resource (90 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Masters Abstracts International, Volume: 56-06.
500 $a Adviser: Raymond Ptucha.
502 $a Thesis (M.S.)--Rochester Institute of Technology, 2017.
504 $a Includes bibliographical references
520 $a Video understanding has become increasingly important as surveillance, social, and informational videos weave themselves into our everyday lives. Video captioning offers a simple way to summarize, index, and search the data. Most video captioning models utilize a video encoder and captioning decoder framework. Hierarchical encoders can abstractly capture clip level temporal features to represent a video, but the clips are at fixed time steps. This thesis research introduces two models: a hierarchical model with steered captioning, and a Multi-stream Hierarchical Boundary model. The steered captioning model is the first attention model to smartly guide an attention model to appropriate locations in a video by using visual attributes. The Multi-stream Hierarchical Boundary model combines a fixed hierarchy recurrent architecture with a soft hierarchy layer by using intrinsic feature boundary cuts within a video to define clips. This thesis also introduces a novel parametric Gaussian attention which removes the restriction of soft attention techniques which require fixed length video streams. By carefully incorporating Gaussian attention in designated layers, the proposed models demonstrate state-of-the-art video captioning results on recent datasets.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Computer engineering. $3 569006
655 7 $a Electronic books. $2 local $3 554714
690 $a 0464
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Rochester Institute of Technology. $b Computer Engineering. $3 1184443
773 0 $t Masters Abstracts International $g 56-06(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10618993 $z click for full text (PQDT)