國立虎尾科技大學 |

Human Activity Analysis using Multi-modalities and Deep Learning.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Human Activity Analysis using Multi-modalities and Deep Learning./
作者:	Zhang, Chenyang.
面頁冊數:	1 online resource (115 pages)
附註:	Source: Dissertation Abstracts International, Volume: 78-04(E), Section: B.
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9781369148558

Human Activity Analysis using Multi-modalities and Deep Learning.
Zhang, Chenyang.

Human Activity Analysis using Multi-modalities and Deep Learning. - 1 online resource (115 pages)

Source: Dissertation Abstracts International, Volume: 78-04(E), Section: B.

Thesis (Ph.D.)--The City College of New York, 2016.

Includes bibliographical references

With the successful development of video recording devices and sharing platforms, visual media has become a significant component of everyone's life in the world. To better organize and understand the tremendous amount of visual data, computer vision and machine learning have become the key technologies to resolve such a huge problem. Among the topics in computer vision research, human activity analysis is one of the most challenging and promising areas. Human activity analysis is dedicated to detecting, recognizing, and understanding the context and meaning of human activities in visual media. This dissertation focuses on two aspects in human activity analysis: 1) how to utilize multi-modality approach, including depth sensors and traditional RGB cameras, for human action modeling. 2) How to utilize more advanced machine learning technologies, such as deep learning and sparse coding, to address more sophisticated problems such as attribute learning and automatic video captioning.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9781369148558Subjects--Topical Terms:

573171
Computer science.
Index Terms--Genre/Form:

554714
Electronic books.

Human Activity Analysis using Multi-modalities and Deep Learning.
LDR:03115ntm a2200325K 4500 001 915248
005 20180727125211.5
006 m o u
007 cr mn||||a|a||
008 190606s2016 xx obm 000 0 eng d
020 $a 9781369148558
035 $a (MiAaPQ)AAI10159927
035 $a (MiAaPQ)ccny.cuny:10107
035 $a AAI10159927
040 $a MiAaPQ $b eng $c MiAaPQ
100 1 $a Zhang, Chenyang. $3 1188551
245 1 0 $a Human Activity Analysis using Multi-modalities and Deep Learning.
264 0 $c 2016
300 $a 1 online resource (115 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 78-04(E), Section: B.
500 $a Adviser: Yingli Tian.
502 $a Thesis (Ph.D.)--The City College of New York, 2016.
504 $a Includes bibliographical references
520 $a With the successful development of video recording devices and sharing platforms, visual media has become a significant component of everyone's life in the world. To better organize and understand the tremendous amount of visual data, computer vision and machine learning have become the key technologies to resolve such a huge problem. Among the topics in computer vision research, human activity analysis is one of the most challenging and promising areas. Human activity analysis is dedicated to detecting, recognizing, and understanding the context and meaning of human activities in visual media. This dissertation focuses on two aspects in human activity analysis: 1) how to utilize multi-modality approach, including depth sensors and traditional RGB cameras, for human action modeling. 2) How to utilize more advanced machine learning technologies, such as deep learning and sparse coding, to address more sophisticated problems such as attribute learning and automatic video captioning.
520 $a To explore the utilization of the depth cameras, we first present a depth camera-based image descriptor called histogram of 3D facets (H3DF) and its utilization in human action and hand gesture recognition and a holistic depth video representation for human actions. To unify both the inputs from depth cameras and RGB cameras, this dissertation first discusses a joint framework to model human affections from both facial expressions and body gestures with a multi-modality fusion framework. Then we present deep learning-based frameworks for human attribute learning and automatic video captioning tasks. Compared to human action detection recognition, automatic video captioning is more challenging because it includes complex language models and visual context. Extensive experiments have also been conducted on several public datasets to demonstrate that our proposed frameworks in this dissertation outperform the state-of-the-art approaches in this research area.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a The City College of New York. $b Electrical Engineering. $3 1185904
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10159927 $z click for full text (PQDT)