國立虎尾科技大學 |

Deep Learning-Based Human Action Understanding in Videos.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Deep Learning-Based Human Action Understanding in Videos./
作者:	Vahdani, Elahe.
面頁冊數:	1 online resource (158 pages)
附註:	Source: Dissertations Abstracts International, Volume: 85-08, Section: B.
Contained By:	Dissertations Abstracts International85-08B.
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9798381681093

Deep Learning-Based Human Action Understanding in Videos.
Vahdani, Elahe.

Deep Learning-Based Human Action Understanding in Videos. - 1 online resource (158 pages)

Source: Dissertations Abstracts International, Volume: 85-08, Section: B.

Thesis (Ph.D.)--City University of New York, 2024.

Includes bibliographical references

The understanding of human actions in videos holds immense potential for technological advancement and societal betterment. This thesis explores fundamental aspects of this field, including action recognition in trimmed clips and action localization in untrimmed videos. Trimmed videos contain only one action instance, with moments before or after the action excluded from the video. However, the majority of videos captured in unconstrained environments, often referred to as untrimmed videos, are naturally unsegmented. Untrimmed videos are typically lengthy and may encompass multiple action instances, along with the moments preceding or following each action, as well as transitions between actions. In the task of action recognition in trimmed clips, the primary objective is to classify action categories. In contrast, action detection in untrimmed videos aims to accurately identify the starting and ending moments of actions within untrimmed videos while also assigning the corresponding action labels. Action understanding in videos has significant implications across various sectors. It is invaluable in surveillance for identifying potential threats and in healthcare for monitoring patient movements. Importantly, it serves as an indispensable tool for interpreting sign language, facilitating communication with the deaf and hard-of-hearing community. This research presents innovative frameworks for video-based action recognition and detection. Annotating temporal boundaries and action labels for all action instances in untrimmed videos is a labor-intensive and expensive process. To mitigate the need for exhaustive annotations, this work introduces pioneering frameworks that rely on limited supervision. The proposed models demonstrate significant performance improvements over the current state-of-the-art on benchmark datasets. Furthermore, the applications of action understanding in sign language videos are explored by pioneering automated detection of signing errors. The effectiveness of the models is evaluated on the collected sign language datasets.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798381681093Subjects--Topical Terms:

573171
Computer science.
Subjects--Index Terms:

Computer visionIndex Terms--Genre/Form:

554714
Electronic books.

Deep Learning-Based Human Action Understanding in Videos.
LDR:03373ntm a22003737 4500 001 1152026
005 20241125080214.5
006 m o d
007 cr mn ---uuuuu
008 250605s2024 xx obm 000 0 eng d
020 $a 9798381681093
035 $a (MiAaPQ)AAI30990374
035 $a AAI30990374
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Vahdani, Elahe. $3 1478900
245 1 0 $a Deep Learning-Based Human Action Understanding in Videos.
264 0 $c 2024
300 $a 1 online resource (158 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 85-08, Section: B.
500 $a Advisor: Tian, Yingli.
502 $a Thesis (Ph.D.)--City University of New York, 2024.
504 $a Includes bibliographical references
520 $a The understanding of human actions in videos holds immense potential for technological advancement and societal betterment. This thesis explores fundamental aspects of this field, including action recognition in trimmed clips and action localization in untrimmed videos. Trimmed videos contain only one action instance, with moments before or after the action excluded from the video. However, the majority of videos captured in unconstrained environments, often referred to as untrimmed videos, are naturally unsegmented. Untrimmed videos are typically lengthy and may encompass multiple action instances, along with the moments preceding or following each action, as well as transitions between actions. In the task of action recognition in trimmed clips, the primary objective is to classify action categories. In contrast, action detection in untrimmed videos aims to accurately identify the starting and ending moments of actions within untrimmed videos while also assigning the corresponding action labels. Action understanding in videos has significant implications across various sectors. It is invaluable in surveillance for identifying potential threats and in healthcare for monitoring patient movements. Importantly, it serves as an indispensable tool for interpreting sign language, facilitating communication with the deaf and hard-of-hearing community. This research presents innovative frameworks for video-based action recognition and detection. Annotating temporal boundaries and action labels for all action instances in untrimmed videos is a labor-intensive and expensive process. To mitigate the need for exhaustive annotations, this work introduces pioneering frameworks that rely on limited supervision. The proposed models demonstrate significant performance improvements over the current state-of-the-art on benchmark datasets. Furthermore, the applications of action understanding in sign language videos are explored by pioneering automated detection of signing errors. The effectiveness of the models is evaluated on the collected sign language datasets.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
650 4 $a Computer engineering. $3 569006
653 $a Computer vision
653 $a Deep learning
653 $a Video understanding
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0464
690 $a 0800
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a City University of New York. $b Computer Science. $3 1184450
773 0 $t Dissertations Abstracts International $g 85-08B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30990374 $z click for full text (PQDT)