國立虎尾科技大學 |

Towards Comprehensive Visual Understanding.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Towards Comprehensive Visual Understanding./
作者:	Jia, Menglin.
面頁冊數:	1 online resource (186 pages)
附註:	Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
Contained By:	Dissertations Abstracts International84-12B.
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9798379722371

Towards Comprehensive Visual Understanding.
Jia, Menglin.

Towards Comprehensive Visual Understanding. - 1 online resource (186 pages)

Source: Dissertations Abstracts International, Volume: 84-12, Section: B.

Thesis (Ph.D.)--Cornell University, 2023.

Includes bibliographical references

An image is worth a thousand words, conveying information that goes beyond the visual content therein. Traditional computer vision tasks focus on the recognition of tangible properties of images, such as objects and scenes. Relatively little attention has been paid to tasks that involve private states where subjectivity analysis is relevant. This area includes detecting cyberbullying and hate speech, identifying emotions, and understanding rhetoric and intentions. This dissertation presents our work in exploring new challenges and approaches towards comprehensive visual understanding, with both subjectivity and objectivity in images in mind. Specifically, on the challenge side, we focus on a specific aspect of subjectivity: the intent behind social media images. We introduce an intent dataset, Intentonomy, annotated with 28 intent categories derived from a social psychology taxonomy. We then systematically study whether, and to what extent, commonly used visual information, i.e., object and context, contribute to human intent understanding. On the approach side, we present three approaches: (1) an intent classifier that attends to object and context classes in images as well as textual information in the form of hashtags; (2) a streamlined pre-training method that uses pseudo labels derived from human responses to social media posts. (3) a parameter-efficient transfer learning method for adapting ever-increasing pre-trained vision models. We find our dataset to be very challenging for visual recognition systems and our approaches to be empirically effective on representative visual understanding tasks.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798379722371Subjects--Topical Terms:

573171
Computer science.
Subjects--Index Terms:

Computer visionIndex Terms--Genre/Form:

554714
Electronic books.

Towards Comprehensive Visual Understanding.
LDR:02972ntm a22003977 4500 001 1144460
005 20240611104233.5
006 m o d
007 cr mn ---uuuuu
008 250605s2023 xx obm 000 0 eng d
020 $a 9798379722371
035 $a (MiAaPQ)AAI30248533
035 $a AAI30248533
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Jia, Menglin. $3 1469500
245 1 0 $a Towards Comprehensive Visual Understanding.
264 0 $c 2023
300 $a 1 online resource (186 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
500 $a Advisor: Cardie, Claire.
502 $a Thesis (Ph.D.)--Cornell University, 2023.
504 $a Includes bibliographical references
520 $a An image is worth a thousand words, conveying information that goes beyond the visual content therein. Traditional computer vision tasks focus on the recognition of tangible properties of images, such as objects and scenes. Relatively little attention has been paid to tasks that involve private states where subjectivity analysis is relevant. This area includes detecting cyberbullying and hate speech, identifying emotions, and understanding rhetoric and intentions. This dissertation presents our work in exploring new challenges and approaches towards comprehensive visual understanding, with both subjectivity and objectivity in images in mind. Specifically, on the challenge side, we focus on a specific aspect of subjectivity: the intent behind social media images. We introduce an intent dataset, Intentonomy, annotated with 28 intent categories derived from a social psychology taxonomy. We then systematically study whether, and to what extent, commonly used visual information, i.e., object and context, contribute to human intent understanding. On the approach side, we present three approaches: (1) an intent classifier that attends to object and context classes in images as well as textual information in the form of hashtags; (2) a streamlined pre-training method that uses pseudo labels derived from human responses to social media posts. (3) a parameter-efficient transfer learning method for adapting ever-increasing pre-trained vision models. We find our dataset to be very challenging for visual recognition systems and our approaches to be empirically effective on representative visual understanding tasks.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
650 4 $a Information science. $3 561178
653 $a Computer vision
653 $a Machine learning
653 $a Visual information
653 $a Social media posts
653 $a Textual information
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0800
690 $a 0723
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Cornell University. $b Information Science. $3 1179518
773 0 $t Dissertations Abstracts International $g 84-12B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30248533 $z click for full text (PQDT)