國立虎尾科技大學 |

Learning Counterfactual Reasoning by Answering Counterfactual Questions from Videos.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Learning Counterfactual Reasoning by Answering Counterfactual Questions from Videos./
作者:	Hu, Qingyuan.
面頁冊數:	1 online resource (48 pages)
附註:	Source: Masters Abstracts International, Volume: 85-01.
Contained By:	Masters Abstracts International85-01.
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9798379951870

Learning Counterfactual Reasoning by Answering Counterfactual Questions from Videos.
Hu, Qingyuan.

Learning Counterfactual Reasoning by Answering Counterfactual Questions from Videos. - 1 online resource (48 pages)

Source: Masters Abstracts International, Volume: 85-01.

Thesis (M.S.)--University of California, Los Angeles, 2023.

Includes bibliographical references

Multimodal counterfactual reasoning is a vital yet challenging ability for AI systems. It involves predicting the outcomes of hypothetical circumstances based on vision and language inputs, which enables AI models to learn from failures and explore hypothetical scenarios. Despite its importance, there are only a few datasets targeting the counterfactual reasoning abilities of multimodal models. Among them, they only cover reasoning over synthetic environments or specific types of events (e.g. traffic collisions), making them hard to reliably benchmark the model generalization ability in diverse real-world scenarios and reasoning dimensions. To overcome these limitations, we develop a video question answering dataset, ACQUIRED: it consists of 3.9K annotated videos, encompassing a wide range of event types and incorporating both first and third-person viewpoints, which ensures a focus on real-world diversity. In addition, each video is annotated with questions that span three distinct dimensions of reasoning, including physical, social, and temporal, which can comprehensively evaluate the model counterfactual abilities along multiple aspects. We benchmark our dataset against several state-of-the-art language-only and multimodal models and experimental results demonstrate a significant performance gap (> 13%) between models and humans. The findings suggest that multimodal counterfactual reasoning remains an open challenge and ACQUIRED is a comprehensive and reliable benchmark for inspiring future research in this direction.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798379951870Subjects--Topical Terms:

573171
Computer science.
Subjects--Index Terms:

Counterfactual reasoningIndex Terms--Genre/Form:

554714
Electronic books.

Learning Counterfactual Reasoning by Answering Counterfactual Questions from Videos.
LDR:02917ntm a22003977 4500 001 1146242
005 20240812064350.5
006 m o d
007 cr bn ---uuuuu
008 250605s2023 xx obm 000 0 eng d
020 $a 9798379951870
035 $a (MiAaPQ)AAI30524075
035 $a AAI30524075
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Hu, Qingyuan. $3 1471600
245 1 0 $a Learning Counterfactual Reasoning by Answering Counterfactual Questions from Videos.
264 0 $c 2023
300 $a 1 online resource (48 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Masters Abstracts International, Volume: 85-01.
500 $a Advisor: Peng, Nanyun.
502 $a Thesis (M.S.)--University of California, Los Angeles, 2023.
504 $a Includes bibliographical references
520 $a Multimodal counterfactual reasoning is a vital yet challenging ability for AI systems. It involves predicting the outcomes of hypothetical circumstances based on vision and language inputs, which enables AI models to learn from failures and explore hypothetical scenarios. Despite its importance, there are only a few datasets targeting the counterfactual reasoning abilities of multimodal models. Among them, they only cover reasoning over synthetic environments or specific types of events (e.g. traffic collisions), making them hard to reliably benchmark the model generalization ability in diverse real-world scenarios and reasoning dimensions. To overcome these limitations, we develop a video question answering dataset, ACQUIRED: it consists of 3.9K annotated videos, encompassing a wide range of event types and incorporating both first and third-person viewpoints, which ensures a focus on real-world diversity. In addition, each video is annotated with questions that span three distinct dimensions of reasoning, including physical, social, and temporal, which can comprehensively evaluate the model counterfactual abilities along multiple aspects. We benchmark our dataset against several state-of-the-art language-only and multimodal models and experimental results demonstrate a significant performance gap (> 13%) between models and humans. The findings suggest that multimodal counterfactual reasoning remains an open challenge and ACQUIRED is a comprehensive and reliable benchmark for inspiring future research in this direction.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
650 4 $a Information technology. $3 559429
653 $a Counterfactual reasoning
653 $a AI models
653 $a Language inputs
653 $a Performance gap
653 $a Real-world diversity
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0800
690 $a 0489
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of California, Los Angeles. $b Computer Science 0201. $3 1182286
773 0 $t Masters Abstracts International $g 85-01.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30524075 $z click for full text (PQDT)