國立虎尾科技大學 |

Grounded Compositional Concept Learning.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Grounded Compositional Concept Learning./
作者:	Xu, Guangyue.
面頁冊數:	1 online resource (103 pages)
附註:	Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
Contained By:	Dissertations Abstracts International85-11B.
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9798382726328

Grounded Compositional Concept Learning.
Xu, Guangyue.

Grounded Compositional Concept Learning. - 1 online resource (103 pages)

Source: Dissertations Abstracts International, Volume: 85-11, Section: B.

Thesis (Ph.D.)--Michigan State University, 2024.

Includes bibliographical references

Humans learn concepts in a grounded and compositional manner. Such compositional and grounding abilities enable humans to understand an endless variety of scenarios and expressions. Although deep learning models have pushed performance to new limits on many Natural Language Processing and Computer Vision tasks, we still have a lack of knowledge about how these models process compositional structures and their potential to accomplish human-like meaning composition. The goal of this thesis is to advance the current compositional generalization research on both the evaluation and design of the learning models. In this direction, we make the following contributions.Firstly, we introduce a transductive learning method to utilize the unlabeled data for learning the distribution of both seen and novel compositions. Moreover, we utilize the cross-attention mechanism to align and ground the linguistic concepts into specific regions of the image to tackle the grounding challenge. Unlike traditional learning, we use episodic training where each training item consists of one image and the sampled positive and negative compositional labels. We select the image's compositional label by computing their matching scores Our empirical results show that combining episodic training and transductive learning does help compositional learning.Secondly, we develop a new prompting technique for compositional learning by considering the interaction between element concepts. In our proposed technique called GIPCOL, we construct a textual input that contains rich compositional information when prompting the foundation vision-language model. We use the CLIP model as the pre-trained backbone vision-language model and improve its compositional zero-shot learning ability with our novel soft-prompting approach. GIPCOL freezes the majority of CLIP's parameters and only learns CLIP's word embedding layer through a graph neural network. By concatenating the learnable soft prompt and the updated word embeddings, GIPCOL achieves better results compared with other prompting-based methods.Thirdly, since retrieval plays a critical role in human learning, our work studies how retrieval can help compositional learning. We propose MetaReVision which is a new retrieval-enhanced meta-learning model to address the visually grounded compositional concept learning problem. Given an image with a novel compositional concept, MetaReVision first uses a retrieval module to find relevant items from the training set. Then it constructs an episode for which the retrieved items form the support set and the test item forms the query set. The retrieved support set mimics the primitive concept learning scenario, while the query set encourages the compositional strategy learning by meta-learning's bi-level optimization objective. The experimental results show that such retrieval-enhanced meta-learning framework helps the vision-language model's compositional learning. Moreover, we create two new benchmarks called CompCOCO and CompFlickr for the evaluation of grounded compositional concept learning.Finally, we evaluate the large generative vision and language models in solving compositional zero-shot learning within the in-context learning framework. We highlight their shortcomings and propose retriever and ranker modules to improve their performance in addressing this challenging problem. These two modules select the most informative in-context examples in their most effective order to guide the backbone generative model. Our approach is novel in the context of grounded compositional learning and our experimental results show improved performance compared to basic in-context learning.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798382726328Subjects--Topical Terms:

573171
Computer science.
Subjects--Index Terms:

Compositional learningIndex Terms--Genre/Form:

554714
Electronic books.

Grounded Compositional Concept Learning.
LDR:05100ntm a22003977 4500 001 1144979
005 20240617111732.5
006 m o d
007 cr mn ---uuuuu
008 250605s2024 xx obm 000 0 eng d
020 $a 9798382726328
035 $a (MiAaPQ)AAI30995878
035 $a AAI30995878
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Xu, Guangyue. $3 1470174
245 1 0 $a Grounded Compositional Concept Learning.
264 0 $c 2024
300 $a 1 online resource (103 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
500 $a Advisor: Kordjamshidi, Parisa;Chai, Joyce Y.
502 $a Thesis (Ph.D.)--Michigan State University, 2024.
504 $a Includes bibliographical references
520 $a Humans learn concepts in a grounded and compositional manner. Such compositional and grounding abilities enable humans to understand an endless variety of scenarios and expressions. Although deep learning models have pushed performance to new limits on many Natural Language Processing and Computer Vision tasks, we still have a lack of knowledge about how these models process compositional structures and their potential to accomplish human-like meaning composition. The goal of this thesis is to advance the current compositional generalization research on both the evaluation and design of the learning models. In this direction, we make the following contributions.Firstly, we introduce a transductive learning method to utilize the unlabeled data for learning the distribution of both seen and novel compositions. Moreover, we utilize the cross-attention mechanism to align and ground the linguistic concepts into specific regions of the image to tackle the grounding challenge. Unlike traditional learning, we use episodic training where each training item consists of one image and the sampled positive and negative compositional labels. We select the image's compositional label by computing their matching scores Our empirical results show that combining episodic training and transductive learning does help compositional learning.Secondly, we develop a new prompting technique for compositional learning by considering the interaction between element concepts. In our proposed technique called GIPCOL, we construct a textual input that contains rich compositional information when prompting the foundation vision-language model. We use the CLIP model as the pre-trained backbone vision-language model and improve its compositional zero-shot learning ability with our novel soft-prompting approach. GIPCOL freezes the majority of CLIP's parameters and only learns CLIP's word embedding layer through a graph neural network. By concatenating the learnable soft prompt and the updated word embeddings, GIPCOL achieves better results compared with other prompting-based methods.Thirdly, since retrieval plays a critical role in human learning, our work studies how retrieval can help compositional learning. We propose MetaReVision which is a new retrieval-enhanced meta-learning model to address the visually grounded compositional concept learning problem. Given an image with a novel compositional concept, MetaReVision first uses a retrieval module to find relevant items from the training set. Then it constructs an episode for which the retrieved items form the support set and the test item forms the query set. The retrieved support set mimics the primitive concept learning scenario, while the query set encourages the compositional strategy learning by meta-learning's bi-level optimization objective. The experimental results show that such retrieval-enhanced meta-learning framework helps the vision-language model's compositional learning. Moreover, we create two new benchmarks called CompCOCO and CompFlickr for the evaluation of grounded compositional concept learning.Finally, we evaluate the large generative vision and language models in solving compositional zero-shot learning within the in-context learning framework. We highlight their shortcomings and propose retriever and ranker modules to improve their performance in addressing this challenging problem. These two modules select the most informative in-context examples in their most effective order to guide the backbone generative model. Our approach is novel in the context of grounded compositional learning and our experimental results show improved performance compared to basic in-context learning.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
650 4 $a Computer engineering. $3 569006
653 $a Compositional learning
653 $a In-context learning
653 $a Large vision-language model
653 $a Meta-learning
653 $a Prompt-based learning
653 $a Retrieval Augmented Generation
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0464
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Michigan State University. $b Computer Science - Doctor of Philosophy. $3 1188138
773 0 $t Dissertations Abstracts International $g 85-11B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30995878 $z click for full text (PQDT)