國立虎尾科技大學 |

Robust Learning From Uncurated Data.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Robust Learning From Uncurated Data./
作者:	Chuang, Ching-Yao.
面頁冊數:	1 online resource (221 pages)
附註:	Source: Dissertations Abstracts International, Volume: 85-10, Section: B.
Contained By:	Dissertations Abstracts International85-10B.
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9798381958966

Robust Learning From Uncurated Data.
Chuang, Ching-Yao.

Robust Learning From Uncurated Data. - 1 online resource (221 pages)

Source: Dissertations Abstracts International, Volume: 85-10, Section: B.

Thesis (Ph.D.)--Massachusetts Institute of Technology, 2023.

Includes bibliographical references

The field of machine learning has witnessed a growing interest in learning from uncurated data, which involves training models from data that has not been carefully curated or labeled. However, this type of data is typically noisy, incomplete, and riddled with errors, making it challenging for machine learning algorithms to learn effectively. This thesis focuses on the development of robust learning methods that can effectively leverage uncurated data while being resilient to the inherent noise and errors in the data. Specifically, we investigate the robustness of contrastive learning, a prominent technique for self-supervised representation learning by contrasting semantically similar and dissimilar pairs of samples.Firstly, we delve into the fundamental challenge inherent in learning from unlabeled data. We find that eliminating false negatives and encouraging hard negatives notably enhance downstream performance and training efficiency.Subsequently, we shift our focus to the omnipresent noise within the dataset. We pay particular attention to the emergence of false positive pairs, a phenomenon particularly prevalent in multimodal contrastive learning settings.In the final segment of our study, we contemplate the efficient eradication of biases from large-scale models. It is observed that, when models are pretrained on biased, uncurated data, they frequently inherit numerous inappropriate biases, which consequentially lead to skewed predictions. In an effort to rectify this, we devise a debiasing algorithm that operates independently of any data or training requirements.Throughout the dissertation, the common thread tying these three components together is a robust and comprehensive approach to mitigating the unique error types associated with unlabeled, noisy, and biased data respectively, offering substantial contributions to the realm of machine learning research.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798381958966Subjects--Topical Terms:

573171
Computer science.
Subjects--Index Terms:

Machine learningIndex Terms--Genre/Form:

554714
Electronic books.

Robust Learning From Uncurated Data.
LDR:03287ntm a22003977 4500 001 1146481
005 20240812064626.5
006 m o d
007 cr bn ---uuuuu
008 250605s2023 xx obm 000 0 eng d
020 $a 9798381958966
035 $a (MiAaPQ)AAI31091625
035 $a (MiAaPQ)MIT1721_1_152764
035 $a AAI31091625
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Chuang, Ching-Yao. $3 1471874
245 1 0 $a Robust Learning From Uncurated Data.
264 0 $c 2023
300 $a 1 online resource (221 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 85-10, Section: B.
500 $a Advisor: Jegelka, Stefanie;Torralba, Antonio.
502 $a Thesis (Ph.D.)--Massachusetts Institute of Technology, 2023.
504 $a Includes bibliographical references
520 $a The field of machine learning has witnessed a growing interest in learning from uncurated data, which involves training models from data that has not been carefully curated or labeled. However, this type of data is typically noisy, incomplete, and riddled with errors, making it challenging for machine learning algorithms to learn effectively. This thesis focuses on the development of robust learning methods that can effectively leverage uncurated data while being resilient to the inherent noise and errors in the data. Specifically, we investigate the robustness of contrastive learning, a prominent technique for self-supervised representation learning by contrasting semantically similar and dissimilar pairs of samples.Firstly, we delve into the fundamental challenge inherent in learning from unlabeled data. We find that eliminating false negatives and encouraging hard negatives notably enhance downstream performance and training efficiency.Subsequently, we shift our focus to the omnipresent noise within the dataset. We pay particular attention to the emergence of false positive pairs, a phenomenon particularly prevalent in multimodal contrastive learning settings.In the final segment of our study, we contemplate the efficient eradication of biases from large-scale models. It is observed that, when models are pretrained on biased, uncurated data, they frequently inherit numerous inappropriate biases, which consequentially lead to skewed predictions. In an effort to rectify this, we devise a debiasing algorithm that operates independently of any data or training requirements.Throughout the dissertation, the common thread tying these three components together is a robust and comprehensive approach to mitigating the unique error types associated with unlabeled, noisy, and biased data respectively, offering substantial contributions to the realm of machine learning research.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
650 4 $a Electrical engineering. $3 596380
653 $a Machine learning
653 $a Training models
653 $a Downstream performance
653 $a Debiasing algorithm
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0544
690 $a 0800
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Massachusetts Institute of Technology. $b Department of Electrical Engineering and Computer Science. $3 1467552
773 0 $t Dissertations Abstracts International $g 85-10B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31091625 $z click for full text (PQDT)