國立虎尾科技大學 |

Unsupervised Model Evaluation.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Unsupervised Model Evaluation./
作者:	Deng, Weijian.
面頁冊數:	1 online resource (139 pages)
附註:	Source: Dissertations Abstracts International, Volume: 85-06, Section: B.
Contained By:	Dissertations Abstracts International85-06B.
標題:	Statistics. -
電子資源:	click for full text (PQDT)
ISBN:	9798380866019

Unsupervised Model Evaluation.
Deng, Weijian.

Unsupervised Model Evaluation. - 1 online resource (139 pages)

Source: Dissertations Abstracts International, Volume: 85-06, Section: B.

Thesis (Ph.D.)--The Australian National University (Australia), 2023.

Includes bibliographical references

Understanding model decision under novel test scenarios is central to machine learning. The standard textbook practice is evaluating a model on a held-out test set that is fully labeled and drawn from the same distribution as the training set. However, this supervised manner of evaluation is often infeasible for real-world deployment, where the test environments undergo distribution shifts and data annotations are not provided. Furthermore, real-world machine learning deployments are often characterized by the discrepancy between the training and test distributions that could cause significant performance drops. Ignoring such potential model failure can lead to serious safety concerns. Therefore, it is important to develop new evaluation schemes for real-world scenarios where annotated data is unavailable.In this thesis, we explore the answer to an interesting question: are labels always necessary for model evaluation? Motivated by this question, we investigate an important but under-explored problem called unsupervised model evaluation, where the goal is to estimate model generalization on various unlabeled out-of-distribution test sets. In particular, this thesis makes contributions to unsupervised model evaluation from four different aspects. In Chapter 3, we examine how distribution shift affects model performance. We report that there is a strong negative linear correlation between model performance and distribution shift. Based on this observation, we propose to predict model accuracy from dataset-level statistics and present two regression methods (i.e., linear regression and network regression) for accuracy estimation. In Chapter 4, we propose to use self-supervision as a criterion for evaluating models. Specifically, we train both supervised semantic image classification and self-supervised rotation prediction in a multi-task way. On a series of datasets, we report an interesting finding: the semantic classification accuracy exhibits a strong linear relationship with the performance of the rotation prediction task. This new finding allows us to use linear regression to estimate classifier performance from the accuracy of rotation prediction which can be obtained on the test set through the freely self-generated rotation labels. In Chapter 5, we exploit the information contained in the prediction matrix for accuracy prediction. Unlike recent methods that only use prediction confidence, we further consider prediction dispersity. Confidence reflects whether the individual prediction is certain; dispersity indicates how the overall predictions are distributed across all categories. Our key insight is that a well-performing model should give predictions with high confidence and high dispersity. Specifically, we aim to consider both properties to make more accurate estimates. To this end, we use the nuclear norm which has been shown to characterize both properties. We show that the nuclear norm makes more accurate and stable accuracy estimations than existing methods. In Chapter 6, from a model-centric perspective, we study the relationship between model generalization and invariance. The former characterizes how well a model performs when encountering in-distribution or out-of-distribution test data, while the latter captures whether the model gives consistent predictions when the input data is transformed. We introduce effective invariance (EI), a simple and reasonable measure of model invariance that does not require image labels. Using invariance scores computed by EI, we perform large-scale quantitative correlation studies between generalization and invariance. We observe generalization and invariance of different models exhibit a strong linear relationship on both in-distribution and out-of-distribution datasets. This new finding allows us to assess and rank the performance of various models on a new dataset.All in all, this thesis focuses on evaluating models on previously unseen distributions without annotations. It is an important but under-explored challenge with implications for increasing the reliability of machine-learning models. We study this problem from four different perspectives and contribute simple, effective, and novel approaches. The extensive analysis demonstrates that our studies present a promising step forward for estimating performance drop under distribution shift and lay the groundwork for future research exploring unsupervised model evaluation.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798380866019Subjects--Topical Terms:

556824
Statistics.
Subjects--Index Terms:

Machine learningIndex Terms--Genre/Form:

554714
Electronic books.

Unsupervised Model Evaluation.
LDR:05905ntm a22004097 4500 001 1145355
005 20240618081826.5
006 m o d
007 cr mn ---uuuuu
008 250605s2023 xx obm 000 0 eng d
020 $a 9798380866019
035 $a (MiAaPQ)AAI30856502
035 $a (MiAaPQ)AustNatlU1885296827
035 $a AAI30856502
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Deng, Weijian. $3 1470653
245 1 0 $a Unsupervised Model Evaluation.
264 0 $c 2023
300 $a 1 online resource (139 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 85-06, Section: B.
500 $a Advisor: Zheng, Liang;Gould, Stephen;Suh, Yumin.
502 $a Thesis (Ph.D.)--The Australian National University (Australia), 2023.
504 $a Includes bibliographical references
520 $a Understanding model decision under novel test scenarios is central to machine learning. The standard textbook practice is evaluating a model on a held-out test set that is fully labeled and drawn from the same distribution as the training set. However, this supervised manner of evaluation is often infeasible for real-world deployment, where the test environments undergo distribution shifts and data annotations are not provided. Furthermore, real-world machine learning deployments are often characterized by the discrepancy between the training and test distributions that could cause significant performance drops. Ignoring such potential model failure can lead to serious safety concerns. Therefore, it is important to develop new evaluation schemes for real-world scenarios where annotated data is unavailable.In this thesis, we explore the answer to an interesting question: are labels always necessary for model evaluation? Motivated by this question, we investigate an important but under-explored problem called unsupervised model evaluation, where the goal is to estimate model generalization on various unlabeled out-of-distribution test sets. In particular, this thesis makes contributions to unsupervised model evaluation from four different aspects. In Chapter 3, we examine how distribution shift affects model performance. We report that there is a strong negative linear correlation between model performance and distribution shift. Based on this observation, we propose to predict model accuracy from dataset-level statistics and present two regression methods (i.e., linear regression and network regression) for accuracy estimation. In Chapter 4, we propose to use self-supervision as a criterion for evaluating models. Specifically, we train both supervised semantic image classification and self-supervised rotation prediction in a multi-task way. On a series of datasets, we report an interesting finding: the semantic classification accuracy exhibits a strong linear relationship with the performance of the rotation prediction task. This new finding allows us to use linear regression to estimate classifier performance from the accuracy of rotation prediction which can be obtained on the test set through the freely self-generated rotation labels. In Chapter 5, we exploit the information contained in the prediction matrix for accuracy prediction. Unlike recent methods that only use prediction confidence, we further consider prediction dispersity. Confidence reflects whether the individual prediction is certain; dispersity indicates how the overall predictions are distributed across all categories. Our key insight is that a well-performing model should give predictions with high confidence and high dispersity. Specifically, we aim to consider both properties to make more accurate estimates. To this end, we use the nuclear norm which has been shown to characterize both properties. We show that the nuclear norm makes more accurate and stable accuracy estimations than existing methods. In Chapter 6, from a model-centric perspective, we study the relationship between model generalization and invariance. The former characterizes how well a model performs when encountering in-distribution or out-of-distribution test data, while the latter captures whether the model gives consistent predictions when the input data is transformed. We introduce effective invariance (EI), a simple and reasonable measure of model invariance that does not require image labels. Using invariance scores computed by EI, we perform large-scale quantitative correlation studies between generalization and invariance. We observe generalization and invariance of different models exhibit a strong linear relationship on both in-distribution and out-of-distribution datasets. This new finding allows us to assess and rank the performance of various models on a new dataset.All in all, this thesis focuses on evaluating models on previously unseen distributions without annotations. It is an important but under-explored challenge with implications for increasing the reliability of machine-learning models. We study this problem from four different perspectives and contribute simple, effective, and novel approaches. The extensive analysis demonstrates that our studies present a promising step forward for estimating performance drop under distribution shift and lay the groundwork for future research exploring unsupervised model evaluation.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Statistics. $3 556824
650 4 $a Computer science. $3 573171
653 $a Machine learning
653 $a Data annotations
653 $a Distribution shift
653 $a Self-supervised rotation prediction
653 $a Effective invariance
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0800
690 $a 0463
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a The Australian National University (Australia). $3 1186624
773 0 $t Dissertations Abstracts International $g 85-06B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30856502 $z click for full text (PQDT)