國立虎尾科技大學 |

Scaling Human Feedback.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Scaling Human Feedback./
作者:	Candidate, Minae Kwon.
面頁冊數:	1 online resource (120 pages)
附註:	Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
Contained By:	Dissertations Abstracts International85-11B.
標題:	Robots. -
電子資源:	click for full text (PQDT)
ISBN:	9798382642543

Scaling Human Feedback.
Candidate, Minae Kwon.

Scaling Human Feedback. - 1 online resource (120 pages)

Source: Dissertations Abstracts International, Volume: 85-11, Section: B.

Thesis (Ph.D.)--Stanford University, 2023.

Includes bibliographical references

Human-generated data has been pivotal for significant advancements in artificial intelligence (AI). As AI models scale and are applied to a wider range of tasks, the demand for more and increasingly specialized human data will grow. However, current methods of acquiring human feedback, such as learning from demonstrations or preferences, and designing objective functions or prompts, are becoming unsustainable due to their high cost and the extensive effort or domain knowledge they require from users. We addresses this challenge by developing algorithms that reduce the cost and effort of providing human feedback. We leverage Foundation models to aid users in offering feedback. Users initially define their objectives (through language or a small dataset), and Foundation models expand this into more detailed feedback. A key contribution is an algorithm, based on a large language model, that allows users to cheaply define their objectives and train a reinforcement learning agent without needing to develop a complex reward function or provide extensive data. For situations where initial objectives are poorly defined or biased, we introduce an algorithm that efficiently queries humans for more information, reducing the number of needed queries. Finally, we conclude by proposing an information-gathering algorithm that eliminates the requirement for human intervention altogether, streamlining the feedback process. By making it cheaper for users to give feedback, either during training or when queried for more information, we hope to make learning from human feedback more scalable.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798382642543Subjects--Topical Terms:

654842
Robots.
Index Terms--Genre/Form:

554714
Electronic books.

Scaling Human Feedback.
LDR:02778ntm a22003257 4500 001 1146470
005 20240812064624.5
006 m o d
007 cr bn ---uuuuu
008 250605s2023 xx obm 000 0 eng d
020 $a 9798382642543
035 $a (MiAaPQ)AAI31049679
035 $a (MiAaPQ)STANFORDsy876pv8068
035 $a AAI31049679
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Candidate, Minae Kwon. $3 1471859
245 1 0 $a Scaling Human Feedback.
264 0 $c 2023
300 $a 1 online resource (120 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
500 $a Advisor: Goodman, Noah;Yang, Diyi;Sadigh, Dorsa.
502 $a Thesis (Ph.D.)--Stanford University, 2023.
504 $a Includes bibliographical references
520 $a Human-generated data has been pivotal for significant advancements in artificial intelligence (AI). As AI models scale and are applied to a wider range of tasks, the demand for more and increasingly specialized human data will grow. However, current methods of acquiring human feedback, such as learning from demonstrations or preferences, and designing objective functions or prompts, are becoming unsustainable due to their high cost and the extensive effort or domain knowledge they require from users. We addresses this challenge by developing algorithms that reduce the cost and effort of providing human feedback. We leverage Foundation models to aid users in offering feedback. Users initially define their objectives (through language or a small dataset), and Foundation models expand this into more detailed feedback. A key contribution is an algorithm, based on a large language model, that allows users to cheaply define their objectives and train a reinforcement learning agent without needing to develop a complex reward function or provide extensive data. For situations where initial objectives are poorly defined or biased, we introduce an algorithm that efficiently queries humans for more information, reducing the number of needed queries. Finally, we conclude by proposing an information-gathering algorithm that eliminates the requirement for human intervention altogether, streamlining the feedback process. By making it cheaper for users to give feedback, either during training or when queried for more information, we hope to make learning from human feedback more scalable.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Robots. $3 654842
650 4 $a Negotiations. $3 1471105
650 4 $a Games. $3 595446
650 4 $a Robotics. $3 561941
655 7 $a Electronic books. $2 local $3 554714
690 $a 0771
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Stanford University. $3 1184533
773 0 $t Dissertations Abstracts International $g 85-11B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31049679 $z click for full text (PQDT)