語系:
繁體中文
English
說明(常見問題)
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Scaling Human Feedback.
紀錄類型:
書目-語言資料,手稿 : Monograph/item
正題名/作者:
Scaling Human Feedback./
作者:
Candidate, Minae Kwon.
面頁冊數:
1 online resource (120 pages)
附註:
Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
Contained By:
Dissertations Abstracts International85-11B.
標題:
Robotics. -
電子資源:
click for full text (PQDT)
ISBN:
9798382642543
Scaling Human Feedback.
Candidate, Minae Kwon.
Scaling Human Feedback.
- 1 online resource (120 pages)
Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
Thesis (Ph.D.)--Stanford University, 2023.
Includes bibliographical references
Human-generated data has been pivotal for significant advancements in artificial intelligence (AI). As AI models scale and are applied to a wider range of tasks, the demand for more and increasingly specialized human data will grow. However, current methods of acquiring human feedback, such as learning from demonstrations or preferences, and designing objective functions or prompts, are becoming unsustainable due to their high cost and the extensive effort or domain knowledge they require from users. We addresses this challenge by developing algorithms that reduce the cost and effort of providing human feedback. We leverage Foundation models to aid users in offering feedback. Users initially define their objectives (through language or a small dataset), and Foundation models expand this into more detailed feedback. A key contribution is an algorithm, based on a large language model, that allows users to cheaply define their objectives and train a reinforcement learning agent without needing to develop a complex reward function or provide extensive data. For situations where initial objectives are poorly defined or biased, we introduce an algorithm that efficiently queries humans for more information, reducing the number of needed queries. Finally, we conclude by proposing an information-gathering algorithm that eliminates the requirement for human intervention altogether, streamlining the feedback process. By making it cheaper for users to give feedback, either during training or when queried for more information, we hope to make learning from human feedback more scalable.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024
Mode of access: World Wide Web
ISBN: 9798382642543Subjects--Topical Terms:
561941
Robotics.
Index Terms--Genre/Form:
554714
Electronic books.
Scaling Human Feedback.
LDR
:02778ntm a22003257 4500
001
1146470
005
20240812064624.5
006
m o d
007
cr bn ---uuuuu
008
250605s2023 xx obm 000 0 eng d
020
$a
9798382642543
035
$a
(MiAaPQ)AAI31049679
035
$a
(MiAaPQ)STANFORDsy876pv8068
035
$a
AAI31049679
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
$d
NTU
100
1
$a
Candidate, Minae Kwon.
$3
1471859
245
1 0
$a
Scaling Human Feedback.
264
0
$c
2023
300
$a
1 online resource (120 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
500
$a
Advisor: Goodman, Noah;Yang, Diyi;Sadigh, Dorsa.
502
$a
Thesis (Ph.D.)--Stanford University, 2023.
504
$a
Includes bibliographical references
520
$a
Human-generated data has been pivotal for significant advancements in artificial intelligence (AI). As AI models scale and are applied to a wider range of tasks, the demand for more and increasingly specialized human data will grow. However, current methods of acquiring human feedback, such as learning from demonstrations or preferences, and designing objective functions or prompts, are becoming unsustainable due to their high cost and the extensive effort or domain knowledge they require from users. We addresses this challenge by developing algorithms that reduce the cost and effort of providing human feedback. We leverage Foundation models to aid users in offering feedback. Users initially define their objectives (through language or a small dataset), and Foundation models expand this into more detailed feedback. A key contribution is an algorithm, based on a large language model, that allows users to cheaply define their objectives and train a reinforcement learning agent without needing to develop a complex reward function or provide extensive data. For situations where initial objectives are poorly defined or biased, we introduce an algorithm that efficiently queries humans for more information, reducing the number of needed queries. Finally, we conclude by proposing an information-gathering algorithm that eliminates the requirement for human intervention altogether, streamlining the feedback process. By making it cheaper for users to give feedback, either during training or when queried for more information, we hope to make learning from human feedback more scalable.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2024
538
$a
Mode of access: World Wide Web
650
4
$a
Robotics.
$3
561941
650
4
$a
Games.
$3
595446
650
4
$a
Negotiations.
$3
1471105
650
4
$a
Robots.
$3
654842
655
7
$a
Electronic books.
$2
local
$3
554714
690
$a
0771
710
2
$a
Stanford University.
$3
1184533
710
2
$a
ProQuest Information and Learning Co.
$3
1178819
773
0
$t
Dissertations Abstracts International
$g
85-11B.
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31049679
$z
click for full text (PQDT)
筆 0 讀者評論
多媒體
評論
新增評論
分享你的心得
Export
取書館別
處理中
...
變更密碼[密碼必須為2種組合(英文和數字)及長度為10碼以上]
登入