語系:
繁體中文
English
說明(常見問題)
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Optimizing Expectations : = From Dee...
~
Schulman, John.
Optimizing Expectations : = From Deep Reinforcement Learning to Stochastic Computation Graphs.
紀錄類型:
書目-語言資料,手稿 : Monograph/item
正題名/作者:
Optimizing Expectations :/
其他題名:
From Deep Reinforcement Learning to Stochastic Computation Graphs.
作者:
Schulman, John.
面頁冊數:
1 online resource (103 pages)
附註:
Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.
標題:
Computer science. -
電子資源:
click for full text (PQDT)
ISBN:
9781369842777
Optimizing Expectations : = From Deep Reinforcement Learning to Stochastic Computation Graphs.
Schulman, John.
Optimizing Expectations :
From Deep Reinforcement Learning to Stochastic Computation Graphs. - 1 online resource (103 pages)
Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.
Thesis (Ph.D.)--University of California, Berkeley, 2016.
Includes bibliographical references
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem: maximize the expected total reward with respect to the parameters of the policy. The first part of the thesis is concerned with making policy gradient methods more sample-efficient and reliable, especially when used with expressive nonlinear function approximators such as neural networks. Chapter 3 considers how to ensure that policy updates lead to monotonic improvement, and how to optimally update a policy given a batch of sampled trajectories. After providing a theoretical analysis, we propose a practical method called trust region policy optimization (TRPO), which performs well on two challenging tasks: simulated robotic locomotion, and playing Atari games using screen images as input. Chapter 4 looks at improving sample complexity of policy gradient methods in a way that is complementary to TRPO: reducing the variance of policy gradient estimates using a state-value function. Using this method, we obtain state-of-the-art results for learning locomotion controllers for simulated 3D robots.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018
Mode of access: World Wide Web
ISBN: 9781369842777Subjects--Topical Terms:
573171
Computer science.
Index Terms--Genre/Form:
554714
Electronic books.
Optimizing Expectations : = From Deep Reinforcement Learning to Stochastic Computation Graphs.
LDR
:02915ntm a2200325K 4500
001
915275
005
20180727125212.5
006
m o u
007
cr mn||||a|a||
008
190606s2016 xx obm 000 0 eng d
020
$a
9781369842777
035
$a
(MiAaPQ)AAI10252040
035
$a
(MiAaPQ)berkeley:16768
035
$a
AAI10252040
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
100
1
$a
Schulman, John.
$3
1188584
245
1 0
$a
Optimizing Expectations :
$b
From Deep Reinforcement Learning to Stochastic Computation Graphs.
264
0
$c
2016
300
$a
1 online resource (103 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.
500
$a
Adviser: Pieter Abbeel.
502
$a
Thesis (Ph.D.)--University of California, Berkeley, 2016.
504
$a
Includes bibliographical references
520
$a
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem: maximize the expected total reward with respect to the parameters of the policy. The first part of the thesis is concerned with making policy gradient methods more sample-efficient and reliable, especially when used with expressive nonlinear function approximators such as neural networks. Chapter 3 considers how to ensure that policy updates lead to monotonic improvement, and how to optimally update a policy given a batch of sampled trajectories. After providing a theoretical analysis, we propose a practical method called trust region policy optimization (TRPO), which performs well on two challenging tasks: simulated robotic locomotion, and playing Atari games using screen images as input. Chapter 4 looks at improving sample complexity of policy gradient methods in a way that is complementary to TRPO: reducing the variance of policy gradient estimates using a state-value function. Using this method, we obtain state-of-the-art results for learning locomotion controllers for simulated 3D robots.
520
$a
Reinforcement learning can be viewed as a special case of optimizing an expectation, and similar optimization problems arise in other areas of machine learning; for example, in variational inference, and when using architectures that include mechanisms for memory and attention. Chapter 5 provides a unifying view of these problems, with a general calculus for obtaining gradient estimators of objectives that involve a mixture of sampled random variables and differentiable operations. This unifying view motivates applying algorithms from reinforcement learning to other prediction and probabilistic modeling problems.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2018
538
$a
Mode of access: World Wide Web
650
4
$a
Computer science.
$3
573171
655
7
$a
Electronic books.
$2
local
$3
554714
690
$a
0984
710
2
$a
ProQuest Information and Learning Co.
$3
1178819
710
2
$a
University of California, Berkeley.
$b
Computer Science.
$3
1179511
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10252040
$z
click for full text (PQDT)
筆 0 讀者評論
多媒體
評論
新增評論
分享你的心得
Export
取書館別
處理中
...
變更密碼[密碼必須為2種組合(英文和數字)及長度為10碼以上]
登入