國立虎尾科技大學 |

Optimizing Expectations : = From Deep Reinforcement Learning to Stochastic Computation Graphs.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Optimizing Expectations :/
其他題名:	From Deep Reinforcement Learning to Stochastic Computation Graphs.
作者:	Schulman, John.
面頁冊數:	1 online resource (103 pages)
附註:	Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9781369842777

Optimizing Expectations : = From Deep Reinforcement Learning to Stochastic Computation Graphs.
Schulman, John.

Optimizing Expectations :From Deep Reinforcement Learning to Stochastic Computation Graphs. - 1 online resource (103 pages)

Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.

Thesis (Ph.D.)--University of California, Berkeley, 2016.

Includes bibliographical references

This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem: maximize the expected total reward with respect to the parameters of the policy. The first part of the thesis is concerned with making policy gradient methods more sample-efficient and reliable, especially when used with expressive nonlinear function approximators such as neural networks. Chapter 3 considers how to ensure that policy updates lead to monotonic improvement, and how to optimally update a policy given a batch of sampled trajectories. After providing a theoretical analysis, we propose a practical method called trust region policy optimization (TRPO), which performs well on two challenging tasks: simulated robotic locomotion, and playing Atari games using screen images as input. Chapter 4 looks at improving sample complexity of policy gradient methods in a way that is complementary to TRPO: reducing the variance of policy gradient estimates using a state-value function. Using this method, we obtain state-of-the-art results for learning locomotion controllers for simulated 3D robots.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9781369842777Subjects--Topical Terms:

573171
Computer science.
Index Terms--Genre/Form:

554714
Electronic books.

Optimizing Expectations : = From Deep Reinforcement Learning to Stochastic Computation Graphs.
LDR:02915ntm a2200325K 4500 001 915275
005 20180727125212.5
006 m o u
007 cr mn||||a|a||
008 190606s2016 xx obm 000 0 eng d
020 $a 9781369842777
035 $a (MiAaPQ)AAI10252040
035 $a (MiAaPQ)berkeley:16768
035 $a AAI10252040
040 $a MiAaPQ $b eng $c MiAaPQ
100 1 $a Schulman, John. $3 1188584
245 1 0 $a Optimizing Expectations : $b From Deep Reinforcement Learning to Stochastic Computation Graphs.
264 0 $c 2016
300 $a 1 online resource (103 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.
500 $a Adviser: Pieter Abbeel.
502 $a Thesis (Ph.D.)--University of California, Berkeley, 2016.
504 $a Includes bibliographical references
520 $a This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem: maximize the expected total reward with respect to the parameters of the policy. The first part of the thesis is concerned with making policy gradient methods more sample-efficient and reliable, especially when used with expressive nonlinear function approximators such as neural networks. Chapter 3 considers how to ensure that policy updates lead to monotonic improvement, and how to optimally update a policy given a batch of sampled trajectories. After providing a theoretical analysis, we propose a practical method called trust region policy optimization (TRPO), which performs well on two challenging tasks: simulated robotic locomotion, and playing Atari games using screen images as input. Chapter 4 looks at improving sample complexity of policy gradient methods in a way that is complementary to TRPO: reducing the variance of policy gradient estimates using a state-value function. Using this method, we obtain state-of-the-art results for learning locomotion controllers for simulated 3D robots.
520 $a Reinforcement learning can be viewed as a special case of optimizing an expectation, and similar optimization problems arise in other areas of machine learning; for example, in variational inference, and when using architectures that include mechanisms for memory and attention. Chapter 5 provides a unifying view of these problems, with a general calculus for obtaining gradient estimators of objectives that involve a mixture of sampled random variables and differentiable operations. This unifying view motivates applying algorithms from reinforcement learning to other prediction and probabilistic modeling problems.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of California, Berkeley. $b Computer Science. $3 1179511
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10252040 $z click for full text (PQDT)