國立虎尾科技大學 |

Information Theoretic Learning Methods for Markov Decision Processes With Parametric Uncertainty.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Information Theoretic Learning Methods for Markov Decision Processes With Parametric Uncertainty./
作者:	Kumar, Peeyush.
面頁冊數:	1 online resource (126 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B.
Contained By:	Dissertation Abstracts International79-12B(E).
標題:	Operations research. -
電子資源:	click for full text (PQDT)
ISBN:	9780438177499

Information Theoretic Learning Methods for Markov Decision Processes With Parametric Uncertainty.
Kumar, Peeyush.

Information Theoretic Learning Methods for Markov Decision Processes With Parametric Uncertainty. - 1 online resource (126 pages)

Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B.

Thesis (Ph.D.)--University of Washington, 2018.

Includes bibliographical references

Markov decision processes (MDPs) model a class of stochastic sequential decision problems with applications in engineering, medicine, and business analytics. There is considerable interest in the literature in MDPs with imperfect information, where the search for well-performing policies faces many challenges. There is no rigorous universally accepted optimality criterion. The decision-maker suffers from the curse-of-dimensionality. Finding good policies requires careful balancing of the trade-off between exploration to acquire information and exploitation of this information to earn high rewards. This dissertation contributes to this area by building a rigorous framework rooted in information theory for solving MDPs with model uncertainty.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780438177499Subjects--Topical Terms:

573517
Operations research.
Index Terms--Genre/Form:

554714
Electronic books.

Information Theoretic Learning Methods for Markov Decision Processes With Parametric Uncertainty.
LDR:04155ntm a2200397Ki 4500 001 916925
005 20180928111503.5
006 m o u
007 cr mn||||a|a||
008 190606s2018 xx obm 000 0 eng d
020 $a 9780438177499
035 $a (MiAaPQ)AAI10828591
035 $a (MiAaPQ)washington:18875
035 $a AAI10828591
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Kumar, Peeyush. $3 1190798
245 1 0 $a Information Theoretic Learning Methods for Markov Decision Processes With Parametric Uncertainty.
264 0 $c 2018
300 $a 1 online resource (126 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B.
500 $a Adviser: Archis V. Ghate.
502 $a Thesis (Ph.D.)--University of Washington, 2018.
504 $a Includes bibliographical references
520 $a Markov decision processes (MDPs) model a class of stochastic sequential decision problems with applications in engineering, medicine, and business analytics. There is considerable interest in the literature in MDPs with imperfect information, where the search for well-performing policies faces many challenges. There is no rigorous universally accepted optimality criterion. The decision-maker suffers from the curse-of-dimensionality. Finding good policies requires careful balancing of the trade-off between exploration to acquire information and exploitation of this information to earn high rewards. This dissertation contributes to this area by building a rigorous framework rooted in information theory for solving MDPs with model uncertainty.
520 $a In the first part, the value of a parameter that characterizes the transition probabilities is unknown to the decision-maker. Information Directed Policy Sampling (IDPS) is proposed to manage the exploration-exploitation trade-off. A generalization of Hoeffding's inequality is employed to derive a regret bound. Numerical results on a stylized example, an auction-design problem, and a response-guided dosing problem are discussed.
520 $a Uncertainty in transition probabilities arises from two levels in the second part. The top level corresponds to the ambiguity about the system model. Bottom-level uncertainty is rooted in the unknown parameter values for each possible model. Prior-update formulas using a hierarchical Bayesian framework are derived and incorporated into two learning algorithms: Thompson Sampling and a hierarchical extension of IDPS. Analytical performance bounds are developed. Numerical results on the response-guided dosing problem are presented.
520 $a The third part extends the above to partially observable Markov decision processes (POMDPs). A connection between POMDPs and the first two chapters is exploited to devise algorithms and provide analytical performance guarantees in three cases: a) uncertainty in the transition probabilities; b) uncertainty in the measurement outcome probabilities; and c) uncertainty in both. Numerical results on partially observed response-guided dosing are included.
520 $a The fourth part develops a formal information theoretic framework inspired by stochastic thermodynamics. It utilizes the idea that information is physical. An explicit link between information entropy and stochastic dynamics of a system coupled to an environment is developed from fundamental principles. Unlike the heuristic idea of the information ratio, this provides an optimization program that is built from system dynamics, problem objective, and feedback from observations. To the best of my knowledge, this is the first framework that is entirely grounded in system and informational dynamics without relying on heuristic criteria.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Operations research. $3 573517
650 4 $a Artificial intelligence. $3 559380
650 4 $a Computer science. $3 573171
655 7 $a Electronic books. $2 local $3 554714
690 $a 0796
690 $a 0800
690 $a 0984
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of Washington. $b Industrial and Systems Engineering. $3 1183424
773 0 $t Dissertation Abstracts International $g 79-12B(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10828591 $z click for full text (PQDT)