國立虎尾科技大學 |

Bayesian Hidden Topic Markov Models.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Bayesian Hidden Topic Markov Models./
作者:	Wilcox, Kenneth Tyler.
面頁冊數:	1 online resource (119 pages)
附註:	Source: Masters Abstracts International, Volume: 56-04.
標題:	Statistics. -
電子資源:	click for full text (PQDT)
ISBN:	9781369795042

Bayesian Hidden Topic Markov Models.
Wilcox, Kenneth Tyler.

Bayesian Hidden Topic Markov Models. - 1 online resource (119 pages)

Source: Masters Abstracts International, Volume: 56-04.

Thesis (M.S.)--Rochester Institute of Technology, 2017.

Includes bibliographical references

Recent developments in topic modeling for text corpora have incorporated Markov models in the latent space to better learn contextual content. Known as the Hidden Topic Markov Model (HTMM), this natural extension of probabilistic mixture models relaxes the "bag-of-words" assumption of the foundational latent Dirichlet allocation topic model by allowing the discrete latent variables, or topics, to follow a special first-order Markov process. Parameter estimation is performed using an expectation-maximization (EM) algorithm with fixed dimensionality of the topic space (Gruber, Rosen-Zvi, and Weiss 2007). I fully derive the state space and EM algorithm for the HTMM. I then extend the Hidden Topic Markov Model (HTMM) into a fully Bayesian framework using a Gibbs sampler. The necessary full conditional distributions are derived and a Gibbs sampling algorithm proposed. I implement both the HTMM EM algorithm (Gruber, Rosen-Zvi, and Weiss 2007) and the HTMM Gibbs sampling algorithm in the R and C++ programming languages. The performance of both inferential algorithms is evaluated on twelve simulated data sets and on a collection of proceedings from the Conference on Neural Information Processing Systems (NIPS). The results suggest that the Gibbs sampling algorithm provides better recovery of the topic space than a combination of the EM and Viterbi algorithms. Parameter estimation is comparable using point estimates with both algorithms. The convergence of the Gibbs sampler is studied and is reliable for reasonably large data sets. Evaluation of both algorithms on the NIPS corpus suggests that the HTMM is better able to handle polysemy than LDA and provides coherent and contiguous topics. Predictive accuracy measured by perplexity is better on training and test documents using the HTMM than using LDA on the NIPS corpus. Introducing Markovian dynamics in topical space provides better topical segmentation of a corpus and increased predictive accuracy for unseen documents.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9781369795042Subjects--Topical Terms:

556824
Statistics.
Index Terms--Genre/Form:

554714
Electronic books.

Bayesian Hidden Topic Markov Models.
LDR:03124ntm a2200337K 4500 001 913828
005 20180628103545.5
006 m o u
007 cr mn||||a|a||
008 190606s2017 xx obm 000 0 eng d
020 $a 9781369795042
035 $a (MiAaPQ)AAI10283610
035 $a (MiAaPQ)rit:12641
035 $a AAI10283610
040 $a MiAaPQ $b eng $c MiAaPQ
100 1 $a Wilcox, Kenneth Tyler. $3 1186830
245 1 0 $a Bayesian Hidden Topic Markov Models.
264 0 $c 2017
300 $a 1 online resource (119 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Masters Abstracts International, Volume: 56-04.
500 $a Adviser: Ernest P. Fokoue.
502 $a Thesis (M.S.)--Rochester Institute of Technology, 2017.
504 $a Includes bibliographical references
520 $a Recent developments in topic modeling for text corpora have incorporated Markov models in the latent space to better learn contextual content. Known as the Hidden Topic Markov Model (HTMM), this natural extension of probabilistic mixture models relaxes the "bag-of-words" assumption of the foundational latent Dirichlet allocation topic model by allowing the discrete latent variables, or topics, to follow a special first-order Markov process. Parameter estimation is performed using an expectation-maximization (EM) algorithm with fixed dimensionality of the topic space (Gruber, Rosen-Zvi, and Weiss 2007). I fully derive the state space and EM algorithm for the HTMM. I then extend the Hidden Topic Markov Model (HTMM) into a fully Bayesian framework using a Gibbs sampler. The necessary full conditional distributions are derived and a Gibbs sampling algorithm proposed. I implement both the HTMM EM algorithm (Gruber, Rosen-Zvi, and Weiss 2007) and the HTMM Gibbs sampling algorithm in the R and C++ programming languages. The performance of both inferential algorithms is evaluated on twelve simulated data sets and on a collection of proceedings from the Conference on Neural Information Processing Systems (NIPS). The results suggest that the Gibbs sampling algorithm provides better recovery of the topic space than a combination of the EM and Viterbi algorithms. Parameter estimation is comparable using point estimates with both algorithms. The convergence of the Gibbs sampler is studied and is reliable for reasonably large data sets. Evaluation of both algorithms on the NIPS corpus suggests that the HTMM is better able to handle polysemy than LDA and provides coherent and contiguous topics. Predictive accuracy measured by perplexity is better on training and test documents using the HTMM than using LDA on the NIPS corpus. Introducing Markovian dynamics in topical space provides better topical segmentation of a corpus and increased predictive accuracy for unseen documents.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Statistics. $3 556824
650 4 $a Computer science. $3 573171
650 4 $a Linguistics. $3 557829
655 7 $a Electronic books. $2 local $3 554714
690 $a 0463
690 $a 0984
690 $a 0290
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Rochester Institute of Technology. $b Applied Statistics. $3 1186831
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10283610 $z click for full text (PQDT)