國立虎尾科技大學 |

Hierarchical Non-Parametric Bayesian Mixture Models and Applications on Big Data.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Hierarchical Non-Parametric Bayesian Mixture Models and Applications on Big Data./
作者:	Yerebakan, Halid Ziya.
面頁冊數:	1 online resource (105 pages)
附註:	Source: Dissertation Abstracts International, Volume: 78-12(E), Section: B.
Contained By:	Dissertation Abstracts International78-12B(E).
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9780355096118

Hierarchical Non-Parametric Bayesian Mixture Models and Applications on Big Data.
Yerebakan, Halid Ziya.

Hierarchical Non-Parametric Bayesian Mixture Models and Applications on Big Data. - 1 online resource (105 pages)

Source: Dissertation Abstracts International, Volume: 78-12(E), Section: B.

Thesis (Ph.D.)

Includes bibliographical references

In the Bayesian nonparametric family, Dirichlet Process (DP) is a prior distribution that is able to learn the number of clusters in mixture models from the data. Thus, the corresponding mixture model is nonparametric in terms of the number of clusters. However, each cluster is represented by a single parametric distribution. Further flexibility is required considering real-world applications with clusters that cannot be modeled with a single parametric distribution. This limitation occurs especially if the cluster shapes are skewed or multimodal. In this dissertation, we have shown that introducing a hierarchy to cluster distributions is an effective way to create more flexible generative models without significantly expanding the parameter space and computational complexity. Referring to the two-layer structure, we have named our method as Infinite Mixtures of Infinite Gaussian Mixtures (I2GMM).

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780355096118Subjects--Topical Terms:

573171
Computer science.
Index Terms--Genre/Form:

554714
Electronic books.

Hierarchical Non-Parametric Bayesian Mixture Models and Applications on Big Data.
LDR:03185ntm a2200385Ki 4500 001 908955
005 20180419104822.5
006 m o u
007 cr mn||||a|a||
008 190606s2017 xx obm 000 0 eng d
020 $a 9780355096118
035 $a (MiAaPQ)AAI10271777
035 $a (MiAaPQ)purdue:21302
035 $a AAI10271777
040 $a MiAaPQ $b eng $c MiAaPQ
099 $a TUL $f hyy $c available through World Wide Web
100 1 $a Yerebakan, Halid Ziya. $3 1179389
245 1 0 $a Hierarchical Non-Parametric Bayesian Mixture Models and Applications on Big Data.
264 0 $c 2017
300 $a 1 online resource (105 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 78-12(E), Section: B.
500 $a Adviser: Mehmet M. Dundar.
502 $a Thesis (Ph.D.) $c Purdue University $d 2017.
504 $a Includes bibliographical references
520 $a In the Bayesian nonparametric family, Dirichlet Process (DP) is a prior distribution that is able to learn the number of clusters in mixture models from the data. Thus, the corresponding mixture model is nonparametric in terms of the number of clusters. However, each cluster is represented by a single parametric distribution. Further flexibility is required considering real-world applications with clusters that cannot be modeled with a single parametric distribution. This limitation occurs especially if the cluster shapes are skewed or multimodal. In this dissertation, we have shown that introducing a hierarchy to cluster distributions is an effective way to create more flexible generative models without significantly expanding the parameter space and computational complexity. Referring to the two-layer structure, we have named our method as Infinite Mixtures of Infinite Gaussian Mixtures (I2GMM).
520 $a We have presented a collapsed Gibbs sampler inference for I2GMM. The parallelization is achieved thanks to the hierarchical structure. However, the collapsed sampler does not consider load balancing. Thus, it does not have a high level of utilization of resources in modern multi-core architectures. Later, we have introduced a new sampling algorithm that combines the uncollapsed sampler and the collapsed sampler to improve the degree of parallelization.
520 $a In our experiments, we have included flow cytometry and remote sensing data as well as some benchmark datasets. We have observed that I2GMM achieves a better mean F1 score as compared to parametric and non-parametric alternatives in clustering. Also, we have applied the new parallel sampler to IGMM and I2GMM models, and we have observed further speed up on computational time while maintaining the clustering accuracy comparable to that achieved by the collapsed Gibbs sampler.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
650 4 $a Artificial intelligence. $3 559380
650 4 $a Statistics. $3 556824
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0800
690 $a 0463
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Purdue University. $b Computer Sciences. $3 1179390
773 0 $t Dissertation Abstracts International $g 78-12B(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10271777 $z click for full text (PQDT)