國立虎尾科技大學 |

Statistical Learning Methods for Big Biomedical Data.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Statistical Learning Methods for Big Biomedical Data./
作者:	Li, Ziyi.
面頁冊數:	1 online resource (143 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B.
Contained By:	Dissertation Abstracts International79-12B(E).
標題:	Biostatistics. -
電子資源:	click for full text (PQDT)
ISBN:	9780438238848

Statistical Learning Methods for Big Biomedical Data.
Li, Ziyi.

Statistical Learning Methods for Big Biomedical Data. - 1 online resource (143 pages)

Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B.

Thesis (Ph.D.)--Emory University, 2018.

Includes bibliographical references

The rapid advancement of biological and clinical technologies has generated several distinct types of big biomedical data, including -omics data and electronic health record data. Such data and their distinct features have created challenges in obtaining meaningful and applicable research findings. In this dissertation, we develop three statistical learning methods for the analysis of big biomedical data.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780438238848Subjects--Topical Terms:

783654
Biostatistics.
Index Terms--Genre/Form:

554714
Electronic books.

Statistical Learning Methods for Big Biomedical Data.
LDR:06047ntm a2200385Ki 4500 001 918705
005 20181030085012.5
006 m o u
007 cr mn||||a|a||
008 190606s2018 xx obm 000 0 eng d
020 $a 9780438238848
035 $a (MiAaPQ)AAI10954394
035 $a (MiAaPQ)m326m179b
035 $a AAI10954394
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Li, Ziyi. $3 1193102
245 1 0 $a Statistical Learning Methods for Big Biomedical Data.
264 0 $c 2018
300 $a 1 online resource (143 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B.
500 $a Adviser: Qi Long.
502 $a Thesis (Ph.D.)--Emory University, 2018.
504 $a Includes bibliographical references
520 $a The rapid advancement of biological and clinical technologies has generated several distinct types of big biomedical data, including -omics data and electronic health record data. Such data and their distinct features have created challenges in obtaining meaningful and applicable research findings. In this dissertation, we develop three statistical learning methods for the analysis of big biomedical data.
520 $a Principal component analysis (PCA) is a popular tool for dimensionality reduction, data mining, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often represented by graphs. Recent work has shown that incorporating such biological information improves feature selection and prediction performance in regression analysis, but there has been limited work on extending this approach to PCA. In the first project, we propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior biological information in variable selection, leading to improved feature selection and more interpretable principal component loadings and potentially providing insight on molecular underpinnings of complex diseases. Our simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods achieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to misspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are suggested in the literature to be related with glioblastoma.
520 $a Electronic health record (EHR) data provide promising opportunity to explore personalized treatment regime and to make clinical predictions. Compared with genomics data, EHR data are known for their irregularity and complexity. In addition, analyzing EHR data involves privacy issues and sharing such data among multiple research sites may not be feasible due to privacy concerns and regulatory hurdles. Recent work uses contextual embedding models and successfully builds one predictive model for analysis of EHR data from multiple sites for more than seventy common diagnoses. Although the existing model can achieve a relatively high predictive accuracy, it cannot build global models without sharing data among sites. In the second project, we propose three novel contextual embedding methods to build predictive models called Naive updates, Dropout updates, and Distributed Noise Contrastive Estimation (Distributed NCE). In addition, we also propose Distributed NCE with DP, which is an updated version of Distributed NCE, to obtain reliable privacy protections. Our simulation study with a real dataset demonstrates that the proposed methods not only can build predictive model with privacy protection distributedly, but also well preserve the model structure and achieve comparable prediction accuracy compared with hidden-truth model built with all the data.
520 $a Biclustering technique can identify local patterns of a data matrix by clustering rows and columns at the same time. Various biclustering methods have been proposed and successfully applied to analyze gene expression data. While existing biclustering methods have many desirable features, most of them are developed for continuous data and none of them can handle genomic data of various types, for example, binomial data as in Single Nucleotide Polymorphism(SNP) data or negative binomial data as in RNA-seq data. In addition, none of existing methods can utilize biological information such as those from functional genomics or proteomics. Recent work has shown that incorporating biological information can improve variable selection and prediction performance in analyses such as linear regression and multivariate analysis. In the third project, we propose a novel Bayesian biclustering method that can handle multiple data types including Gaussian, Binomial, Negative binomial, and Poisson data. In addition, our method uses a Bayesian adaptive structured shrinkage prior that enables feature selection guided by biological information such as those from functional genomics. Our simulation studies and application to multiple genomics datasets demonstrate robust and superior performance of the proposed method, compared to other existing biclustering methods.
520 $a For future work, we can continue the direction of the first topic and explore the potential extension of sparse PCA combining neural network, or continue the direction of the second topic and replace Word2Vec with recently proposed embedding approaches, or continue the direction of the third topic to incorporate subject level phenotype information into the biclustering process.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Biostatistics. $3 783654
650 4 $a Bioinformatics. $3 583857
655 7 $a Electronic books. $2 local $3 554714
690 $a 0308
690 $a 0715
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Emory University. $b Biostatistics. $3 1193103
773 0 $t Dissertation Abstracts International $g 79-12B(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10954394 $z click for full text (PQDT)