國立虎尾科技大學 |

Novel Random Forest and Variable Importance Methods for Clustered Data.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Novel Random Forest and Variable Importance Methods for Clustered Data./
作者:	Calhoun, Peter Montgomery.
面頁冊數:	1 online resource (124 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-01(E), Section: B.
Contained By:	Dissertation Abstracts International79-01B(E).
標題:	Statistics. -
電子資源:	click for full text (PQDT)
ISBN:	9780355214468

Novel Random Forest and Variable Importance Methods for Clustered Data.
Calhoun, Peter Montgomery.

Novel Random Forest and Variable Importance Methods for Clustered Data. - 1 online resource (124 pages)

Source: Dissertation Abstracts International, Volume: 79-01(E), Section: B.

Thesis (Ph.D.)--The Claremont Graduate University, 2017.

Includes bibliographical references

Tree-based methods are becoming increasingly popular due to their few statistical assumptions and accurate predictions. Classification and Regression Trees (CART) can handle a variety of data structures and give easy to interpret prediction rules. However, there are several limitations with CART including requiring independent outcomes, having high variance, giving poor predictive performance, and inducing a variable selection bias. In this dissertation, we discuss these limitations and propose algorithms that resolve these issues.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780355214468Subjects--Topical Terms:

556824
Statistics.
Index Terms--Genre/Form:

554714
Electronic books.

Novel Random Forest and Variable Importance Methods for Clustered Data.
LDR:05051ntm a2200385Ki 4500 001 918630
005 20181030085011.5
006 m o u
007 cr mn||||a|a||
008 190606s2017 xx obm 000 0 eng d
020 $a 9780355214468
035 $a (MiAaPQ)AAI10284219
035 $a (MiAaPQ)cgu:11080
035 $a AAI10284219
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Calhoun, Peter Montgomery. $3 1193000
245 1 0 $a Novel Random Forest and Variable Importance Methods for Clustered Data.
264 0 $c 2017
300 $a 1 online resource (124 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-01(E), Section: B.
500 $a Adviser: Juanjuan Fan.
502 $a Thesis (Ph.D.)--The Claremont Graduate University, 2017.
504 $a Includes bibliographical references
520 $a Tree-based methods are becoming increasingly popular due to their few statistical assumptions and accurate predictions. Classification and Regression Trees (CART) can handle a variety of data structures and give easy to interpret prediction rules. However, there are several limitations with CART including requiring independent outcomes, having high variance, giving poor predictive performance, and inducing a variable selection bias. In this dissertation, we discuss these limitations and propose algorithms that resolve these issues.
520 $a In Chapter 1, we introduce CART and discuss the advantages with tree-based methods. We show CART handles interactions and nonlinear relationships and provides easy to interpret prediction rules. We conclude with an example and discuss some of the limitations with the standard CART implementation.
520 $a In Chapter 2, we discuss the MST R package which extends the CART implementation to handle multivariate survival data. We introduce multivariate survival trees and illustrate how they can be constructed in R. We discuss some of the features of the MST R package. We analyze a dental study to predict tooth loss and estimate survival of molars and non-molars. We conclude with future directions of the MST R package.
520 $a In Chapter 3, we introduce random forests. Random forests reduce the variance from CART and are one of the most accurate machine learning methods to make predictions and analyze studies. However, the variable selection bias found in CART still occurs with random forests. We propose a variant of the random forest called completely randomized with acceptance-rejection trees (CRAR). We compare our proposed method with three other methods of constructing random forests: standard random forest (RF), smooth sigmoid surrogate trees (SSS), and extremely randomized trees (ER). We find CRAR and ER have the best overall accuracy and performance for classification problems. They have the lowest misclassification rates, reduce or eliminate the variable selection bias, and are the fastest algorithms. The best algorithm for regression problems may be selected based on the overall objective --- whether it be high accuracy, variable selection, or speed. We recommend considering all four algorithms based on the study and objective.
520 $a In Chapter 4, we propose the repeated measures random forest (RMRF) algorithm that extends the standard random forest implementation to handle longitudinal designs. The RMRF algorithm uses subsamples, the robust Wald statistic, and an accept-reject quality control step to grow an ensemble of trees. We adopt an area under the curve (AUC) based permuted importance method to assess variable importance. We show the RMRF algorithm outperforms other algorithms that naively assume independence under a variety of data simulations. An algorithm that ignores the dependence will favor patient-level variables for strongly correlated responses. We also show the RMRF algorithm outperforms RF and ER at identifying the informative variable.
520 $a The final chapter uses the RMRF algorithm to identify factors associated with nocturnal hypoglycemia. We adopt a permuted importance method to test significance of factors with random forests. We find hemoglobin A1c (P=0.01), bedtime blood glucose (P=0.01), insulin on board (P=0.03), time system activated (P=0.02), exercise (P=0.01), and daytime hypoglycemia (P=0.01) are associated with nocturnal hypoglycemia. We show interaction effects affect hypoglycemia and explore the significance of time system activated. Finally, we assign risk profiles to each night and show the RMRF algorithm accurately predicts nocturnal hypoglycemia. We conclude the proposed RMRF algorithm can identify influential variables while handling dependent outcomes.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Statistics. $3 556824
655 7 $a Electronic books. $2 local $3 554714
690 $a 0463
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a The Claremont Graduate University. $b School of Mathematical Sciences. $3 1193001
773 0 $t Dissertation Abstracts International $g 79-01B(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10284219 $z click for full text (PQDT)