國立虎尾科技大學 |

Distributed and Robust Statistical Learning.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Distributed and Robust Statistical Learning./
作者:	Zhu, Ziwei.
面頁冊數:	1 online resource (268 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.
Contained By:	Dissertation Abstracts International79-10B(E).
標題:	Statistics. -
電子資源:	click for full text (PQDT)
ISBN:	9780438047990

Distributed and Robust Statistical Learning.
Zhu, Ziwei.

Distributed and Robust Statistical Learning. - 1 online resource (268 pages)

Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.

Thesis (Ph.D.)--Princeton University, 2018.

Includes bibliographical references

Decentralized and corrupted data are nowadays ubiquitous, which impose fundamental challenges for modern statistical analysis. Illustrative examples are massive and decentralized data produced by distributed data collection systems of giant IT companies, corrupted measurement in genetic micro-array analysis, heavy-tailed returns of stocks and etc. These notorious features of modern data often contradict conventional theoretical assumptions in statistics research and invalidate standard statistical procedures. My dissertation addresses these problems by proposing new methodologies with strong statistical guarantees. When data are distributed over different places with limited communication budget, we propose to do local statistical analysis first and aggregate the local results rather than the data themselves to generate a final result. We applied this approach to low-dimensional regression, high-dimensional sparse regression and principal component analysis. When data are not over-scattered, our distributed approach is proved to achieve the same statistical performance as the full sample oracle, i.e., the standard procedure based on all the data. To handle heavy-tailed corruption, we propose a generic principle of data shrinkage for robust estimation and inference. To illustrate this principle, we apply it to estimate regression coefficients in the trace regression model and generalized linear model with heavy-tailed noise and design. The proposed method achieves nearly the same statistical error rate as the standard procedure while requiring only bounded moment conditions on data. This widens the scope of high-dimensional techniques, reducing the moment conditions from sub-exponential or sub-Gaussian distributions to merely bounded second or fourth moment.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780438047990Subjects--Topical Terms:

556824
Statistics.
Index Terms--Genre/Form:

554714
Electronic books.

Distributed and Robust Statistical Learning.
LDR:02983ntm a2200337Ki 4500 001 919038
005 20181106103646.5
006 m o u
007 cr mn||||a|a||
008 190606s2018 xx obm 000 0 eng d
020 $a 9780438047990
035 $a (MiAaPQ)AAI10815951
035 $a (MiAaPQ)princeton:12532
035 $a AAI10815951
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Zhu, Ziwei. $3 1193516
245 1 0 $a Distributed and Robust Statistical Learning.
264 0 $c 2018
300 $a 1 online resource (268 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.
500 $a Adviser: Jianqing Fan.
502 $a Thesis (Ph.D.)--Princeton University, 2018.
504 $a Includes bibliographical references
520 $a Decentralized and corrupted data are nowadays ubiquitous, which impose fundamental challenges for modern statistical analysis. Illustrative examples are massive and decentralized data produced by distributed data collection systems of giant IT companies, corrupted measurement in genetic micro-array analysis, heavy-tailed returns of stocks and etc. These notorious features of modern data often contradict conventional theoretical assumptions in statistics research and invalidate standard statistical procedures. My dissertation addresses these problems by proposing new methodologies with strong statistical guarantees. When data are distributed over different places with limited communication budget, we propose to do local statistical analysis first and aggregate the local results rather than the data themselves to generate a final result. We applied this approach to low-dimensional regression, high-dimensional sparse regression and principal component analysis. When data are not over-scattered, our distributed approach is proved to achieve the same statistical performance as the full sample oracle, i.e., the standard procedure based on all the data. To handle heavy-tailed corruption, we propose a generic principle of data shrinkage for robust estimation and inference. To illustrate this principle, we apply it to estimate regression coefficients in the trace regression model and generalized linear model with heavy-tailed noise and design. The proposed method achieves nearly the same statistical error rate as the standard procedure while requiring only bounded moment conditions on data. This widens the scope of high-dimensional techniques, reducing the moment conditions from sub-exponential or sub-Gaussian distributions to merely bounded second or fourth moment.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Statistics. $3 556824
650 4 $a Operations research. $3 573517
655 7 $a Electronic books. $2 local $3 554714
690 $a 0463
690 $a 0796
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Princeton University. $b Operations Research and Financial Engineering. $3 1182940
773 0 $t Dissertation Abstracts International $g 79-10B(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10815951 $z click for full text (PQDT)