國立虎尾科技大學 |

Complexity Penalized Methods for Structured and Unstructured Data.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Complexity Penalized Methods for Structured and Unstructured Data./
作者:	Goeva, Aleksandrina Valerieva.
面頁冊數:	1 online resource (136 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-04(E), Section: B.
Contained By:	Dissertation Abstracts International79-04B(E).
標題:	Statistics. -
電子資源:	click for full text (PQDT)
ISBN:	9780355460612

Complexity Penalized Methods for Structured and Unstructured Data.
Goeva, Aleksandrina Valerieva.

Complexity Penalized Methods for Structured and Unstructured Data. - 1 online resource (136 pages)

Source: Dissertation Abstracts International, Volume: 79-04(E), Section: B.

Thesis (Ph.D.)--Boston University, 2017.

Includes bibliographical references

A fundamental goal of statisticians is to make inferences from the sample about characteristics of the underlying population. This is an inverse problem, since we are trying to recover a feature of the input with the availability of observations on an output. Towards this end, we consider complexity penalized methods, because they balance goodness of fit and generalizability of the solution. The data from the underlying population may come in diverse formats - structured or unstructured - such as probability distributions, text tokens, or graph characteristics. Depending on the defining features of the problem we can chose the appropriate complexity penalized approach, and assess the quality of the estimate produced by it. Favorable characteristics are strong theoretical guarantees of closeness to the true value and interpretability. Our work fits within this framework and spans the areas of simulation optimization, text mining and network inference. The first problem we consider is model calibration under the assumption that given a hypothesized input model, we can use stochastic simulation to obtain its corresponding output observations. We formulate it as a stochastic program by maximizing the entropy of the input distribution subject to moment matching. We then propose an iterative scheme via simulation to approximately solve it. We prove convergence of the proposed algorithm under appropriate conditions and demonstrate the performance via numerical studies. The second problem we consider is summarizing text documents through an inferred set of topics. We propose a frequentist reformulation of a Bayesian regularization scheme. Through our complexity-penalized perspective we lend further insight into the nature of the loss function and the regularization achieved through the priors in the Bayesian formulation. The third problem is concerned with the impact of sampling on the degree distribution of a network. Under many sampling designs, we have a linear inverse problem characterized by an ill-conditioned matrix. We investigate the theoretical properties of an approximate solution for the degree distribution found by regularizing the solution of the ill-conditioned least squares objective. Particularly, we study the rate at which the penalized solution tends to the true value as a function of network size and sampling rate.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780355460612Subjects--Topical Terms:

556824
Statistics.
Index Terms--Genre/Form:

554714
Electronic books.

Complexity Penalized Methods for Structured and Unstructured Data.
LDR:03612ntm a2200349Ki 4500 001 920618
005 20181203094030.5
006 m o u
007 cr mn||||a|a||
008 190606s2017 xx obm 000 0 eng d
020 $a 9780355460612
035 $a (MiAaPQ)AAI10268226
035 $a (MiAaPQ)bu:12954
035 $a AAI10268226
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Goeva, Aleksandrina Valerieva. $3 1195473
245 1 0 $a Complexity Penalized Methods for Structured and Unstructured Data.
264 0 $c 2017
300 $a 1 online resource (136 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-04(E), Section: B.
500 $a Advisers: Eric D. Kolaczyk; Henry Lam.
502 $a Thesis (Ph.D.)--Boston University, 2017.
504 $a Includes bibliographical references
520 $a A fundamental goal of statisticians is to make inferences from the sample about characteristics of the underlying population. This is an inverse problem, since we are trying to recover a feature of the input with the availability of observations on an output. Towards this end, we consider complexity penalized methods, because they balance goodness of fit and generalizability of the solution. The data from the underlying population may come in diverse formats - structured or unstructured - such as probability distributions, text tokens, or graph characteristics. Depending on the defining features of the problem we can chose the appropriate complexity penalized approach, and assess the quality of the estimate produced by it. Favorable characteristics are strong theoretical guarantees of closeness to the true value and interpretability. Our work fits within this framework and spans the areas of simulation optimization, text mining and network inference. The first problem we consider is model calibration under the assumption that given a hypothesized input model, we can use stochastic simulation to obtain its corresponding output observations. We formulate it as a stochastic program by maximizing the entropy of the input distribution subject to moment matching. We then propose an iterative scheme via simulation to approximately solve it. We prove convergence of the proposed algorithm under appropriate conditions and demonstrate the performance via numerical studies. The second problem we consider is summarizing text documents through an inferred set of topics. We propose a frequentist reformulation of a Bayesian regularization scheme. Through our complexity-penalized perspective we lend further insight into the nature of the loss function and the regularization achieved through the priors in the Bayesian formulation. The third problem is concerned with the impact of sampling on the degree distribution of a network. Under many sampling designs, we have a linear inverse problem characterized by an ill-conditioned matrix. We investigate the theoretical properties of an approximate solution for the degree distribution found by regularizing the solution of the ill-conditioned least squares objective. Particularly, we study the rate at which the penalized solution tends to the true value as a function of network size and sampling rate.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Statistics. $3 556824
650 4 $a Mathematics. $3 527692
650 4 $a Applied mathematics. $3 1069907
655 7 $a Electronic books. $2 local $3 554714
690 $a 0463
690 $a 0405
690 $a 0364
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Boston University. $b Mathematics and Statistics. $3 1195474
773 0 $t Dissertation Abstracts International $g 79-04B(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10268226 $z click for full text (PQDT)