國立虎尾科技大學 |

Essays in Econometrics and Machine Learning.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Essays in Econometrics and Machine Learning./
作者:	Yao, Qingsong.
面頁冊數:	1 online resource (200 pages)
附註:	Source: Dissertations Abstracts International, Volume: 85-10, Section: B.
Contained By:	Dissertations Abstracts International85-10B.
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9798382191386

Essays in Econometrics and Machine Learning.
Yao, Qingsong.

Essays in Econometrics and Machine Learning. - 1 online resource (200 pages)

Source: Dissertations Abstracts International, Volume: 85-10, Section: B.

Thesis (Ph.D.)--Boston College, 2024.

Includes bibliographical references

This dissertation consists of three chapters demonstrating how the current econometric problems can be solved by using machine learning techniques. In the first chapter, I propose new approaches to estimating large dimensional monotone index models. This class of models has been popular in the applied and theoretical econometrics literatures as it includes discrete choice, nonparametric transformation, and duration models. A main advantage of my approach is computational. For instance, rank estimation procedures such as those proposed in Han (1987) and Cavanagh and Sherman (1998) that optimize a nonsmooth, non convex objective function are difficult to use with more than a few regressors and so limits their use in with economic data sets. For such monotone index models with increasing dimension, we propose to use a new class of estimators based on batched gradient descent (BGD) involving nonparametric methods such as kernel estimation or sieve estimation, and study their asymptotic properties. The BGD algorithm uses an iterative procedure where the key step exploits a strictly convex objective function, resulting in computational advantages. A contribution of my approach is that the model is large dimensional and semiparametric and so does not require the use of parametric distributional assumptions.The second chapter studies the estimation of semiparametric monotone index models when the sample size n is extremely large and conventional approaches fail to work due to devastating computational burdens. Motivated by the mini-batch gradient descent algorithm (MBGD) that is widely used as a stochastic optimization tool in the machine learning field, this chapter proposes a novel subsample- and iteration-based estimation procedure. In particular, starting from any initial guess of the true parameter, the estimator is progressively updated using a sequence of subsamples randomly drawn from the data set whose sample size is much smaller than n. The update is based on the gradient of some well-chosen loss function, where the nonparametric component in the model is replaced with its Nadaraya-Watson kernel estimator that is also constructed based on the random subsamples. The proposed algorithm essentially generalizes MBGD algorithm to the semiparametric setup. Since the new method uses only a subsample to perform Nadaraya-Watson kernel estimation and conduct the update, compared with the full-sample-based iterative method, the new method reduces the computational time by roughly n times if the subsample size and the kernel function are chosen properly, so can be easily applied when the sample size n is large. Moreover, this chapter shows that if averages are further conducted across the estimators produced during iterations, the difference between the average estimator and full-sample-based estimator will be 1/ √ n-trivial. Consequently, the averaged estimator is 1/ √ n-consistent and asymptotically normally distributed. In other words, the new estimator substantially improves the computational speed, while at the same time maintains the estimation accuracy. Finally, extensive Monte Carlo experiments and real data analysis illustrate the excellent performance of novel algorithm in terms of computational efficiency when the sample size is extremely large.Finally, the third chapter studies robust inference procedure for treatment effects in panel data with flexible relationship across units via the random forest method. The key contribution of this chapter is twofold. First, it proposes a direct construction of prediction intervals for the treatment effect by exploiting the information of the joint distribution of the cross-sectional units to construct counter-factuals using random forest. In particular, it proposes a Quantile Control Method (QCM) using the Quantile Random Forest (QRF) to accommodate flexible cross-sectional structure as well as high dimensionality. Second, it establishes the asymptotic consistency of QRF under the panel/time series setup with high dimensionality, which is of theoretical interest on its own right. In addition, Monte Carlo simulations are conducted and show that prediction intervals via the QCM have excellent coverage probability for the treatment effects comparing to existing methods in the literature, and are robust to heteroskedasticity, autocorrelation, and various types of model misspecifications. Finally, an empirical application to study the effect of the economic integration between Hong Kong and mainland China on Hong Kong's economy is conducted to highlight the potential of the proposed method.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798382191386Subjects--Topical Terms:

573171
Computer science.
Subjects--Index Terms:

Batched gradient descent Index Terms--Genre/Form:

554714
Electronic books.

Essays in Econometrics and Machine Learning.
LDR:06001ntm a22003977 4500 001 1147566
005 20240909103814.5
006 m o d
007 cr bn ---uuuuu
008 250605s2024 xx obm 000 0 eng d
020 $a 9798382191386
035 $a (MiAaPQ)AAI31143259
035 $a AAI31143259
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Yao, Qingsong. $3 1473323
245 1 0 $a Essays in Econometrics and Machine Learning.
264 0 $c 2024
300 $a 1 online resource (200 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 85-10, Section: B.
500 $a Advisor: Khan, Shakeeb;Xiao, Zhijie.
502 $a Thesis (Ph.D.)--Boston College, 2024.
504 $a Includes bibliographical references
520 $a This dissertation consists of three chapters demonstrating how the current econometric problems can be solved by using machine learning techniques. In the first chapter, I propose new approaches to estimating large dimensional monotone index models. This class of models has been popular in the applied and theoretical econometrics literatures as it includes discrete choice, nonparametric transformation, and duration models. A main advantage of my approach is computational. For instance, rank estimation procedures such as those proposed in Han (1987) and Cavanagh and Sherman (1998) that optimize a nonsmooth, non convex objective function are difficult to use with more than a few regressors and so limits their use in with economic data sets. For such monotone index models with increasing dimension, we propose to use a new class of estimators based on batched gradient descent (BGD) involving nonparametric methods such as kernel estimation or sieve estimation, and study their asymptotic properties. The BGD algorithm uses an iterative procedure where the key step exploits a strictly convex objective function, resulting in computational advantages. A contribution of my approach is that the model is large dimensional and semiparametric and so does not require the use of parametric distributional assumptions.The second chapter studies the estimation of semiparametric monotone index models when the sample size n is extremely large and conventional approaches fail to work due to devastating computational burdens. Motivated by the mini-batch gradient descent algorithm (MBGD) that is widely used as a stochastic optimization tool in the machine learning field, this chapter proposes a novel subsample- and iteration-based estimation procedure. In particular, starting from any initial guess of the true parameter, the estimator is progressively updated using a sequence of subsamples randomly drawn from the data set whose sample size is much smaller than n. The update is based on the gradient of some well-chosen loss function, where the nonparametric component in the model is replaced with its Nadaraya-Watson kernel estimator that is also constructed based on the random subsamples. The proposed algorithm essentially generalizes MBGD algorithm to the semiparametric setup. Since the new method uses only a subsample to perform Nadaraya-Watson kernel estimation and conduct the update, compared with the full-sample-based iterative method, the new method reduces the computational time by roughly n times if the subsample size and the kernel function are chosen properly, so can be easily applied when the sample size n is large. Moreover, this chapter shows that if averages are further conducted across the estimators produced during iterations, the difference between the average estimator and full-sample-based estimator will be 1/ √ n-trivial. Consequently, the averaged estimator is 1/ √ n-consistent and asymptotically normally distributed. In other words, the new estimator substantially improves the computational speed, while at the same time maintains the estimation accuracy. Finally, extensive Monte Carlo experiments and real data analysis illustrate the excellent performance of novel algorithm in terms of computational efficiency when the sample size is extremely large.Finally, the third chapter studies robust inference procedure for treatment effects in panel data with flexible relationship across units via the random forest method. The key contribution of this chapter is twofold. First, it proposes a direct construction of prediction intervals for the treatment effect by exploiting the information of the joint distribution of the cross-sectional units to construct counter-factuals using random forest. In particular, it proposes a Quantile Control Method (QCM) using the Quantile Random Forest (QRF) to accommodate flexible cross-sectional structure as well as high dimensionality. Second, it establishes the asymptotic consistency of QRF under the panel/time series setup with high dimensionality, which is of theoretical interest on its own right. In addition, Monte Carlo simulations are conducted and show that prediction intervals via the QCM have excellent coverage probability for the treatment effects comparing to existing methods in the literature, and are robust to heteroskedasticity, autocorrelation, and various types of model misspecifications. Finally, an empirical application to study the effect of the economic integration between Hong Kong and mainland China on Hong Kong's economy is conducted to highlight the potential of the proposed method.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
653 $a Batched gradient descent
653 $a Machine learning
653 $a Nadaraya-Watson kernel estimator
653 $a Quantile Control Method
653 $a Quantile Random Forest
655 7 $a Electronic books. $2 local $3 554714
690 $a 0501
690 $a 0984
690 $a 0800
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Boston College. $b GSAS - Economics. $3 1473324
773 0 $t Dissertations Abstracts International $g 85-10B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31143259 $z click for full text (PQDT)