語系:
繁體中文
English
說明(常見問題)
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Machine Learning for Detecting Trends and Topics From Research Papers and Proceedings.
紀錄類型:
書目-語言資料,手稿 : Monograph/item
正題名/作者:
Machine Learning for Detecting Trends and Topics From Research Papers and Proceedings./
作者:
Dixon, Jose.
面頁冊數:
1 online resource (112 pages)
附註:
Source: Masters Abstracts International, Volume: 84-11.
Contained By:
Masters Abstracts International84-11.
標題:
Information science. -
電子資源:
click for full text (PQDT)
ISBN:
9798379533373
Machine Learning for Detecting Trends and Topics From Research Papers and Proceedings.
Dixon, Jose.
Machine Learning for Detecting Trends and Topics From Research Papers and Proceedings.
- 1 online resource (112 pages)
Source: Masters Abstracts International, Volume: 84-11.
Thesis (M.S.)--Morgan State University, 2023.
Includes bibliographical references
1,000 portable document files are divided into five labels from the World Health Organization COVID-19 Research Downloadable Articles and PubMed Central databases for positive and negative papers. PDF files are converted into unstructured raw text files. Tokenization and lemmatization are done using the Natural Language Toolkit Library after removing punctuation. Training size variation and subsampling were varied experimentally to determine their effect on the performance measures. Supervised learning classification is performed using the Scikit-learn library and the following classifiers: Random Forest, Naive Bayes, Decision Tree, XGBoost, and Logistic Regression. Imbalanced sampling techniques are implemented using the Imbalanced-learn library based on the following techniques: Synthetic Minority Oversampling Technique, Random Oversampling, Random Undersampling, TomekLinks, and NearMiss to address the problem of distribution of positive and negative samples. R and the tidyverse are used to conduct statistical and exploratory data analysis on performance metrics. The machine learning classifiers achieve an average precision score of 78% and a recall score of 77%, while the sampling techniques have higher average precision and recall scores of 80% and 81%, respectively. Correcting imbalanced sampling supplied significant p-values from NearMiss, ROS, and SMOTE for precision and recall scores. This work has shown that training size variation, subsampling, and imbalanced sampling techniques with machine learning algorithms can improve performance in the results of precision, recall, accuracy, and area under the curve scores, including the analysis of variance.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024
Mode of access: World Wide Web
ISBN: 9798379533373Subjects--Topical Terms:
561178
Information science.
Subjects--Index Terms:
Imbalanced samplingIndex Terms--Genre/Form:
554714
Electronic books.
Machine Learning for Detecting Trends and Topics From Research Papers and Proceedings.
LDR
:03123ntm a22004097 4500
001
1142565
005
20240422071019.5
006
m o d
007
cr mn ---uuuuu
008
250605s2023 xx obm 000 0 eng d
020
$a
9798379533373
035
$a
(MiAaPQ)AAI30313079
035
$a
AAI30313079
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
$d
NTU
100
1
$a
Dixon, Jose.
$3
1466941
245
1 0
$a
Machine Learning for Detecting Trends and Topics From Research Papers and Proceedings.
264
0
$c
2023
300
$a
1 online resource (112 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Masters Abstracts International, Volume: 84-11.
500
$a
Includes supplementary digital materials.
500
$a
Advisor: Rahman, Md Mahmudur.
502
$a
Thesis (M.S.)--Morgan State University, 2023.
504
$a
Includes bibliographical references
520
$a
1,000 portable document files are divided into five labels from the World Health Organization COVID-19 Research Downloadable Articles and PubMed Central databases for positive and negative papers. PDF files are converted into unstructured raw text files. Tokenization and lemmatization are done using the Natural Language Toolkit Library after removing punctuation. Training size variation and subsampling were varied experimentally to determine their effect on the performance measures. Supervised learning classification is performed using the Scikit-learn library and the following classifiers: Random Forest, Naive Bayes, Decision Tree, XGBoost, and Logistic Regression. Imbalanced sampling techniques are implemented using the Imbalanced-learn library based on the following techniques: Synthetic Minority Oversampling Technique, Random Oversampling, Random Undersampling, TomekLinks, and NearMiss to address the problem of distribution of positive and negative samples. R and the tidyverse are used to conduct statistical and exploratory data analysis on performance metrics. The machine learning classifiers achieve an average precision score of 78% and a recall score of 77%, while the sampling techniques have higher average precision and recall scores of 80% and 81%, respectively. Correcting imbalanced sampling supplied significant p-values from NearMiss, ROS, and SMOTE for precision and recall scores. This work has shown that training size variation, subsampling, and imbalanced sampling techniques with machine learning algorithms can improve performance in the results of precision, recall, accuracy, and area under the curve scores, including the analysis of variance.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2024
538
$a
Mode of access: World Wide Web
650
4
$a
Information science.
$3
561178
650
4
$a
Computer science.
$3
573171
653
$a
Imbalanced sampling
653
$a
Machine learning
653
$a
Statistical analysis
653
$a
Subsampling
653
$a
Text classification
653
$a
Text retrieval
655
7
$a
Electronic books.
$2
local
$3
554714
690
$a
0984
690
$a
0723
710
2
$a
Morgan State University.
$b
Computer Science and Bioinformatics Program.
$3
1466942
710
2
$a
ProQuest Information and Learning Co.
$3
1178819
773
0
$t
Masters Abstracts International
$g
84-11.
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30313079
$z
click for full text (PQDT)
筆 0 讀者評論
多媒體
評論
新增評論
分享你的心得
Export
取書館別
處理中
...
變更密碼[密碼必須為2種組合(英文和數字)及長度為10碼以上]
登入