國立虎尾科技大學 |

Fraud Detection Using Optimized Machine Learning Tools Under Imbalance Classes.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Fraud Detection Using Optimized Machine Learning Tools Under Imbalance Classes./
作者:	Isangediok, Mary.
面頁冊數:	1 online resource (56 pages)
附註:	Source: Masters Abstracts International, Volume: 85-03.
Contained By:	Masters Abstracts International85-03.
標題:	Mathematics. -
電子資源:	click for full text (PQDT)
ISBN:	9798380340427

Fraud Detection Using Optimized Machine Learning Tools Under Imbalance Classes.
Isangediok, Mary.

Fraud Detection Using Optimized Machine Learning Tools Under Imbalance Classes. - 1 online resource (56 pages)

Source: Masters Abstracts International, Volume: 85-03.

Thesis (M.Sc.)--Texas A&M University - Corpus Christi, 2023.

Includes bibliographical references

Fraud detection is considered to be a challenging task due to the changing nature of fraud patterns over time and the limited availability of fraud examples to learn such sophisticated patterns. Thus, fraud detection with the aid of smart versions of machine learning (ML) tools is essential to assure safety. Fraud detection is a primary ML classification task; however, the optimum performance of the corresponding ML tool relies on the usage of the best hyperparameter values. Moreover, classification under imbalanced classes is quite challenging as it causes poor performance in minority classes, which most ML classification techniques ignore. Thus, we investigate four ML techniques, namely, logistic regression, decision trees, random forest, and extreme gradient boost, that are suitable for handling imbalance classes to maximize recall and simultaneously reduce false negatives. First, these classifiers are trained on two original benchmark unbalanced fraud detection datasets, namely, phishing website URLs and fraudulent credit card transactions. Then, three synthetically balanced datasets are produced for each original data set by implementing the sampling frameworks, namely, random under sampler, synthetic minority oversampling technique (SMOTE), and SMOTE edited nearest neighbor (SMOTEENN). The optimum hyperparameters for all 16 experiments are revealed using the method RandomzedSearchCV. The validity of the 16 approaches in the context of fraud detection is compared using two benchmark performance metrics, namely, area under the curve of receiver operating characteristics (AUC ROC) and area under the curve of precision and recall (AUC PR). For both Malware datasets, phishing website URLs, and credit card fraud transaction datasets, the results indicate that extreme gradient boost trained on the original data shows trustworthy performance in the imbalanced dataset and manages to outperform the other three methods in terms of both AUC ROC and AUC PR.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798380340427Subjects--Topical Terms:

527692
Mathematics.
Subjects--Index Terms:

Class imbalanceIndex Terms--Genre/Form:

554714
Electronic books.

Fraud Detection Using Optimized Machine Learning Tools Under Imbalance Classes.
LDR:03379ntm a22004097 4500 001 1152123
005 20241122094129.5
006 m o d
007 cr mn ---uuuuu
008 250605s2023 xx obm 000 0 eng d
020 $a 9798380340427
035 $a (MiAaPQ)AAI30576083
035 $a AAI30576083
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Isangediok, Mary. $3 1479013
245 1 0 $a Fraud Detection Using Optimized Machine Learning Tools Under Imbalance Classes.
264 0 $c 2023
300 $a 1 online resource (56 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Masters Abstracts International, Volume: 85-03.
500 $a Advisor: Gajamannage, K. H.
502 $a Thesis (M.Sc.)--Texas A&M University - Corpus Christi, 2023.
504 $a Includes bibliographical references
520 $a Fraud detection is considered to be a challenging task due to the changing nature of fraud patterns over time and the limited availability of fraud examples to learn such sophisticated patterns. Thus, fraud detection with the aid of smart versions of machine learning (ML) tools is essential to assure safety. Fraud detection is a primary ML classification task; however, the optimum performance of the corresponding ML tool relies on the usage of the best hyperparameter values. Moreover, classification under imbalanced classes is quite challenging as it causes poor performance in minority classes, which most ML classification techniques ignore. Thus, we investigate four ML techniques, namely, logistic regression, decision trees, random forest, and extreme gradient boost, that are suitable for handling imbalance classes to maximize recall and simultaneously reduce false negatives. First, these classifiers are trained on two original benchmark unbalanced fraud detection datasets, namely, phishing website URLs and fraudulent credit card transactions. Then, three synthetically balanced datasets are produced for each original data set by implementing the sampling frameworks, namely, random under sampler, synthetic minority oversampling technique (SMOTE), and SMOTE edited nearest neighbor (SMOTEENN). The optimum hyperparameters for all 16 experiments are revealed using the method RandomzedSearchCV. The validity of the 16 approaches in the context of fraud detection is compared using two benchmark performance metrics, namely, area under the curve of receiver operating characteristics (AUC ROC) and area under the curve of precision and recall (AUC PR). For both Malware datasets, phishing website URLs, and credit card fraud transaction datasets, the results indicate that extreme gradient boost trained on the original data shows trustworthy performance in the imbalanced dataset and manages to outperform the other three methods in terms of both AUC ROC and AUC PR.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Mathematics. $3 527692
650 4 $a Computer science. $3 573171
650 4 $a Information technology. $3 559429
653 $a Class imbalance
653 $a Credit card
653 $a Cyber crime
653 $a Fraud detection
653 $a Machine learning
653 $a Phishing website URLs
655 7 $a Electronic books. $2 local $3 554714
690 $a 0405
690 $a 0984
690 $a 0489
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Texas A&M University - Corpus Christi. $b Mathematics. $3 1186088
773 0 $t Masters Abstracts International $g 85-03.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30576083 $z click for full text (PQDT)