國立虎尾科技大學 |

The Role of Model Implementation in Neuroscientific Applications of Machine Learning.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	The Role of Model Implementation in Neuroscientific Applications of Machine Learning./
作者:	Abe, Taiga.
面頁冊數:	1 online resource (233 pages)
附註:	Source: Dissertations Abstracts International, Volume: 85-07, Section: B.
Contained By:	Dissertations Abstracts International85-07B.
標題:	Neurosciences. -
電子資源:	click for full text (PQDT)
ISBN:	9798381276411

The Role of Model Implementation in Neuroscientific Applications of Machine Learning.
Abe, Taiga.

The Role of Model Implementation in Neuroscientific Applications of Machine Learning. - 1 online resource (233 pages)

Source: Dissertations Abstracts International, Volume: 85-07, Section: B.

Thesis (Ph.D.)--Columbia University, 2024.

Includes bibliographical references

In modern neuroscience, large scale machine learning models are becoming increasingly critical components of data analysis. Despite the accelerating adoption of these large scale machine learning tools, there are fundamental challenges to their use in scientific applications that remain largely unaddressed. In this thesis, I focus on one such challenge: variability in the predictions of large scale machine learning models relative to seemingly trivial differences in their implementation. Existing research has shown that the performance of large scale machine learning models (more so than traditional model like linear regression) is meaningfully entangled with design choices such as the hardware components, operating system, software dependencies, and random seed that the corresponding model depends upon. Within the bounds of current practice, there are few ways of controlling this kind of implementation variability across the broad community of neuroscience researchers (making data analysis less reproducible), and little understanding of how data analyses might be designed to mitigate these issues (making data analysis unreliable). This dissertation will present two broad research directions that address these shortcomings. First, I will describe a novel, cloud-based platform for sharing data analysis tools reproducibly and at scale. This platform, called NeuroCAAS, enables developers of novel data analyses to precisely specify an implementation of their entire data analysis, which can then be used automatically by any other user on custom built cloud resources. I show that this approach is able to efficiently support a wide variety of existing data analysis tools, as well as novel tools which would not be feasible to build and share outside of a platform like NeuroCAAS. Second, I conduct two large-scale studies on the behavior of deep ensembles. Deep ensembles are a class of machine learning model which uses implementation variability to improve the quality of model predictions; in particular, by aggregating the predictions of deep networks over stochastic initialization and training. Deep ensembles simultaneously provide a way to control the impact of implementation variability (by aggregating predictions across random seeds) and also to understand what kind of predictive diversity is generated by this particular form of implementation variability. I present a number of surprising results that contradict widely held intuitions about the performance of deep ensembles as well as the mechanisms behind their success, and show that in many aspects, the behavior of deep ensembles is similar to that of an appropriately chosen single neural network. As a whole, this dissertation presents novel methods and insights focused on the role of implementation variability in large scale machine learning models, and more generally upon the challenges of working with such large models in neuroscience data analysis. I conclude by discussing other ongoing efforts to improve the reproducibility and accessibility of large scale machine learning in neuroscience, as well as long term goals to speed the adoption and reliability of such methods in a scientific context.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798381276411Subjects--Topical Terms:

593561
Neurosciences.
Subjects--Index Terms:

Data analysisIndex Terms--Genre/Form:

554714
Electronic books.

The Role of Model Implementation in Neuroscientific Applications of Machine Learning.
LDR:04652ntm a22004337 4500 001 1148784
005 20240930100147.5
006 m o d
007 cr bn ---uuuuu
008 250605s2024 xx obm 000 0 eng d
020 $a 9798381276411
035 $a (MiAaPQ)AAI30818660
035 $a AAI30818660
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Abe, Taiga. $3 1474841
245 1 4 $a The Role of Model Implementation in Neuroscientific Applications of Machine Learning.
264 0 $c 2024
300 $a 1 online resource (233 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 85-07, Section: B.
500 $a Advisor: Cunningham, John P.
502 $a Thesis (Ph.D.)--Columbia University, 2024.
504 $a Includes bibliographical references
520 $a In modern neuroscience, large scale machine learning models are becoming increasingly critical components of data analysis. Despite the accelerating adoption of these large scale machine learning tools, there are fundamental challenges to their use in scientific applications that remain largely unaddressed. In this thesis, I focus on one such challenge: variability in the predictions of large scale machine learning models relative to seemingly trivial differences in their implementation. Existing research has shown that the performance of large scale machine learning models (more so than traditional model like linear regression) is meaningfully entangled with design choices such as the hardware components, operating system, software dependencies, and random seed that the corresponding model depends upon. Within the bounds of current practice, there are few ways of controlling this kind of implementation variability across the broad community of neuroscience researchers (making data analysis less reproducible), and little understanding of how data analyses might be designed to mitigate these issues (making data analysis unreliable). This dissertation will present two broad research directions that address these shortcomings. First, I will describe a novel, cloud-based platform for sharing data analysis tools reproducibly and at scale. This platform, called NeuroCAAS, enables developers of novel data analyses to precisely specify an implementation of their entire data analysis, which can then be used automatically by any other user on custom built cloud resources. I show that this approach is able to efficiently support a wide variety of existing data analysis tools, as well as novel tools which would not be feasible to build and share outside of a platform like NeuroCAAS. Second, I conduct two large-scale studies on the behavior of deep ensembles. Deep ensembles are a class of machine learning model which uses implementation variability to improve the quality of model predictions; in particular, by aggregating the predictions of deep networks over stochastic initialization and training. Deep ensembles simultaneously provide a way to control the impact of implementation variability (by aggregating predictions across random seeds) and also to understand what kind of predictive diversity is generated by this particular form of implementation variability. I present a number of surprising results that contradict widely held intuitions about the performance of deep ensembles as well as the mechanisms behind their success, and show that in many aspects, the behavior of deep ensembles is similar to that of an appropriately chosen single neural network. As a whole, this dissertation presents novel methods and insights focused on the role of implementation variability in large scale machine learning models, and more generally upon the challenges of working with such large models in neuroscience data analysis. I conclude by discussing other ongoing efforts to improve the reproducibility and accessibility of large scale machine learning in neuroscience, as well as long term goals to speed the adoption and reliability of such methods in a scientific context.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Neurosciences. $3 593561
650 4 $a Computer science. $3 573171
650 4 $a Bioinformatics. $3 583857
653 $a Data analysis
653 $a Ensembles
653 $a Infrastructure
653 $a Machine learning
653 $a Reproducibility
653 $a Robustness
653 $a Accessibility
655 7 $a Electronic books. $2 local $3 554714
690 $a 0317
690 $a 0984
690 $a 0800
690 $a 0715
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Columbia University. $b Neurobiology and Behavior. $3 1186904
773 0 $t Dissertations Abstracts International $g 85-07B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30818660 $z click for full text (PQDT)