語系:
繁體中文
English
說明(常見問題)
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Application of Machine Learning in I...
~
Thomas, Tara Elizabeth.
Application of Machine Learning in Improving System Reliability and Performance.
紀錄類型:
書目-語言資料,手稿 : Monograph/item
正題名/作者:
Application of Machine Learning in Improving System Reliability and Performance./
作者:
Thomas, Tara Elizabeth.
面頁冊數:
1 online resource (68 pages)
附註:
Source: Masters Abstracts International, Volume: 57-01.
Contained By:
Masters Abstracts International57-01(E).
標題:
Computer engineering. -
電子資源:
click for full text (PQDT)
ISBN:
9780355306309
Application of Machine Learning in Improving System Reliability and Performance.
Thomas, Tara Elizabeth.
Application of Machine Learning in Improving System Reliability and Performance.
- 1 online resource (68 pages)
Source: Masters Abstracts International, Volume: 57-01.
Thesis (M.S.E.C.E.)
Includes bibliographical references
Improving the reliability and performance are of utmost importance for any system. This thesis presents two machine learning based techniques- one which improves the reliability of parallel programs by detecting silent data corruption, and the other which improves the performance of factory operations by optimizing the scheduling algorithm and detecting bottlenecks.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018
Mode of access: World Wide Web
ISBN: 9780355306309Subjects--Topical Terms:
569006
Computer engineering.
Index Terms--Genre/Form:
554714
Electronic books.
Application of Machine Learning in Improving System Reliability and Performance.
LDR
:04821ntm a2200361Ki 4500
001
910811
005
20180517112611.5
006
m o u
007
cr mn||||a|a||
008
190606s2017 xx obm 000 0 eng d
020
$a
9780355306309
035
$a
(MiAaPQ)AAI10616320
035
$a
(MiAaPQ)purdue:21933
035
$a
AAI10616320
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
099
$a
TUL
$f
hyy
$c
available through World Wide Web
100
1
$a
Thomas, Tara Elizabeth.
$3
1182278
245
1 0
$a
Application of Machine Learning in Improving System Reliability and Performance.
264
0
$c
2017
300
$a
1 online resource (68 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Masters Abstracts International, Volume: 57-01.
500
$a
Adviser: Saurabh Bagchi.
502
$a
Thesis (M.S.E.C.E.)
$c
Purdue University
$d
2017.
504
$a
Includes bibliographical references
520
$a
Improving the reliability and performance are of utmost importance for any system. This thesis presents two machine learning based techniques- one which improves the reliability of parallel programs by detecting silent data corruption, and the other which improves the performance of factory operations by optimizing the scheduling algorithm and detecting bottlenecks.
520
$a
The size and complexity of supercomputing clusters are rapidly increasing to cater to the needs of complex scientific applications. At the same time, the feature size and operating voltage level of the internal components are decreasing. This dual trend makes these machines extremely vulnerable to soft errors or random bit flips. For complex parallel applications, these soft errors can lead to silent data corruption which could lead to large inaccuracies in the final computational results. Hence, it is important to determine the presence and severity of such errors early on, so that proper counter measures can be taken. In this paper, we introduce a tool called Sirius, which can accurately identify silent data corruptions based on the simple insight that there exist spatial and temporal locality within most variables in such programs. Spatial locality means that values of the variable at nodes that are close by in a network sense, are also close numerically. Similarly, temporal locality means that the values change slowly and in a continuous manner with time. Sirius uses neural networks to learn such locality patterns, separately for each critical variable, and produces probabilistic assertions which can be embedded in the code of the parallel program to detect silent data corruptions. We have implemented this technique on parallel benchmark programs - LULESH and CoMD. Our evaluations show that Sirius can detect silent errors in the code with much higher accuracy compared to previously proposed methods. Sirius detected 98% of the silent data corruptions with a false positive rate of less than 0.02 as compared to the false positive rate 0.06 incurred by the state of the art acceleration based prediction (ABP) based technique.
520
$a
As advancements in electronics and computer engineering has led to improved simulation tools and software, there is a high thrust on simulation based optimization of factory operations. This is particularly useful as it lets the person who simulates, observe the effect of the changes on the factory model without impacting actual production. Improper and inefficient scheduling and the presence of system bottlenecks are two major factors that affect the throughput, and thereby, the profits of a factory. We introduce Minerva, a machine learning based technique that can be applied on simulation of factory models to ensure optimal scheduling and to identify bottlenecks. Minerva uses reinforcement learning to provide a schedule that performs significantly better than popular scheduling techniques in the case of a more realistic extension of Job Shop Scheduling Problems. Minerva also uses neural networks to detect bottleneck resources in the system with much higher accuracy than traditional bottleneck identification methods. We evaluated Minerva on two representative benchmarks and found that Minerva performs significantly better than popular scheduling techniques in the case of a more realistic factory model. For a given scheduling algorithm, Minerva is able to detect the system bottleneck with high accuracy of 95.2% which is almost 25% better than the best among the popular bottleneck identification methods.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2018
538
$a
Mode of access: World Wide Web
650
4
$a
Computer engineering.
$3
569006
655
7
$a
Electronic books.
$2
local
$3
554714
690
$a
0464
710
2
$a
ProQuest Information and Learning Co.
$3
1178819
710
2
$a
Purdue University.
$b
Electrical and Computer Engineering.
$3
1148521
773
0
$t
Masters Abstracts International
$g
57-01(E).
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10616320
$z
click for full text (PQDT)
筆 0 讀者評論
多媒體
評論
新增評論
分享你的心得
Export
取書館別
處理中
...
變更密碼[密碼必須為2種組合(英文和數字)及長度為10碼以上]
登入