語系:
繁體中文
English
說明(常見問題)
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Fault-Tolerance Techniques for High-...
~
Robert, Yves.
Fault-Tolerance Techniques for High-Performance Computing
紀錄類型:
書目-語言資料,印刷品 : Monograph/item
正題名/作者:
Fault-Tolerance Techniques for High-Performance Computing/ edited by Thomas Herault, Yves Robert.
其他作者:
Herault, Thomas.
面頁冊數:
IX, 320 p. 113 illus.online resource. :
Contained By:
Springer Nature eBook
標題:
Computer system failures. -
電子資源:
https://doi.org/10.1007/978-3-319-20943-2
ISBN:
9783319209432
Fault-Tolerance Techniques for High-Performance Computing
Fault-Tolerance Techniques for High-Performance Computing
[electronic resource] /edited by Thomas Herault, Yves Robert. - 1st ed. 2015. - IX, 320 p. 113 illus.online resource. - Computer Communications and Networks,1617-7975. - Computer Communications and Networks,.
Part I: General Overview -- Fault-Tolerance Techniques for High-Performance Computing -- Part II: Technical Contributions -- Errors and Faults -- Fault-Tolerant MPI -- Using Replication for Resilience on Exascale Systems -- Energy-Aware Check pointing Strategies.
This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Topics and features: Includes self-contained contributions from an international selection of preeminent experts Provides a survey of resilience methods and performance models Examines the various sources for errors and faults in large-scale systems, detailing their characteristics, with a focus on modeling, detection and prediction Reviews the spectrum of techniques that can be applied to design a fault-tolerant message passing interface Investigates different approaches to replication, comparing these to the traditional checkpoint-recovery approach Discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems, proposing a methodology to estimate such energy consumption This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing. Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Supérieure de Lyon, France, and a Visiting Research Scholar in the ICL.
ISBN: 9783319209432
Standard No.: 10.1007/978-3-319-20943-2doiSubjects--Topical Terms:
782237
Computer system failures.
LC Class. No.: QA76.9.E94
Dewey Class. No.: 004.24
Fault-Tolerance Techniques for High-Performance Computing
LDR
:03488nam a22004095i 4500
001
970489
003
DE-He213
005
20200705154427.0
007
cr nn 008mamaa
008
201211s2015 gw | s |||| 0|eng d
020
$a
9783319209432
$9
978-3-319-20943-2
024
7
$a
10.1007/978-3-319-20943-2
$2
doi
035
$a
978-3-319-20943-2
050
4
$a
QA76.9.E94
072
7
$a
UYD
$2
bicssc
072
7
$a
COM074000
$2
bisacsh
072
7
$a
UYD
$2
thema
082
0 4
$a
004.24
$2
23
245
1 0
$a
Fault-Tolerance Techniques for High-Performance Computing
$h
[electronic resource] /
$c
edited by Thomas Herault, Yves Robert.
250
$a
1st ed. 2015.
264
1
$a
Cham :
$b
Springer International Publishing :
$b
Imprint: Springer,
$c
2015.
300
$a
IX, 320 p. 113 illus.
$b
online resource.
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
347
$a
text file
$b
PDF
$2
rda
490
1
$a
Computer Communications and Networks,
$x
1617-7975
505
0
$a
Part I: General Overview -- Fault-Tolerance Techniques for High-Performance Computing -- Part II: Technical Contributions -- Errors and Faults -- Fault-Tolerant MPI -- Using Replication for Resilience on Exascale Systems -- Energy-Aware Check pointing Strategies.
520
$a
This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Topics and features: Includes self-contained contributions from an international selection of preeminent experts Provides a survey of resilience methods and performance models Examines the various sources for errors and faults in large-scale systems, detailing their characteristics, with a focus on modeling, detection and prediction Reviews the spectrum of techniques that can be applied to design a fault-tolerant message passing interface Investigates different approaches to replication, comparing these to the traditional checkpoint-recovery approach Discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems, proposing a methodology to estimate such energy consumption This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing. Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Supérieure de Lyon, France, and a Visiting Research Scholar in the ICL.
650
0
$a
Computer system failures.
$3
782237
650
0
$a
Computer software—Reusability.
$3
1254984
650
0
$a
Numerical analysis.
$3
527939
650
1 4
$a
System Performance and Evaluation.
$3
669346
650
2 4
$a
Performance and Reliability.
$3
669802
650
2 4
$a
Numeric Computing.
$3
669943
700
1
$a
Herault, Thomas.
$4
edt
$4
http://id.loc.gov/vocabulary/relators/edt
$3
1067022
700
1
$a
Robert, Yves.
$4
edt
$4
http://id.loc.gov/vocabulary/relators/edt
$3
1067023
710
2
$a
SpringerLink (Online service)
$3
593884
773
0
$t
Springer Nature eBook
776
0 8
$i
Printed edition:
$z
9783319209449
776
0 8
$i
Printed edition:
$z
9783319209425
776
0 8
$i
Printed edition:
$z
9783319355603
830
0
$a
Computer Communications and Networks,
$x
1617-7975
$3
1255420
856
4 0
$u
https://doi.org/10.1007/978-3-319-20943-2
912
$a
ZDB-2-SCS
912
$a
ZDB-2-SXCS
950
$a
Computer Science (SpringerNature-11645)
950
$a
Computer Science (R0) (SpringerNature-43710)
筆 0 讀者評論
多媒體
評論
新增評論
分享你的心得
Export
取書館別
處理中
...
變更密碼[密碼必須為2種組合(英文和數字)及長度為10碼以上]
登入