國立虎尾科技大學 |

Optimizing Access to Scientific Data for Storage, Analysis and Visualization.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Optimizing Access to Scientific Data for Storage, Analysis and Visualization./
作者:	Ionkov, Latchesar.
面頁冊數:	1 online resource (170 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-09(E), Section: B.
Contained By:	Dissertation Abstracts International79-09B(E).
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9780355864984

Optimizing Access to Scientific Data for Storage, Analysis and Visualization.
Ionkov, Latchesar.

Optimizing Access to Scientific Data for Storage, Analysis and Visualization. - 1 online resource (170 pages)

Source: Dissertation Abstracts International, Volume: 79-09(E), Section: B.

Thesis (Ph.D.)--University of California, Santa Cruz, 2018.

Includes bibliographical references

Scientific workflows contain an increasing number of interacting applications, often with big disparity between the formats of data being produced and consumed by different applications. This mismatch can result in performance degradation as data retrieval causes multiple read operations (often to a remote storage system) in order to convert the data. In recent years, with the large increase in the amount of data and computational power available there is demand for applications to support data access in-situ, or close-to simulation to provide application steering, analytics and visualization.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780355864984Subjects--Topical Terms:

573171
Computer science.
Index Terms--Genre/Form:

554714
Electronic books.

Optimizing Access to Scientific Data for Storage, Analysis and Visualization.
LDR:05902ntm a2200409Ki 4500 001 916644
005 20180927111920.5
006 m o u
007 cr mn||||a|a||
008 190606s2018 xx obm 000 0 eng d
020 $a 9780355864984
035 $a (MiAaPQ)AAI10748843
035 $a (MiAaPQ)ucsc:11451
035 $a AAI10748843
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Ionkov, Latchesar. $3 1190442
245 1 0 $a Optimizing Access to Scientific Data for Storage, Analysis and Visualization.
264 0 $c 2018
300 $a 1 online resource (170 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-09(E), Section: B.
500 $a Adviser: Carlos Maltzahn.
502 $a Thesis (Ph.D.)--University of California, Santa Cruz, 2018.
504 $a Includes bibliographical references
520 $a Scientific workflows contain an increasing number of interacting applications, often with big disparity between the formats of data being produced and consumed by different applications. This mismatch can result in performance degradation as data retrieval causes multiple read operations (often to a remote storage system) in order to convert the data. In recent years, with the large increase in the amount of data and computational power available there is demand for applications to support data access in-situ, or close-to simulation to provide application steering, analytics and visualization.
520 $a Although some parallel filesystems and middleware libraries attempt to identify access patterns and optimize data retrieval, they frequently fail if the patterns are complex. It is evident that more knowledge of the structure of the datasets at the storage systems level will provide many opportunities for further performance improvements.
520 $a For most developers of scientific applications, storing the application data, and its particular format on disk, is not an essential part of the application. Although they acknowledge the importance of the I/O performance, their expertise lies mostly in numerical simulations and the particular models their application simulates. Most of their efforts are spent of ensuring that the it produces correct numerical results. Ideally, they would like to be able to have a library call that reads a subset of the data from storage (no matter what its format is), and place it in the data structures the simulation defines in the computer memory. Since the data needs to be analyzed and visualized, and the data has to be accessible from third-party tools, the scientists are forced to know more about the data formats.
520 $a In this dissertation we investigate multiple techniques for utilizing dataset description for improving performance and overall data availability for HPC applications. We introduce a declarative data description language that can be used to define the complete dataset as well as parts of it. These descriptions are used to generate transformation rules that allow data to be converted between different physical layouts on storage and in memory.
520 $a First, we define the DRepl dataset description language and use it to implement divergent data views and replicas as POSIX files. We evaluate the performance for this approach and demonstrate its advantages both because of the transparent application use, and combined performance when the application is combined with analytics and/or visualization code that reads the data in different format. DRepl decouples the data producers and consumers and the data layouts they use from the way the data is stored on the storage system. DRepl has shown up to 2x for cumulative performance when data is accessed using optimized replicas.
520 $a Second, we extend the previous approach to the parallel environment used in HPC. Instead of using POSIX files, the new method allows data to be accessed in larger chunks (fragments) in the way it will be laid out in memory. The developers can define what data structures they have in the process' memory and the overall format of the dataset on storage, and the runtime will automatically take care of transforming the data between the two. Both the formats in memory and on disk are described with the DRepl language. Replacing the ability for reading the data as an array of bytes with operations that use descriptions of the data structure, provides better opportunities for the storage system to optimize the access to the persistent data. The integration of this technique in Ceph demonstrates the potential advantages for this approach. The experiments show performance improvements up to 5 times for writes and 10 times for reads, compared to collective MPI I/O.
520 $a Third, we explore the future directions of extending the DRepl language to support more complex datasets. The additions would allow scientists to use different resolutions for different parts of a multi-dimensional spaces, and define how to transform the data between resolutions. The changes would also allow completely abstract definitions of datasets not only for continuums, but also for primitive types like real and integer numbers. The fragments of the dataset that are present in memory or disk will have concrete types that are compatible with the abstract types used in the dataset.
520 $a Finally, we provide foundations on how to extend the previous functionality to the most complicated data structures used in scientific applications -- unstructured meshes.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of California, Santa Cruz. $b Computer Science. $3 1184383
773 0 $t Dissertation Abstracts International $g 79-09B(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10748843 $z click for full text (PQDT)