國立虎尾科技大學 |

Data Management Solutions for Tackling Big Data Variety.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Data Management Solutions for Tackling Big Data Variety./
作者:	Arora, Vaibhav.
面頁冊數:	1 online resource (211 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-09(E), Section: B.
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9780355876734

Data Management Solutions for Tackling Big Data Variety.
Arora, Vaibhav.

Data Management Solutions for Tackling Big Data Variety. - 1 online resource (211 pages)

Source: Dissertation Abstracts International, Volume: 79-09(E), Section: B.

Thesis (Ph.D.)--University of California, Santa Barbara, 2018.

Includes bibliographical references

Variety is one of the three defining characteristics of Big Data; the others being Volume and Velocity. There are several aspects of this data variety: diversity in data formats (text, video, audio) and structure (relational, graph etc), variety in access methodologies (OLTP, OLAP), and distribution heterogeneity within the workloads (read-heavy, high contention). Data management solutions for modern-day applications need to tackle this variety.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780355876734Subjects--Topical Terms:

573171
Computer science.
Index Terms--Genre/Form:

554714
Electronic books.

Data Management Solutions for Tackling Big Data Variety.
LDR:03879ntm a2200325K 4500 001 915357
005 20180727125213.5
006 m o u
007 cr mn||||a|a||
008 190606s2018 xx obm 000 0 eng d
020 $a 9780355876734
035 $a (MiAaPQ)AAI10752195
035 $a (MiAaPQ)ucsb:13833
035 $a AAI10752195
040 $a MiAaPQ $b eng $c MiAaPQ
100 1 $a Arora, Vaibhav. $3 1188684
245 1 0 $a Data Management Solutions for Tackling Big Data Variety.
264 0 $c 2018
300 $a 1 online resource (211 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-09(E), Section: B.
500 $a Advisers: Divyakant Agrawal; Amr El Abbadi.
502 $a Thesis (Ph.D.)--University of California, Santa Barbara, 2018.
504 $a Includes bibliographical references
520 $a Variety is one of the three defining characteristics of Big Data; the others being Volume and Velocity. There are several aspects of this data variety: diversity in data formats (text, video, audio) and structure (relational, graph etc), variety in access methodologies (OLTP, OLAP), and distribution heterogeneity within the workloads (read-heavy, high contention). Data management solutions for modern-day applications need to tackle this variety.
520 $a This dissertation provides an understanding of the challenges associated with the different elements of variety, and proposes several solutions for efficiently handling its various aspects. First, the dissertation studies the challenges related to variety in data structure and access methodologies, and the resultant heterogeneity at the data infrastructure level. Applications now employ several data-processing engines with different underlying representations, like row, column, graph etc., to process their data. We propose Janus, which introduces a novel data-movement pipeline, which enables the use of different representations to support both high throughput of transactions and diverse analytics, while still ensuring consistent real-time analytics in a scale-out setting. Janus partitions the data at different representations, and allows distributed transactions and diverse partitioning strategies at the representations. Then, we propose Typhon and Cerberus, which define and enforce consistency semantics for application data spread across representations. Second, this dissertation proposes solutions for handling distribution heterogeneity within the workloads. Workloads can have have skewed distribution in terms of operation-type, data access or temporal variation. We propose strongly-consistent quorum reads for Raft-like consensus protocols, which can be utilized to scale read-heavy workloads. For supporting high contention transaction workloads, we integrate an existing dynamic timestamp allocation based concurrency control mechanism in a distributed OLTP setting, and analyze its performance. Third, we study IoT applications, which have to deal with both physical heterogeneity of the sensors, as well as diverse data-processing demands. We propose a multi-representation based architecture catering to IoT applications, and also present the initial design of M-stream, a computation framework for enabling integration and monitoring of uncertain data from multiple sensors. Through analysis, illustrative examples and extensive evaluation of the proposed protocols, this dissertation demonstrates that the proposed solutions can be employed for efficiently handling the different aspects of variety of data-intensive applications.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of California, Santa Barbara. $b Computer Science. $3 1182528
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10752195 $z click for full text (PQDT)