Language:
English
繁體中文
Help
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Big Data Analytics : = Methods and A...
~
ProQuest Information and Learning Co.
Big Data Analytics : = Methods and Applications.
Record Type:
Language materials, manuscript : Monograph/item
Title/Author:
Big Data Analytics :/
Reminder of title:
Methods and Applications.
Author:
Paulson, Erik Steven.
Description:
1 online resource (119 pages)
Notes:
Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.
Subject:
Computer science. -
Online resource:
click for full text (PQDT)
ISBN:
9780438016330
Big Data Analytics : = Methods and Applications.
Paulson, Erik Steven.
Big Data Analytics :
Methods and Applications. - 1 online resource (119 pages)
Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.
Thesis (Ph.D.)--The University of Wisconsin - Madison, 2018.
Includes bibliographical references
Big Data is now pervasive. This has driven a critical need to develop novel methods to store and process data at large scale, as well as to develop new applications to use and make sense of this data. This dissertation makes two contributions toward addressing this need. First, we study methods for large-scale data analysis. In particular, we compare the popular MapReduce model to parallel relational database management systems, and empirically analyze their strengths and weaknesses. We evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a collection of benchmarks that we have run on an open-source version of MR as well as on two parallel DBMSs. For each benchmark, we measure each system's performance for various degrees of parallelism on a cluster of 100 shared-nothing nodes. Our results reveal some interesting trade-offs. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future systems should take from both kinds of architectures.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018
Mode of access: World Wide Web
ISBN: 9780438016330Subjects--Topical Terms:
573171
Computer science.
Index Terms--Genre/Form:
554714
Electronic books.
Big Data Analytics : = Methods and Applications.
LDR
:03392ntm a2200337K 4500
001
915378
005
20180727125214.5
006
m o u
007
cr mn||||a|a||
008
190606s2018 xx obm 000 0 eng d
020
$a
9780438016330
035
$a
(MiAaPQ)AAI10827019
035
$a
(MiAaPQ)wisc:15367
035
$a
AAI10827019
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
100
1
$a
Paulson, Erik Steven.
$3
1188710
245
1 0
$a
Big Data Analytics :
$b
Methods and Applications.
264
0
$c
2018
300
$a
1 online resource (119 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.
500
$a
Adviser: AnHai Doan.
502
$a
Thesis (Ph.D.)--The University of Wisconsin - Madison, 2018.
504
$a
Includes bibliographical references
520
$a
Big Data is now pervasive. This has driven a critical need to develop novel methods to store and process data at large scale, as well as to develop new applications to use and make sense of this data. This dissertation makes two contributions toward addressing this need. First, we study methods for large-scale data analysis. In particular, we compare the popular MapReduce model to parallel relational database management systems, and empirically analyze their strengths and weaknesses. We evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a collection of benchmarks that we have run on an open-source version of MR as well as on two parallel DBMSs. For each benchmark, we measure each system's performance for various degrees of parallelism on a cluster of 100 shared-nothing nodes. Our results reveal some interesting trade-offs. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future systems should take from both kinds of architectures.
520
$a
In the second contribution, we examine how Big Data scaling methods can be used to build a scalable and flexible cloud-based entity matching applications, and what lessons can be learned for future development of similar applications. Entity matching (EM) finds disparate data instances that refer to the same real-world entity. EM has been long studied and is crucial to many fields, and will become even more so in the age of Big Data. However, it is still very difficult for domain scientists to use EM systems, especially at scale. In response, we have developed CloudMatcher, a cloud/crowd service for EM. CloudMatcher aims to be a fast, easy- to-use, scalable, and highly available EM service on the Web. As far as we can tell, no such application has been developed for EM in the data management research community. We describe CloudMatcher's development and deployment, providing a detailed analysis of its performance over several representative datasets and in several scale-up experiments, and discussing lessons learned. Taken together, our contributions in this dissertation advance the topic of Big Data analytics, for both aspects of methods and applications.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2018
538
$a
Mode of access: World Wide Web
650
4
$a
Computer science.
$3
573171
650
4
$a
Information technology.
$3
559429
655
7
$a
Electronic books.
$2
local
$3
554714
690
$a
0984
690
$a
0489
710
2
$a
ProQuest Information and Learning Co.
$3
1178819
710
2
$a
The University of Wisconsin - Madison.
$b
Computer Sciences.
$3
1179878
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10827019
$z
click for full text (PQDT)
based on 0 review(s)
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login