國立虎尾科技大學 |

Big Data Analytics : = Methods and Applications.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Big Data Analytics :/
其他題名:	Methods and Applications.
作者:	Paulson, Erik Steven.
面頁冊數:	1 online resource (119 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9780438016330

Big Data Analytics : = Methods and Applications.
Paulson, Erik Steven.

Big Data Analytics :Methods and Applications. - 1 online resource (119 pages)

Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.

Thesis (Ph.D.)--The University of Wisconsin - Madison, 2018.

Includes bibliographical references

Big Data is now pervasive. This has driven a critical need to develop novel methods to store and process data at large scale, as well as to develop new applications to use and make sense of this data. This dissertation makes two contributions toward addressing this need. First, we study methods for large-scale data analysis. In particular, we compare the popular MapReduce model to parallel relational database management systems, and empirically analyze their strengths and weaknesses. We evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a collection of benchmarks that we have run on an open-source version of MR as well as on two parallel DBMSs. For each benchmark, we measure each system's performance for various degrees of parallelism on a cluster of 100 shared-nothing nodes. Our results reveal some interesting trade-offs. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future systems should take from both kinds of architectures.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780438016330Subjects--Topical Terms:

573171
Computer science.
Index Terms--Genre/Form:

554714
Electronic books.

Big Data Analytics : = Methods and Applications.
LDR:03392ntm a2200337K 4500 001 915378
005 20180727125214.5
006 m o u
007 cr mn||||a|a||
008 190606s2018 xx obm 000 0 eng d
020 $a 9780438016330
035 $a (MiAaPQ)AAI10827019
035 $a (MiAaPQ)wisc:15367
035 $a AAI10827019
040 $a MiAaPQ $b eng $c MiAaPQ
100 1 $a Paulson, Erik Steven. $3 1188710
245 1 0 $a Big Data Analytics : $b Methods and Applications.
264 0 $c 2018
300 $a 1 online resource (119 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.
500 $a Adviser: AnHai Doan.
502 $a Thesis (Ph.D.)--The University of Wisconsin - Madison, 2018.
504 $a Includes bibliographical references
520 $a Big Data is now pervasive. This has driven a critical need to develop novel methods to store and process data at large scale, as well as to develop new applications to use and make sense of this data. This dissertation makes two contributions toward addressing this need. First, we study methods for large-scale data analysis. In particular, we compare the popular MapReduce model to parallel relational database management systems, and empirically analyze their strengths and weaknesses. We evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a collection of benchmarks that we have run on an open-source version of MR as well as on two parallel DBMSs. For each benchmark, we measure each system's performance for various degrees of parallelism on a cluster of 100 shared-nothing nodes. Our results reveal some interesting trade-offs. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future systems should take from both kinds of architectures.
520 $a In the second contribution, we examine how Big Data scaling methods can be used to build a scalable and flexible cloud-based entity matching applications, and what lessons can be learned for future development of similar applications. Entity matching (EM) finds disparate data instances that refer to the same real-world entity. EM has been long studied and is crucial to many fields, and will become even more so in the age of Big Data. However, it is still very difficult for domain scientists to use EM systems, especially at scale. In response, we have developed CloudMatcher, a cloud/crowd service for EM. CloudMatcher aims to be a fast, easy- to-use, scalable, and highly available EM service on the Web. As far as we can tell, no such application has been developed for EM in the data management research community. We describe CloudMatcher's development and deployment, providing a detailed analysis of its performance over several representative datasets and in several scale-up experiments, and discussing lessons learned. Taken together, our contributions in this dissertation advance the topic of Big Data analytics, for both aspects of methods and applications.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
650 4 $a Information technology. $3 559429
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0489
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a The University of Wisconsin - Madison. $b Computer Sciences. $3 1179878
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10827019 $z click for full text (PQDT)