國立虎尾科技大學 |

High-Performance Systems for Crowdsourced Data Analysis.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	High-Performance Systems for Crowdsourced Data Analysis./
作者:	Haas, Daniel.
面頁冊數:	1 online resource (153 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-05(E), Section: B.
Contained By:	Dissertation Abstracts International79-05B(E).
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9780355576993

High-Performance Systems for Crowdsourced Data Analysis.
Haas, Daniel.

High-Performance Systems for Crowdsourced Data Analysis. - 1 online resource (153 pages)

Source: Dissertation Abstracts International, Volume: 79-05(E), Section: B.

Thesis (Ph.D.)

Includes bibliographical references

In spite of the dramatic recent progress in automated techniques for computer vision and natural language understanding, human effort, often in the form of crowd workers recruited on marketplaces such as Amazon's Mechanical Turk, remains a necessary part of data analysis workflows for machine learning and data cleaning. However, embedding manual steps in automated workflows comes with a performance cost, since humans seldom process data at the speed of computers. In order to rapidly iterate between hypotheses and evidence, data analysts need tools that can provide human processing at close to machine latencies.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780355576993Subjects--Topical Terms:

573171
Computer science.
Index Terms--Genre/Form:

554714
Electronic books.

High-Performance Systems for Crowdsourced Data Analysis.
LDR:03165ntm a2200373Ki 4500 001 909023
005 20180419104824.5
006 m o u
007 cr mn||||a|a||
008 190606s2017 xx obm 000 0 eng d
020 $a 9780355576993
035 $a (MiAaPQ)AAI10617692
035 $a (MiAaPQ)berkeley:17221
035 $a AAI10617692
040 $a MiAaPQ $b eng $c MiAaPQ
099 $a TUL $f hyy $c available through World Wide Web
100 1 $a Haas, Daniel. $3 1179510
245 1 0 $a High-Performance Systems for Crowdsourced Data Analysis.
264 0 $c 2017
300 $a 1 online resource (153 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-05(E), Section: B.
500 $a Adviser: Michael J. Franklin.
502 $a Thesis (Ph.D.) $c University of California, Berkeley $d 2017.
504 $a Includes bibliographical references
520 $a In spite of the dramatic recent progress in automated techniques for computer vision and natural language understanding, human effort, often in the form of crowd workers recruited on marketplaces such as Amazon's Mechanical Turk, remains a necessary part of data analysis workflows for machine learning and data cleaning. However, embedding manual steps in automated workflows comes with a performance cost, since humans seldom process data at the speed of computers. In order to rapidly iterate between hypotheses and evidence, data analysts need tools that can provide human processing at close to machine latencies.
520 $a In this dissertation, I describe the design, theory, and implementation of performant crowd-powered systems. After discussing the performance implications of involving humans in data analysis workflows, I present an example of a data cleaning system that requires low-latency crowd input. Then, I describe CLAMShell, a system that accurately labels large-scale datasets in one to two minutes, and its evaluation on over a thousand workers processing nearly a quarter million tasks. Next, I consider the design of multi-tenant crowd systems running many heterogeneous applications at once. I describe Cioppino, a system designed to improve throughput and reduce cost in this setting, while taking into account worker preferences. Finally, I explore the theory of identifying fast individuals in an unknown population of workers, which can be modeled as an instance of the infinite-armed bandit problem. The analysis results in novel near-optimal algorithms with applications to broader statistical theory. Together, these components provide for the implementation of human computation systems that are cost-efficient, scalable, and fast enough to integrate into existing data analysis workflows without compromising performance.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
650 4 $a Information science. $3 561178
650 4 $a Artificial intelligence. $3 559380
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0723
690 $a 0800
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of California, Berkeley. $b Computer Science. $3 1179511
773 0 $t Dissertation Abstracts International $g 79-05B(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10617692 $z click for full text (PQDT)