國立虎尾科技大學 |

The complexity of massive data set computations.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	The complexity of massive data set computations./
作者:	Bar-Yossef, Ziv.
面頁冊數:	1 online resource (214 pages)
附註:	Source: Dissertations Abstracts International, Volume: 67-05, Section: B.
Contained By:	Dissertations Abstracts International67-05B.
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9780542251726

The complexity of massive data set computations.
Bar-Yossef, Ziv.

The complexity of massive data set computations. - 1 online resource (214 pages)

Source: Dissertations Abstracts International, Volume: 67-05, Section: B.

Thesis (Ph.D.)--University of California, Berkeley, 2002.

Includes bibliographical references

Numerous massive data sets, ranging from flows of Internet traffic to logs of supermarket transactions, have emerged during the past few years. Their overwhelming size and the typically restricted access to them call for new computational models. This thesis studies three such models: sampling computations, data stream computations, and sketch computations. While most of the previous work focused on designing algorithms in the new models, this thesis revolves around the limitations of the models. We develop a suite of lower bound techniques that characterize the complexity of functions in these models, indicating which problems can be solved efficiently in them. We derive specific bounds for a multitude of practical problems, arising from applications in database, networking, and information retrieval, such as frequency statistics, selection functions, statistical moments, and distance estimation. We present general, powerful, and easy to use lower bound techniques for the sampling model. The techniques apply to all functions and address both oblivious and adaptive sampling. They frequently produce optimal bounds for a wide range of functions. They are stated in terms of new combinatorial and statistical properties of functions, which are easy to calculate. We obtain lower bounds for the data stream and sketch models through one-way and simultaneous communication complexity. We develop lower bounds for the latter via a new information-theoretic view of communication complexity. A highlight of this work is an optimal simultaneous communication complexity lower bound for the important multi-party set-disjointness problem. Finally, we present a powerful method for proving lower bounds for general communication complexity. The method is based on a direct sum property of a new measure of complexity for communication complexity protocols and on a novel statistical view of communication complexity. We use the technique to obtain improved communication complexity and data stream lower bounds for several problems, including multi-party set-disjointness, frequency moments, and Lp distance estimation. These results solve open problems of Alon, Matias, and Szegedy and of Saks and Sun.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9780542251726Subjects--Topical Terms:

573171
Computer science.
Subjects--Index Terms:

Computational complexityIndex Terms--Genre/Form:

554714
Electronic books.

The complexity of massive data set computations.
LDR:03536ntm a22003617 4500 001 1147126
005 20240909062221.5
006 m o d
007 cr bn ---uuuuu
008 250605s2002 xx obm 000 0 eng d
020 $a 9780542251726
035 $a (MiAaPQ)AAI3183783
035 $a AAI3183783
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Bar-Yossef, Ziv. $3 1472778
245 1 4 $a The complexity of massive data set computations.
264 0 $c 2002
300 $a 1 online resource (214 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 67-05, Section: B.
500 $a Publisher info.: Dissertation/Thesis.
500 $a Advisor: Papadimitriou, Christos H.
502 $a Thesis (Ph.D.)--University of California, Berkeley, 2002.
504 $a Includes bibliographical references
520 $a Numerous massive data sets, ranging from flows of Internet traffic to logs of supermarket transactions, have emerged during the past few years. Their overwhelming size and the typically restricted access to them call for new computational models. This thesis studies three such models: sampling computations, data stream computations, and sketch computations. While most of the previous work focused on designing algorithms in the new models, this thesis revolves around the limitations of the models. We develop a suite of lower bound techniques that characterize the complexity of functions in these models, indicating which problems can be solved efficiently in them. We derive specific bounds for a multitude of practical problems, arising from applications in database, networking, and information retrieval, such as frequency statistics, selection functions, statistical moments, and distance estimation. We present general, powerful, and easy to use lower bound techniques for the sampling model. The techniques apply to all functions and address both oblivious and adaptive sampling. They frequently produce optimal bounds for a wide range of functions. They are stated in terms of new combinatorial and statistical properties of functions, which are easy to calculate. We obtain lower bounds for the data stream and sketch models through one-way and simultaneous communication complexity. We develop lower bounds for the latter via a new information-theoretic view of communication complexity. A highlight of this work is an optimal simultaneous communication complexity lower bound for the important multi-party set-disjointness problem. Finally, we present a powerful method for proving lower bounds for general communication complexity. The method is based on a direct sum property of a new measure of complexity for communication complexity protocols and on a novel statistical view of communication complexity. We use the technique to obtain improved communication complexity and data stream lower bounds for several problems, including multi-party set-disjointness, frequency moments, and Lp distance estimation. These results solve open problems of Alon, Matias, and Szegedy and of Saks and Sun.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
653 $a Computational complexity
653 $a Massive data set
653 $a Theory of computation
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of California, Berkeley. $3 1183587
773 0 $t Dissertations Abstracts International $g 67-05B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3183783 $z click for full text (PQDT)