國立虎尾科技大學 |

Deep Web Data Analytics.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Deep Web Data Analytics./
作者:	Lu, Yachao.
面頁冊數:	1 online resource (75 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B.
Contained By:	Dissertation Abstracts International79-12B(E).
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9780438163379

Deep Web Data Analytics.
Lu, Yachao.

Deep Web Data Analytics. - 1 online resource (75 pages)

Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B.

Thesis (Ph.D.)--The George Washington University, 2018.

Includes bibliographical references

A large portion of data available on the web is present in the so called ''Deep Web''. The deep web consists of private or hidden databases that lie behind form-like query interfaces that allow users to browse these databases in a controlled manner. While hidden database interfaces are normally designed to allow users to execute search queries, for certain applications it is also useful to perform data analytics over such databases. Data analytics challenges toward online social network is one of most popular topic on this area.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780438163379Subjects--Topical Terms:

573171
Computer science.
Index Terms--Genre/Form:

554714
Electronic books.

Deep Web Data Analytics.
LDR:03709ntm a2200373Ki 4500 001 917160
005 20181005115849.5
006 m o u
007 cr mn||||a|a||
008 190606s2018 xx obm 000 0 eng d
020 $a 9780438163379
035 $a (MiAaPQ)AAI10825995
035 $a (MiAaPQ)gwu:14261
035 $a AAI10825995
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Lu, Yachao. $3 1191115
245 1 0 $a Deep Web Data Analytics.
264 0 $c 2018
300 $a 1 online resource (75 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B.
500 $a Includes supplementary digital materials.
500 $a Adviser: Nan Zhang.
502 $a Thesis (Ph.D.)--The George Washington University, 2018.
504 $a Includes bibliographical references
520 $a A large portion of data available on the web is present in the so called ''Deep Web''. The deep web consists of private or hidden databases that lie behind form-like query interfaces that allow users to browse these databases in a controlled manner. While hidden database interfaces are normally designed to allow users to execute search queries, for certain applications it is also useful to perform data analytics over such databases. Data analytics challenges toward online social network is one of most popular topic on this area.
520 $a In the first party of my research. We targeted challenges by data analytics techniques that can be performed only using the public interfaces of the databases while respecting the data access limitations (e.g., query rate limits) imposed by the data owners on a general view. We developed System HYDRA (Hidden Database Research and Analytics) which enables fast sampling and data analytics over a hidden web database that provides nothing but a form-like web search interface as its only access channel. Broadly, it consists of three major components: (1) SAMPLE-GEN which produces samples according to a given sampling distribution (2) SAMPLE-EVAL that evaluates samples produced by SAMPLE-GEN and also generates estimations for a given aggregate query and (3) TIMBR that enables fast and easy construction of a wrapper that models both input and output interface of the web database thereby translating supported search queries to HTTP requests and retrieving top-k query answers from HTTP responses.
520 $a As another part of my research, we will target challenges on the existing Markov Chain Monte Carlo methods such as random walks based sampling algorithms on websites that having graph browsing interfaces(e.g., online social networks). The problem with such an approach, however, is the large amount of queries often required (i.e., a long ''burn-in time'') for a random walk to reach a desired (stationary) sampling distribution. To reduce the ''burn-in'' time, we introduce the idea of a ''Cross-Community Random Walk'' algorithm leverage the community affiliation information. By increasing the weight of the edge that across different community, our random walk can go through all different community. We demonstrated the superiority of ''Cross-Community Random Walk'' over traditional simple random walks through theoretical analysis and extensive experiments over real world online social networks.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
650 4 $a Information science. $3 561178
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0723
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a The George Washington University. $b Computer Science. $3 1148676
773 0 $t Dissertation Abstracts International $g 79-12B(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10825995 $z click for full text (PQDT)