國立虎尾科技大學 |

Methods for Extracting Data from the Internet.

Record Type:	Language materials, manuscript : Monograph/item
Title/Author:	Methods for Extracting Data from the Internet./
Author:	Willers, Joel.
Description:	1 online resource (102 pages)
Notes:	Source: Masters Abstracts International, Volume: 57-02.
Contained By:	Masters Abstracts International57-02(E).
Subject:	Sociology. -
Online resource:	click for full text (PQDT)
ISBN:	9780355337754

Methods for Extracting Data from the Internet.
Willers, Joel.

Methods for Extracting Data from the Internet. - 1 online resource (102 pages)

Source: Masters Abstracts International, Volume: 57-02.

Thesis (M.S.)

Includes bibliographical references

The advent of the Internet has yielded exciting new opportunities for the collection of large amounts of structured and unstructured social scientific data. This thesis describes two such methods for harvesting data from websites and web services: web-scraping and connecting to an application programming interface (API). I describe the development and implementation of tools for each of these methods. In my review of the two related, yet distinct data collection methods, I provide concrete examples of each. To illustrate the first method, 'scraping' data from publicly available data repositories (specifically the Google Books Ngram Corpus), I developed a tool and made it available to the public on a web site. The Google Books Ngram Corpus contains groups of words used in millions of books that were digitized and catalogued. The corpus has been made available for public use, but in current form, accessing the data is tedious, time consuming and error prone. For the second method, utilizing an API from a web service (specifically the Twitter Streaming API), I used a code library and the R programming language to develop a program that connects to the Twitter API to collect public posts known as tweets. I review prior studies that have used these data, after which, I report results from a case study involving references to countries. The relative prestige of nations are compared based on the frequency of mentions in English literature and mentions in tweets.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780355337754Subjects--Topical Terms:

551705
Sociology.
Index Terms--Genre/Form:

554714
Electronic books.

Methods for Extracting Data from the Internet.
LDR:02703ntm a2200349Ki 4500 001 909874
005 20180426091049.5
006 m o u
007 cr mn||||a|a||
008 190606s2017 xx obm 000 0 eng d
020 $a 9780355337754
035 $a (MiAaPQ)AAI10605965
035 $a (MiAaPQ)iastate:16725
035 $a AAI10605965
040 $a MiAaPQ $b eng $c MiAaPQ
099 $a TUL $f hyy $c available through World Wide Web
100 1 $a Willers, Joel. $3 1180865
245 1 0 $a Methods for Extracting Data from the Internet.
264 0 $c 2017
300 $a 1 online resource (102 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Masters Abstracts International, Volume: 57-02.
500 $a Adviser: Shawn Dorius.
502 $a Thesis (M.S.) $c Iowa State University $d 2017.
504 $a Includes bibliographical references
520 $a The advent of the Internet has yielded exciting new opportunities for the collection of large amounts of structured and unstructured social scientific data. This thesis describes two such methods for harvesting data from websites and web services: web-scraping and connecting to an application programming interface (API). I describe the development and implementation of tools for each of these methods. In my review of the two related, yet distinct data collection methods, I provide concrete examples of each. To illustrate the first method, 'scraping' data from publicly available data repositories (specifically the Google Books Ngram Corpus), I developed a tool and made it available to the public on a web site. The Google Books Ngram Corpus contains groups of words used in millions of books that were digitized and catalogued. The corpus has been made available for public use, but in current form, accessing the data is tedious, time consuming and error prone. For the second method, utilizing an API from a web service (specifically the Twitter Streaming API), I used a code library and the R programming language to develop a program that connects to the Twitter API to collect public posts known as tweets. I review prior studies that have used these data, after which, I report results from a case study involving references to countries. The relative prestige of nations are compared based on the frequency of mentions in English literature and mentions in tweets.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Sociology. $3 551705
650 4 $a Web studies. $3 1148502
655 7 $a Electronic books. $2 local $3 554714
690 $a 0626
690 $a 0646
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Iowa State University. $b Sociology. $3 1180866
773 0 $t Masters Abstracts International $g 57-02(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10605965 $z click for full text (PQDT)