語系:
繁體中文
English
說明(常見問題)
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Natural Language Processing and Network-Based Methods for Studying Social Media Communities.
紀錄類型:
書目-語言資料,手稿 : Monograph/item
正題名/作者:
Natural Language Processing and Network-Based Methods for Studying Social Media Communities./
作者:
Lai, Angela.
面頁冊數:
1 online resource (120 pages)
附註:
Source: Dissertations Abstracts International, Volume: 85-04, Section: B.
Contained By:
Dissertations Abstracts International85-04B.
標題:
Computer science. -
電子資源:
click for full text (PQDT)
ISBN:
9798380619745
Natural Language Processing and Network-Based Methods for Studying Social Media Communities.
Lai, Angela.
Natural Language Processing and Network-Based Methods for Studying Social Media Communities.
- 1 online resource (120 pages)
Source: Dissertations Abstracts International, Volume: 85-04, Section: B.
Thesis (Ph.D.)--New York University, 2023.
Includes bibliographical references
Social media has democratized access to information and facilitated community building and discussions with people from all across the world. As such online communities and discussions gained prominence for their influence on politics and current events, so too grew the need to understand user and information dynamics in those spheres. Studying these at scale using large volumes of social media data requires quantitative methods. In this dissertation, we develop and apply natural language processing and network-based methods to examine questions about echo chambers, bot activity, and conspiracy information diets using social media data. Though each chapter focuses on a different substantive area, they share the aim of understanding information and community dynamics on social media at a large scale.Chapter 1 presents a fast and scalable method for estimating the ideology of political YouTube videos. The subfield of estimating ideology as a latent variable has often focused on traditional actors such as legislators while more recent work has used social media data to estimate the ideology of ordinary users, political elites, and media sources. We build on this work to estimate the ideology of a political YouTube video. First, we start with a matrix of political Reddit posts linking to YouTube videos and apply correspondence analysis to place those videos in an ideological space. Second, we train a language model with those estimated ideologies as training labels, enabling us to estimate the ideologies of videos not posted on Reddit. These predicted ideologies are then validated against human labels. We demonstrate the utility of this method by applying it to the watch histories of survey respondents to evaluate the prevalence of echo chambers on YouTube in addition to the association between video ideology and viewer engagement. Our approach gives video-level scores based only on supplied text metadata, is scalable, and can be easily adjusted to account for changes in the ideological landscape. This method enabled our analysis of the YouTube recommendation algorithm's contribution to rabbitholes, echo chambers, and system-wide ideological bias in M. A. Brown et al., 2022. This work was co-authored with Megan Brown and James Bisbee.In Chapter 2, we develop context-specific classifiers to study bot behavior on Twitter during the 2016 U.S. presidential election. We specifically examine the hypothesis that bots disproportionately favored certain candidates. Using a context-specific classifier trained on human-labeled data, we categorize over six million Twitter accounts as bots or non-bots. We also train a language model to label monthly account stance toward relevant presidential candidates: Hillary Clinton, Bernie Sanders, and Donald Trump. The predicted labels allow us to compare bots and non-bots based on their expressed support of Clinton vs. Trump. We find that bots tend to tweet neutrally about candidates at higher rates than non-bots and that a higher proportion of bots mentioning Trump do so in a positive manner when compared to non-bots. Further, the ratio of Clinton partisan to Trump partisan accounts is far higher among non-bots relative to bots so that presence of bots might have attenuated the presence of Clinton partisan accounts on the overall platform. We also examine the "strategies" employed by bot and non-bot accounts by looking at whether they tweet purely positively or negatively about a candidate. Though higher proportions of bot accounts tend to employ the simple strategy of either cheerleading or campaigning against a candidate, partisanship appears to have the largest influence on an account's apparent strategies, as bots and non-bots on the same side are more similar overall when compared to bots or non-bots supporting different candidates. Overall, despite our conservative approach to bot detection, we find meaningful differences in attitudes toward candidates among bots and non-bots during the 2016 election. This work was co-authored with Denis Stukal and Zhanna Terechshenko.Chapter 3 compares the types of information posted in conspiracy and mainstream subreddits or Reddit communities centered around shared interests and beliefs, to better understand the contributors to the diverging narratives in each. We use a network-based approach to expand existing sets of labeled conspiracy and mainstream subreddits to enable a comprehensive comparison of conspiracy and mainstream information diets and preferences. For a more valid content comparison, we align posts at the topic level and obtain topic labels from a transformers-based topic model. We find that, relative to mainstream posts, conspiracy posts have higher levels of negative sentiment and a higher proportion of links to other social media domains. Domains shared in conspiracy posts tend to have lower quality ratings, and the mean political ideology of domains and YouTube videos shared in conspiracy posts is more conservative. This suggests that conspiracy and mainstream communities have different information preferences. Additionally, when we compare how these communities respond to posts linking to domains of similar quality, we find that for certain topics, higher quality domains are more negatively received in conspiracy posts.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024
Mode of access: World Wide Web
ISBN: 9798380619745Subjects--Topical Terms:
573171
Computer science.
Subjects--Index Terms:
Ideology estimationIndex Terms--Genre/Form:
554714
Electronic books.
Natural Language Processing and Network-Based Methods for Studying Social Media Communities.
LDR
:06722ntm a22003977 4500
001
1147445
005
20240918072812.5
006
m o d
007
cr bn ---uuuuu
008
250605s2023 xx obm 000 0 eng d
020
$a
9798380619745
035
$a
(MiAaPQ)AAI30632875
035
$a
AAI30632875
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
$d
NTU
100
1
$a
Lai, Angela.
$3
1473182
245
1 0
$a
Natural Language Processing and Network-Based Methods for Studying Social Media Communities.
264
0
$c
2023
300
$a
1 online resource (120 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Dissertations Abstracts International, Volume: 85-04, Section: B.
500
$a
Advisor: Bonneau, Richard;Tucker, Joshua A.
502
$a
Thesis (Ph.D.)--New York University, 2023.
504
$a
Includes bibliographical references
520
$a
Social media has democratized access to information and facilitated community building and discussions with people from all across the world. As such online communities and discussions gained prominence for their influence on politics and current events, so too grew the need to understand user and information dynamics in those spheres. Studying these at scale using large volumes of social media data requires quantitative methods. In this dissertation, we develop and apply natural language processing and network-based methods to examine questions about echo chambers, bot activity, and conspiracy information diets using social media data. Though each chapter focuses on a different substantive area, they share the aim of understanding information and community dynamics on social media at a large scale.Chapter 1 presents a fast and scalable method for estimating the ideology of political YouTube videos. The subfield of estimating ideology as a latent variable has often focused on traditional actors such as legislators while more recent work has used social media data to estimate the ideology of ordinary users, political elites, and media sources. We build on this work to estimate the ideology of a political YouTube video. First, we start with a matrix of political Reddit posts linking to YouTube videos and apply correspondence analysis to place those videos in an ideological space. Second, we train a language model with those estimated ideologies as training labels, enabling us to estimate the ideologies of videos not posted on Reddit. These predicted ideologies are then validated against human labels. We demonstrate the utility of this method by applying it to the watch histories of survey respondents to evaluate the prevalence of echo chambers on YouTube in addition to the association between video ideology and viewer engagement. Our approach gives video-level scores based only on supplied text metadata, is scalable, and can be easily adjusted to account for changes in the ideological landscape. This method enabled our analysis of the YouTube recommendation algorithm's contribution to rabbitholes, echo chambers, and system-wide ideological bias in M. A. Brown et al., 2022. This work was co-authored with Megan Brown and James Bisbee.In Chapter 2, we develop context-specific classifiers to study bot behavior on Twitter during the 2016 U.S. presidential election. We specifically examine the hypothesis that bots disproportionately favored certain candidates. Using a context-specific classifier trained on human-labeled data, we categorize over six million Twitter accounts as bots or non-bots. We also train a language model to label monthly account stance toward relevant presidential candidates: Hillary Clinton, Bernie Sanders, and Donald Trump. The predicted labels allow us to compare bots and non-bots based on their expressed support of Clinton vs. Trump. We find that bots tend to tweet neutrally about candidates at higher rates than non-bots and that a higher proportion of bots mentioning Trump do so in a positive manner when compared to non-bots. Further, the ratio of Clinton partisan to Trump partisan accounts is far higher among non-bots relative to bots so that presence of bots might have attenuated the presence of Clinton partisan accounts on the overall platform. We also examine the "strategies" employed by bot and non-bot accounts by looking at whether they tweet purely positively or negatively about a candidate. Though higher proportions of bot accounts tend to employ the simple strategy of either cheerleading or campaigning against a candidate, partisanship appears to have the largest influence on an account's apparent strategies, as bots and non-bots on the same side are more similar overall when compared to bots or non-bots supporting different candidates. Overall, despite our conservative approach to bot detection, we find meaningful differences in attitudes toward candidates among bots and non-bots during the 2016 election. This work was co-authored with Denis Stukal and Zhanna Terechshenko.Chapter 3 compares the types of information posted in conspiracy and mainstream subreddits or Reddit communities centered around shared interests and beliefs, to better understand the contributors to the diverging narratives in each. We use a network-based approach to expand existing sets of labeled conspiracy and mainstream subreddits to enable a comprehensive comparison of conspiracy and mainstream information diets and preferences. For a more valid content comparison, we align posts at the topic level and obtain topic labels from a transformers-based topic model. We find that, relative to mainstream posts, conspiracy posts have higher levels of negative sentiment and a higher proportion of links to other social media domains. Domains shared in conspiracy posts tend to have lower quality ratings, and the mean political ideology of domains and YouTube videos shared in conspiracy posts is more conservative. This suggests that conspiracy and mainstream communities have different information preferences. Additionally, when we compare how these communities respond to posts linking to domains of similar quality, we find that for certain topics, higher quality domains are more negatively received in conspiracy posts.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2024
538
$a
Mode of access: World Wide Web
650
4
$a
Computer science.
$3
573171
650
4
$a
Political science.
$3
558774
650
4
$a
Web studies.
$3
1148502
653
$a
Ideology estimation
653
$a
Natural language processing
653
$a
Social media
653
$a
Social networks
653
$a
Reddit communities
655
7
$a
Electronic books.
$2
local
$3
554714
690
$a
0984
690
$a
0615
690
$a
0646
710
2
$a
ProQuest Information and Learning Co.
$3
1178819
710
2
$a
New York University.
$b
Center for Data Science.
$3
1468521
773
0
$t
Dissertations Abstracts International
$g
85-04B.
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30632875
$z
click for full text (PQDT)
筆 0 讀者評論
多媒體
評論
新增評論
分享你的心得
Export
取書館別
處理中
...
變更密碼[密碼必須為2種組合(英文和數字)及長度為10碼以上]
登入
第一次登入時,112年前入學、到職者,密碼請使用身分證號登入;112年後入學、到職者,密碼請使用身分證號"後六碼"登入,請注意帳號密碼有區分大小寫!
帳號(學號)
密碼
請在此電腦上記得個人資料
取消
忘記密碼? (請注意!您必須已在系統登記E-mail信箱方能使用。)