國立虎尾科技大學 |

Towards Population of Knowledge Bases From Conversational Sources.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Towards Population of Knowledge Bases From Conversational Sources./
作者:	Gao, Ning.
面頁冊數:	1 online resource (181 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-12(E), Section: A.
Contained By:	Dissertation Abstracts International79-12A(E).
標題:	Information science. -
電子資源:	click for full text (PQDT)
ISBN:	9780438144545

Towards Population of Knowledge Bases From Conversational Sources.
Gao, Ning.

Towards Population of Knowledge Bases From Conversational Sources. - 1 online resource (181 pages)

Source: Dissertation Abstracts International, Volume: 79-12(E), Section: A.

Thesis (Ph.D.)--University of Maryland, College Park, 2018.

Includes bibliographical references

With an increasing amount of data created daily, it is challenging for users to organize and discover information from massive collections of digital content (e.g., text and speech). The population of knowledge bases requires linking information from unstructured sources (e.g., news articles and web pages) to structured external knowledge bases (e.g., Wikipedia), which has the potential to advance information archiving and access, and to support knowledge discovery and reasoning. Because of the complexity of this task, knowledge base population is composed of multiple sub-tasks, including the entity linking task, defined as linking the mention of entities (e.g., persons, organizations, and locations) found in documents to their referents in external knowledge bases and the event task, defined as extracting related information for events that should be entered in the knowledge base.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780438144545Subjects--Topical Terms:

561178
Information science.
Index Terms--Genre/Form:

554714
Electronic books.

Towards Population of Knowledge Bases From Conversational Sources.
LDR:04744ntm a2200349Ki 4500 001 919160
005 20181116131021.5
006 m o u
007 cr mn||||a|a||
008 190606s2018 xx obm 000 0 eng d
020 $a 9780438144545
035 $a (MiAaPQ)AAI10784172
035 $a (MiAaPQ)umd:18823
035 $a AAI10784172
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Gao, Ning. $3 1193669
245 1 0 $a Towards Population of Knowledge Bases From Conversational Sources.
264 0 $c 2018
300 $a 1 online resource (181 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-12(E), Section: A.
500 $a Adviser: Douglas W. Oard.
502 $a Thesis (Ph.D.)--University of Maryland, College Park, 2018.
504 $a Includes bibliographical references
520 $a With an increasing amount of data created daily, it is challenging for users to organize and discover information from massive collections of digital content (e.g., text and speech). The population of knowledge bases requires linking information from unstructured sources (e.g., news articles and web pages) to structured external knowledge bases (e.g., Wikipedia), which has the potential to advance information archiving and access, and to support knowledge discovery and reasoning. Because of the complexity of this task, knowledge base population is composed of multiple sub-tasks, including the entity linking task, defined as linking the mention of entities (e.g., persons, organizations, and locations) found in documents to their referents in external knowledge bases and the event task, defined as extracting related information for events that should be entered in the knowledge base.
520 $a Most prior work on tasks related to knowledge base population has focused on dissemination-oriented sources written in the third person (e.g., new articles) that benefit from two characteristics: the content is written in formal language and is to some degree self-contextualized, and the entities mentioned (e.g., persons) are likely to be widely known to the public so that rich information can be found from existing general knowledge bases (e.g., Wikipedia and DBpedia). The work proposed in this thesis focuses on tasks related to knowledge base population for conversational sources written in the first person (e.g., emails and phone recordings), which offers new challenges. One challenge is that most conversations (e.g., 68% of the person names and 53% of the organization names in Enron emails) refer to entities that are known to the conversational participants but not widely known. Thus, existing entity linking techniques relying on general knowledge bases are not appropriate. Another challenge is that some of the shared context between participants in first-person conversations may be implicit and thus challenging to model, increasing the difficulty, even for human annotators, of identifying the true referents.
520 $a This thesis focuses on several tasks relating to the population of knowledge bases for conversational content: the population of collection-specific knowledge bases for organization entities and meetings from email collections; the entity linking task that resolves the mention of three types of entities (person, organization, and location) found in both conversational text (emails) and speech (phone recordings) sources to multiple knowledge bases, including a general knowledge base built from Wikipedia and collection-specific knowledge bases; the meeting linking task that links meeting-related email messages to the referenced meeting entries in the collection-specific meeting knowledge base; and speaker identification techniques to improve the entity linking task for phone recordings without known speakers. Following the model-based evaluation paradigm, three collections (namely, Enron emails, Avocado emails, and Enron phone recordings) are used as the representations of conversational sources, new test collections are created for each task, and experiments are conducted for each task to evaluate the efficacy of the proposed methods and to provide a comparison to existing state-of-the-art systems. This work has implications in the research fields of e-discovery, scientific collaboration, speaker identification, speech retrieval, and privacy protection.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Information science. $3 561178
655 7 $a Electronic books. $2 local $3 554714
690 $a 0723
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of Maryland, College Park. $b Library & Information Services. $3 1185305
773 0 $t Dissertation Abstracts International $g 79-12A(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10784172 $z click for full text (PQDT)