國立虎尾科技大學 |

Improving Search via Named Entity Recognition in Morphologically Rich Languages : = A Case Study in Urdu.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Improving Search via Named Entity Recognition in Morphologically Rich Languages :/
其他題名:	A Case Study in Urdu.
作者:	Riaz, Kashif H.
面頁冊數:	1 online resource (249 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-08(E), Section: B.
Contained By:	Dissertation Abstracts International79-08B(E).
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9780355807479

Improving Search via Named Entity Recognition in Morphologically Rich Languages : = A Case Study in Urdu.
Riaz, Kashif H.

Improving Search via Named Entity Recognition in Morphologically Rich Languages :A Case Study in Urdu. - 1 online resource (249 pages)

Source: Dissertation Abstracts International, Volume: 79-08(E), Section: B.

Thesis (Ph.D.)--University of Minnesota, 2018.

Includes bibliographical references

Search is not a solved problem even in the world of Google and Bing's state of the art engines. Google and similar search engines are keyword based. Keyword-based searching suffers from the vocabulary mismatch problem---the terms in document and user's information request don't overlap. For example, cars and automobiles. This phenomenon is called synonymy. Similarly, the user's term may be polysemous---a user is inquiring about a river's bank, but documents about financial institutions are matched. Vocabulary mismatch exacerbated when the search occurs in Morphological Rich Language (MRL). Concept search techniques like dimensionality reduction do not improve search in Morphological Rich Languages. Names frequently occur news text and determine the "what," "where," "when," and "who" in the news text. Named Entity Recognition attempts to recognize names automatically in text, but these techniques are far from mature in MRL, especially in Arabic Script languages. Urdu is one the focus MRL of this dissertation among Arabic, Farsi, Hindi, and Russian, but it does not have the enabling technologies for NER and search. A corpus, stop word generation algorithm, a light stemmer, a baseline, and NER algorithm is created so the NER-aware search can be accomplished for Urdu. This dissertation demonstrates that NER-aware search on Arabic, Russian, Urdu, and English shows significant improvement over baseline. Furthermore, this dissertation highlights the challenges for researching in low-resource MRL languages.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780355807479Subjects--Topical Terms:

573171
Computer science.
Index Terms--Genre/Form:

554714
Electronic books.

Improving Search via Named Entity Recognition in Morphologically Rich Languages : = A Case Study in Urdu.
LDR:02812ntm a2200349Ki 4500 001 920910
005 20181227095853.5
006 m o u
007 cr mn||||a|a||
008 190606s2018 xx obm 000 0 eng d
020 $a 9780355807479
035 $a (MiAaPQ)AAI10747478
035 $a (MiAaPQ)umn:19008
035 $a AAI10747478
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Riaz, Kashif H. $3 1195847
245 1 0 $a Improving Search via Named Entity Recognition in Morphologically Rich Languages : $b A Case Study in Urdu.
264 0 $c 2018
300 $a 1 online resource (249 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-08(E), Section: B.
500 $a Advisers: Vipin Kumar; Blake Howald.
502 $a Thesis (Ph.D.)--University of Minnesota, 2018.
504 $a Includes bibliographical references
520 $a Search is not a solved problem even in the world of Google and Bing's state of the art engines. Google and similar search engines are keyword based. Keyword-based searching suffers from the vocabulary mismatch problem---the terms in document and user's information request don't overlap. For example, cars and automobiles. This phenomenon is called synonymy. Similarly, the user's term may be polysemous---a user is inquiring about a river's bank, but documents about financial institutions are matched. Vocabulary mismatch exacerbated when the search occurs in Morphological Rich Language (MRL). Concept search techniques like dimensionality reduction do not improve search in Morphological Rich Languages. Names frequently occur news text and determine the "what," "where," "when," and "who" in the news text. Named Entity Recognition attempts to recognize names automatically in text, but these techniques are far from mature in MRL, especially in Arabic Script languages. Urdu is one the focus MRL of this dissertation among Arabic, Farsi, Hindi, and Russian, but it does not have the enabling technologies for NER and search. A corpus, stop word generation algorithm, a light stemmer, a baseline, and NER algorithm is created so the NER-aware search can be accomplished for Urdu. This dissertation demonstrates that NER-aware search on Arabic, Russian, Urdu, and English shows significant improvement over baseline. Furthermore, this dissertation highlights the challenges for researching in low-resource MRL languages.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
650 4 $a Linguistics. $3 557829
650 4 $a Language. $3 571568
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0290
690 $a 0679
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of Minnesota. $b Computer Science. $3 1180176
773 0 $t Dissertation Abstracts International $g 79-08B(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10747478 $z click for full text (PQDT)