語系:
繁體中文
English
說明(常見問題)
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Improving Search via Named Entity Re...
~
University of Minnesota.
Improving Search via Named Entity Recognition in Morphologically Rich Languages : = A Case Study in Urdu.
紀錄類型:
書目-語言資料,手稿 : Monograph/item
正題名/作者:
Improving Search via Named Entity Recognition in Morphologically Rich Languages :/
其他題名:
A Case Study in Urdu.
作者:
Riaz, Kashif H.
面頁冊數:
1 online resource (249 pages)
附註:
Source: Dissertation Abstracts International, Volume: 79-08(E), Section: B.
Contained By:
Dissertation Abstracts International79-08B(E).
標題:
Computer science. -
電子資源:
click for full text (PQDT)
ISBN:
9780355807479
Improving Search via Named Entity Recognition in Morphologically Rich Languages : = A Case Study in Urdu.
Riaz, Kashif H.
Improving Search via Named Entity Recognition in Morphologically Rich Languages :
A Case Study in Urdu. - 1 online resource (249 pages)
Source: Dissertation Abstracts International, Volume: 79-08(E), Section: B.
Thesis (Ph.D.)--University of Minnesota, 2018.
Includes bibliographical references
Search is not a solved problem even in the world of Google and Bing's state of the art engines. Google and similar search engines are keyword based. Keyword-based searching suffers from the vocabulary mismatch problem---the terms in document and user's information request don't overlap. For example, cars and automobiles. This phenomenon is called synonymy. Similarly, the user's term may be polysemous---a user is inquiring about a river's bank, but documents about financial institutions are matched. Vocabulary mismatch exacerbated when the search occurs in Morphological Rich Language (MRL). Concept search techniques like dimensionality reduction do not improve search in Morphological Rich Languages. Names frequently occur news text and determine the "what," "where," "when," and "who" in the news text. Named Entity Recognition attempts to recognize names automatically in text, but these techniques are far from mature in MRL, especially in Arabic Script languages. Urdu is one the focus MRL of this dissertation among Arabic, Farsi, Hindi, and Russian, but it does not have the enabling technologies for NER and search. A corpus, stop word generation algorithm, a light stemmer, a baseline, and NER algorithm is created so the NER-aware search can be accomplished for Urdu. This dissertation demonstrates that NER-aware search on Arabic, Russian, Urdu, and English shows significant improvement over baseline. Furthermore, this dissertation highlights the challenges for researching in low-resource MRL languages.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018
Mode of access: World Wide Web
ISBN: 9780355807479Subjects--Topical Terms:
573171
Computer science.
Index Terms--Genre/Form:
554714
Electronic books.
Improving Search via Named Entity Recognition in Morphologically Rich Languages : = A Case Study in Urdu.
LDR
:02812ntm a2200349Ki 4500
001
920910
005
20181227095853.5
006
m o u
007
cr mn||||a|a||
008
190606s2018 xx obm 000 0 eng d
020
$a
9780355807479
035
$a
(MiAaPQ)AAI10747478
035
$a
(MiAaPQ)umn:19008
035
$a
AAI10747478
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
$d
NTU
100
1
$a
Riaz, Kashif H.
$3
1195847
245
1 0
$a
Improving Search via Named Entity Recognition in Morphologically Rich Languages :
$b
A Case Study in Urdu.
264
0
$c
2018
300
$a
1 online resource (249 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Dissertation Abstracts International, Volume: 79-08(E), Section: B.
500
$a
Advisers: Vipin Kumar; Blake Howald.
502
$a
Thesis (Ph.D.)--University of Minnesota, 2018.
504
$a
Includes bibliographical references
520
$a
Search is not a solved problem even in the world of Google and Bing's state of the art engines. Google and similar search engines are keyword based. Keyword-based searching suffers from the vocabulary mismatch problem---the terms in document and user's information request don't overlap. For example, cars and automobiles. This phenomenon is called synonymy. Similarly, the user's term may be polysemous---a user is inquiring about a river's bank, but documents about financial institutions are matched. Vocabulary mismatch exacerbated when the search occurs in Morphological Rich Language (MRL). Concept search techniques like dimensionality reduction do not improve search in Morphological Rich Languages. Names frequently occur news text and determine the "what," "where," "when," and "who" in the news text. Named Entity Recognition attempts to recognize names automatically in text, but these techniques are far from mature in MRL, especially in Arabic Script languages. Urdu is one the focus MRL of this dissertation among Arabic, Farsi, Hindi, and Russian, but it does not have the enabling technologies for NER and search. A corpus, stop word generation algorithm, a light stemmer, a baseline, and NER algorithm is created so the NER-aware search can be accomplished for Urdu. This dissertation demonstrates that NER-aware search on Arabic, Russian, Urdu, and English shows significant improvement over baseline. Furthermore, this dissertation highlights the challenges for researching in low-resource MRL languages.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2018
538
$a
Mode of access: World Wide Web
650
4
$a
Computer science.
$3
573171
650
4
$a
Linguistics.
$3
557829
650
4
$a
Language.
$3
571568
655
7
$a
Electronic books.
$2
local
$3
554714
690
$a
0984
690
$a
0290
690
$a
0679
710
2
$a
ProQuest Information and Learning Co.
$3
1178819
710
2
$a
University of Minnesota.
$b
Computer Science.
$3
1180176
773
0
$t
Dissertation Abstracts International
$g
79-08B(E).
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10747478
$z
click for full text (PQDT)
筆 0 讀者評論
多媒體
評論
新增評論
分享你的心得
Export
取書館別
處理中
...
變更密碼[密碼必須為2種組合(英文和數字)及長度為10碼以上]
登入