Language:
English
繁體中文
Help
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Improving Search via Named Entity Re...
~
University of Minnesota.
Improving Search via Named Entity Recognition in Morphologically Rich Languages : = A Case Study in Urdu.
Record Type:
Language materials, manuscript : Monograph/item
Title/Author:
Improving Search via Named Entity Recognition in Morphologically Rich Languages :/
Reminder of title:
A Case Study in Urdu.
Author:
Riaz, Kashif H.
Description:
1 online resource (249 pages)
Notes:
Source: Dissertation Abstracts International, Volume: 79-08(E), Section: B.
Contained By:
Dissertation Abstracts International79-08B(E).
Subject:
Computer science. -
Online resource:
click for full text (PQDT)
ISBN:
9780355807479
Improving Search via Named Entity Recognition in Morphologically Rich Languages : = A Case Study in Urdu.
Riaz, Kashif H.
Improving Search via Named Entity Recognition in Morphologically Rich Languages :
A Case Study in Urdu. - 1 online resource (249 pages)
Source: Dissertation Abstracts International, Volume: 79-08(E), Section: B.
Thesis (Ph.D.)--University of Minnesota, 2018.
Includes bibliographical references
Search is not a solved problem even in the world of Google and Bing's state of the art engines. Google and similar search engines are keyword based. Keyword-based searching suffers from the vocabulary mismatch problem---the terms in document and user's information request don't overlap. For example, cars and automobiles. This phenomenon is called synonymy. Similarly, the user's term may be polysemous---a user is inquiring about a river's bank, but documents about financial institutions are matched. Vocabulary mismatch exacerbated when the search occurs in Morphological Rich Language (MRL). Concept search techniques like dimensionality reduction do not improve search in Morphological Rich Languages. Names frequently occur news text and determine the "what," "where," "when," and "who" in the news text. Named Entity Recognition attempts to recognize names automatically in text, but these techniques are far from mature in MRL, especially in Arabic Script languages. Urdu is one the focus MRL of this dissertation among Arabic, Farsi, Hindi, and Russian, but it does not have the enabling technologies for NER and search. A corpus, stop word generation algorithm, a light stemmer, a baseline, and NER algorithm is created so the NER-aware search can be accomplished for Urdu. This dissertation demonstrates that NER-aware search on Arabic, Russian, Urdu, and English shows significant improvement over baseline. Furthermore, this dissertation highlights the challenges for researching in low-resource MRL languages.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018
Mode of access: World Wide Web
ISBN: 9780355807479Subjects--Topical Terms:
573171
Computer science.
Index Terms--Genre/Form:
554714
Electronic books.
Improving Search via Named Entity Recognition in Morphologically Rich Languages : = A Case Study in Urdu.
LDR
:02812ntm a2200349Ki 4500
001
920910
005
20181227095853.5
006
m o u
007
cr mn||||a|a||
008
190606s2018 xx obm 000 0 eng d
020
$a
9780355807479
035
$a
(MiAaPQ)AAI10747478
035
$a
(MiAaPQ)umn:19008
035
$a
AAI10747478
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
$d
NTU
100
1
$a
Riaz, Kashif H.
$3
1195847
245
1 0
$a
Improving Search via Named Entity Recognition in Morphologically Rich Languages :
$b
A Case Study in Urdu.
264
0
$c
2018
300
$a
1 online resource (249 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Dissertation Abstracts International, Volume: 79-08(E), Section: B.
500
$a
Advisers: Vipin Kumar; Blake Howald.
502
$a
Thesis (Ph.D.)--University of Minnesota, 2018.
504
$a
Includes bibliographical references
520
$a
Search is not a solved problem even in the world of Google and Bing's state of the art engines. Google and similar search engines are keyword based. Keyword-based searching suffers from the vocabulary mismatch problem---the terms in document and user's information request don't overlap. For example, cars and automobiles. This phenomenon is called synonymy. Similarly, the user's term may be polysemous---a user is inquiring about a river's bank, but documents about financial institutions are matched. Vocabulary mismatch exacerbated when the search occurs in Morphological Rich Language (MRL). Concept search techniques like dimensionality reduction do not improve search in Morphological Rich Languages. Names frequently occur news text and determine the "what," "where," "when," and "who" in the news text. Named Entity Recognition attempts to recognize names automatically in text, but these techniques are far from mature in MRL, especially in Arabic Script languages. Urdu is one the focus MRL of this dissertation among Arabic, Farsi, Hindi, and Russian, but it does not have the enabling technologies for NER and search. A corpus, stop word generation algorithm, a light stemmer, a baseline, and NER algorithm is created so the NER-aware search can be accomplished for Urdu. This dissertation demonstrates that NER-aware search on Arabic, Russian, Urdu, and English shows significant improvement over baseline. Furthermore, this dissertation highlights the challenges for researching in low-resource MRL languages.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2018
538
$a
Mode of access: World Wide Web
650
4
$a
Computer science.
$3
573171
650
4
$a
Linguistics.
$3
557829
650
4
$a
Language.
$3
571568
655
7
$a
Electronic books.
$2
local
$3
554714
690
$a
0984
690
$a
0290
690
$a
0679
710
2
$a
ProQuest Information and Learning Co.
$3
1178819
710
2
$a
University of Minnesota.
$b
Computer Science.
$3
1180176
773
0
$t
Dissertation Abstracts International
$g
79-08B(E).
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10747478
$z
click for full text (PQDT)
based on 0 review(s)
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login