語系:
繁體中文
English
說明(常見問題)
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Toward Building End-to-End Entity Ma...
~
ProQuest Information and Learning Co.
Toward Building End-to-End Entity Matching Solutions.
紀錄類型:
書目-語言資料,手稿 : Monograph/item
正題名/作者:
Toward Building End-to-End Entity Matching Solutions./
作者:
Gnanaprakash Christopher, Paul Suganthan.
面頁冊數:
1 online resource (135 pages)
附註:
Source: Dissertation Abstracts International, Volume: 79-05(E), Section: B.
Contained By:
Dissertation Abstracts International79-05B(E).
標題:
Computer science. -
電子資源:
click for full text (PQDT)
ISBN:
9780355590319
Toward Building End-to-End Entity Matching Solutions.
Gnanaprakash Christopher, Paul Suganthan.
Toward Building End-to-End Entity Matching Solutions.
- 1 online resource (135 pages)
Source: Dissertation Abstracts International, Volume: 79-05(E), Section: B.
Thesis (Ph.D.)
Includes bibliographical references
Entity matching (EM) finds data records that refer to the same real-world entity. Numerous EM solutions have been proposed. These solutions however suffer from two main problems. First, they are not end-to-end. That is, the EM workflow consists of multiple steps, such as cleaning, blocking, matching, sampling, labeling, debugging, etc. Current work however has focused mostly on blocking and matching, ignoring the remaining steps. Second, most current works are designed primarily for power users. They are very difficult for lay users to use. In this dissertation I develop solutions to address the above two problems. For the first problem, I work together with several colleagues to develop Magellan, an end-to-end EM solution approach. Within the context of Magellan, I develop a solution to help users extract missing attribute values from textual data (so that EM can be performed more accurately). For the second problem, I develop a solution that lay users can use to perform EM end-to-end easily on the cloud, using a cluster of machines, and optionally using crowdsourcing. I then focus on string matching, a special case of EM, and develop an effective end-to-end solution for lay users. Finally, I describe how the above solutions have been implemented (mostly as open-source software) and deployed to solve real-world applications. The open-source implementation of several solutions in particular has been deployed on Kaggle, a large and well-known data science and competition platform with well over 0.5M users.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018
Mode of access: World Wide Web
ISBN: 9780355590319Subjects--Topical Terms:
573171
Computer science.
Index Terms--Genre/Form:
554714
Electronic books.
Toward Building End-to-End Entity Matching Solutions.
LDR
:02778ntm a2200337Ki 4500
001
909230
005
20180419121558.5
006
m o u
007
cr mn||||a|a||
008
190606s2018 xx obm 000 0 eng d
020
$a
9780355590319
035
$a
(MiAaPQ)AAI10743625
035
$a
(MiAaPQ)wisc:15104
035
$a
AAI10743625
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
099
$a
TUL
$f
hyy
$c
available through World Wide Web
100
1
$a
Gnanaprakash Christopher, Paul Suganthan.
$3
1179877
245
1 0
$a
Toward Building End-to-End Entity Matching Solutions.
264
0
$c
2018
300
$a
1 online resource (135 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Dissertation Abstracts International, Volume: 79-05(E), Section: B.
500
$a
Adviser: AnHai Doan.
502
$a
Thesis (Ph.D.)
$c
The University of Wisconsin - Madison
$d
2018.
504
$a
Includes bibliographical references
520
$a
Entity matching (EM) finds data records that refer to the same real-world entity. Numerous EM solutions have been proposed. These solutions however suffer from two main problems. First, they are not end-to-end. That is, the EM workflow consists of multiple steps, such as cleaning, blocking, matching, sampling, labeling, debugging, etc. Current work however has focused mostly on blocking and matching, ignoring the remaining steps. Second, most current works are designed primarily for power users. They are very difficult for lay users to use. In this dissertation I develop solutions to address the above two problems. For the first problem, I work together with several colleagues to develop Magellan, an end-to-end EM solution approach. Within the context of Magellan, I develop a solution to help users extract missing attribute values from textual data (so that EM can be performed more accurately). For the second problem, I develop a solution that lay users can use to perform EM end-to-end easily on the cloud, using a cluster of machines, and optionally using crowdsourcing. I then focus on string matching, a special case of EM, and develop an effective end-to-end solution for lay users. Finally, I describe how the above solutions have been implemented (mostly as open-source software) and deployed to solve real-world applications. The open-source implementation of several solutions in particular has been deployed on Kaggle, a large and well-known data science and competition platform with well over 0.5M users.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2018
538
$a
Mode of access: World Wide Web
650
4
$a
Computer science.
$3
573171
655
7
$a
Electronic books.
$2
local
$3
554714
690
$a
0984
710
2
$a
ProQuest Information and Learning Co.
$3
1178819
710
2
$a
The University of Wisconsin - Madison.
$b
Computer Sciences.
$3
1179878
773
0
$t
Dissertation Abstracts International
$g
79-05B(E).
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10743625
$z
click for full text (PQDT)
筆 0 讀者評論
多媒體
評論
新增評論
分享你的心得
Export
取書館別
處理中
...
變更密碼[密碼必須為2種組合(英文和數字)及長度為10碼以上]
登入