語系:
繁體中文
English
說明(常見問題)
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Achieving consumable big data analyt...
~
ProQuest Information and Learning Co.
Achieving consumable big data analytics by distributing data mining algorithms.
紀錄類型:
書目-語言資料,手稿 : Monograph/item
正題名/作者:
Achieving consumable big data analytics by distributing data mining algorithms./
作者:
Khalifa, Shady Samir Mohamed.
面頁冊數:
1 online resource (129 pages)
附註:
Source: Dissertation Abstracts International, Volume: 75-01C.
標題:
Computer science. -
電子資源:
click for full text (PQDT)
Achieving consumable big data analytics by distributing data mining algorithms.
Khalifa, Shady Samir Mohamed.
Achieving consumable big data analytics by distributing data mining algorithms.
- 1 online resource (129 pages)
Source: Dissertation Abstracts International, Volume: 75-01C.
Thesis (Ph.D.)--Queen's University (Canada), 2017.
Includes bibliographical references
Businesses look at Big Data as an opportunity to gain insights for improving their services. The derivation of such insights requires using different data mining techniques. Mature data mining tools like WEKA or R have been in development for years. They implement a large number of data mining algorithms and can support sophisticated Analytics. However, these mature tools are designed to run on a single machine making them unsuitable to handle Big Data. Using these tools requires data mining and statistics knowledge, and some of them, like R, are hard to learn. Businesses do not always have the technical skills required to carry on such Analytics. Even if they do, it is challenging to find a tool with the needed algorithms that supports distributed processing to handle the Big Data high arrival velocity and large volumes. The Businesses' analytical requirements can be addressed by Consumable Big Data Analytics, that is, solutions that allow businesses to do Big Data Analytics themselves using their in-house expertise. In this work, we provide a Consumable Analytics solution to meet the businesses' analytical needs. First, we conduct a survey of existing Analytics solutions to identify possible areas of improvement to provide Consumable Analytics. Second, instead of developing distributed data mining algorithms to handle Big Data, we develop the Data Mining Distribution (DMD) algorithm and the Label-Aware Disjoint Partitioning (LADP) algorithm to distribute the execution of all existing single-machine data mining algorithms without rewriting a single line of their code. This gives users the flexibility to use any available data mining library, have algorithms like Hoeffding Tree run 70% to 95% faster and achieve up to 18% increase in prediction accuracy. Third, we develop the free and open source QDrill solution to implement our DMD and LADP algorithms for distributed Analytics. QDrill implements our proposed Distributed Analytics Query Language (DAQL) interface that adds Analytics capabilities to the regular SQL syntax and allows integration with Business Intelligence (BI) tools. This allows businesses to use their in-house expertise to do Big Data Analytics using the spreadsheets and visualizations of their BI tools.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018
Mode of access: World Wide Web
Subjects--Topical Terms:
573171
Computer science.
Index Terms--Genre/Form:
554714
Electronic books.
Achieving consumable big data analytics by distributing data mining algorithms.
LDR
:03325ntm a2200289K 4500
001
913941
005
20180628100931.5
006
m o u
007
cr mn||||a|a||
008
190606s2017 xx obm 000 0 eng d
035
$a
(MiAaPQ)AAI10589571
035
$a
(MiAaPQ)QueensUCan197415460
035
$a
AAI10589571
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
100
1
$a
Khalifa, Shady Samir Mohamed.
$3
1186976
245
1 0
$a
Achieving consumable big data analytics by distributing data mining algorithms.
264
0
$c
2017
300
$a
1 online resource (129 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Dissertation Abstracts International, Volume: 75-01C.
502
$a
Thesis (Ph.D.)--Queen's University (Canada), 2017.
504
$a
Includes bibliographical references
520
$a
Businesses look at Big Data as an opportunity to gain insights for improving their services. The derivation of such insights requires using different data mining techniques. Mature data mining tools like WEKA or R have been in development for years. They implement a large number of data mining algorithms and can support sophisticated Analytics. However, these mature tools are designed to run on a single machine making them unsuitable to handle Big Data. Using these tools requires data mining and statistics knowledge, and some of them, like R, are hard to learn. Businesses do not always have the technical skills required to carry on such Analytics. Even if they do, it is challenging to find a tool with the needed algorithms that supports distributed processing to handle the Big Data high arrival velocity and large volumes. The Businesses' analytical requirements can be addressed by Consumable Big Data Analytics, that is, solutions that allow businesses to do Big Data Analytics themselves using their in-house expertise. In this work, we provide a Consumable Analytics solution to meet the businesses' analytical needs. First, we conduct a survey of existing Analytics solutions to identify possible areas of improvement to provide Consumable Analytics. Second, instead of developing distributed data mining algorithms to handle Big Data, we develop the Data Mining Distribution (DMD) algorithm and the Label-Aware Disjoint Partitioning (LADP) algorithm to distribute the execution of all existing single-machine data mining algorithms without rewriting a single line of their code. This gives users the flexibility to use any available data mining library, have algorithms like Hoeffding Tree run 70% to 95% faster and achieve up to 18% increase in prediction accuracy. Third, we develop the free and open source QDrill solution to implement our DMD and LADP algorithms for distributed Analytics. QDrill implements our proposed Distributed Analytics Query Language (DAQL) interface that adds Analytics capabilities to the regular SQL syntax and allows integration with Business Intelligence (BI) tools. This allows businesses to use their in-house expertise to do Big Data Analytics using the spreadsheets and visualizations of their BI tools.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2018
538
$a
Mode of access: World Wide Web
650
4
$a
Computer science.
$3
573171
655
7
$a
Electronic books.
$2
local
$3
554714
690
$a
0984
710
2
$a
ProQuest Information and Learning Co.
$3
1178819
710
2
$a
Queen's University (Canada).
$3
1148613
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10589571
$z
click for full text (PQDT)
筆 0 讀者評論
多媒體
評論
新增評論
分享你的心得
Export
取書館別
處理中
...
變更密碼[密碼必須為2種組合(英文和數字)及長度為10碼以上]
登入