國立虎尾科技大學 |

Scalable Nonlinear Spectral Dimensionality Reduction Methods for Streaming Data.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Scalable Nonlinear Spectral Dimensionality Reduction Methods for Streaming Data./
作者:	Mahapatra, Suchismit.
面頁冊數:	1 online resource (125 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.
Contained By:	Dissertation Abstracts International79-10B(E).
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9780438050020

Scalable Nonlinear Spectral Dimensionality Reduction Methods for Streaming Data.
Mahapatra, Suchismit.

Scalable Nonlinear Spectral Dimensionality Reduction Methods for Streaming Data. - 1 online resource (125 pages)

Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.

Thesis (Ph.D.)--State University of New York at Buffalo, 2018.

Includes bibliographical references

High-dimensional data is inherently difficult to explore and analyze owing to the "curse of dimensionality" that render many statistical and Machine Learning (ML) techniques (e.g. clustering, classification, model fitting, etc.) inadequate. In this context, nonlinear spectral dimensionality reduction (NLSDR) methods have proved to be an indispensable tool. However, standard NLSDR methods, e.g. Isomap or Locally Linear Embedding (LLE), have been designed for off-line or batch processing. Consequently, they are computationally too expensive or impractical in cases where dimensionality reduction must be applied on a data stream. Processing data streams efficiently using standard approaches is also challenging in general, given streams require real-time processing and cannot be stored permanently. Any form of analysis, including NLSDR and/or detecting concept-drift requires adequate summarization which can deal with the inherent constraints and that can approximate the characteristics of the stream well. In spite of advances in hardware and development of novel processing frameworks, the issue of scalability of ML algorithms still remains. The scalability of an algorithm is measured via how its performance gets affected as the problem size increases. Scalable algorithms should be able to work with any amount of data without consuming ever growing amounts of storage memory and computations. The challenge is often to find a trade-off between quality and processing time i.e. getting "good enough" solutions as "fast" or "efficiently" as possible.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780438050020Subjects--Topical Terms:

573171
Computer science.
Index Terms--Genre/Form:

554714
Electronic books.

Scalable Nonlinear Spectral Dimensionality Reduction Methods for Streaming Data.
LDR:04546ntm a2200349Ki 4500 001 919212
005 20181116131021.5
006 m o u
007 cr mn||||a|a||
008 190606s2018 xx obm 000 0 eng d
020 $a 9780438050020
035 $a (MiAaPQ)AAI10823954
035 $a (MiAaPQ)buffalo:15849
035 $a AAI10823954
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Mahapatra, Suchismit. $3 1193728
245 1 0 $a Scalable Nonlinear Spectral Dimensionality Reduction Methods for Streaming Data.
264 0 $c 2018
300 $a 1 online resource (125 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.
500 $a Adviser: Varun Chandola.
502 $a Thesis (Ph.D.)--State University of New York at Buffalo, 2018.
504 $a Includes bibliographical references
520 $a High-dimensional data is inherently difficult to explore and analyze owing to the "curse of dimensionality" that render many statistical and Machine Learning (ML) techniques (e.g. clustering, classification, model fitting, etc.) inadequate. In this context, nonlinear spectral dimensionality reduction (NLSDR) methods have proved to be an indispensable tool. However, standard NLSDR methods, e.g. Isomap or Locally Linear Embedding (LLE), have been designed for off-line or batch processing. Consequently, they are computationally too expensive or impractical in cases where dimensionality reduction must be applied on a data stream. Processing data streams efficiently using standard approaches is also challenging in general, given streams require real-time processing and cannot be stored permanently. Any form of analysis, including NLSDR and/or detecting concept-drift requires adequate summarization which can deal with the inherent constraints and that can approximate the characteristics of the stream well. In spite of advances in hardware and development of novel processing frameworks, the issue of scalability of ML algorithms still remains. The scalability of an algorithm is measured via how its performance gets affected as the problem size increases. Scalable algorithms should be able to work with any amount of data without consuming ever growing amounts of storage memory and computations. The challenge is often to find a trade-off between quality and processing time i.e. getting "good enough" solutions as "fast" or "efficiently" as possible.
520 $a In this thesis, I propose a generalized framework for streaming NLSDR which can work with different manifold learning approaches e.g. Isomap and LLE to be able to deal effectively with data streams, having underlying distributions which can be multi-modal in nature and be non-uniformally sampled as well. In particular, I developed streaming Isomap or S-Isomap, an algorithm which via a clever approximation is able to scalably reduce the computation cost of discovering the low-dimensional embedding at a fraction of the cost without affecting the quality significantly.
520 $a However, S-Isomap was limited in this scope i.e. it could only deal with unimodal, uniformly sampled distributions. Hence arose the need for S-Isomap++, which ameliorated the flaws of its predecessor in being able to deal with multimodal and/or unevenly sampled distributions. However, S-Isomap++ can only detect manifolds which it encounters in its batch learning phase and not those which it might encounter in the streaming phase. Thus, S-Isomap++ ceases to "learn" and evolve to be able to limit the embedding error for points in the data stream, which motivated the need for GP-Isomap, which via a novel positive-definite geodesic-distance based kernel, and using Gaussian Processes to measure variance, is able to detect concept-drift i.e. distinguish among different manifolds and embed streaming samples effectively. Subsequently, we developed the streaming LLE algorithm, for processing streams using LLE as well as discuss a generalized Out-of-Sample Extension methodology for streaming NLSDR, applicable for different manifold learning algorithms. Lastly, we provide theoretical bounds for S-Isomap and GP-Isomap as part of this work.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a State University of New York at Buffalo. $b Computer Science and Engineering. $3 1180201
773 0 $t Dissertation Abstracts International $g 79-10B(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10823954 $z click for full text (PQDT)