國立虎尾科技大學 |

Multimodal Approaches to Computer Vision Problems.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Multimodal Approaches to Computer Vision Problems./
作者:	Reale, Christeopher.
面頁冊數:	1 online resource (136 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-07(E), Section: B.
Contained By:	Dissertation Abstracts International79-07B(E).
標題:	Electrical engineering. -
電子資源:	click for full text (PQDT)
ISBN:	9780355636154

Multimodal Approaches to Computer Vision Problems.
Reale, Christeopher.

Multimodal Approaches to Computer Vision Problems. - 1 online resource (136 pages)

Source: Dissertation Abstracts International, Volume: 79-07(E), Section: B.

Thesis (Ph.D.)--University of Maryland, College Park, 2017.

Includes bibliographical references

The goal of computer vision research is to automatically extract high-level information from images and videos. The vast majority of this research focuses specifically on visible light imagery. In this dissertation, we present approaches to computer vision problems that incorporate data obtained from alternative modalities including thermal infrared imagery, near-infrared imagery, and text. We consider approaches where other modalities are used in place of visible imagery as well as approaches that use other modalities to improve the performance of traditional computer vision algorithms. The bulk of this dissertation focuses on Heterogeneous Face Recognition (HFR). HFR is a variant of face recognition where the probe and gallery face images are obtained with different sensing modalities. We also present a method to incorporate text information into human activity recognition algorithms.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780355636154Subjects--Topical Terms:

596380
Electrical engineering.
Index Terms--Genre/Form:

554714
Electronic books.

Multimodal Approaches to Computer Vision Problems.
LDR:05727ntm a2200373Ki 4500 001 920672
005 20181203094031.5
006 m o u
007 cr mn||||a|a||
008 190606s2017 xx obm 000 0 eng d
020 $a 9780355636154
035 $a (MiAaPQ)AAI10640914
035 $a (MiAaPQ)umd:18562
035 $a AAI10640914
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Reale, Christeopher. $3 1195539
245 1 0 $a Multimodal Approaches to Computer Vision Problems.
264 0 $c 2017
300 $a 1 online resource (136 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-07(E), Section: B.
500 $a Adviser: Rama Chellappa.
502 $a Thesis (Ph.D.)--University of Maryland, College Park, 2017.
504 $a Includes bibliographical references
520 $a The goal of computer vision research is to automatically extract high-level information from images and videos. The vast majority of this research focuses specifically on visible light imagery. In this dissertation, we present approaches to computer vision problems that incorporate data obtained from alternative modalities including thermal infrared imagery, near-infrared imagery, and text. We consider approaches where other modalities are used in place of visible imagery as well as approaches that use other modalities to improve the performance of traditional computer vision algorithms. The bulk of this dissertation focuses on Heterogeneous Face Recognition (HFR). HFR is a variant of face recognition where the probe and gallery face images are obtained with different sensing modalities. We also present a method to incorporate text information into human activity recognition algorithms.
520 $a We first present a kernel task-driven coupled dictionary model to represent the data across multiple domains for thermal infrared HFR. We extend a linear coupled dictionary model to use the kernel method to process the signals in a high dimensional space; this effectively enables the dictionaries to represent the data non-linearly in the original feature space. We further improve the model by making the dictionaries task-driven. This allows us to tune the dictionaries to perform well on the classification task at hand rather than the standard reconstruction task. We show that our algorithms outperform algorithms based on standard coupled dictionaries on three datasets for thermal infrared to visible face recognition.
520 $a Next, we present a deep learning-based approach to near-infrared (NIR) HFR. Most approaches to HFR involve modeling the relationship between corresponding images from the visible and sensing domains. Due to data constraints, this is typically done at the patch level and/or with shallow models to prevent overfitting. In this approach, rather than modeling local patches or using a simple model, we use a complex, deep model to learn the relationship between the entirety of cross-modal face images. We describe a deep convolutional neural network-based method that leverages a large visible image face dataset to prevent overfitting. We present experimental results on two benchmark data sets showing its effectiveness.
520 $a Third, we present a model order selection algorithm for deep neural networks. In recent years, deep learning has emerged as a dominant methodology in machine learning. While it has been shown to produce state-of-the-art results for a variety of applications, one aspect of deep networks that has not been extensively researched is how to determine the optimal network structure. This problem is generally solved by ad hoc methods. In this work we address a sub-problem of this task: determining the breadth (number of nodes) of each layer. We show how to use group-sparsity-inducing regularization to automatically select these hyper-parameters. We demonstrate the proposed method by using it to reduce the size of networks while maintaining performance for our NIR HFR deep-learning algorithm. Additionally, we demonstrate the generality of our algorithm by applying it to image classification tasks.
520 $a Finally, we present a method to improve activity recognition algorithms through the use of multitask learning and information extracted from a large text corpora. Current state-of-the-art deep learning approaches are limited by the size and scope of the data set they use to train the networks. We present a multitask learning approach to expand the training data set. Specifically, we train the neural networks to recognize objects in addition to activities. This allows us to expand our training set with large, publicly available object recognition data sets and thus use deeper, state-of-the-art network architectures. Additionally, when learning about the target activities, the algorithms are limited to the information contained in the training set. It is virtually impossible to capture all variations of the target activities in a training set. In this work, we extract information about the target activities from a large text corpora. We incorporate this information into the training algorithm by using it to select relevant object recognition classes for the multitask learning approach. We present experimental results on a benchmark activity recognition data set showing the effectiveness of our approach.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Electrical engineering. $3 596380
655 7 $a Electronic books. $2 local $3 554714
690 $a 0544
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of Maryland, College Park. $b Electrical Engineering. $3 845418
773 0 $t Dissertation Abstracts International $g 79-07B(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10640914 $z click for full text (PQDT)