國立虎尾科技大學 |

Rich Embedding Techniques to Improve Scene Understanding.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Rich Embedding Techniques to Improve Scene Understanding./
作者:	VidalMata, Rosaura G.
面頁冊數:	1 online resource (143 pages)
附註:	Source: Dissertations Abstracts International, Volume: 85-06, Section: B.
Contained By:	Dissertations Abstracts International85-06B.
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9798381153583

Rich Embedding Techniques to Improve Scene Understanding.
VidalMata, Rosaura G.

Rich Embedding Techniques to Improve Scene Understanding. - 1 online resource (143 pages)

Source: Dissertations Abstracts International, Volume: 85-06, Section: B.

Thesis (Ph.D.)--University of Notre Dame, 2024.

Includes bibliographical references

Scene understanding lies at the core of numerous computer vision applications, ranging from object recognition and semantic segmentation to surveillance, media forensics, and autonomous navigation. Recent strides in deep learning models and embedding techniques have propelled the field forward, yet a significant challenge remains: the ability to effectively interpret media captured under real-world conditions. Techniques that perform on par with humans on standardized benchmarks, often struggle when applied in real-world scenarios where the environment variables are often unconstrained and there are numerous artifact sources in the imaging acquisition pipeline (e.g., rain, haze, motion blur, interlacing, etc.). As computer vision pushes further into real-world applications, the question arises: How can we effectively engineer a computer vision system that can interpret media captured under these scenarios?This doctoral thesis aims to address these challenges and develop scene recognition algorithms and techniques that provide high-fidelity situational awareness and understanding of complex scenes. Initially, we analyze the limitations of traditional methods and assess the impact of image restoration and enhancement on automatic visual recognition. The scope of investigation covers a wide range of tasks, including image classification, object detection, manipulation detection, and localization.Exploratory work is conducted to identify effective image pre-processing algorithms, in combination with robust features and supervised machine learning approaches, that are well-suited for challenging scenarios involving motion blur, adverse weather conditions, and misfocus. Additionally, we present a comprehensive review of the state-of-the-art in image manipulation detection, focusing on deep learning-based and learning-free methods. Findings indicate that both types of methods are susceptible to being fooled by high-quality manipulations, and can benefit from introducing pre-processing techniques designed to accentuate anomalies present in manipulated regions. These techniques leverage image enhancement approaches to highlight manipulated regions, enabling both human observers and machine learning approaches to localize tampered regions more accurately.Furthermore, recognizing the limitations of image-based approaches in capturing the dynamic nature of scenes, the thesis delves into the potential of incorporating contextual information and temporal relationships into the embedding process. Drawing inspiration from human perception, which leverages temporal segmentation to comprehend complex activities, the research explores the fusion of multiple modalities, such as visual and temporal information, to create more informative and discriminative embeddings. Various approaches are investigated to process data across both the visual and temporal domains, seeking to obtain a more accurate understanding of the scene structure. The objective is not only to comprehend the impact of aberrations and contextual noise present in the data but also to acquire a cleaner representation of captured scenes, optimizing their suitability for recognition tasks.In conclusion, this doctoral thesis presents a novel approach to scene understanding by harnessing rich embedding techniques in real-world computer vision applications. By addressing the limitations of traditional methods, exploring the potential of temporal relationships, and incorporating image enhancement strategies, the research propels the field toward achieving high-fidelity situational awareness. Furthermore, the study sheds light on the challenges of object and manipulation detection and underscores the importance of pre-processing techniques to enhance detection accuracy. This research paves the way for the development of robust computer vision systems capable of interpreting real-world scenes and opens new avenues for advancing the field's capabilities in media forensics, surveillance, augmented reality, and beyond.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798381153583Subjects--Topical Terms:

573171
Computer science.
Subjects--Index Terms:

Scene understandingIndex Terms--Genre/Form:

554714
Electronic books.

Rich Embedding Techniques to Improve Scene Understanding.
LDR:05337ntm a22003617 4500 001 1150296
005 20241028114739.5
006 m o d
007 cr bn ---uuuuu
008 250605s2024 xx obm 000 0 eng d
020 $a 9798381153583
035 $a (MiAaPQ)AAI30691818
035 $a AAI30691818
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a VidalMata, Rosaura G. $3 1476755
245 1 0 $a Rich Embedding Techniques to Improve Scene Understanding.
264 0 $c 2024
300 $a 1 online resource (143 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 85-06, Section: B.
500 $a Advisor: Scheirer, Walter J.;Bowyer, Kevin W.
502 $a Thesis (Ph.D.)--University of Notre Dame, 2024.
504 $a Includes bibliographical references
520 $a Scene understanding lies at the core of numerous computer vision applications, ranging from object recognition and semantic segmentation to surveillance, media forensics, and autonomous navigation. Recent strides in deep learning models and embedding techniques have propelled the field forward, yet a significant challenge remains: the ability to effectively interpret media captured under real-world conditions. Techniques that perform on par with humans on standardized benchmarks, often struggle when applied in real-world scenarios where the environment variables are often unconstrained and there are numerous artifact sources in the imaging acquisition pipeline (e.g., rain, haze, motion blur, interlacing, etc.). As computer vision pushes further into real-world applications, the question arises: How can we effectively engineer a computer vision system that can interpret media captured under these scenarios?This doctoral thesis aims to address these challenges and develop scene recognition algorithms and techniques that provide high-fidelity situational awareness and understanding of complex scenes. Initially, we analyze the limitations of traditional methods and assess the impact of image restoration and enhancement on automatic visual recognition. The scope of investigation covers a wide range of tasks, including image classification, object detection, manipulation detection, and localization.Exploratory work is conducted to identify effective image pre-processing algorithms, in combination with robust features and supervised machine learning approaches, that are well-suited for challenging scenarios involving motion blur, adverse weather conditions, and misfocus. Additionally, we present a comprehensive review of the state-of-the-art in image manipulation detection, focusing on deep learning-based and learning-free methods. Findings indicate that both types of methods are susceptible to being fooled by high-quality manipulations, and can benefit from introducing pre-processing techniques designed to accentuate anomalies present in manipulated regions. These techniques leverage image enhancement approaches to highlight manipulated regions, enabling both human observers and machine learning approaches to localize tampered regions more accurately.Furthermore, recognizing the limitations of image-based approaches in capturing the dynamic nature of scenes, the thesis delves into the potential of incorporating contextual information and temporal relationships into the embedding process. Drawing inspiration from human perception, which leverages temporal segmentation to comprehend complex activities, the research explores the fusion of multiple modalities, such as visual and temporal information, to create more informative and discriminative embeddings. Various approaches are investigated to process data across both the visual and temporal domains, seeking to obtain a more accurate understanding of the scene structure. The objective is not only to comprehend the impact of aberrations and contextual noise present in the data but also to acquire a cleaner representation of captured scenes, optimizing their suitability for recognition tasks.In conclusion, this doctoral thesis presents a novel approach to scene understanding by harnessing rich embedding techniques in real-world computer vision applications. By addressing the limitations of traditional methods, exploring the potential of temporal relationships, and incorporating image enhancement strategies, the research propels the field toward achieving high-fidelity situational awareness. Furthermore, the study sheds light on the challenges of object and manipulation detection and underscores the importance of pre-processing techniques to enhance detection accuracy. This research paves the way for the development of robust computer vision systems capable of interpreting real-world scenes and opens new avenues for advancing the field's capabilities in media forensics, surveillance, augmented reality, and beyond.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
650 4 $a Computer engineering. $3 569006
653 $a Scene understanding
653 $a Scene recognition algorithms
653 $a Embedding techniques
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0464
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of Notre Dame. $b Computer Science and Engineering. $3 1413548
773 0 $t Dissertations Abstracts International $g 85-06B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30691818 $z click for full text (PQDT)