國立虎尾科技大學 |

Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning./
作者:	Larsson, Gustav Martin.
面頁冊數:	1 online resource (131 pages)
附註:	Source: Dissertation Abstracts International, Volume: 79-02(E), Section: B.
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9780355234251

Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning.
Larsson, Gustav Martin.

Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning. - 1 online resource (131 pages)

Source: Dissertation Abstracts International, Volume: 79-02(E), Section: B.

Thesis (Ph.D.)--The University of Chicago, 2017.

Includes bibliographical references

The success of deep learning in computer vision is rooted in the ability of deep networks to scale up model complexity as demanded by challenging visual tasks. As complexity is increased, so is the demand for large amounts of labeled data to train the model. This is associated with a costly human annotation effort. Modern vision networks often rely on a two-stage training process to satisfy this thirst for training data: the first stage, pretraining, is done on a general vision task where a large collection of annotated data is available. This primes the network with semantic knowledge that is general to a wide variety of vision tasks. The second stage, fine-tuning, continues the training of the network, this time for the target task where annotations are often scarce. The reliance on supervised pretraining anchors future progress to a constant human annotation effort, especially for new or ever-changing domains. To address this concern, with the long-term goal of leveraging the abundance of cheap unlabeled data, we explore methods of unsupervised pretraining. In particular, we propose to use self-supervised automatic image colorization.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9780355234251Subjects--Topical Terms:

573171
Computer science.
Index Terms--Genre/Form:

554714
Electronic books.

Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning.
LDR:05150ntm a2200361K 4500 001 914371
005 20180703084808.5
006 m o u
007 cr mn||||a|a||
008 190606s2017 xx obm 000 0 eng d
020 $a 9780355234251
035 $a (MiAaPQ)AAI10603743
035 $a (MiAaPQ)uchicago:13932
035 $a AAI10603743
040 $a MiAaPQ $b eng $c MiAaPQ
100 1 $a Larsson, Gustav Martin. $3 1187606
245 1 0 $a Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning.
264 0 $c 2017
300 $a 1 online resource (131 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 79-02(E), Section: B.
500 $a Advisers: Yali Amit; Gregory Shakhnarovich.
502 $a Thesis (Ph.D.)--The University of Chicago, 2017.
504 $a Includes bibliographical references
520 $a The success of deep learning in computer vision is rooted in the ability of deep networks to scale up model complexity as demanded by challenging visual tasks. As complexity is increased, so is the demand for large amounts of labeled data to train the model. This is associated with a costly human annotation effort. Modern vision networks often rely on a two-stage training process to satisfy this thirst for training data: the first stage, pretraining, is done on a general vision task where a large collection of annotated data is available. This primes the network with semantic knowledge that is general to a wide variety of vision tasks. The second stage, fine-tuning, continues the training of the network, this time for the target task where annotations are often scarce. The reliance on supervised pretraining anchors future progress to a constant human annotation effort, especially for new or ever-changing domains. To address this concern, with the long-term goal of leveraging the abundance of cheap unlabeled data, we explore methods of unsupervised pretraining. In particular, we propose to use self-supervised automatic image colorization.
520 $a We begin by evaluating two baselines for leveraging unlabeled data for representation learning. One is based on training a mixture model for each layer in a greedy manner. We show that this method excels on relatively simple tasks in the small sample regime. It can also be used to produce a well-organized feature space that is equivariant to cyclic transformations, such as rotation. Second, we consider autoencoders, which are trained end-to-end and thus avoid the main concerns of greedy training. However, its per-pixel loss is not a good analog to perceptual similarity and the representation suffers as a consequence. Both of these methods leave a wide gap between unsupervised and supervised pretraining.
520 $a As a precursor to our improvements in unsupervised representation learning, we develop a novel method for automatic colorization of grayscale images and focus initially on its use as a graphics application. We set a new state-of-the-art that handles a wide variety of scenes and contexts. Our method makes it possible to revitalize old black-and-white photography, without requiring human effort or expertise. In order for the model to appropriately re-color a grayscale object, it must first be able to identify it. Since such high-level semantic knowledge benefits colorization, we found success employing the two-stage training process with supervised pretraining. This raises the question: If colorization and classification both benefit from the same visual semantics, can we reverse the relationship and use colorization to benefit classification?
520 $a Using colorization as a pretraining method does not require data annotations, since labeled training pairs are automatically constructed by separating intensity and color. The task is what is called self-supervised. Colorization joins a growing family of self-supervision methods as a front-runner with state-of-the-art results. We show that up to a certain sample size, labeled data can be entirely replaced by a large collection of unlabeled data. If these techniques continue to improve, they may one day supplant supervised pretraining altogether. We provide a significant step toward this goal.
520 $a As a future direction for self-supervision, we investigate if multiple proxy tasks can be combined to improve generalization in the representation. A wide range of combination methods is explored, both offline methods that fuse or distill already trained networks, and online methods that actively train multiple tasks together. On controlled experiments, we demonstrate significant gains using both offline and online methods. However, the benefits do not translate to self-supervision pretraining, leaving the question of multi-proxy self-supervision an open and interesting problem.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a The University of Chicago. $b Computer Science. $3 845504
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10603743 $z click for full text (PQDT)