Visual analysis with limited supervision

Publication Type:
Issue Date:
Full metadata record
Visual analysis is an attractive research topic in the field of computer vision. In the visual analysis, there are two critical directions, visual retrieval and visual classification. In recent years, visual retrieval has been investigated and developed in many real-world applications, for instance, in person re-identification. On the other hand, visual classification is also widely studied, such as in image classification. Typical visual analysis methods are supervised learning algorithms. In such algorithms, extensive labeled data is demanded for training supervised models in order to achieve acceptable performance. However, it is difficult to collect and generate annotated data in the real world due to the limited resources, such as human labor for annotation. Therefore, it is urgent to develop methods to complete the visual analysis mission with limited supervision. In this thesis, we propose to address the visual analysis problem with limited supervision. Specifically, we treat limited supervision problem in three scenarios according to the amount of labeled data. In the first scenario, no labeled data are provided and only limited human labor for annotation is available; In the second scenario, scarce labeled data and abundant unlabeled data are accessible. In the third scenario, only few instances in the target dataset are labeled and there are multiple sources of labeled data from different domains. In Chapter 2 and Chapter 3, we discuss the first scenario, when no labeled data are provided, and only limited human labor for annotation is available. We propose to solve the problem via active learning. Unlike conventional active learning, which usually starts with a set of labeled data as the reference, in this thesis, we adopt the active learning algorithm with no pre-given labeled data. We refer these algorithms as the Early Active Learning. In this thesis, first, we attempt to select the most contributive instances for annotation and later being utilized for training supervised models. We demonstrate that even by annotating a few selected instances, the proposed method can achieve comparable performance in the visual retrieval. Second, we further extend the instance based active learning to pair-based early active learning. Other than select instances for annotation, the pair-based early active learning selects the most informative pairs for annotation, which is essential in the visual retrieval. In Chapter 4, in the second scenario, we address the visual retrieval problem when there are scarce labeled data and abundant unlabeled data. In this thesis, we propose to utilize both the labeled and the unlabeled data in a semi-supervised attribute learning schema. The proposed method could jointly learn the latent attributes with appropriate dimensions and estimate the pairwise probability of the data simultaneously. In Chapter 5 and Chapter 6, in the third scenario, we focus on visual classification with few or no labels, but there are pre-known labeled data from other domains. To improve the performance in the target domain, we adopt transfer learning algorithms to transfer helpful knowledge from the pre-known (source) domain with labeled data. First, in Chapter 5, the few-shot visual classification problem is considered. We have access to multiple source datasets with well-labeled data but can only access a limited set of labeled data in the target dataset. An Analogical Transfer Learning schema is proposed for this problem. It attempts to transfer the knowledge from the source domains to enhance the performance of the target domain models. In the algorithm, an analogy-revision schema is designed to select only the helpful source instances to enhance the target domain models. Second, in Chapter 6, we challenge a more difficult problem when there is no labeled data in the target domain in the visual retrieval problem. A Domain-aware Unsupervised Cross-dataset Transfer Learning algorithm is proposed to address this problem. The importance of universal and domain-unique appearances are valued simultaneously and jointly contribute to the representation learning. It manages to leverage the common and domain-unique representations across datasets in the unsupervised visual retrieval.
Please use this identifier to cite or link to this item: