SOLVING COMPUTER VISION PROBLEMS UNDER THE COMPOSITIONALITY PRINCIPLE

XU ZIWEI

Publication

SOLVING COMPUTER VISION PROBLEMS UNDER THE COMPOSITIONALITY PRINCIPLE

XU ZIWEI

Abstract

This thesis presents solutions to computer vision problems using the principle of compositionality, which explains how complex concepts are formed from primitive ones. First, a Motion Capsule Autoencoder (MCAE) is introduced to learn transformation-invariant representations of motion signals. Second, a Blocked Message Passing Network (BMP-Net) is developed to recognise objects described by adjective-noun pairs in images, minimising the system's bias toward seen pairs. Last, for temporal segmentation of human activity videos, a Differentiable Temporal Logic (DTL) framework is proposed to reduce logical errors in the outputs of deep learning models. By efficiently and robustly relating different aspects of these problems compositionally, these solutions enable a better understanding of the rich visual world.