Staff Publications

Permanent URI for this collection

https://scholarbank.nus.edu.sg/10635/137005

Browse

Search results

Now showing 1 - 10 of 129

Mixed-dish Recognition with Contextual Relation Networks
(2019-10-21) Lixi Deng; Jingjing Chen; Qianru Sun; Xiangnan He; Sheng Tang; Zhaoyan Ming; Yongdong Zhang; Tat-Seng Chua; DEPARTMENT OF COMPUTER SCIENCE
Mixed dish is a food category that contains different dishes mixed in one plate, and is popular in Eastern and Southeast Asia. Recognizing individual dishes in a mixed dish image is important for health related applications, e.g. calculating the nutrition values. However, most existing methods that focus on single dish classification are not applicable to mixed-dish recognition. The new challenge in recognizing mixed-dish images are the complex ingredient combination and severe overlap among different dishes. In order to tackle these problems, we propose a novel approach called contextual relation networks (CR-Nets) that encodes the implicit and explicit contextual relations among multiple dishes using region-level features and label-level co-occurrence, respectively. This is inspired by the intuition that people are likely to choose dishes with common eating habits, e.g., with multiple nutrition but without repeating ingredients. In addition, we collect a large-scale dataset of mixed-dish images that contain 9, 254 mixed-dish images from 6 school canteens in Singapore. Extensive experiments on both our dataset and a smaller-scale public dataset validate that our CR-Nets can achieve top performance for localizing the dishes and recognizing their food categories. © 2019 Association for Computing Machinery.
Shorter-is-Better: Venue Category Estimation from Micro-Video
(Association for Computing Machinery, Inc, 2016-10-15) Jianglong Zhang; Liqiang Nie; Xiang Wang; Xiangnan He; Xianglin Huang; Tat-Seng Chua; DEPARTMENT OF COMPUTER SCIENCE
According to our statistics on over 2 million micro-videos, only 1.22% of them are associated with venue information, which greatly hinders the location-oriented applications and personalized services. To alleviate this problem, we aim to label the bite-sized video clips with venue categories. It is, however, nontrivial due to three reasons: 1) no available benchmark dataset; 2) insufficient information, low quality, and information loss; and 3) complex relatedness among venue categories. Towards this end, we propose a scheme comprising of two components. In particular, we first crawl a representative set of micro-videos from Vine and extract a rich set of features from textual, visual and acoustic modalities. We then, in the second component, build a tree-guided multi-task multi-modal learning model to estimate the venue category for each unseen micro-video. This model is able to jointly learn a common space from multi-modalities and leverage the predefined Foursquare hierarchical structure to regularize the relatedness among venue categories. Extensive experiments have well-validated our model. As a side research contribution, we have released our data, codes and involved parameters. © 2016 ACM.
Multi-Source Domain Adaptation for Visual Sentiment Classification
(2020-02-07) Chuang Lin; Sicheng Zhao; Lei Meng; Tat-Seng Chua; DEPARTMENT OF COMPUTER SCIENCE
Existing domain adaptation methods on visual sentiment classification typically are investigated under the singlesource scenario, where the knowledge learned from a source domain of sufficient labeled data is transferred to the target domain of loosely labeled or unlabeled data. However, in practice, data from a single source domain usually have a limited volume and can hardly cover the characteristics of the target domain. In this paper, we propose a novel multi-source domain adaptation (MDA) method, termed Multi-source Sentiment Generative Adversarial Network (MSGAN), for visual sentiment classification. To handle data from multiple source domains, it learns to find a unified sentiment latent space where data from both the source and target domains share a similar distribution. This is achieved via cycle consistent adversarial learning in an end-to-end manner. Extensive experiments conducted on four benchmark datasets demonstrate that MSGAN significantly outperforms the state-of-theart MDA approaches for visual sentiment classification.
I Know What You Want to Express: Sentence Element Inference by Incorporating External Knowledge Base
(IEEE Computer Society, 2016-10-27) Xiaochi Wei; Heyan Huang; Liqiang Nie; Hanwang Zhang; Xian-Ling Mao; Tat-Seng Chua; DEPARTMENT OF COMPUTER SCIENCE
Sentence auto-completion is an important feature that saves users many keystrokes in typing the entire sentence by providing suggestions as they type. Despite its value, the existing sentence auto-completion methods, such as query completion models, can hardly be applied to solving the object completion problem in sentences with the form of (subject, verb, object), due to the complex natural language description and the data deficiency problem. Towards this goal, we treat an SVO sentence as a three-element triple (subject, sentence pattern, object), and cast the sentence object completion problem as an element inference problem. These elements in all triples are encoded into a unified low-dimensional embedding space by our proposed TRANSFER model, which leverages the external knowledge base to strengthen the representation learning performance. With such representations, we can provide reliable candidates for the desired missing element by a linear model. Extensive experiments on a real-world dataset have well-validated our model. Meanwhile, we have successfully applied our proposed model to factoid question answering systems for answer candidate selection, which further demonstrates the applicability of the TRANSFER model. © 2016 IEEE.
Conversational Recommendation: Formulation, Methods, and Evaluation
(Association for Computing Machinery, Inc, 2020-07-25) Wenqiang Lei; Xiangnan He; Maarten de Rijke; Tat-Seng Chua; DEPARTMENT OF COMPUTER SCIENCE
Recommender systems have demonstrated great success in information seeking. However, traditional recommender systems work in a static way, estimating user preferences on items from past interaction history. This prevents recommender systems from capturing dynamic and fine-grained preferences of users. Conversational recommender systems bring a revolution to existing recommender systems. They are able to communicate with users through natural languages during which they can explicitly ask whether a user likes an attribute or not. With the preferred attributes, a recommender system can conduct more accurate and personalized recommendations. Therefore, while they are still a relatively new topic, conversational recommender systems attract great research attention. We identify four emerging directions: (1) exploration and exploitation trade-off in the cold-start recommendation setting; (2) attribute-centric conversational recommendation; (3) strategy-focused conversational recommendation; and (4) dialogue understanding and response generation. This tutorial covers these four directions, providing a review of existing approaches and progress on the topic. By presenting the emerging and promising topic of conversational recommender systems, we aim to provide take-aways to practitioners to build their own systems. We also want to stimulate more ideas and discussions with audiences on core problems of this topic such as task formalization, dataset collection, algorithm development, and evaluation, with the ambition of facilitating the development of conversational recommender systems. © 2020 Owner/Author.
Video Visual Relation Detection
(Association for Computing Machinery, Inc, 2017-10-23) Xindi Shang; Tongwei Ren; Jingfan Guo; Hanwang Zhang; Tat-Seng Chua; DEPARTMENT OF COMPUTER SCIENCE
As a bridge to connect vision and language, visual relations between objects in the form of relation triplet , such as "person-touch-dog" and "cat-above-sofa", provide a more comprehensive visual content understanding beyond objects. In this paper, we propose a novel vision task named Video Visual Relation Detection (VidVRD) to perform visual relation detection in videos instead of still images (ImgVRD). As compared to still images, videos provide a more natural set of features for detecting visual relations, such as the dynamic relations like "A-follow-B" and "A-towards-B", and temporally changing relations like "A-chase-B" followed by "A-hold-B". However, VidVRD is technically more challenging than ImgVRD due to the difficulties in accurate object tracking and diverse relation appearances in video domain. To this end, we propose a VidVRD method, which consists of object tracklet proposal, short-term relation prediction and greedy relational association. Moreover, we contribute the first dataset for VidVRD evaluation, which contains 1,000 videos with manually labeled visual relations, to validate our proposed method. On this dataset, our method achieves the best performance in comparison with the state-of-the-art baselines. © 2017 ACM.
Embedding Factorization Models for Jointly Recommending Items and User Generated Lists
(Association for Computing Machinery, Inc, 2017-08-07) Da Cao; Liqiang Nie; Xiangnan He; Xiaochi Wei; Shuizhi Zhu; Shunxiang Wu; Tat-Seng Chua; DEPARTMENT OF COMPUTER SCIENCE; ELECTRICAL AND COMPUTER ENGINEERING
Existing recommender algorithms mainly focused on recommending individual items by utilizing user-item interactions. However, little attention has been paid to recommend user generated lists (e.g., playlists and booklists). On one hand, user generated lists contain rich signal about item co-occurrence, as items within a list are usually gathered based on a specific theme. On the other hand, a user's preference over a list also indicate her preference over items within the list. We believe that 1) if the rich relevance signal within user generated lists can be properly leveraged, an enhanced recommendation for individual items can be provided, and 2) if user-item and user-list interactions are properly utilized, and the relationship between a list and its contained items is discovered, the performance of user-item and user-list recommendations can be mutually reinforced. Towards this end, we devise embedding factorization models, which extend traditional factorization method by incorporating item-item (item-item-list) co-occurrence with embedding-based algorithms. Specifically, we employ factorization model to capture users' preferences over items and lists, and utilize embeddingbased models to discover the co-occurrence information among items and lists. The gap between the two types of models is bridged by sharing items' latent factors. Remarkably, our proposed framework is capable of solving the new-item cold-start problem, where items have never been consumed by users but exist in user generated lists. Overall performance comparisons and micro-level analyses demonstrate the promising performance of our proposed approaches. © 2017 Copyright held by the owner/author(s).
Learning and Reasoning on Graph for Recommendations
(2020-02-03) Xiang Wang; Xiangnan He; Tat-Seng Chua; DEPARTMENT OF COMPUTER SCIENCE
Recommendation methods construct predictive models to estimate the likelihood of a user-item interaction. Previous models largely follow a general supervised learning paradigm ? treating each interaction as a separate data instance and building a supervised learning model upon the information isolated island. Such paradigm, however, overlook relations among data instances, hence easily resulting in suboptimal performance especially for sparse scenarios. Moreover, due to the black-box nature, most models hardly exhibit the reasons behind a prediction, making the recommendation process opaque to understand. In this tutorial, we revisit the recommendation problem from the perspective of graph learning and reasoning. Common data sources for recommendation can be organized into graphs, such as bipartite user-item interaction graphs, social networks, item knowledge graphs (heterogeneous graphs), among others. Such a graph-based organization connects the isolated data instances and exhibits relationships among instances as high-order connectivities, thereby encoding meaningful patterns for collaborative filtering, content-based filtering, social influence modeling, and knowledge-aware reasoning. Inspired by this, prior studies have incorporated graph analysis (e.g., random walk) and graph learning (e.g., network embedding) into recommender models and achieved great success. Together with the recent success of graph neural networks (GNNs), graph-based models have exhibited the potential to be the technologies for next-generation recommender systems. This tutorial provides a review on graph-based learning methods for recommendation, with special focus on recent developments of GNNs. By introducing this emerging and promising topic in this tutorial, we expect the audience to get deep understanding and accurate insight on the spaces, stimulate more ideas and discussions, and promote developments of technologies. ? 2020 Copyright held by the owner/author(s).
Predicting Personalized Emotion Perceptions of Social Images
(Association for Computing Machinery, Inc, 2016-10-15) Sicheng Zhao; Hongxun Yao; Yue Gao; Rongrong Ji,Wenlong Xie; Xiaolei Jiang; Tat-Seng Chua; DEPARTMENT OF COMPUTER SCIENCE
Images can convey rich semantics and induce various emotions to viewers. Most existing works on affective image analysis focused on predicting the dominant emotions for the majority of viewers. However, such dominant emotion is often insufficient in real-world applications, as the emotions that are induced by an image are highly subjective and different with respect to different viewers. In this paper, we propose to predict the personalized emotion perceptions of images for each individual viewer. Different types of factors that may affect personalized image emotion perceptions, including visual content, social context, temporal evolution, and location influence, are jointly investigated. Rolling multi-task hypergraph learning is presented to consistently combine these factors and a learning algorithm is designed for automatic optimization. For evaluation, we set up a large scale image emotion dataset from Flickr, named Image-Emotion-Social-Net, on both dimensional and categorical emotion representations with over 1 million images and about 8,000 users. Experiments conducted on this dataset demonstrate that the proposed method can achieve significant performance gains on personalized emotion classification, as compared to several state-oftheart approaches. © 2016 ACM.
Adversarial Training Towards Robust Multimedia Recommender System
(2019-01-18) Jinhui Tang; Xiaoyu Du; Xiangnan He; Fajie Yuan; Qi Tian; Tat-Seng Chua; DEPARTMENT OF COMPUTER SCIENCE

Staff Publications

Permanent URI for this collection

Browse

Filters

Author

Subject

Date

Has files

Item Type

Types

Funding agency

Department

Settings

Sort By

Results per page

Citations

Search results