Paper Digest: ACM Multimedia 2021 Papers & Highlights
Interested users can choose to read all MM-2021 papers in our digest console, which supports more features.
To search for papers presented at MM-2021 on a specific topic, please make use of the search by venue (MM-2021) service. To summarize the latest research published at MM-2021 on a specific topic, you can utilize the review by venue (MM-2021) service. To synthesizes the findings from MM 2021 into comprehensive reports, give a try to MM-2021 Research. If you are interested in browsing papers by author, we have a comprehensive list of all MM-2021 authors & their papers.
This curated list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that gets you the personalized and comprehensive updates on the latest research in your field. It also empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Experience the full potential of our services today!
TABLE 1: Paper Digest: ACM Multimedia 2021 Papers & Highlights
| Paper | Author(s) | |
|---|---|---|
| 1 | Multi-modal Representation Learning for Video Advertisement Content Structuring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a multi-modal encoder to learn multi-modal representation from video advertisements by interacting between video-audio and text. |
Daya Guo; Zhaoyang Zeng; |
| 2 | Joint Implicit Image Function for Guided Depth Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the recent progress in implicit neural representation, we propose to formulate the guided super-resolution as a neural implicit image interpolation problem, where we take the form of a general image interpolation but use a novel Joint Implicit Image Function (JIIF) representation to learn both the interpolation weights and values. |
Jiaxiang Tang; Xiaokang Chen; Gang Zeng; |
| 3 | Long-tailed Distribution Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we formulate Long-tailed recognition as Domain Adaption (LDA), by modeling the long-tailed distribution as an unbalanced domain and the general distribution as a balanced domain. |
Zhiliang Peng; Wei Huang; Zonghao Guo; Xiaosong Zhang; Jianbin Jiao; Qixiang Ye; |
| 4 | Group-based Distinctive Image Captioning with Memory Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we improve the distinctiveness of image captions using a Group-based Distinctive Captioning Model (GdisCap), which compares each image with other images in one similar group and highlights the uniqueness of each image. |
Jiuniu Wang; Wenjia Xu; Qingzhong Wang; Antoni B. Chan; |
| 5 | Towards Bridging Video and Language By Caption Generation and Sentence Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Caption generation and sentence localization are two representative tasks for connecting video and language, and my research is focused on these two tasks. In this extended abstract, I present approaches for tackling each of these tasks by exploiting fine-grained information in videos, together with ideas about how these two tasks can be connected. |
Shaoxiang Chen; |
| 6 | PyTorchVideo: A Deep Learning Library for Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing. |
Haoqi Fan; Tullie Murrell; Heng Wang; Kalyan Vasudev Alwala; Yanghao Li; Yilei Li; Bo Xiong; Nikhila Ravi; Meng Li; Haichuan Yang; Jitendra Malik; Ross Girshick; Matt Feiszli; Aaron Adcock; Wan-Yen Lo; Christoph Feichtenhofer; |
| 7 | Hybrid Reasoning Network for Video-based Commonsense Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Hybrid Reasoning Network (HybridNet) to endow the neural networks with the capability of semantic-level reasoning and word-level reasoning. |
Weijiang Yu; Jian Liang; Lei Ji; Lu Li; Yuejian Fang; Nong Xiao; Nan Duan; |
| 8 | Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle the difficulty in unseen singer modeling, we propose Multi-Singer, a fast multi-singer vocoder with generative adversarial networks. |
Rongjie Huang; Feiyang Chen; Yi Ren; Jinglin Liu; Chenye Cui; Zhou Zhao; |
| 9 | One-Stage Visual Grounding Via Semantic-Aware Feature Filter Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods fuse the textual feature and visual feature map by simply concatenation, which ignores the textual semantics and limits these models’ ability in cross-modal understanding. To overcome this weakness, we propose a semantic-aware framework that utilizes both queries’ structured knowledge and context-sensitive representations to filter the visual feature maps to localize the referents more accurately. |
Jiabo Ye; Xin Lin; Liang He; Dingbang Li; Qin Chen; |
| 10 | Co-Transport for Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a result, we propose CO-transport for class Incremental Learning (COIL), which learns to relate across incremental tasks with the class-wise semantic relationship. |
Da-Wei Zhou; Han-Jia Ye; De-Chuan Zhan; |
| 11 | Towards Robust Deep Hiding Under Non-Differentiable Distortions for Practical Blind Watermarking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Deep learning has been widely used in data hiding, for which inserting an attack simulation layer (ASL) after the watermarked image has been widely recognized as the most effective approach for improving the pipeline robustness against distortions. Despite its wide usage, the gain of enhanced robustness from ASL is usually interpreted through the lens of augmentation, while our work explores this gain from a new perspective by disentangling the forward and backward propagation of such ASL. |
Chaoning Zhang; Adil Karjauv; Philipp Benz; In So Kweon; |
| 12 | Once and for All: Self-supervised Multi-modal Co-training on One-billion Videos at Alibaba Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to develop a once and for all pretraining technique for diverse modalities and downstream tasks. |
Lianghua Huang; Yu Liu; Xiangzeng Zhou; Ansheng You; Ming Li; Bin Wang; Yingya Zhang; Pan Pan; Xu Yinghui; |
| 13 | MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present MMOCR—an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction. |
Zhanghui Kuang; Hongbin Sun; Zhizhong Li; Xiaoyu Yue; Tsui Hin Lin; Jianyong Chen; Huaqiang Wei; Yiqin Zhu; Tong Gao; Wenwei Zhang; Kai Chen; Wayne Zhang; Dahua Lin; |
| 14 | Enhanced Invertible Encoding for Learned Image Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, few efforts are devoted to structuring a better transformation between the image space and the latent feature space. In this paper, instead of employing previous autoencoder style networks to build this transformation, we propose an enhanced Invertible Encoding Network with invertible neural networks (INNs) to largely mitigate the information loss problem for better compression. |
Yueqi Xie; Ka Leong Cheng; Qifeng Chen; |
| 15 | TransFusion: Multi-Modal Fusion for Video Tag Inference Via Translation-based Knowledge Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This, however, does not apply to inferring generic tags or taxonomy that are less relevant to video contents, such as video originality or its broader category, which are important in practice. In this paper, we claim that these generic tags can be modeled through the semantic relations between videos and tags, and can be utilized simultaneously with the multi-modal features to achieve better video tagging. |
Di Jin; Zhongang Qi; Yingmin Luo; Ying Shan; |
| 16 | Improving Robustness and Accuracy Via Relative Information Encoding in 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the great progress achieved by these approaches, they are not robust to global motion, and lack the ability to accurately predict local motion with a small movement range. To alleviate these two problems, we propose a relative information encoding method that yields positional and temporal enhanced representations. |
Wenkang Shan; Haopeng Lu; Shanshe Wang; Xinfeng Zhang; Wen Gao; |
| 17 | STST: Spatial-Temporal Specialized Transformer for Skeleton-based Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to occlusion/sensor/raw video, etc., there are noises on both temporal and spatial dimensions in the extracted skeleton data reducing the recognition capabilities of models. To adapt to this imperfect information condition, we propose a multi-task self-supervised learning method by providing confusing samples in different situations to improve the robustness of our model. |
Yuhan Zhang; Bo Wu; Wen Li; Lixin Duan; Chuang Gan; |
| 18 | Counterfactual Debiasing Inference for Compositional Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For that, in this work we propose a novel learning framework called Counterfactual Debiasing Network (CDN) to improve the model generalization ability by removing the interference introduced by visual appearances of objects/subjects. |
Pengzhan Sun; Bo Wu; Xunsong Li; Wen Li; Lixin Duan; Chuang Gan; |
| 19 | Overview of Tencent Multi-modal Ads Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This task will advance the foundation of comprehensive ads video understanding, which has a significant impact on many applications in ads, such as video recommendation and user behavior analysis. This paper presents an overview of the video structuring task in our grand challenge, including the background of ads videos, an elaborate description of this task, our proposed dataset, the evaluation protocol, and our baseline model. |
Zhenzhi Wang; Zhimin Li; Liyu Wu; Jiangfeng Xiong; Qinglin Lu; |
| 20 | MMFashion: An Open-Source Toolbox for Visual Fashion Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This toolbox and the benchmark could serve the flourishing research community by providing a flexible toolkit to deploy existing models and develop new ideas and approaches. We welcome all contributions to this still-growing efforts towards open science: https://github.com/open-mmlab/mmfashion. |
Xin Liu; Jiancheng Li; Jiaqi Wang; Ziwei Liu; |
| 21 | Joint Optimization in Edge-Cloud Continuum for Federated Unsupervised Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present FedUReID, a federated unsupervised person ReID system to learn person ReID models without any labels while preserving privacy. |
Weiming Zhuang; Yonggang Wen; Shuai Zhang; |
| 22 | Privacy-Preserving Portrait Matting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We systematically evaluate both trimap-free and trimap-based matting methods on P3M-10k and find that existing matting methods show different generalization capabilities when following the Privacy-Preserving Training (PPT) setting, i.e., training on face-blurred images and testing on arbitrary images. To devise a better trimap-free portrait matting model, we propose P3M-Net, which leverages the power of a unified framework for both semantic perception and detail matting, and specifically emphasizes the interaction between them and the encoder to facilitate the matting process. |
Jizhizi Li; Sihan Ma; Jing Zhang; Dacheng Tao; |
| 23 | R-GAN: Exploring Human-like Way for Reasonable Text-to-Image Synthesis Via Generative Adversarial Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main challenges lie in the ambiguous semantic of a complex description and the intricate scene of an image with various objects, different positional relationship and diverse appearances. To address these challenges, we propose R-GAN, which can generate reasonable images according to the given text in a human-like way. |
Yanyuan Qiao; Qi Chen; Chaorui Deng; Ning Ding; Yuankai Qi; Mingkui Tan; Xincheng Ren; Qi Wu; |
| 24 | A Large-Scale Benchmark for Food Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In addition, we propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge. In experiments, we use three popular semantic segmentation methods (i.e., Dilated Convolution based[20], Feature Pyramid based[25], and Vision Transformer based[60] ) as baselines, and evaluate them as well as ReLeM on our new datasets. |
Xiongwei Wu; Xin Fu; Ying Liu; Ee-Peng Lim; Steven C.H. Hoi; Qianru Sun; |
| 25 | Fully Quantized Image Super-Resolution Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy. |
Hu Wang; Peng Chen; Bohan Zhuang; Chunhua Shen; |
| 26 | Video Background Music Generation with Controllable Music Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we address the task of video background music generation. |
Shangzhe Di; Zeren Jiang; Si Liu; Zhaokai Wang; Leyan Zhu; Zexin He; Hongming Liu; Shuicheng Yan; |
| 27 | Dynamic Momentum Adaptation for Zero-Shot Cross-Domain Crowd Counting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a novel Crowd Counting framework built upon an external Momentum Template, termed C2MoT, which enables the encoding of domain specific information via an external template representation. |
Qiangqiang Wu; Jia Wan; Antoni B. Chan; |
| 28 | Non-Linear Fusion for Self-Paced Multi-View Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, inspired by the effectiveness of non-linear combination in instance learning and the auto-weighted approaches, we propose Non-Linear Fusion for Self-Paced Multi-View Clustering (NSMVC), which is totally different from the the conventional linear-weighting algorithms. |
Zongmo Huang; Yazhou Ren; Xiaorong Pu; Lifang He; |
| 29 | EVRNet: Efficient Video Restoration on Edge Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To re- store videos on recipient edge devices in real-time, we introduce an efficient video restoration network, EVRNet. |
Sachin Mehta; Amit Kumar; Fitsum Reda; Varun Nasery; Vikram Mulukutla; Rakesh Ranjan; Vikas Chandra; |
| 30 | Multi-Perspective Video Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work targets at the problems of comprehensive video captioning and the generation of multiple descriptions from different perspectives, termed asMulti-Perspective Video Captioning. |
Yi Bin; Xindi Shang; Bo Peng; Yujuan Ding; Tat-Seng Chua; |
| 31 | Question-controlled Text-aware Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel Geometry and Question Aware Model (GQAM). |
Anwen Hu; Shizhe Chen; Qin Jin; |
| 32 | SM-SGE: A Self-Supervised Multi-Scale Skeleton Graph Encoding Framework for Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we for the first time propose a Self-supervised Multi-scale Skeleton Graph Encoding (SM-SGE) framework that comprehensively models human body, component relations, and skeleton dynamics from unlabeled skeleton graphs of various scales to learn an effective skeleton representation for person Re-ID. |
Haocong Rao; Xiping Hu; Jun Cheng; Bin Hu; |
| 33 | Learning Kinematic Formulas from Multiple View Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given a set of multiple view videos, which records the motion trajectory of an object, we propose to find out the objects’ kinematic formulas with neural rendering techniques. |
Liangchen Song; Sheng Liu; Celong Liu; Zhong Li; Yuqi Ding; Yi Xu; Junsong Yuan; |
| 34 | Handling Difficult Labels for Multi-label Image Classification Via Uncertainty Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To handle difficult labels of multi-label image classification, we propose to calibrate the model, which not only predicts the labels but also estimates the uncertainty of the prediction. |
Liangchen Song; Jialian Wu; Ming Yang; Qian Zhang; Yuan Li; Junsong Yuan; |
| 35 | A Multimodal Framework for Video Ads Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Taking the 2021 TAAC competition as an opportunity, we developed a multimodal system to improve the ability of structured analysis of advertising video content. |
Zejia Weng; Lingchen Meng; Rui Wang; Zuxuan Wu; Yu-Gang Jiang; |
| 36 | Unifying Multimodal Transformer for Bi-directional Image and Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a unified image-and-text generative framework based on a single multimodal model to jointly study the bi-directional tasks. |
Yupan Huang; Hongwei Xue; Bei Liu; Yutong Lu; |
| 37 | A Picture Is Worth A Thousand Words: A Unified System for Diverse Captions and Rich Images Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A creative image-and-text generative AI system mimics humans’ extraordinary abilities to provide users with diverse and comprehensive caption suggestions, as well as rich image creations. In this work, we demonstrate such an AI creation system to produce both diverse captions and rich images. |
Yupan Huang; Bei Liu; Jianlong Fu; Yutong Lu; |
| 38 | One-Stage Incomplete Multi-view Clustering Via Late Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is widely observed that there are incomplete views for partial samples in practice. In this paper, we propose One-Stage Late Fusion Incomplete Multi-view Clustering (OS-LF-IMVC) to address this issue. |
Yi Zhang; Xinwang Liu; Siwei Wang; Jiyuan Liu; Sisi Dai; En Zhu; |
| 39 | RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Numerous previous works have developed various structures for bidirectional feature fusion, all of which are shown to improve the detection performance effectively. We observe that these complicated network structures require feature pyramids to be stacked in a fixed order, which introduces longer pipelines and reduces the inference speed. |
Zhuofan Zong; Qianggang Cao; Biao Leng; |
| 40 | Towards Fast and High-Quality Sign Language Production Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, by generating target pose frames conditioned on the previously generated ones, these models are prone to bringing issues such as error accumulation and high inference latency. In this paper, we argue that such issues are mainly caused by adopting autoregressive manner. |
Wencan Huang; Wenwen Pan; Zhou Zhao; Qi Tian; |
| 41 | ROECS: A Robust Semi-direct Pipeline Towards Online Extrinsics Correction of The Surround-view System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As an attempt to fill this research gap to some extent, in this work, we present a novel extrinsics correction pipeline designed specially for the SVS, namely ROECS (Robust Online Extrinsics Correction of the Surround-view system). |
Tianjun Zhang; Nlong Zhao; Ying Shen; Xuan Shao; Lin Zhang; Yicong Zhou; |
| 42 | TSA-Net: Tube Self-Attention Network for Action Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Most existing approaches usually tackle this problem by directly migrating the model from action recognition tasks, which ignores the intrinsic differences within the feature map such as foreground and background information. To address this issue, we propose a Tube Self-Attention Network (TSA-Net) for action quality assessment (AQA). |
Shunli Wang; Dingkang Yang; Peng Zhai; Chixiao Chen; Lihua Zhang; |
| 43 | Better Learning Shot Boundary Detection Via Multi-task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To deal with the variations, we propose a multi-task architecture called Transnet++. |
Haoxin Zhang; Zhimin Li; Qinglin Lu; |
| 44 | Deep Reasoning Network for Few-shot Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Dynamic Reasoning Network (DRNet) to adaptively generate the parameters of predicting layers and infer the segmentation mask for each unseen category. |
Yunzhi Zhuge; Chunhua Shen; |
| 45 | Scene Text Image Super-Resolution Via Parallelly Contextual Attention Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Parallelly Contextual Attention Network (PCAN), which effectively learns sequence-dependent features and focuses more on high-frequency information of the reconstruction in text images. |
Cairong Zhao; Shuyang Feng; Brian Nlong Zhao; Zhijun Ding; Jun Wu; Fumin Shen; Heng Tao Shen; |
| 46 | VidVRD 2021: The Third Grand Challenge on Video Relation Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this task is to promote research on developing video semantic understanding model, so as to perform complex inferences and mining of visual knowledge in videos. In this paper, we make a comprehensive and detailed introduction of this task, conclude the proposed algorithms in the last few years, and propose future direction for research in this task. |
Wei Ji; Yicong Li; Meng Wei; Xindi Shang; Junbin Xiao; Tongwei Ren; Tat-Seng Chua; |
| 47 | Parametric Reshaping of Portraits in Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present a robust and easy-to-use parametric method to reshape the portrait in a video to produce smooth retouched results. |
Xiangjun Tang; WenXin Sun; Yong-Liang Yang; Xiaogang Jin; |
| 48 | Interventional Video Relation Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, the models’ prediction will be easily biased towards the popular head predicates (e.g., next-to and in-front-of), thus leading to poor generalizability.To fill the research gap, this paper proposes an Interventional Video Relation Detection (IVRD) approach that aims to improve not only the accuracy but also the robustness of the model prediction. |
Yicong Li; Xun Yang; Xindi Shang; Tat-Seng Chua; |
| 49 | Knowledge Perceived Multi-modal Pretraining in E-commerce Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address multi-modal pretraining of product data in the field of E-commerce. |
Yushan Zhu; Huaixiao Zhao; Wen Zhang; Ganqiang Ye; Hui Chen; Ningyu Zhang; Huajun Chen; |
| 50 | Conditional Directed Graph Convolution for 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to represent the human skeleton as a directed graph with the joints as nodes and bones as edges that are directed from parent joints to child joints. |
Wenbo Hu; Changgong Zhang; Fangneng Zhan; Lei Zhang; Tien-Tsin Wong; |
| 51 | Attention-driven Graph Clustering Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel deep clustering method named Attention-driven Graph Clustering Network (AGCN). |
Zhihao Peng; Hui Liu; Yuheng Jia; Junhui Hou; |
| 52 | Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we delve into this topic and empirically find that direct feature distribution alignment on the CNN backbone only brings limited improvements, as it does not guarantee domain-invariant sequence features in the transformer for prediction. To address this issue, we propose a novel Sequence Feature Alignment (SFA) method that is specially designed for the adaptation of detection transformers. |
Wen Wang; Yang Cao; Jing Zhang; Fengxiang He; Zheng-Jun Zha; Yonggang Wen; Dacheng Tao; |
| 53 | Actions Speak Louder Than Listening: Evaluating Music Style Transfer Based on Editing Experience Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an editing test to evaluate users’ editing experience of music generation models in a systematic way. |
Wei-Tsung Lu; Meng-Hsuan Wu; Yuh-Ming Chiu; Li Su; |
| 54 | Semi-Autoregressive Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Towards that end, we propose a novel two-stage framework, referred to as Semi-Autoregressive Image Captioning (SAIC), to make a better trade-off between performance and speed. |
Xu Yan; Zhengcong Fei; Zekang Li; Shuhui Wang; Qingming Huang; Qi Tian; |
| 55 | Learning Hierarchical Embedding for Video Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address video instance segmentation using a new generative model that learns effective representations of the target and background appearance. |
Zheyun Qin; Xiankai Lu; Xiushan Nie; Xiantong Zhen; Yilong Yin; |
| 56 | Towards Realistic Visual Dubbing with Heterogeneous Sources Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, it may be intractable to collect the perfect homologous data in some cases, for example, audio-corrupted or picture-blurry videos. To explore this kind of data and support high-fidelity few-shot visual dubbing, in this paper, we novelly propose a simple yet efficient two-stage framework with a higher flexibility of mining heterogeneous data. |
Tianyi Xie; Liucheng Liao; Cheng Bi; Benlai Tang; Xiang Yin; Jianfei Yang; Mingjie Wang; Jiali Yao; Yang Zhang; Zejun Ma; |
| 57 | Multi-Source Fusion and Automatic Predictor Selection for Zero-Shot Video Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel multi-source fusion network for zero-shot video object segmentation. |
Xiaoqi Zhao; Youwei Pang; Jiaxing Yang; Lihe Zhang; Huchuan Lu; |
| 58 | Former-DFER: Dynamic Facial Expression Recognition Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a dynamic facial expression recognition transformer (Former-DFER) for the in-the-wild scenario. |
Zengqun Zhao; Qingshan Liu; |
| 59 | SFE-Net: EEG-based Emotion Recognition with Symmetrical Spatial Feature Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a spatial folding ensemble network (SFE-Net) is presented for EEG feature extraction and emotion recognition. |
Xiangwen Deng; Junlin Zhu; Shangming Yang; |
| 60 | Cross-Camera Feature Prediction for Intra-Camera Supervised Person Re-identification Across Distant Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study intra-camera supervised person re-identification across distant scenes (ICS-DS Re-ID), which uses cross-camera unpaired data with intra-camera identity labels for training. |
Wenhang Ge; Chunyan Pan; Ancong Wu; Hongwei Zheng; Wei-Shi Zheng; |
| 61 | A Question Answering System for Unstructured Table Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Table parsing from images is nontrivial since it is closely related to not only NLP but also computer vision (CV) to parse the tabular structure from an image. In this demo, we present a question answering system for unstructured table images. |
Wenyuan Xue; Siqi Cai; Wen Wang; Qingyong Li; Baosheng Yu; Yibing Zhan; Dacheng Tao; |
| 62 | Q-Art Code: Generating Scanning-robust Art-style QR Codes By Deformable Convolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose StyleCode-Net, a method to generate novel art-style QR codes which can better match the entire style of their carriers to improve the visual quality. |
Hao Su; Jianwei Niu; Xuefeng Liu; Qingfeng Li; Ji Wan; Mingliang Xu; |
| 63 | PFFN: Progressive Feature Fusion Network for Lightweight Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, convolutional neural network (CNN) has been the core ingredient of modern models, triggering the surge of deep learning in super-resolution (SR). Despite the great success of these CNN-based methods which are prone to be deeper and heavier, it is impracticable to directly apply these methods for some low-budget devices due to the superfluous computational overhead. |
Dongyang Zhang; Changyu Li; Ning Xie; Guoqing Wang; Jie Shao; |
| 64 | CLIP4Caption: CLIP for Video Caption Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing video captioning models lack adequate visual representation due to the neglect of the existence of gaps between videos and texts. To bridge this gap, in this paper, we propose a CLIP4Caption framework that improves video captioning based on a CLIP-enhanced video-text matching network (VTM). |
Mingkang Tang; Zhanyu Wang; Zhenhua LIU; Fengyun Rao; Dian Li; Xiu Li; |
| 65 | Hierarchical View Predictor: Unsupervised 3D Global Feature Learning Through Hierarchical Prediction Among Unordered Views Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a view-based deep learning model called Hierarchical View Predictor (HVP) to learn 3D shape features from unordered views in an unsupervised manner. |
Zhizhong Han; Xiyang Wang; Yu-Shen Liu; Matthias Zwicker; |
| 66 | Recovering The Unbiased Scene Graphs from The Biased Ones Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we show that, due to the missing labels, SGG can be viewed as a Learning from Positive and Unlabeled data (PU learning) problem, where the reporting bias can be removed by recovering the unbiased probabilities from the biased ones by utilizing label frequencies, i.e., the per-class fraction of labeled, positive examples in all the positive examples. |
Meng-Jiun Chiou; Henghui Ding; Hanshu Yan; Changhu Wang; Roger Zimmermann; Jiashi Feng; |
| 67 | Towards A Unified Middle Modality Learning for Visible-Infrared Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a non-linear middle modality generator (MMG), which helps to reduce the modality discrepancy. |
Yukang Zhang; Yan Yan; Yang Lu; Hanzi Wang; |
| 68 | Information-Growth Attention Network for Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a concise but effective Information-Growth Attention Network (IGAN) that shows the incremental information is beneficial for SR. |
Zhuangzi Li; Ge Li; Thomas Li; Shan Liu; Wei Gao; |
| 69 | CDP: Towards Optimal Filter Pruning Via Class-wise Discriminative Power Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Alternatively, we propose a novel filter pruning strategy via class-wise discriminative power (CDP). |
Tianshuo Xu; Yuhang Wu; Xiawu Zheng; Teng Xi; Gang Zhang; Errui Ding; Fei Chao; Rongrong Ji; |
| 70 | Video Relation Detection Via Tracklet Based Visual Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we apply the state-of-the-art video object tracklet detection pipeline MEGA[7] and deepSORT [27] to generate tracklet proposals. |
Kaifeng Gao; Long Chen; Yifeng Huang; Jun Xiao; |
| 71 | Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we argue that the features extracted from the pre-trained extractors,e.g., I3D, which are trained for trimmed video action classification, but not specific for WS-TAL task, leading to inevitable redundancy and sub-optimization. |
Fa-Ting Hong; Jia-Chang Feng; Dan Xu; Ying Shan; Wei-Shi Zheng; |
| 72 | Video Visual Relation Detection Via Iterative Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing VidVRD approaches classify these three relation components in either independent or cascaded manner, thus fail to fully exploit the inter-dependency among them. In order to utilize this inter-dependency in tackling the challenges of visual relation recognition in videos, we propose a novel iterative relation inference approach for VidVRD. |
Xindi Shang; Yicong Li; Junbin Xiao; Wei Ji; Tat-Seng Chua; |
| 73 | Contrastive Learning for Cold-Start Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, the representation learning is theoretically lower-bounded by the integration of two terms: mutual information between collaborative embeddings of users and items, and mutual information between collaborative embeddings and feature representations of items. To model such a learning process, we devise a new objective function founded upon contrastive learning and develop a simple yet efficient Contrastive Learning-based Cold-start Recommendation framework (CLCRec). |
Yinwei Wei; Xiang Wang; Qi Li; Liqiang Nie; Yan Li; Xuanping Li; Tat-Seng Chua; |
| 74 | ASFM-Net: Asymmetrical Siamese Feature Matching Network for Point Completion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We tackle the problem of object completion from point clouds and propose a novel point cloud completion network employing an Asymmetrical Siamese Feature Matching strategy, termed as ASFM-Net. |
Yaqi Xia; Yan Xia; Wei Li; Rui Song; Kailang Cao; Uwe Stilla; |
| 75 | Weakly-Supervised Temporal Action Localization Via Cross-Stream Collaborative Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches simply resort to either concatenation or weighted sum to learn how to take advantages of these two modalities for accurate action localization, which ignore the substantial variance between such two modalities. In this paper, we present Cross-Stream Collaborative Learning (CSCL) to address these issues. |
Yuan Ji; Xu Jia; Huchuan Lu; Xiang Ruan; |
| 76 | Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action Unit Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, many aspects with regard to some unique properties of AUs, such as the regional and relational characteristics, are not sufficiently explored in previous works. Motivated by this, we take the AU properties into consideration and propose two auxiliary AU related tasks to bridge the gap between limited annotations and the model performance in a self-supervised manner via the unlabeled data. |
Jingwei Yan; Jingjing Wang; Qiang Li; Chunmao Wang; Shiliang Pu; |
| 77 | Triangle-Reward Reinforcement Learning: A Visual-Linguistic Semantic Alignment for Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we argue that the XE objective is not sensitive to visual-linguistic alignment, which cannot discriminately penalize the semantic inconsistency and shrink the context gap. To solve these problems, we propose the Triangle-Reward Reinforcement Learning (TRRL) method. |
Weizhi Nie; Jiesi Li; Ning Xu; An-An Liu; Xuanya Li; Yongdong Zhang; |
| 78 | Constrained Graphic Layout Generation Via Latent Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, a title text almost always appears on top of other elements in a document. In this work, we generate graphic layouts that can flexibly incorporate such design semantics, either specified implicitly or explicitly by a user. |
Kotaro Kikuchi; Edgar Simo-Serra; Mayu Otani; Kota Yamaguchi; |
| 79 | Demystifying Commercial Video Conferencing Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we carry out an in-depth measurement and modeling study on the rate control algorithms used in six popular commercial video conferencing applications. |
Insoo Lee; Jinsung Lee; Kyunghan Lee; Dirk Grunwald; Sangtae Ha; |
| 80 | HAT: Hierarchical Aggregation Transformers for Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Comprehensive experiments on four large-scale Re-ID benchmarks demonstrate that our method shows better results than several state-of-the-art methods.The code is released at https://github.com/AI-Zhpp/HAT. |
Guowen Zhang; Pingping Zhang; Jinqing Qi; Huchuan Lu; |
| 81 | DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods try to learn domain invariant features while suffering from large domain gaps that make it difficult to correctly align discrepant features, especially in the initial training phase. To address this issue, we propose a novel Dual Soft-Paste (DSP) method in this paper. |
Li Gao; Jing Zhang; Lefei Zhang; Dacheng Tao; |
| 82 | OsGG-Net: One-step Graph Generation Network for Unbiased Head Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose OsGG-Net, a One-step Graph Generation Network for estimating head poses from a single image by generating a landmark-connection graph to model the 3D angle associated with the landmark distribution robustly. |
Shentong Mo; Xin Miao; |
| 83 | Contrastive Disentangled Meta-Learning for Signer-Independent Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we mainly attack the signer-independent setting and focus on augmenting the generalization ability of translation model. |
Tao Jin; Zhou Zhao; |
| 84 | FAMGAN: Fine-grained AUs Modulation Based Generative Adversarial Network for Micro-Expression Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is extremely difficult to elicit and label MEs, resulting in a lack of sufficient MEs data for MEs analysis. To address this challenge and inspired by the current face generation technology, in this paper we introduce Generative Adversarial Network based on fine-grained Action Units (AUs) modulation to generate MEs sequence (FAMGAN). |
Yifan Xu; Sirui Zhao; Huaying Tang; Xinglong Mao; Tong Xu; Enhong Chen; |
| 85 | MultiModal Language Modelling on Knowledge Graphs for Deep Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an equivalence representation of span-prediction based language models and knowledge-graphs to better leverage recent developments of language modelling for multi-modal problem statements. |
Vishal Anand; Raksha Ramesh; Boshen Jin; Ziyin Wang; Xiaoxiao Lei; Ching-Yung Lin; |
| 86 | A Stepwise Matching Method for Multi-modal Image Based on Cascaded Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Template matching of multi-modal image has been a challenge to image matching, and it is difficult to balance the speed and the accuracy, especially for images with large sizes. Based on this, we propose a stepwise image matching method to achieve a precise location from the coarse-to-fine image matching by utilizing cascaded networks. |
Jinming Mu; Shuiping Gou; Shasha Mao; Shankui Zheng; |
| 87 | SimulSLT: End-to-End Simultaneous Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the existing sign language translation methods need to read all the videos before starting the translation, which leads to a high inference latency and also limits their application in real-life scenarios. To solve this problem, we propose SimulSLT, the first end-to-end simultaneous sign language translation model, which can translate sign language videos into target text concurrently. |
Aoxiong Yin; Zhou Zhao; Jinglin Liu; Weike Jin; Meng Zhang; Xingshan Zeng; Xiaofei He; |
| 88 | Unsupervised Portrait Shadow Removal Via Generative Priors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an effective progressive optimization algorithm to learn the decomposition process. |
Yingqing He; Yazhou Xing; Tianjia Zhang; Qifeng Chen; |
| 89 | Implicit Feedbacks Are Not Always Favorable: Iterative Relabeled One-Class Collaborative Filtering Against Noisy Interactions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they generally ignore the noise in observed interactions (i.e., click does not necessarily represent positive feedback), which might induce performance degradation. To attack this issue, we propose a novel iteratively relabeling framework to jointly mitigate the noise in both observed and unobserved interactions. |
Zitai Wang; Qianqian Xu; Zhiyong Yang; Xiaochun Cao; Qingming Huang; |
| 90 | Exploring Gradient Flow Based Saliency for DNN Model Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Traditional model pruning methods such as l-1 pruning that evaluates the channel significance for DNN pay too much attention to the local analysis of each channel and make use of the magnitude of the entire feature while ignoring its relevance to the batch normalization (BN) and ReLU layer after each convolutional operation. To overcome these problems, we propose a new model pruning method from a new perspective of gradient flow in this paper. |
Xinyu Liu; Baopu Li; Zhen Chen; Yixuan Yuan; |
| 91 | Graph Convolutional Multi-modal Hashing for Flexible Multimedia Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Flexible Graph Convolutional Multi-modal Hashing (FGCMH) method that adopts GCNs with linear complexity to preserve both the modality-individual and modality-fused structural similarity for discriminative hash learning. |
Xu Lu; Lei Zhu; Li Liu; Liqiang Nie; Huaxiang Zhang; |
| 92 | CanvasEmb: Learning Layout Representation with Large-scale Pre-training for Graphic Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the recent success of self-supervised pre-training techniques in various natural language processing tasks, in this paper, we propose CanvasEmb (Canvas Embedding), which pre-trains deep representations from unlabeled graphic designs by jointly conditioning on all the context elements in a canvas, with a multi-dimensional feature encoder and a multi-task learning objective. |
Yuxi Xie; Danqing Huang; Jinpeng Wang; Chin-Yew Lin; |
| 93 | Semi-supervised Domain Adaptive Retrieval Via Discriminative Hashing Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the challenging SDAR, this paper propose a novel method named Discriminative Hashing learning (DHLing) which mainly includes two modules, i.e., domain-specific optimization and domain-invariant memory bank. |
Haifeng Xia; Taotao Jing; Chen Chen; Zhengming Ding; |
| 94 | Cross Modal Compression: Towards Human-comprehensible Semantic Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, with the increasing demand for machine analysis and semantic monitoring in recent years, semantic fidelity rather than signal fidelity is becoming another emerging concern in image/video compression. With the recent advances in cross modal translation and generation, in this paper, we propose the cross modal compression~(CMC), a semantic compression framework for visual data, to transform the high redundant visual data~(such as image, video, etc.) into a compact, human-comprehensible domain~(such as text, sketch, semantic map, attributions, etc.), while preserving the semantic. |
Jiguo Li; Chuanmin Jia; Xinfeng Zhang; Siwei Ma; Wen Gao; |
| 95 | Efficient Sparse Attacks on Videos Using Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There are still lack of effective adversarial methods to produce adversarial videos with small perturbations and limited query numbers at the same time.In this paper, an efficient and powerful method is proposed for adversarial video attacks in the black-box attack mode. |
Huanqian Yan; Xingxing Wei; |
| 96 | Intrinsic Temporal Regularization for High-resolution Human Video Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A key challenge for this task lies in the modeling of feature transformation across source and driving frames, where fine-grained transform helps promote visual details at garment regions, but often at the expense of intensified temporal flickering. To resolve this dilemma, we propose a novel framework with 1) a multi-scale transform estimation and feature fusion module to preserve fine-grained garment details, and 2) an intrinsic regularization loss to enforce temporal consistency of learned transform between adjacent frames. |
Lingbo Yang; Zhanning Gao; Siwei Ma; Wen Gao; |
| 97 | When Face Completion Meets Irregular Holes: An Attributes Guided Deep Inpainting Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel attributes-guided face completion network (AttrFaceNet), which comprises a facial attribute prediction subnet and a face completion subnet. |
Jie Xiao; Dandan Zhan; Haoran Qi; Zhi Jin; |
| 98 | Skeleton-Contrastive 3D Action Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we propose inter-skeleton contrastive learning, which learns from multiple different input skeleton representations in a cross-contrastive manner. |
Fida Mohammad Thoker; Hazel Doughty; Cees G. M. Snoek; |
| 99 | Visual Co-Occurrence Alignment Learning for Weakly-Supervised Video Moment Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to improve the visual feature representation with supervisions in the visual domain, obtaining discriminative visual features for cross-modal learning. |
Zheng Wang; Jingjing Chen; Yu-Gang Jiang; |
| 100 | Show, Read and Reason: Table Structure Recognition with Flexible Context Aggregator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the challenging problem of table structure recognition in this work. |
Hao Liu; Xin Li; Bing Liu; Deqiang Jiang; Yinsong Liu; Bo Ren; Rongrong Ji; |
| 101 | Object Point Cloud Classification Via Poly-Convolutional Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These classification methods design a number of effective permutation-invariant feature encoding kernels, but still suffer from the intrinsic challenge of large geometric feature variations caused by inconsistent point distributions along object surface. In this paper, point cloud classification can be addressed via deep graph representation learning on aggregating multiple convolutional feature kernels (namely, a poly convolutional operation) anchored on each point with its local neighbours. |
Xuanxiang Lin; Ke Chen; Kui Jia; |
| 102 | IButter: Neural Interactive Bullet Time Generator for Human Free-viewpoint Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: During preview, we propose an interactive bullet-time design approach by extending the NeRF rendering to a real-time and dynamic setting and getting rid of the tedious per-scene training. |
Liao Wang; Ziyu Wang; Pei Lin; Yuheng Jiang; Xin Suo; Minye Wu; Lan Xu; Jingyi Yu; |
| 103 | E2Net: Excitative-Expansile Learning for Weakly Supervised Object Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, learning object localization imposes CNNs to attend non-salient regions under weak supervision, which may negatively influence image classification results. To address these challenges, this paper proposes a novel end-to-end Excitation-Expansion network, coined as E$^2$Net, to localize entire objects with only image-level labels, which served as the base of most multimedia tasks. |
Zhiwei Chen; Liujuan Cao; Yunhang Shen; Feihong Lian; Yongjian Wu; Rongrong Ji; |
| 104 | Semantic-aware Transfer with Instance-adaptive Parsing for Crowded Scenes Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, simply applying this mechanism to crowded scenes pose estimation results in unsatisfactory performance due to several issues, in particular involving missing keypoints in crowds and ambiguously labeling during training. To tackle above two issues, we introduce a novel method named Semantic-aware Transfer with Instance-adaptive Parsing (STIP). |
Xuanhan Wang; Lianli Gao; Yan Dai; Yixuan Zhou; Jingkuan Song; |
| 105 | Multi-Modal Sarcasm Detection with Interactive In-Modal and Cross-Modal Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate multi-modal sarcasm detection from a novel perspective, so as to determine the sentiment inconsistencies within a certain modality and across different modalities by constructing heterogeneous in-modal and cross-modal graphs (InCrossMGs) for each multi-modal example. |
Bin Liang; Chenwei Lou; Xiang Li; Lin Gui; Min Yang; Ruifeng Xu; |
| 106 | Long-Range Feature Propagating for Natural Image Matting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we find that more than 50\% pixels in the unknown regions cannot be correlated to pixels in known regions due to the limitation of small effective reception fields of common convolutional neural networks, which leads to inaccurate estimation when the pixels in the unknown regions cannot be inferred only with pixels in the reception fields. To solve this problem, we propose Long-Range Feature Propagating Network (LFPNet), which learns the long-range context features outside the reception fields for alpha matte estimation. |
Qinglin Liu; Haozhe Xie; Shengping Zhang; Bineng Zhong; Rongrong Ji; |
| 107 | Robust Shadow Detection By Exploring Effective Shadow Contexts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Effective contexts for separating shadows from non-shadow objects can appear in different scales due to different object sizes. This paper introduces a new module, Effective-Context Augmentation (ECA), to utilize these contexts for robust shadow detection with deep structures. |
Xianyong Fang; Xiaohao He; Linbo Wang; Jianbing Shen; |
| 108 | Object-aware Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an object-aware long-short-range spatial alignment approach, which is composed of a foreground object feature enhancement (FOE) module, a long-range semantic correspondence (LSC) module and a short-range spatial manipulation (SSM) module. |
Yike Wu; Bo Zhang; Gang Yu; Weixi Zhang; Bin Wang; Tao Chen; Jiayuan Fan; |
| 109 | Distributed Attention for Grounded Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a simple yet effective method to alleviate the issue, termed as partial grounding problem in our paper. |
Nenglun Chen; Xingjia Pan; Runnan Chen; Lei Yang; Zhiwen Lin; Yuqiang Ren; Haolei Yuan; Xiaowei Guo; Feiyue Huang; Wenping Wang; |
| 110 | Locally Adaptive Structure and Texture Similarity for Image Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe a locally adaptive structure and texture similarity index for full-reference IQA, which we term A-DISTS. |
Keyan Ding; Yi Liu; Xueyi Zou; Shiqi Wang; Kede Ma; |
| 111 | Self-supervising Action Recognition By Statistical Moment and Subspace Descriptors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we build on a concept of self-supervision by taking RGB frames as input to learn to predict both action concepts and auxiliary descriptors e.g., object descriptors. |
Lei Wang; Piotr Koniusz; |
| 112 | Co-learning: Learning from Noisy Labels with Self-supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by co-training with both supervised learning view and self-supervised learning view, we propose a simple yet effective method called Co-learning for learning with noisy labels. |
Cheng Tan; Jun Xia; Lirong Wu; Stan Z. Li; |
| 113 | Auto-MSFNet: Search Multi-scale Fusion Network for Salient Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a multi-scale features fusion framework based on Neural Architecture Search (NAS), named Auto-MSFNet. |
Miao Zhang; Tingwei Liu; Yongri Piao; Shunyu Yao; Huchuan Lu; |
| 114 | Consistency-Constancy Bi-Knowledge Learning for Pedestrian Detection in Night Surveillance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a consistency-constancy bi-knowledge learning (CCBL) for pedestrian detection in night surveillance, which is able to simultaneously achieve the night pedestrian detection’s useful knowledge, coming from day and night surveillance. |
Xiao Wang; Zheng Wang; Wu Liu; Xin Xu; Jing Chen; Chia-Wen Lin; |
| 115 | Linking The Characters: Video-oriented Social Graph Generation Via Hierarchical-cumulative GCN Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To that end, inspired by the human inference ability on social relationship, we propose a novel Hierarchical- Cumulative Graph Convolutional Network (HC-GCN) to generate the social relation graph for multiple characters in the video. |
Shiwei Wu; Joya Chen; Tong Xu; Liyi Chen; Lingfei Wu; Yao Hu; Enhong Chen; |
| 116 | ION: Instance-level Object Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new task of Instance Object Navigation (ION), where instance-level descriptions of targets are provided and instance-level navigation is required. |
Weijie Li; Xinhang Song; Yubing Bai; Sixian Zhang; Shuqiang Jiang; |
| 117 | Hybrid Network Compression Via Meta-Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Neural network pruning and quantization are two major lines of network compression. This raises a natural question that whether we can find the optimal compression by considering multiple network compression criteria in a unified framework. |
Jianming Ye; Shiliang Zhang; Jingdong Wang; |
| 118 | Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the aforementioned problems, we propose a novel multi-granular spatio-temporal graph network for skeleton-based action classification that jointly models the coarse- and fine-grained skeleton motion patterns. |
Tailin Chen; Desen Zhou; Jian Wang; Shidong Wang; Yu Guan; Xuming He; Errui Ding; |
| 119 | Structure-aware Mathematical Expression Recognition with Sequence-Level Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, they may fail to capture the rigorous relations among different formula symbols as they consider MER as a common language generation task. To address these issues, we propose a Structure-Aware Sequence-Level (SASL) model for MER. |
Minli Li; Peilin Zhao; Yifan Zhang; Shuaicheng Niu; Qingyao Wu; Mingkui Tan; |
| 120 | Collocation and Try-on Network: Whether An Outfit Is Compatible Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current FCM studies still perform far from satisfactory, because they only consider the collocation compatibility modeling, while neglecting the natural human habits that people generally evaluate outfit compatibility from both the collocation (discrete assess) and the try-on (unified assess) perspectives. In light of the above analysis, we propose a Collocation and Try-On Network (CTO-Net) for FCM, combining both the collocation and try-on compatibilities. |
Na Zheng; Xuemeng Song; Qingying Niu; Xue Dong; Yibing Zhan; Liqiang Nie; |
| 121 | Learning Hierarchal Channel Attention for Fine-grained Visual Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the great merit of multi-modal data in introducing semantic knowledge and sequential analysis techniques in learning hierarchical feature representation for generating discriminative fine-grained features. |
Xiang Guan; Guoqing Wang; Xing Xu; Yi Bin; |
| 122 | Joint Learning for Relationship and Interaction Analysis in Video with Multimodal Feature Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a solution to the DVU task which applies joint learning of interaction and relationship prediction and multimodal feature fusion. |
Beibei Zhang; Fan Yu; Yanxin Gao; Tongwei Ren; Gangshan Wu; |
| 123 | Beyond OCR + VQA: Involving OCR Into The Flow for Robust and Accurate TextVQA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It makes the performance of multimodal reasoning and question answering highly depend on the accuracy of OCR. In this work, we address this issue with two perspectives. |
Gangyan Zeng; Yuan Zhang; Yu Zhou; Xiaomeng Yang; |
| 124 | RecycleNet: An Overlapped Text Instance Recovery Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, these approaches cannot handle overlapped instances that often appear in sheets like invoices, receipts and math exercises, where printed templates are generated beforehand and extra contents are added afterward on existing texts. In this paper, we aim to tackle this problem by proposing RecycleNet, which automatically extracts and reconstructs overlapped instances by fully recycling the intersecting pixels that used to be obstacles for recognition. |
Yiqing Hu; Yan Zheng; Xinghua Jiang; Hao Liu; Deqiang Jiang; Yinsong Liu; Bo Ren; Rongrong Ji; |
| 125 | TDI TextSpotter: Taking Data Imbalance Into Account in Scene Text Spotting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The default left-to-right reading direction leads to errors in unconventional text spotting. In this paper, we propose a novel scene text spotter TDI to solve these problems. |
Yu Zhou; Hongtao Xie; Shancheng Fang; Jing Wang; Zhengjun Zha; Yongdong Zhang; |
| 126 | WAS-VTON: Warping Architecture Search for Virtual Try-on Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite recent progress on image-based virtual try-on, current methods are constraint by shared warping networks and thus fail to synthesize natural try-on results when faced with clothing categories that require different warping operations. In this paper, we address this problem by finding clothing category-specific warping networks for the virtual try-on task via Neural Architecture Search (NAS). |
Zhenyu Xie; Xujie Zhang; Fuwei Zhao; Haoye Dong; Michael C. Kampffmeyer; Haonan Yan; Xiaodan Liang; |
| 127 | Missing Data Imputation for Solar Yield Prediction Using Temporal Multi-Modal Variational Auto-Encoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel temporal multi-modal variational auto-encoder (TMMVAE) model, to enhance the robustness of short-term solar power yield prediction with missing data. |
Meng Shen; Huaizheng Zhang; Yixin Cao; Fan Yang; Yonggang Wen; |
| 128 | Cut-Thumbnail: A Novel Data Augmentation for Convolutional Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel data augmentation strategy named Cut-Thumbnail, that aims to improve the shape bias of the network. |
Tianshu Xie; Xuan Cheng; Xiaomin Wang; Minghui Liu; Jiali Deng; Tao Zhou; Ming Liu; |
| 129 | Stacked Semantically-Guided Learning for Image De-distortion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It can benefit many computational visual media applications that are primarily designed for high-quality images. In order to address this challenging issue, we propose a stacked semantically-guided network, which is the first try on this task. |
Huiyuan Fu; Changhao Tian; Xin Wang; Huadong Ma; |
| 130 | Rethinking The Impacts of Overfitting and Feature Quality on Small-scale Video Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose three major techniques to improve feature quality and another three to alleviate overfitting in an attempt to make lightweight models achieve higher performances. |
Xuansheng Wu; Feichi Yang; Tong Zhou; Xinyue Lin; |
| 131 | Multi-initialization Optimization Network for Accurate 3D Human Pose and Shape Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we reduce the ambituity by optimizing multiple initializations. |
Zhiwei Liu; Xiangyu Zhu; Lu Yang; Xiang Yan; Ming Tang; Zhen Lei; Guibo Zhu; Xuetao Feng; Yan Wang; Jinqiao Wang; |
| 132 | Weighted Gaussian Loss Based Hamming Hashing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, nearly all existing Hamming hashing methods cannot effectively penalize the dissimilar pairs within the Hamming ball to push them out. To tackle this problem, in this paper, we propose a novel Weighted Gaussian Loss based Hamming Hashing, called WGLHH, which introduces a weighted Gaussian loss to optimize hashing model. |
Rong-Cheng Tu; Xian-Ling Mao; Cihang Kong; Zihang Shao; Ze-Lin Li; Wei Wei; Heyan Huang; |
| 133 | Facial Micro-Expression Generation Based on Deep Motion Retargeting and Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Second, to enhance the feature extraction ability, we applied deep transfer learning (DTL) by borrowing knowledge from macro-expression images. We evaluated our method on three datasets, CASME II, SMIC, and SAMM, and found that it showed satisfactory results on all of them. |
Xinqi Fan; Ali Raza Shahid; Hong Yan; |
| 134 | Is Visual Context Really Helpful for Knowledge Graph? A Representation Learning Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we probe the utility of the auxiliary visual context from knowledge graph representation learning perspective by designing a Relation Sensitive Multi-modal Embedding model, RSME for short. |
Meng Wang; Sen Wang; Han Yang; Zheng Zhang; Xi Chen; Guilin Qi; |
| 135 | ZiGAN: Fine-grained Chinese Calligraphy Font Generation Via A Few-shot Style Transfer Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple but powerful end-to-end Chinese calligraphy font generation framework ZiGAN, which does not require any manual operation or redundant preprocessing to generate fine-grained target style characters with few-shot references. |
Qi Wen; Shuang Li; Bingfeng Han; Yi Yuan; |
| 136 | How to Learn A Domain-Adaptive Event Simulator? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To align the statistics of synthetic events with that of target event cameras, existing simulators often need to be heuristically tuned with elaborative manual efforts and thus become incompetent to automatically adapt to various domains. To address this issue, this work proposes one of the first learning-based, domain-adaptive event simulator. |
Daxin Gu; Jia Li; Yu Zhang; Yonghong Tian; |
| 137 | Pose-Guided Feature Learning with Knowledge Distillation for Occluded Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve high accuracy while preserving low inference complexity, we propose a network named Pose-Guided Feature Learning with Knowledge Distillation (PGFL-KD), where the pose information is exploited to regularize the learning of semantics aligned features but is discarded in testing. |
Kecheng Zheng; Cuiling Lan; Wenjun Zeng; Jiawei Liu; Zhizheng Zhang; Zheng-Jun Zha; |
| 138 | Deep Interactive Video Inpainting: An Invisibility Cloak for Harry Potter Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new task of deep interactive video inpainting and an application for users to interact with machines. |
Cheng Chen; Jiayin Cai; Yao Hu; Xu Tang; Xinggang Wang; Chun Yuan; Xiang Bai; Song Bai; |
| 139 | Ego-Deliver: A Large-Scale Dataset For Egocentric Video Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Ego-Deliver, a new large-scale egocentric video benchmark recorded by takeaway riders about their daily work. |
Haonan Qiu; Pan He; Shuchun Liu; Weiyuan Shao; Feiyun Zhang; Jiajun Wang; Liang He; Feng Wang; |
| 140 | Multimodal Global Relation Knowledge Distillation for Egocentric Action Anticipation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the task of action anticipation on egocentric videos. |
Yi Huang; Xiaoshan Yang; Changsheng Xu; |
| 141 | Stereo Video Super-Resolution Via Exploiting View-Temporal Correlations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Stereo Video Super-Resolution Network (SVSRNet) to fulfill the StereoVSR task via exploiting view-temporal correlations. |
Ruikang Xu; Zeyu Xiao; Mingde Yao; Yueyi Zhang; Zhiwei Xiong; |
| 142 | Efficient Graph Deep Learning in TensorFlow with Tf_geometric Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce tf_geometric1, an efficient and friendly library for graph deep learning, which is compatible with both TensorFlow 1.x and 2.x. |
Jun Hu; Shengsheng Qian; Quan Fang; Youze Wang; Quan Zhao; Huaiwen Zhang; Changsheng Xu; |
| 143 | Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel multi-level counterfactual contrastive learning network for VCR by jointly modeling the hierarchical visual contents and the inter-modality relationships between the visual and linguistic domains. |
Xi Zhang; Feifei Zhang; Changsheng Xu; |
| 144 | From Image to Imuge: Immunized Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Imuge, an image tamper resilient generative scheme for image self-recovery. |
Qichao Ying; Zhenxing Qian; Hang Zhou; Haisheng Xu; Xinpeng Zhang; Siyi Li; |
| 145 | Merging Multiple Template Matching Predictions in Intra Coding with Attentive Convolutional Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the prediction indicated by the best template matching is not always the actually best prediction. To solve this problem, we propose a method, which merges multiple template matching predictions through a convolutional neural network with attention module. |
Qijun Wang; Guodong Zheng; |
| 146 | LSSNet: A Two-stream Convolutional Neural Network for Spotting Macro- and Micro-expression in Long Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient two-stream network named location suppression based spotting network (LSSNet), which includes three parts. |
Wang-Wang Yu; Jingwen Jiang; Yong-Jie Li; |
| 147 | Face-based Voice Conversion: Learning The Voice Behind A Face Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since there is a strong relationship between human faces and voices, a promising approach would be to synthesize various voice characteristics from face representation. Therefore, we introduce a novel idea of generating different voice styles from different human face photos, which can facilitate new applications, e.g., personalized voice assistants. |
Hsiao-Han Lu; Shao-En Weng; Ya-Fan Yen; Hong-Han Shuai; Wen-Huang Cheng; |
| 148 | AdvHash: Set-to-set Targeted Attack on Deep Hashing with One Single Adversarial Patch Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose AdvHash, the first targeted mismatch attack on deep hashing through adversarial patch. |
Shengshan Hu; Yechao Zhang; Xiaogeng Liu; Leo Yu Zhang; Minghui Li; Hai Jin; |
| 149 | JDMAN: Joint Discriminative and Mutual Adaptation Networks for Cross-Domain Facial Expression Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose Joint Discriminative and Mutual Adaptation Networks (JDMAN), which collaboratively bridge the domain shift and semantic gap by domain- and category-level co-adaptation based on mutual information and discriminative metric learning techniques. |
Yingjian Li; Yingnan Gao; Bingzhi Chen; Zheng Zhang; Lei Zhu; Guangming Lu; |
| 150 | Multimodal Compatibility Modeling Via Exploring The Consistent and Complementary Correlations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the consistent and complementary correlations for better compatibility modeling. |
Weili Guan; Haokun Wen; Xuemeng Song; Chung-Hsing Yeh; Xiaojun Chang; Liqiang Nie; |
| 151 | Cross-Modal Generalization: Learning in Low Resource Modalities Via Meta-Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formalize cross-modal generalization as a learning paradigm to train a model that can (1) quickly perform new tasks (from new domains) while (2) being originally trained on a different input modality. |
Paul Pu Liang; Peter Wu; Liu Ziyin; Louis-Philippe Morency; Ruslan Salakhutdinov; |
| 152 | Deep Self-Supervised T-SNE for Multi-modal Subspace Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods might be incapable of handling real problems with complex heterogeneous structures between different modalities, since the large heterogeneous structure makes it difficult to directly learn a discriminative shared self-representation for multi-modal clustering. To tackle this problem, in this paper, we propose a deep Self-supervised t-SNE method (StSNE) for multi-modal subspace clustering, which learns soft label features by multi-modal encoders and utilizes the common label feature to supervise soft label feature of each modal by adversarial training and reconstruction networks. |
Qianqian Wang; Wei Xia; Zhiqiang Tao; Quanxue Gao; Xiaochun Cao; |
| 153 | Cycle-Consistent Inverse GAN for Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel unified framework of Cycle-consistent Inverse GAN (CI-GAN) for both text-to-image generation and text-guided image manipulation tasks. |
Hao Wang; Guosheng Lin; Steven C. H. Hoi; Chunyan Miao; |
| 154 | Mix-order Attention Networks for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing CNN-based methods neglect the diversity of image contents and degradations in the corrupted images and treat channel-wise features equally, thus hindering the representation ability of CNNs. To address this issue, we propose deep mix-order attention networks (MAN) to extract features that capture rich feature statistics within networks. |
Tao Dai; Yalei Lv; Bin Chen; Zhi Wang; Zexuan Zhu; Shu-Tao Xia; |
| 155 | Meta Self-Paced Learning for Cross-Modal Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Meta Self-Paced Network (Meta-SPN) that automatically learns a weighting scheme from data for cross-modal matching. |
Jiwei Wei; Xing Xu; Zheng Wang; Guoqing Wang; |
| 156 | Learning Multi-context Aware Location Representations from Large-scale Geotagged Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a machine learning based approach, termed GPS2Vec+, which learns rich location representations by capitalizing on the world-wide geotagged images. |
Yifang Yin; Ying Zhang; Zhenguang Liu; Yuxuan Liang; Sheng Wang; Rajiv Ratn Shah; Roger Zimmermann; |
| 157 | AggNet for Self-supervised Monocular Depth Estimation: Go An Aggressive Step Furthe Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous methods usually adopt a one-stage MDE network, which is insufficient to achieve high performance. In this paper, we dig deep into this task to propose an aggressive framework termed AggNet. |
Zhi Chen; Xiaoqing Ye; Liang Du; Wei Yang; Liusheng Huang; Xiao Tan; Zhenbo Shi; Fumin Shen; Errui Ding; |
| 158 | InterBN: Channel Fusion for Adversarial Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: A classifier trained on one dataset rarely works on other datasets obtained under different conditions because of domain shifting. Such a problem is usually solved by domain … |
Mengzhu Wang; Wei Wang; Baopu Li; Xiang Zhang; Long Lan; Huibin Tan; Tianyi Liang; Wei Yu; Zhigang Luo; |
| 159 | Inferring The Importance of Product Appearance with Semi-supervised Multi-modal Enhancement: A Step Towards The Screenless Retailing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to infer the significance of every item’s appearance in consumer decision making and identify the group of items that are suitable for screenless shopping. |
Yongshun Gong; Jinfeng Yi; Dong-Dong Chen; Jian Zhang; Jiayu Zhou; Zhihua Zhou; |
| 160 | Pre-training Graph Transformer with Multimodal Side Information for Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the recent success of pre-training models on natural language and images, we propose a pre-training strategy to learn item representations by considering both item side information and their relationships. |
Yong Liu; Susen Yang; Chenyi Lei; Guoxin Wang; Haihong Tang; Juyong Zhang; Aixin Sun; Chunyan Miao; |
| 161 | Weakly-Supervised Video Object Grounding Via Stable Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: (2) A few works have attempted to utilize contextual information to learn object representations, but found a significant decrease in performance due to the unstable training in cross-modal alignment. To address the above issues, in this paper, we propose a Stable Context Learning (SCL) framework for WSVOG which jointly enjoys the merits of stable learning and rich contextual information. |
Wei Wang; Junyu Gao; Changsheng Xu; |
| 162 | M3TR: Multi-modal Multi-label Recognition with Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To build the global scope of visual context as well as interactions between visual modality and linguistic modality, we propose the Multi-Modal Multi-label recognition TRansformers (M3TR) with the ternary relationship learning for inter-and intra-modalities. |
Jiawei Zhao; Yifan Zhao; Jia Li; |
| 163 | Meta-FDMixup: Cross-Domain Few-Shot Learning Guided By Labeled Target Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we realize that the labeled target data in CD-FSL has not been leveraged in any way to help the learning process. |
Yuqian Fu; Yanwei Fu; Yu-Gang Jiang; |
| 164 | Deep Neural Network Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Afterward, MLaaS returns a well-trained classifier to them. In this paper, we explore a potential novel task named deep neural network retrieval and its application which helps MLaaS to save computation resources. |
Nan Zhong; Zhenxing Qian; Xinpeng Zhang; |
| 165 | TACR-Net: Editing on Deep Video and Voice Portraits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 3) the operation of forgery is always at the video level, without considering the forgery of the voice, especially the synchronization of the converted voice and the mouth. To address these distortion problems, we propose a novel deep learning framework, named Temporal-Refinement Autoregressive-Cascade Rendering Network (TACR-Net) for audio-driven dynamic talking face editing. |
Luchuan Song; Bin Liu; Guojun Yin; Xiaoyi Dong; Yufei Zhang; Jia-Xuan Bai; |
| 166 | I Know Your Keyboard Input: A Robust Keystroke Eavesdropper Based-on Acoustic Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a robust side-channel attack scheme to infer keystrokes on the surrounding keyboard, leveraging the smart devices’ microphones. |
Jia-Xuan Bai; Bin Liu; Luchuan Song; |
| 167 | Perception-Oriented Stereo Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the perceptual performance, this paper proposes the first perception-oriented stereo image super-resolution approach by exploiting the feedback, provided by the evaluation on the perceptual quality of StereoSR results. |
Chenxi Ma; Bo Yan; Weimin Tan; Xuhao Jiang; |
| 168 | End-to-end Quality of Experience Evaluation for HTTP Adaptive Streaming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we will investigate the stated metrics, best practices and evaluations methods, and available techniques with an aim to (i) design and develop practical and scalable measurement tools and prototypes, (ii) provide a better understanding of current technologies and techniques (eg. |
Babak Taraghi; |
| 169 | Exploiting Invariance of Mining Facial Landmarks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an invariant learning method for facial landmark mining in a self-supervised manner. |
Jiangming Shi; Zixian Gao; Hao Liu; Zekuan Yu; Fengjun Li; |
| 170 | Disentangle Your Dense Object Detector Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the current training pipeline for dense detectors is compromised to lots of conjunctions that may not hold. In this paper, we investigate three such important conjunctions: 1) only samples assigned as positive in classification head are used to train the regression head; 2) classification and regression share the same input feature and computational fields defined by the parallel head architecture; and 3) samples distributed in different feature pyramid layers are treated equally when computing the loss. |
Zehui Chen; Chenhongyi Yang; Qiaofei Li; Feng Zhao; Zheng-Jun Zha; Feng Wu; |
| 171 | Learning Transferrable and Interpretable Representations for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to learn a domain transformation space via a domain transformer network (DTN) which explicitly mines the relationship among multiple domains and constructs transferable feature representations for down-stream tasks by interpreting each feature as a semantically weighted combination of multiple domain-specific features. |
Zhekai Du; Jingjing Li; Ke Lu; Lei Zhu; Zi Huang; |
| 172 | GCM-Net: Towards Effective Global Context Modeling for Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new image inpainting method termed Global Context Modeling Network (GCM-Net). |
Huan Zheng; Zhao Zhang; Yang Wang; Zheng Zhang; Mingliang Xu; Yi Yang; Meng Wang; |
| 173 | Position-Augmented Transformers with Entity-Aligned Mesh for TextVQA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: And traditional multimodal transformers cannot effectively capture relative position information and original image features. To address these issues in an intuitive but effective way, we propose a novel model, position-augmented transformers with entity-aligned mesh, for the TextVQA task. |
Xuanyu Zhang; Qing Yang; |
| 174 | CALLip: Lipreading Using Contrastive and Attribute Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite having reached a feasible performance, lipreading still faces two crucial challenges: 1) the considerable lip movement variations cross different persons when they utter the same words; 2) the similar lip movements of people when they utter some confused phonemes. To tackle these two problems, we propose a novel lipreading framework, CALLip, which employs attribute learning and contrastive learning. |
Yiyang Huang; Xuefeng Liang; Chaowei Fang; |
| 175 | Data-Free Ensemble Knowledge Distillation for Privacy-conscious Multimedia Model Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most well-behaved KD approaches require the original dataset, which is usually unavailable due to privacy issues, while existing data-free KD methods perform much worse than data-required counterparts. In this paper, we analyze previous data-free KD methods from the data perspective and point out that using a single pre-trained model limits the performance of these approaches. |
Zhiwei Hao; Yong Luo; Han Hu; Jianping An; Yonggang Wen; |
| 176 | SOGAN: 3D-Aware Shadow and Occlusion Robust GAN for Makeup Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate it, we propose a novel makeup transfer method, called 3D-Aware Shadow and Occlusion Robust GAN (SOGAN). |
Yueming Lyu; Jing Dong; Bo Peng; Wei Wang; Tieniu Tan; |
| 177 | Complementary Factorization Towards Outfit Compatibility Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although existing studies have achieved prominent progress, most of them overlook the essential global outfit representation learning, and the hidden complementary factors behind the outfit compatibility uncovering. Towards this end, we propose an Outfit Compatibility Modeling scheme via Complementary Factorization, termed as OCM-CF. |
Tianyu Su; Xuemeng Song; Na Zheng; Weili Guan; Yan Li; Liqiang Nie; |
| 178 | StrucTexT: Structured Text Understanding with Multi-Modal Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a unified framework named StrucTexT, which is flexible and effective for handling both sub-tasks. |
Yulin Li; Yuxi Qian; Yuechen Yu; Xiameng Qin; Chengquan Zhang; Yan Liu; Kun Yao; Junyu Han; Jingtuo Liu; Errui Ding; |
| 179 | No-Reference Video Quality Assessment with Heterogeneous Knowledge Ensemble Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel no-reference VQA (NR-VQA) method with HEterogeneous Knowledge Ensemble (HEKE). |
Jinjian Wu; Yongxu Liu; Leida Li; Weisheng Dong; Guangming Shi; |
| 180 | Cross-modal Joint Prediction and Alignment for Composed Query Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the composed query image retrieval task, namely retrieving the target images that are similar to a composed query, in which a modification text is combined with a query image to describe a user’s accurate search intention. |
Yuchen Yang; Min Wang; Wengang Zhou; Houqiang Li; |
| 181 | Joint-teaching: Learning to Refine Knowledge for Resource-constrained Unsupervised Cross-modal Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an effective unsupervised learning framework named JOint-teachinG (JOG) to pursue a high-performance yet light-weight cross-modal retrieval model. |
Peng-Fei Zhang; Jiasheng Duan; Zi Huang; Hongzhi Yin; |
| 182 | CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel CONtextual QUery-awarE Ranking~(CONQUER) model for effective moment localization and ranking. |
Zhijian Hou; Chong-Wah Ngo; W. K. Chan; |
| 183 | CAA: Candidate-Aware Aggregation for Temporal Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they may neglect the underlying relationship among candidates unconsciously. In this paper, we propose a novel model termed Candidate-Aware Aggregation (CAA) to tackle this problem. |
Yifan Ren; Xing Xu; Fumin Shen; Yazhou Yao; Huimin Lu; |
| 184 | LightFEC: Network Adaptive FEC with A Lightweight Deep-Learning Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose LightFEC to make accurate and fast prediction of packet loss pattern. |
Han Hu; Sheng Cheng; Xinggong Zhang; Zongming Guo; |
| 185 | PRNet: A Progressive Recovery Network for Revealing Perceptually Encrypted Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new end-to-end method of analyzing the visual security of perceptually encrypted images, without any manual work or knowing any prior knowledge of the encryption scheme. |
Tao Xiang; Ying Yang; Shangwei Guo; Hangcheng Liu; Hantao Liu; |
| 186 | Searching Motion Graphs for Human Motion Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on a jump-sensitive graph path search algorithm proposed in this paper, our model can efficiently solve human motion completion over the motion graphs. |
Chenchen Liu; Yadong Mu; |
| 187 | Multimodal Dialog System: Relational Graph-based Context-aware Question Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a relational graph-based context-aware question understanding scheme, which enhances the user intention comprehension from local to global. |
Haoyu Zhang; Meng Liu; Zan Gao; Xiaoqiang Lei; Yinglong Wang; Liqiang Nie; |
| 188 | Why Do We Click: Visual Impression-aware News Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, existing research pays little attention to the click decision-making process in designing multi-modal modeling modules. In this work, inspired by the fact that users make their click decisions mostly based on the visual impression they perceive when browsing news, we propose to capture such visual impression information with visual-semantic modeling for news recommendation. |
Jiahao Xun; Shengyu Zhang; Zhou Zhao; Jieming Zhu; Qi Zhang; Jingjie Li; Xiuqiang He; Xiaofei He; Tat-Seng Chua; Fei Wu; |
| 189 | Block Popularity Prediction for Multimedia Storage Systems Using Spatial-Temporal-Sequential Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We systematically evaluate our STSNN models against six baseline models from three different categories, i.e., heuristic methods, regression methods and neural network-based methods. |
Yingying Cheng; Fan Zhang; Gang Hu; Yiwen Wang; Hanhui Yang; Gong Zhang; Zhuo Cheng; |
| 190 | FaceX-Zoo: A PyTorch Toolbox for Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The demands are in three folds, including 1) modular training scheme, 2) standard and automatic evaluation, and 3) groundwork of deployment. To meet these demands, we present a novel open-source project, named FaceX-Zoo, which is constructed with modular and scalable design, and oriented to the academic and industrial community of face-related analysis. |
Jun Wang; Yinglu Liu; Yibo Hu; Hailin Shi; Tao Mei; |
| 191 | Multimodal Asymmetric Dual Learning for Unsupervised Eyeglasses Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a multimodal asymmetric dual learning method for unsupervised glasses removal. |
Qing Lin; Bo Yan; Weimin Tan; |
| 192 | Partial Tubal Nuclear Norm Regularized Multi-view Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, previous methods mainly focus on using the tensor nuclear norm for low-rank representation to explore the high correlation of multi-view features, which often causes the estimation bias of the tensor rank. To overcome these limitations, we propose the partial tubal nuclear norm regularized multi-view learning (PTN2ML) method, in which the partial tubal nuclear norm as a non-convex surrogate of the tensor tubal multi-rank, only minimizes the partial sum of the smaller tubal singular values to preserve the low-rank property of the self-representation tensor. |
Yongyong Chen; Shuqin Wang; Chong Peng; Guangming Lu; Yicong Zhou; |
| 193 | Get The Best of The Three Worlds: Real-Time Neural Image Compression in A Non-GPU Environment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By resolving the high GPU dependency and improving the low speed of neural models, this paper proposes two non-GPU models that get the best of the three worlds. |
Zekun Zheng; Xiaodong Wang; Xinye Lin; Shaohe Lv; |
| 194 | UniCon: Unified Context Network for Robust Active Speaker Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new efficient framework, the Unified Context Network (UniCon), for robust active speaker detection (ASD). |
Yuanhang Zhang; Susan Liang; Shuang Yang; Xiao Liu; Zhongqin Wu; Shiguang Shan; Xilin Chen; |
| 195 | Adaptive Affinity Loss and Erroneous Pseudo-Label Refinement for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to embed affinity learning of multi-stage approaches in a single-stage model. |
Xiangrong Zhang; Zelin Peng; Peng Zhu; Tianyang Zhang; Chen Li; Huiyu Zhou; Licheng Jiao; |
| 196 | Facial Prior Based First Order Motion Model for Micro-expression Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Given a target face, we intend to drive the face to generate micro-expression videos according to the motion patterns of source videos. |
Yi Zhang; Youjun Zhao; Yuhang Wen; Zixuan Tang; Xinhua Xu; Mengyuan Liu; |
| 197 | Perceptual Quality Assessment of Internet Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the properties of Internet videos originated from Youku, we propose a spatio-temporal distortion-aware model (STDAM). |
Jiahua Xu; Jing Li; Xingguang Zhou; Wei Zhou; Baichao Wang; Zhibo Chen; |
| 198 | Do We Really Need Frame-by-Frame Annotation Datasets for Object Tracking? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate the necessity of large-scale training data to ensure tracking algorithms’ performance. |
Lei Hu; Shaoli Huang; Shilei Wang; Wei Liu; Jifeng Ning; |
| 199 | A Simple and Effective Baseline for Robust Logo Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article introduces the solution of the third place team green hand for ACM Multimedia 2021 Security AI Challenger Phase 7: Robust defense competition for e-commerce logo detection. |
Weipeng Xu; Ye Liu; Daquan Lin; |
| 200 | Learning Spatial-angular Fusion for Compressive Light Field Imaging in A Cycle-consistent Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates the 4-D light field (LF) reconstruction from 2-D measurements captured by the coded aperture camera. To tackle such an ill-posed inverse problem, we propose a cycle-consistent reconstruction network (CR-Net). |
Xianqiang Lyu; Zhiyu Zhu; Mantang Guo; Jing Jin; Junhui Hou; Huanqiang Zeng; |
| 201 | AsyNCE: Disentangling False-Positives for Weakly-Supervised Video Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, naive inner production is suboptimal for the similarity measure of cross domains. To solve these issues, we propose a novel AsyNCE loss to flexibly disentangle the positive pairs from negative ones in frame-level MIL, which allows for mitigating the uncertainty of false-positive frames effectively. |
Cheng Da; Yanhao Zhang; Yun Zheng; Pan Pan; Yinghui Xu; Chunhong Pan; |
| 202 | DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new framework, called Document Image Transformer (DocTr), to address the issue of geometry and illumination distortion of the document images. |
Hao Feng; Yuechen Wang; Wengang Zhou; Jiajun Deng; Houqiang Li; |
| 203 | Recursive Fusion and Deformable Spatiotemporal Attention for Video Compression Artifact Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, to boost artifact removal, on the one hand, we propose a Recursive Fusion (RF) module to model the temporal dependency within a long temporal range. |
Minyi Zhao; Yi Xu; Shuigeng Zhou; |
| 204 | Learning Spatio-temporal Representation By Channel Aliasing Video Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel pretext task namely Channel Aliasing Video Perception (CAVP) for self-supervised video representation learning. |
Yiqi Lin; Jinpeng Wang; Manlin Zhang; Andy J. Ma; |
| 205 | Dual Learning Music Composition and Dance Choreography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel extension, where we jointly model both tasks in a dual learning approach. |
Shuang Wu; Zhenguang Liu; Shijian Lu; Li Cheng; |
| 206 | Personality Recognition By Modelling Person-specific Cognitive Processes Using Graph Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent research shows that in dyadic and group interactions individuals’ nonverbal behaviours are influenced by the behaviours of their conversational partner(s). Therefore, in this work we hypothesise that during a dyadic interaction, the target subject’s facial reactions are driven by two main factors: (i) their internal (person-specific) cognition, and (ii) the externalised nonverbal behaviours of their conversational partner. |
Zilong Shao; Siyang Song; Shashank Jaiswal; Linlin Shen; Michel Valstar; Hatice Gunes; |
| 207 | BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and Monocular Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the multi-task learning, we propose a joint learning network of depth map super-resolution (DSR) and monocular depth estimation (MDE) without introducing additional supervision labels. |
Qi Tang; Runmin Cong; Ronghui Sheng; Lingzhi He; Dan Zhang; Yao Zhao; Sam Kwong; |
| 208 | View-normalized Skeleton Generation for Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a view normalization-based action recognition framework, which is composed of view-normalization generative adversarial network (VN-GAN) and classification network. |
Qingzhe Pan; Zhifu Zhao; Xuemei Xie; Jianan Li; Yuhan Cao; Guangming Shi; |
| 209 | Image Quality Caption with Attentive and Recurrent Semantic Attractor Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel quality caption model is inventively developed to assess the image quality with hierarchical semantics. |
Wen Yang; Jinjian Wu; Leida Li; Weisheng Dong; Guangming Shi; |
| 210 | Identity-Preserving Face Anonymization Via Adaptively Facial Attributes Obfuscation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a face anonymization framework that could obfuscate visual appearance while preserving the identity discriminability. |
Jingzhi Li; Lutong Han; Ruoyu Chen; Hua Zhang; Bing Han; Lili Wang; Xiaochun Cao; |
| 211 | Space-Angle Super-Resolution for Multi-View Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Intuitively, multi-view images can provide more angular references, and higher resolution can provide more high-frequency details. Therefore, we propose a one-stage space-angle super-resolution network called SASRnet, which simultaneously synthesizes real and virtual HR views. |
Yuqi Sun; Ri Cheng; Bo Yan; Shili Zhou; |
| 212 | Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Different from them, we reconsider the status of two modalities and propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD, which differentially models the dependence of two modalities according to the feature representations of different layers. |
Chen Zhang; Runmin Cong; Qinwei Lin; Lin Ma; Feng Li; Yao Zhao; Sam Kwong; |
| 213 | Searching A Hierarchically Aggregated Fusion Architecture for Fast Multi-Modality Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these handcrafted designs are unable to cope with the high demanding fusion tasks, resulting in blurred targets and lost textural details. To alleviate these issues, in this paper, we propose a novel approach, aiming at searching effective architectures according to various modality principles and fusion mechanisms.Specifically, we construct a hierarchically aggregated fusion architecture to extract and refine fused features from feature-level and object-level fusion perspectives, which is responsible for obtaining complementary target/detail representations. |
Risheng Liu; Zhu Liu; Jinyuan Liu; Xin Fan; |
| 214 | Mask Is All You Need: Rethinking Mask R-CNN for Dense and Arbitrary-Shaped Scene Text Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we argue that the performance degradation results from the learning confusion issue in the mask head. |
Xugong Qin; Yu Zhou; Youhui Guo; Dayan Wu; Zhihong Tian; Ning Jiang; Hongbin Wang; Weiping Wang; |
| 215 | Enhancing Knowledge Tracing Via Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, by leveraging the current advances in adversarial training (AT), we propose an efficient AT based KT method (ATKT) to enhance KT model’s generalization and thus push the limit of KT. |
Xiaopeng Guo; Zhijie Huang; Jie Gao; Mingyu Shang; Maojing Shu; Jun Sun; |
| 216 | Lifting The Veil of Frequency in Joint Segmentation and Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit the mutual enhancement for joint semantic segmentation and depth estimation. |
Tianhao Fu; Yingying Li; Xiaoqing Ye; Xiao Tan; Hao Sun; Fumin Shen; Errui Ding; |
| 217 | GCNIllustrator: Illustrating The Effect of Hyperparameters on Graph Convolutional Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Not only machine learning beginners, but also experienced practitioners often have difficulties to properly tune their models. We hypothesize that having a tool that visualizes the effect of hyperparameters choice on the performance can accelerate the model development and improve the understanding of these black-box models. |
Ivona Najdenkoska; Jeroen den Boef; Thomas Schneider; Justo van der Werf; Reinier de Ridder; Fajar Fathurrahman; Marcel Worring; |
| 218 | Transferrable Contrastive Learning for Visual Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Actually, the optimal region where the domain gap vanishes and the instance level constraint that SSL peruses may not coincide at all. From this point, we present a particular paradigm of self-supervised learning tailored for domain adaptation, i.e., Transferrable Contrastive Learning (TCL), which links the SSL and the desired cross-domain transferability congruently. |
Yang Chen; Yingwei Pan; Yu Wang; Ting Yao; Xinmei Tian; Tao Mei; |
| 219 | DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel Dynamic Segment Aggregation (DSA) module to capture relationship among snippets. |
Wenhao Wu; Yuxiang Zhao; Yanwu Xu; Xiao Tan; Dongliang He; Zhikang Zou; Jin Ye; Yingying Li; Mingde Yao; Zichao Dong; Yifeng Shi; |
| 220 | Mitigating Generation Shifts for Generalized Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel generative flow framework that consists of multiple conditional affine coupling layers for learning unseen data generation. |
Zhi Chen; Yadan Luo; Sen Wang; Ruihong Qiu; Jingjing Li; Zi Huang; |
| 221 | Learning Contextual Transformer Network for Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, this paper proposes the Contextual Transformer Network (CTN) which not only learns relationships between the corrupted and the uncorrupted regions but also exploits their respective internal closeness. |
Ye Deng; Siqi Hui; Sanping Zhou; Deyu Meng; Jinjun Wang; |
| 222 | Mining Latent Structures for Multimedia Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a LATent sTructure mining method for multImodal reCommEndation, which we term LATTICE for brevity. |
Jinghao Zhang; Yanqiao Zhu; Qiang Liu; Shu Wu; Shuhui Wang; Liang Wang; |
| 223 | Game Theory-driven Rate Control for 360-Degree Video Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce game theory to find optimal inter/intra-frame bit allocations that maximize the overall RC performance in terms of utility function. |
Tiesong Zhao; Jielian Lin; Yanjie Song; Xu Wang; Yuzhen Niu; |
| 224 | Fully Functional Image Manipulation Using Scene Graphs in A Bounding-Box Free Way Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the two issues above, we propose a novel bounding box free approach, which consists of two parts: a Local Bounding Box Free (Local-BBox-Free) Mask Generation and a Global Bounding Box Free (Global-BBox-Free) Instance Generation. |
Sitong Su; Lianli Gao; Junchen Zhu; Jie Shao; Jingkuan Song; |
| 225 | Learning Fine-Grained Motion Embedding for Landscape Animation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we focus on landscape animation, which aims to generate time-lapse videos from a single landscape image. |
Hongwei Xue; Bei Liu; Huan Yang; Jianlong Fu; Houqiang Li; Jiebo Luo; |
| 226 | Exploring Contextual-Aware Representation and Linguistic-Diverse Expression for Visual Dialog Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, conventional methods suffer from the single-answer learning strategy, where it only accepts one correct answer without considering the diverse expressions of the language (i.e., one identical meaning but multiple expressions via rephrasing or adopting synonyms etc). In this paper, we introduce Contextual-Aware Representation and linguistic-diverse Expression (CARE), a novel plug-and-play framework with contextual-based graph embedding and curriculum contrastive learning to solve the above two issues. |
Xiangpeng Li; Lianli Gao; Lei Zhao; Jingkuan Song; |
| 227 | Fine-grained Cross-modal Alignment Network for Text-Video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the problem, we propose a novel Fine-grained Cross-modal Alignment Network (FCA-Net), which considers the interactions between visual semantic units (i.e., sub-actions/sub-events) in videos and phrases in sentences for cross-modal alignment. |
Ning Han; Jingjing Chen; Guangyi Xiao; Hao Zhang; Yawen Zeng; Hao Chen; |
| 228 | Mask and Predict: Multi-step Reasoning for Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most of existing methods always formulate SGG as a straightforward task, only limited by the manner of one-time prediction, which focuses on a single-pass pipeline and predicts all the semantic. Therefore, to handle this problem, we propose a novel multi-step reasoning manner for SGG. |
Hongshuo Tian; Ning Xu; An-An Liu; Chenggang Yan; Zhendong Mao; Quan Zhang; Yongdong Zhang; |
| 229 | Self-supervised Consensus Representation Learning for Attributed Graph Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Attempting to fully exploit the rich information of topological structure and node features for attributed graph, we introduce self-supervised learning mechanism to graph representation learning and propose a novel Self-supervised Consensus Representation Learning (SCRL) framework. |
Changshu Liu; Liangjian Wen; Zhao Kang; Guangchun Luo; Ling Tian; |
| 230 | Efficient Reinforcement Learning Development with RLzoo Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce RLzoo, a new DRL library that aims to make the development of DRL agents efficient. |
Zihan Ding; Tianyang Yu; Hongming Zhang; Yanhua Huang; Guo Li; Quancheng Guo; Luo Mai; Hao Dong; |
| 231 | ViDA-MAN: Visual Dialog with Digital Humans Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate ViDA-MAN, a digital-human agent for multi-modal interaction, which offers realtime audio-visual responses to instant speech inquiries. |
Tong Shen; Jiawei Zuo; Fan Shi; Jin Zhang; Liqin Jiang; Meng Chen; Zhengchen Zhang; Wei Zhang; Xiaodong He; Tao Mei; |
| 232 | Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Segment Similarity and Alignment Network (SSAN) in dealing with the challenge which is firstly trained end-to-end in S-CBVR. |
Chen Jiang; Kaiming Huang; Sifeng He; Xudong Yang; Wei Zhang; Xiaobo Zhang; Yuan Cheng; Lei Yang; Qing Wang; Furong Xu; Tan Pan; Wei Chu; |
| 233 | Polar Ray: A Single-stage Angle-free Detector for Oriented Object Detection in Aerial Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new anchor-free oriented object detector, Polar Ray Network (PRNet), where object keypoints are represented by polar coordinates without angle regression. |
Shuai Liu; Lu Zhang; Shuai Hao; Huchuan Lu; You He; |
| 234 | GAMnet: Robust Feature Matching Via Graph Adversarial-Matching Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, it is also challenging to incorporate the discrete one-to-one matching constraints into the differentiable correspondence solver in deep matching network. To address these issues, we propose a novel Graph Adversarial Matching Network (GAMnet) for graph matching problem. |
Bo Jiang; Pengfei Sun; Ziyan Zhang; Jin Tang; Bin Luo; |
| 235 | AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a neural network that performs audio transformations to user-specified sources (e.g., vocals) of a given audio track according to a given description while preserving other sources not mentioned in the description. |
Woosung Choi; Minseok Kim; Marco A. Mart\'{\i}nez Ram\'{\i}rez; Jaehwa Chung; Soonyoung Jung; |
| 236 | DLA-Net for FG-SBIR: Dynamic Local Aligned Network for Fine-Grained Sketch-Based Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we emphasize the local features are more discriminating than global feature in FG-SBIR and explore an effective way to utilize local features. |
Jiaqing Xu; Haifeng Sun; Qi Qi; Jingyu Wang; Ce Ge; Lejian Zhang; Jianxin Liao; |
| 237 | Retinomorphic Sensing: A Novel Paradigm for Future Multimedia Computing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel solution, namely retinomorphic sensing, which integrates fovea-like and peripheral-like sampling mechanisms to generate asynchronous visual streams using a unified representation as the retina does. |
Zhaodong Kang; Jianing Li; Lin Zhu; Yonghong Tian; |
| 238 | InsPose: Instance-Aware Networks for Single-Stage Multi-Person Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Different from previous solutions that involve complex heuristic designs, we present a simple yet effective solution by employing instance-aware dynamic networks. |
Dahu Shi; Xing Wei; Xiaodong Yu; Wenming Tan; Ye Ren; Shiliang Pu; |
| 239 | Attribute-specific Control Units in StyleGAN for Fine-grained Image Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We evaluate our proposed method in various face attribute manipulation tasks. |
Rui Wang; Jian Chen; Gang Yu; Li Sun; Changqian Yu; Changxin Gao; Nong Sang; |
| 240 | Imbalanced Source-free Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, class-imbalance is a common phenomenon in real-world domain adaptation applications. To address this issue, we present Imbalanced Source-free Domain Adaptation (ISFDA) in this paper. |
Xinhao Li; Jingjing Li; Lei Zhu; Guoqing Wang; Zi Huang; |
| 241 | Boosting Mobile CNN Inference Through Semantic Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Human brains are known to be capable of speeding up visual recognition of repeatedly presented objects through faster memory encoding and accessing procedures on activated … |
Yun Li; Chen Zhang; Shihao Han; Li Lyna Zhang; Baoqun Yin; Yunxin Liu; Mengwei Xu; |
| 242 | Spatiotemporal Inconsistency Learning for DeepFake Video Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we present a novel temporal modeling paradigm in TIM by exploiting the temporal difference over adjacent frames along with both horizontal and vertical directions. |
Zhihao Gu; Yang Chen; Taiping Yao; Shouhong Ding; Jilin Li; Feiyue Huang; Lizhuang Ma; |
| 243 | NJU MCG – Sensetime Team Submission to Pre-training for Video Understanding Challenge Track II Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the method that underlies our submission to the Pre-training for Video Understanding Challenge Track II. |
Liwei Jin; Haoyue Cheng; Su Xu; Wayne Wu; Limin Wang; |
| 244 | Video Semantic Segmentation Via Sparse Temporal Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering the segmentation accuracy and inference efficiency, we propose a novel Sparse Temporal Transformer (STT) to bridge temporal relation among video frames adaptively, which is also equipped with query selection and key selection. |
Jiangtong Li; Wentao Wang; Junjie Chen; Li Niu; Jianlou Si; Chen Qian; Liqing Zhang; |
| 245 | Multimodal Relation Extraction with Efficient Graph Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the multimodal relation extraction (MRE), a task that identifies textual relations with visual clues. |
Changmeng Zheng; Junhao Feng; Ze Fu; Yi Cai; Qing Li; Tao Wang; |
| 246 | Exploiting BERT for Multimodal Target Sentiment Classification Through Input Space Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a two-stream model that translates images in input space using an object-aware transformer followed by a single-pass non-autoregressive text generation approach. |
Zaid Khan; Yun Fu; |
| 247 | Zero-shot Video Emotion Recognition Via Multimodal Protagonist-aware Transformer Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate a new task of zero-shot video emotion recognition, which aims to recognize rare unseen emotions. |
Fan Qi; Xiaoshan Yang; Changsheng Xu; |
| 248 | Hierarchical Multi-Task Learning for Diagram Question Answering with Multi-Modal Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing separate two-stage methods for DQA are limited in ineffective feedback mechanisms. To address this problem, in this paper, we propose a novel structural parsing-integrated Hierarchical Multi-Task Learning (HMTL) model for diagram question answering based on a multi-modal transformer framework. |
Zhaoquan Yuan; Xiao Peng; Xiao Wu; Changsheng Xu; |
| 249 | Semantic-Guided Relation Propagation Network for Few-shot Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel semantic-guided relation propagation network (SRPN), which leverages semantic information together with visual information for few-shot action recognition. |
Xiao Wang; Weirong Ye; Zhongang Qi; Xun Zhao; Guangge Wang; Ying Shan; Hanzi Wang; |
| 250 | Phoenix: Combining Highest-Profit First Scheduling and Responsive Congestion Control for Delay-sensitive Multimedia Transmission Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, a high-performance hybrid control algorithm is urgently needed to ensure the user’s QoE. In response to this problem, this paper proposes a scheduling algorithm based on transmission profit and a responsive congestion algorithm, and compared simulations in a variety of scenarios. |
Haozhe Li; |
| 251 | CARE: Cloudified Android OSes on The Cloud Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conducted an experiment by testing real-world applications on the percentage of unused resources to demonstrate the severity of this issue. Nearly 50\% of resources are unused.To solve this problem, we present CARE, the first framework intended to reduce the system-level redundancy by cloudifying the system from monolithic to Cloud-native. |
Dongjie Tang; Cathy Bao; Yong Yao; Chao Xie; Qiming Shi; Marc Mao; Randy Xu; Linsheng Li; Mohammad R. Haghighat; Zhengwei Qi; Haibing Guan; |
| 252 | Dynamic Knowledge Distillation with Cross-Modality Knowledge Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To coordinate the training dynamic, we propose to imbue our model the ability of dynamic distilling from multiple knowledge sources. |
Guangzhi Wang; |
| 253 | Local Graph Convolutional Networks for Cross-Modal Hashing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the known supervised deep methods mainly rely on the labeled information of datasets, which is insufficient to characterize the latent structures that exist among different modalities. To mitigate this problem, in this paper, we propose to use Graph Convolutional Networks (GCNs) to exploit the local structure information of datasets for cross-modal hash learning. |
Yudong Chen; Sen Wang; Jianglin Lu; Zhi Chen; Zheng Zhang; Zi Huang; |
| 254 | Video Representation Learning with Graph Contrastive Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel contrastive self-supervised video representation learning framework, termed Graph Contrastive Augmentation (GCA), by constructing a video temporal graph and devising a graph augmentation that is designed to enhance the correlation across frames of videos and developing a new view for exploring temporal structure in videos. |
Jingran Zhang; Xing Xu; Fumin Shen; Yazhou Yao; Jie Shao; Xiaofeng Zhu; |
| 255 | Few-Shot Multi-Agent Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel metric-based multi-agent FSL framework which has three main components: an efficient communication mechanism that propagates compact and fine-grained query feature maps from query agents to support agents; an asymmetric attention mechanism that computes region-level attention weights between query and support feature maps; and a metric-learning module which calculates the image-level relevance between query and support data fast and accurately. |
Chenyou Fan; Junjie Hu; Jianwei Huang; |
| 256 | TBRA: Tiling and Bitrate Adaptation for Mobile 360-Degree Video Streaming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce adaptive tiling into the conventional bitrate adaptation for mobile 360degree~video streaming. |
Lei Zhang; Yanyan Suo; Ximing Wu; Feng Wang; Yuchi Chen; Laizhong Cui; Jiangchuan Liu; Zhong Ming; |
| 257 | Unsupervised Image Deraining: Optimization Model Driven Deep CNN Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The opposite holds true for the model-driven unsupervised optimization methods. To overcome these problems, we propose a unified unsupervised learning framework which inherits the generalization and representation merits for real rain removal. |
Changfeng Yu; Yi Chang; Yi Li; Xile Zhao; Luxin Yan; |
| 258 | Exploring Graph-Structured Semantics for Cross-Modal Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a graph-based, semantic-constrained learning framework to comprehensively explore the intra- and inter-modality information for cross-modal retrieval. |
Lei Zhang; Leiting Chen; Chuan Zhou; Fan Yang; Xin Li; |
| 259 | Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, the U-shape structure exists some drawbacks and there is still a lot of room for improvement. Therefore, we propose a novel framework to treat semantic context, spatial detail and boundary information separately in the decoder part. |
Zhirui Zhao; Changqun Xia; Chenxi Xie; Jia Li; |
| 260 | Keyframe Extraction from Motion Capture Sequences with Graph Based Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing methods are optimization-based, which cause the issues of flexibility and efficiency and eventually constrains the interactions and controls with animators. To address these limitations, we propose a novel graph based deep reinforcement learning method for efficient unsupervised keyframe selection. |
Clinton Mo; Kun Hu; Shaohui Mei; Zebin Chen; Zhiyong Wang; |
| 261 | Anchor-free 3D Single Stage Detector with Mask-Guided Attention for Point Cloud Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an attentive module to fit the sparse feature maps to dense mostly on the object regions through the deformable convolution tower and the supervised mask-guided attention. |
Jiale Li; Hang Dai; Ling Shao; Yong Ding; |
| 262 | From Voxel to Point: IoU-guided 3D Object Detection for Point Cloud with Voxel-to-Point Decoder Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an Intersection-over-Union (IoU) guided two-stage 3D object detector with a voxel-to-point decoder. |
Jiale Li; Hang Dai; Ling Shao; Yong Ding; |
| 263 | Cascade Cross-modal Attention Network for Video Actor and Action Segmentation from A Sentence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem that selectively segments the actor and its action in the video clip given the sentence description. |
Weidong Chen; Guorong Li; Xinfeng Zhang; Hongyang Yu; Shuhui Wang; Qingming Huang; |
| 264 | Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to recycle the trained discriminator for another use: no-reference image quality assessment (NR-IQA). |
Yunan Zhu; Haichuan Ma; Jialun Peng; Dong Liu; Zhiwei Xiong; |
| 265 | VQMG: Hierarchical Vector Quantised and Multi-hops Graph Reasoning for Explicit Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose VQMG, a novel and unified framework for multi-hops relational reasoning and explicit representation learning. |
Lei Li; Chun Yuan; |
| 266 | FoodLogoDet-1500: A Dataset for Large-Scale Food Logo Detection Via Multi-Scale Feature Decoupling Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For that, we propose a novel food logo detection method Multi-scale Feature Decoupling Network (MFDNet), which decouples classification and regression into two branches and focuses on the classification branch to solve the problem of distinguishing multiple food logo categories. |
Qiang Hou; Weiqing Min; Jing Wang; Sujuan Hou; Yuanjie Zheng; Shuqiang Jiang; |
| 267 | Single Image 3D Object Estimation with Primitive Graph Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address those challenges, we adopt a primitive-based representation for 3D object, and propose a two-stage graph network for primitive-based 3D object estimation, which consists of a sequential proposal module and a graph reasoning module. |
Qian He; Desen Zhou; Bo Wan; Xuming He; |
| 268 | Focusing on Persons: Colorizing Old Images Learning from Modern Historical Movies Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, a HistoryNet including three parts, namely, classification, fine grained semantic parsing and colorization, is proposed. |
Xin Jin; Zhonglan Li; Ke Liu; Dongqing Zou; Xiaodong Li; Xingfan Zhu; Ziyin Zhou; Qilong Sun; Qingyu Liu; |
| 269 | Combining Attention with Flow for Person Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we observe that the commonly used spatial transformation blocks have complementary advantages. |
Yurui Ren; Yubo Wu; Thomas H. Li; Shan Liu; Ge Li; |
| 270 | Pixel-level Intra-domain Adaptation for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a pixel-level intra-domain adaptation approach to reduce the intra-domain gaps within the target data. |
Zizheng Yan; Xianggang Yu; Yipeng Qin; Yushuang Wu; Xiaoguang Han; Shuguang Cui; |
| 271 | Metric Learning for Anti-Compression Facial Forgery Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As forgery images and videos are usually compressed into different formats such as JPEG and H264 when circulating on the Internet, existing forgery-detection methods trained on uncompressed data often suffer from significant performance degradation in identifying them. To solve this problem, we propose a novel anti-compression facial forgery detection framework, which learns a compression-insensitive embedding feature space utilizing both original and compressed forgeries. |
Shenhao Cao; Qin Zou; Xiuqing Mao; Dengpan Ye; Zhongyuan Wang; |
| 272 | Focal and Composed Vision-semantic Modeling for Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a focal and composed vision-semantic modeling method, which is a trainable end-to-end model, for better vision-semantic redundancy removal and compositionality modeling. |
Yudong Han; Yangyang Guo; Jianhua Yin; Meng Liu; Yupeng Hu; Liqiang Nie; |
| 273 | SINGA-Easy: An Easy-to-Use Framework for MultiModal Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Further, in terms of adaptability, elastic computation solutions are much needed as the actual serving workload fluctuates constantly, and scaling the hardware resources to handle the fluctuating workload is typically infeasible. To address these challenges, we introduce SINGA-Easy, a new deep learning framework that provides distributed hyper-parameter tuning at the training stage, dynamic computational cost control at the inference stage, and intuitive user interactions with multimedia contents facilitated by model explanation. |
Naili Xing; Sai Ho Yeung; Cheng-Hao Cai; Teck Khim Ng; Wei Wang; Kaiyuan Yang; Nan Yang; Meihui Zhang; Gang Chen; Beng Chin Ooi; |
| 274 | Cross-modal Self-Supervised Learning for Lip Reading: When Contrastive Learning Meets Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this work is to learn discriminative visual representations for lip reading without access to manual text annotation. |
Changchong Sheng; Matti Pietik\{a}inen; Qi Tian; Li Liu; |
| 275 | Domain Generalization Via Feature Variation Decorrelation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by an observation that the class-irrelevant information of sample in the form of semantic variation would lead to negative transfer, we propose to linearly disentangle the variation out of sample in feature space and impose a novel class decorrelation regularization on the feature variation. |
Chang Liu; Lichen Wang; Kai Li; Yun Fu; |
| 276 | Delving Into Deep Image Prior for Adversarial Defense: A Novel Reconstruction-based Defense Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To defend against adversarial attacks in a training-free and attack-agnostic manner, this work proposes a novel and effective reconstruction-based defense framework by delving into deep image prior (DIP). |
Li Ding; Yongwei Wang; Xin Ding; Kaiwen Yuan; Ping Wang; Hua Huang; Z. Jane Wang; |
| 277 | Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the retrieval performance remains unsatisfactory due to a lack of consistent representation in both semantics and structural spaces. In this work, we propose to address the above issue from two aspects: (i) constructing intrinsic structure (along with relations) among the fragments of respective modalities, e.g., dog → play → ball in semantic structure for an image, and (ii) seeking explicit inter-modal structural and semantic correspondence between the visual and textual modalities.In this paper, we propose a novel Structured Multi-modal Feature Embedding and Alignment (SMFEA) model for image-sentence retrieval. |
Xuri Ge; Fuhai Chen; Joemon M. Jose; Zhilong Ji; Zhongqin Wu; Xiao Liu; |
| 278 | Self-Contrastive Learning with Hard Negative Sampling for Self-supervised Point Cloud Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Significant progress has been made in methods for point cloud analysis, which often requires costly human annotation as supervision. To address this issue, we propose a novel self-contrastive learning for self-supervised point cloud representation learning, aiming to capture both local geometric patterns and nonlocal semantic primitives based on the nonlocal self-similarity of point clouds. |
Bi’an Du; Xiang Gao; Wei Hu; Xin Li; |
| 279 | Boosting End-to-end Multi-Object Tracking and Person Search Via Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we dissect the gap between two-step and end-to-end strategy and propose a simple yet effective end-to-end framework with knowledge distillation. |
Wei Zhang; Lingxiao He; Peng Chen; Xingyu Liao; Wu Liu; Qi Li; Zhenan Sun; |
| 280 | Learning to Understand Traffic Signs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, there has been no public dataset for traffic sign understanding research. Our work takes the first step towards addressing this problem. |
Yunfei Guo; Wei Feng; Fei Yin; Tao Xue; Shuqi Mei; Cheng-Lin Liu; |
| 281 | CDD: Multi-view Subspace Clustering Via Cross-view Diversity Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus we can directly obtain the final clustering result without any postprocessing as each connected component precisely corresponds to an individual cluster. We model the above concerns into a unified optimization framework. |
Shudong Huang; Ivor W. Tsang; Zenglin Xu; Jiancheng Lv; Quanhui Liu; |
| 282 | MovieREP: A New Movie Reproduction Framework for Film Soundtrack Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel optical imaging based reproduction framework is proposed with the basic idea that restoring film audio damage in the image domain. |
Ruiqi Wang; Long Ye; Qin Zhang; |
| 283 | One-stage Context and Identity Hallucination Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, with the help of hallucination maps, we introduce an effectively improved reconstruction loss to utilize unlimited unpaired face images for training. |
Yinglu Liu; Mingcan Xiang; Hailin Shi; Tao Mei; |
| 284 | WePerson: Learning A Generalized Re-identification Model from All-weather Virtual Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, real data annotation is costly and model generalization ability is hindered by the lack of large-scale and diverse data. To address this problem, we propose a Weather Person pipeline that can generate a synthesized Re-ID dataset with different weather, scenes, and natural lighting conditions automatically. |
He Li; Mang Ye; Bo Du; |
| 285 | Quality Assessment of End-to-End Learned Image Compression: The Benchmark and Objective Measure Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we construct a large-scale image database for quality assessment of compressed images. |
Yang Li; Shiqi Wang; Xinfeng Zhang; Shanshe Wang; Siwei Ma; Yue Wang; |
| 286 | PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Parallel, Iterative and Mimicking Network (PIMNet) to balance accuracy and efficiency. |
Zhi Qiao; Yu Zhou; Jin Wei; Wei Wang; Yuan Zhang; Ning Jiang; Hongbin Wang; Weiping Wang; |
| 287 | End-to-End Video Object Detection with Spatial-Temporal Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present TransVOD, an end-to-end video object detection model based on a spatial-temporal Transformer architecture. |
Lu He; Qianyu Zhou; Xiangtai Li; Li Niu; Guangliang Cheng; Xiao Li; Wenxuan Liu; Yunhai Tong; Lizhuang Ma; Liqing Zhang; |
| 288 | MageAdd: Real-Time Interaction Simulation for Scene Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, manual scene modelling can always ensure high quality, but requires a cumbersome trial-and-error process. In this paper, we bridge the above gap by presenting a data-driven 3D scene synthesis framework that can intelligently infer objects to the scene by incorporating and simulating user preferences with minimum input. |
Shao-Kui Zhang; Yi-Xiao Li; Yu He; Yong-Liang Yang; Song-Hai Zhang; |
| 289 | Domain-Aware SE Network for Sketch-based Image Retrieval with Multiplicative Euclidean Margin Softmax Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel approach for Sketch-Based Image Retrieval (SBIR), for which the key is to bridge the gap between sketches and photos in terms of the data representation. |
Peng Lu; Gao Huang; Hangyu Lin; Wenming Yang; Guodong Guo; Yanwei Fu; |
| 290 | Multi-view Clustering Via Deep Matrix Factorization and Partition Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: iii) The partition level information has not been utilized in existing work. To solve the above issues, we propose a novel multi-view clustering algorithm via deep matrix decomposition and partition alignment. |
Chen Zhang; Siwei Wang; Jiyuan Liu; Sihang Zhou; Pei Zhang; Xinwang Liu; En Zhu; Changwang Zhang; |
| 291 | Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unlike the prior work where systems make decision instantaneously using short-term features, we propose a novel framework, named TalkNet, that makes decision by taking both short-term and long-term features into consideration. |
Ruijie Tao; Zexu Pan; Rohan Kumar Das; Xinyuan Qian; Mike Zheng Shou; Haizhou Li; |
| 292 | SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although prior works that explore lip reading have obtained salient achievements, they are all trained in a non-simultaneous manner where the predictions are generated requiring access to the full video. To breakthrough this constraint, we study the task of simultaneous lip reading and devise SimulLR, a simultaneous lip Reading transducer with attention-guided adaptive memory from three aspects: (1) To address the challenge of monotonic alignments while considering the syntactic structure of the generated sentences under simultaneous setting, we build a transducer-based model and design several effective training strategies including CTC pre-training, model warm-up and curriculum learning to promote the training of the lip reading transducer. |
Zhijie Lin; Zhou Zhao; Haoyuan Li; Jinglin Liu; Meng Zhang; Xingshan Zeng; Xiaofei He; |
| 293 | On-demand Action Detection System Using Pose Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, it is still very difficult to apply these methods for industrial applications where the actions of interests might happen rarely in real scenarios such as criminal or suspicious behaviors, because it is impossible to collect a large number of such training data for target actions. In this paper, we disruptively abandon these conventional methods, alternatively, adopting an on-demand retrieval approach using pose information to handle the action detection task. |
Noboru Yoshida; Jianquan Liu; |
| 294 | Visual Language Based Succinct Zero-Shot Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering that the redundant modification may increase the risk of over-fitting in seen classes and reduce generalization performance on unseen classes, we propose a visual language based succinct zero-shot object detection framework, which only replaces the classification branch in the modern object detector with a lightweight visual-language network. |
Ye Zheng; Xi Huang; Li Cui; |
| 295 | Conceptual and Syntactical Cross-modal Alignment with Cross-level Consistency for Image-Text Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In theory, a sentence is not only a set of words or phrases but also a syntactic structure, consisting of a set of basic syntactic tuples (i.e.(attribute) object – predicate – (attribute) subject). Inspired by this, we propose a Conceptual and Syntactical Cross-modal Alignment with Cross-level Consistency (CSCC) for Image-text Matching by simultaneously exploring the multiple-level cross-modal alignments across the concept and syntactic with a consistency constraint. |
Pengpeng Zeng; Lianli Gao; Xinyu Lyu; Shuaiqi Jing; Jingkuan Song; |
| 296 | MM21 Pre-training for Video Understanding Challenge: Video Captioning with Pretraining Techniques Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose single-modality pretrained feature fusion technique which is composed of reasonable multi-view feature extraction method and designed multi-modality feature fusion strategy. |
Sihan Chen; Xinxin Zhu; Dongze Hao; Wei Liu; Jiawei Liu; Zijia Zhao; Longteng Guo; Jing Liu; |
| 297 | Shape Controllable Virtual Try-on for Underwear Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we put forward an akin task that aims to dress clothing for underwear models. |
Xin Gao; Zhenjiang Liu; Zunlei Feng; Chengji Shen; Kairi Ou; Haihong Tang; Mingli Song; |
| 298 | Two-stage Visual Cues Enhancement Network for Referring Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such matching behaviors are hard to be learned and captured when the visual cues of referents (i.e. referred objects) are insufficient, as the referents of weak visual cues tend to be easily confused by cluttered background at boundary or even overwhelmed by salient objects in the image. And the insufficient visual cues issue can not be handled by the cross-modal fusion mechanisms as done in previous work.In this paper, we tackle this problem from a novel perspective of enhancing the visual information for the referents by devising a Two-stage Visual cues enhancement Network (TV-Net), where a novel Retrieval and Enrichment Scheme (RES) and an Adaptive Multi-resolution feature Fusion (AMF) module are proposed. |
Yang Jiao; Zequn Jie; Weixin Luo; Jingjing Chen; Yu-Gang Jiang; Xiaolin Wei; Lin Ma; |
| 299 | PUGCQ: A Large Scale Dataset for Quality Assessment of Professional User-Generated Content Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we systematically conduct the comprehensive study on the perceptual quality of PUGC videos and introduce a database consisting of 10,000 PUGC videos with subjective ratings. |
Guo Li; Baoliang Chen; Lingyu Zhu; Qinwen He; Hongfei Fan; Shiqi Wang; |
| 300 | Diverse Image Inpainting with Bidirectional and Autoregressive Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose BAT-Fill, an innovative image inpainting framework that introduces a novel bidirectional autoregressive transformer (BAT) for image inpainting. |
Yingchen Yu; Fangneng Zhan; Rongliang WU; Jianxiong Pan; Kaiwen Cui; Shijian Lu; Feiying Ma; Xuansong Xie; Chunyan Miao; |
| 301 | Trajectory Is Not Enough: Hidden Following Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose hidden follower” detection (HFD) task and a HFD model based on gaze pattern extraction. |
Danni Xu; Ruimin Hu; Zixiang Xiong; Zheng Wang; Linbo Luo; Dengshi Li; |
| 302 | L2RS: A Learning-to-Rescore Mechanism for Hybrid Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to advance the performance of industrial ASR systems by exploring a more effective method for N-best rescoring, a critical step that greatly affects the final recognition accuracy. |
Yuanfeng Song; Di Jiang; Xuefang Zhao; Qian Xu; Raymond Chi-Wing Wong; Lixin Fan; Qiang Yang; |
| 303 | SmartSales: An AI-Powered Telemarketing Coaching System in FinTech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, training telesales representatives is always a pain point for enterprises since it is usually conducted manually and costs great effort and time. In this demonstration, we propose a telemarketing coaching system named SmartSales to help enterprises develop better salespeople. |
Yuanfeng Song; Xuefang Zhao; Di Jiang; Xiaoling Huang; Weiwei Zhao; Qian Xu; Raymond Chi-Wing Wong; Qiang Yang; |
| 304 | SmartMeeting: Automatic Meeting Transcription and Summarization for In-Person Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A successful AMTS system relies on systematic integration of multiple natural language processing (NLP) techniques, such as automatic speech recognition, speaker identification, and meeting summarization, which are traditionally developed separately and validated offline with standard datasets. In this demonstration, we provide a novel productive meeting tool named SmartMeeting, which enables users to automatically record, transcribe, summarize, and manage the information in an in-person meeting. |
Yuanfeng Song; Di Jiang; Xuefang Zhao; Xiaoling Huang; Qian Xu; Raymond Chi-Wing Wong; Qiang Yang; |
| 305 | MGH: Metadata Guided Hypergraph Modeling for Unsupervised Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the real world, such metadata is normally available alongside captured images, and thus plays an important role in separating several hard ReID matches. With this motivation in mind, we propose MGH, a novel unsupervised person ReID approach that uses meta information to construct a hypergraph for feature learning and label refinement. |
Yiming Wu; Xintian Wu; Xi Li; Jian Tian; |
| 306 | Skeleton-Aware Neural Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, skeleton has not been fully studied for Sign Language Translation (SLT), especially for end-to-end SLT. Therefore, in this paper, we propose a novel end-to-end Skeleton-Aware neural Network (SANet) for video-based SLT. |
Shiwei Gan; Yafeng Yin; Zhiwei Jiang; Lei Xie; Sanglu Lu; |
| 307 | Learning What and When to Drop: Adaptive Multimodal and Contextual Dynamics for Emotion Recognition in Conversation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present MetaDrop, a differentiable and end-to-end approach for the ERC task that learns module-wise decisions across modalities and conversation flows simultaneously, which supports adaptive information sharing pattern and dynamic fusion paths. |
Feiyu Chen; Zhengxiao Sun; Deqiang Ouyang; Xueliang Liu; Jie Shao; |
| 308 | Learning Sample-Specific Policies for Sequential Image Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a policy-driven sequential image augmentation approach for image-related tasks. |
Pu Li; Xiaobai Liu; Xiaohui Xie; |
| 309 | Deconfounded and Explainable Interactive Vision-Language Retrieval of Complex Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus we propose a deconfounded explainable vision-language retrieval system. |
Junda Wu; Tong Yu; Shuai Li; |
| 310 | HANet: Hierarchical Alignment Networks for Video-Text Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Some works exploit the local details by disentangling sentences, but overlook the corresponding videos, causing the asymmetry of video-text representation. To address the above limitations, we propose a Hierarchical Alignment Network (HANet) to align different level representations for video-text matching. |
Peng Wu; Xiangteng He; Mingqian Tang; Yiliang Lv; Jing Liu; |
| 311 | ReLLIE: Deep Reinforcement Learning for Customized Low-Light Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Low-light image enhancement (LLIE) is a pervasive yet challenging problem, since: 1) low-light measurements may vary due to different imaging conditions in practice; 2) images can be enlightened subjectively according to diverse preference by each individual. To tackle these two challenges, this paper presents a novel deep reinforcement learning based method, dubbed ReLLIE, for customized low-light enhancement. |
Rongkai Zhang; Lanqing Guo; Siyu Huang; Bihan Wen; |
| 312 | From Synthetic to Real: Image Dehazing Collaborating with Unlabeled Real Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Single image dehazing is a challenging task, for which the domain shift between synthetic training data and real-world testing images usually leads to degradation of existing methods. To address this issue, we propose a novel image dehazing framework collaborating with unlabeled real data. |
Ye Liu; Lei Zhu; Shunda Pei; Huazhu Fu; Jing Qin; Qing Zhang; Liang Wan; Wei Feng; |
| 313 | JPGNet: Joint Predictive Filtering and Generative Network for Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, for the first time, we formulate image inpainting as a mix of two problems, i.e., predictive filtering and deep generation. |
Qing Guo; Xiaoguang Li; Felix Juefei-Xu; Hongkai Yu; Yang Liu; Song Wang; |
| 314 | Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation) Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel multiview detector, MVDeTr, that adopts a newly introduced shadow transformer to aggregate multiview information. |
Yunzhong Hou; Liang Zheng; |
| 315 | Pairwise VLAD Interaction Network for Video Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: From human’s perspective, answering a video question should first summarize both visual and language information, and then explore their correlations for answer reasoning. In this paper, we propose a new method called Pairwise VLAD Interaction Network (PVI-Net) to address this problem. |
Hui Wang; Dan Guo; Xian-Sheng Hua; Meng Wang; |
| 316 | Neighbor-view Enhanced Model for Vision and Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a multi-module Neighbor-View Enhanced Model (NvEM) to adaptively incorporate visual contexts from neighbor views for better textual-visual matching. |
Dong An; Yuankai Qi; Yan Huang; Qi Wu; Liang Wang; Tieniu Tan; |
| 317 | VLAD-VSA: Cross-Domain Face Presentation Attack Detection with Vocabulary Separation and Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, the VLAD aggregation method is adopted to quantize local features with visual vocabulary locally partitioning the feature space, and hence preserve the local discriminability. |
Jiong Wang; Zhou Zhao; Weike Jin; Xinyu Duan; Zhen Lei; Baoxing Huai; Yiling Wu; Xiaofei He; |
| 318 | Multi-caption Text-to-Face Synthesis: Dataset and Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We accordingly propose a Semantic Embedding and Attention (SEA-T2F) network that allows multiple captions as input to generate highly semantically related face images. |
Jianxin Sun; Qi Li; Weining Wang; Jian Zhao; Zhenan Sun; |
| 319 | A Virtual Character Generation and Animation System for E-Commerce Live Streaming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Virtual character has been widely adopted in many areas, such as virtual assistant, virtual customer service, robotics and etc. In this paper, we focus on its application in e-commerce live streaming. |
Li Hu; Bang Zhang; Peng Zhang; Jinwei Qi; Jian Cao; Daiheng Gao; Haiming Zhao; Xiaoduan Feng; Qi Wang; Lian Zhuo; Pan Pan; Yinghui Xu; |
| 320 | Text-driven 3D Avatar Animation with Emotional and Expressive Behaviors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a practical system which drives both facial and body movements of 3D avatar by text input. |
Li Hu; Jinwei Qi; Bang Zhang; Pan Pan; Yinghui Xu; |
| 321 | Annotation-Efficient Semantic Segmentation with Shape Prior Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this research, we study the problem of reducing the annotation cost of segmentation network training with a focus on exploring the shape prior knowledge of objects. |
Yuhang Lu; |
| 322 | SuperFront: From Low-resolution to High-resolution Frontal Face Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Even the most impressive achievement in frontal face synthesis is challenged by large poses and low-quality data given one single side-view face. We propose a synthesizer called SuperFront GAN (SF-GAN) to accept one or more low-resolution (LR) faces at the input to then output a high-resolution (HR) frontal face with various poses and such to preserve identity information. |
Yu Yin; Joseph P. Robinson; Songyao Jiang; Yue Bai; Can Qin; Yun Fu; |
| 323 | GLM-Net: Global and Local Motion Estimation Via Task-Oriented Encoder-Decoder Structure Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the problem of separating the global camera motion and the local dynamic motion from an optical flow. |
Yuchen Yang; Ye Xiang; Shuaicheng Liu; Lifang Wu; Boxuan Zhao; Bing Zeng; |
| 324 | Video Transformer for Deepfake Detection with Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel video transformer with incremental learning for detecting deepfake videos. |
Sohail Ahmed Khan; Hang Dai; |
| 325 | Theophany: Multimodal Speech Augmentation in Instantaneous Privacy Channels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These factors lead conversation participants to speak louder and more distinctively, exposing the content to potential eavesdroppers. To address these issues, we introduce Theophany, a privacy-preserving framework for augmenting speech. |
Abhishek Kumar; Tristan Braud; Lik Hang Lee; Pan Hui; |
| 326 | JokerGAN: Memory-Efficient Model for Handwritten Text Generation with Text Line Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent work on handwritten text generation shows that generative models can be used as a data augmentation method to improve the performance of HTR systems.We propose a new method for handwritten text generation that uses generative adversarial networks with multi-class conditional batch normalization, which enables us to use character sequences with variable lengths as conditional input. |
Jan Zdenek; Hideki Nakayama; |
| 327 | Multi-Modal Multi-Instance Learning for Retinal Disease Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose in this paper Multi-Modal Multi-Instance Learning (MM-MIL) for selectively fusing CFP and OCT modalities. |
Xirong Li; Yang Zhou; Jie Wang; Hailan Lin; Jianchun Zhao; Dayong Ding; Weihong Yu; Youxin Chen; |
| 328 | Multi-Level Visual Representation with Semantic-Reinforced Learning for Video Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our bronze-medal solution for the video captioning task of the ACMMM2021 Pre-Training for Video Understanding Challenge. |
Chengbo Dong; Xinru Chen; Aozhu Chen; Fan Hu; Zihan Wang; Xirong Li; |
| 329 | Viewing from Frequency Domain: A DCT-based Information Enhancement Network for Video Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make full use of the rich information in video sequences, this paper proposes a Discrete Cosine Transform based Information Enhancement Network (DCT-IEN) to achieve more comprehensive spatio-temporal representation from frequency domain. |
Liangchen Liu; Xi Yang; Nannan Wang; Xinbo Gao; |
| 330 | HDA-Net: Horizontal Deformable Attention Network for Stereo Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient horizontal attention module to adaptively capture the global correspondence clues. |
Qi Zhang; Xuesong Zhang; Baoping Li; Yuzhong Chen; Anlong Ming; |
| 331 | APF: An Adversarial Privacy-preserving Filter to Protect Portrait Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Online photo sharing services accidentally act as the main approach for the malicious crawlers to exploit face recognition to access portrait privacy. In this demo, we propose an adversarial privacy-preserving filter, which can preserve face image from malicious face recognition algorithms. |
Xian Zhao; Jiaming Zhang; Xiaowen Huang; |
| 332 | Heterogeneous Feature Fusion and Cross-modal Alignment for Composed Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issues, we propose an end-to-end framework for composed image retrieval, which consists of three key components including Multi-modal Complementary Fusion (MCF), Cross-modal Guided Pooling (CGP), and Relative Caption-aware Consistency (RCC). |
Gangjian Zhang; Shikui Wei; Huaxin Pang; Yao Zhao; |
| 333 | Camera-Agnostic Person Re-Identification Via Adversarial Disentangling Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the success of single-domain person re-identification (ReID), current supervised models degrade dramatically when deployed to unseen domains, mainly due to the discrepancy across cameras. To tackle this issue, we propose an Adversarial Disentangling Learning (ADL) framework to decouple camera-related and ID-related features, which can be readily used for camera-agnostic person ReID. |
Hao Ni; Jingkuan Song; Xiaosu Zhu; Feng Zheng; Lianli Gao; |
| 334 | Simplifying Multimodal Emotion Recognition with Single Eye Movement Modality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To increase the feasibility and the generalization ability of emotion decoding without compromising the performance, we propose a generative adversarial network-based framework. |
Xu Yan; Li-Ming Zhao; Bao-Liang Lu; |
| 335 | Diffusing The Liveness Cues for Face Anti-spoofing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Patch-based Compact Graph Network (PCGN) to diffuse the subtle liveness cues from all the patches. |
Sheng Li; Xun Zhu; Guorui Feng; Xinpeng Zhang; Zhenxing Qian; |
| 336 | Understanding Chinese Video and Language Via Contrastive Multimodal Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel video-language understanding framework named Victor, which stands for VIdeo-language understanding via Contrastive mulTimOdal pRe-training. |
Chenyi Lei; Shixian Luo; Yong Liu; Wanggui He; Jiamang Wang; Guoxin Wang; Haihong Tang; Chunyan Miao; Houqiang Li; |
| 337 | Semantic Scalable Image Compression with Cross-Layer Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel end-to-end semantic scalable image compression method, which progressively compresses coarse-grained semantic features, fine-grained semantic features, and image signals. |
Hanyue Tu; Li Li; Wengang Zhou; Houqiang Li; |
| 338 | VoteHMR: Occlusion-Aware Voting Network for Robust 3D Human Mesh Recovery from Partial Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we make the first attempt to reconstruct reliable 3D human shapes from single-frame partial point clouds. To achieve this, we propose an end-to-end learnable method, named VoteHMR. |
Guanze Liu; Yu Rong; Lu Sheng; |
| 339 | Robust Logo Detection in E-Commerce Images By Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the paper, we introduce our solution for the ACM MM2021 Robust Logo Detection Grand Challenge. |
Hang Chen; Xiao Li; Zefan Wang; Xiaolin Hu; |
| 340 | Implicit Feature Refinement for Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel implicit feature refinement module for high-quality instance segmentation. |
Lufan Ma; Tiancai Wang; Bin Dong; Jiangpeng Yan; Xiu Li; Xiangyu Zhang; |
| 341 | MBRS: Enhancing Robustness of DNN-based Watermarking By Mini-Batch of Real and Simulated JPEG Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we found that none of the existing framework can well ensure the robustness against JPEG compression, which is non-differential but is an essential and important image processing operation. To address such limitations, we proposed a novel end-to-end training architecture, which utilizes Mini-Batch of Real and Simulated JPEG compression (MBRS) to enhance the JPEG robustness. |
Zhaoyang Jia; Han Fang; Weiming Zhang; |
| 342 | From Superficial to Deep: Language Bias Driven Curriculum Learning for Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the fact that VQA samples with different levels of language bias contribute differently for answer prediction, in this paper, we overcome the language prior problem by proposing a novel Language Bias driven Curriculum Learning (LBCL) approach, which employs an easy-to-hard learning strategy with a novel difficulty metric Visual Sensitive Coefficient (VSC). |
Mingrui Lao; Yanming Guo; Yu Liu; Wei Chen; Nan Pu; Michael S. Lew; |
| 343 | Automated Playtesting with A Cognitive Model of Sensorimotor Coordination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on a cognitive model of sensorimotor coordination that explains the human button input process, we propose a novel automated playtesting technique that predicts the game difficulty experienced by players with different skills in moving-target acquisition (MTA) games. |
Injung Lee; Hyunchul Kim; Byungjoo Lee; |
| 344 | Elastic Tactile Simulation Towards Tactile-Visual Perception Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Elastic Interaction of Particles (EIP) for tactile simulation, which is capable of reflecting the elastic property of the tactile sensor as well as characterizing the fine-grained physical interaction during contact. |
Yikai Wang; Wenbing Huang; Bin Fang; Fuchun Sun; Chang Li; |
| 345 | Scalable Multi-view Subspace Clustering with Unified Anchors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the complementary multi-view information has not been well utilized since the graphs are constructed independently by the anchors from the corresponding views. To address these issues, we propose a Scalable Multi-view Subspace Clustering with Unified Anchors (SMVSC). |
Mengjing Sun; Pei Zhang; Siwei Wang; Sihang Zhou; Wenxuan Tu; Xinwang Liu; En Zhu; Changjian Wang; |
| 346 | Towards Controllable and Photorealistic Region-wise Image Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a generative model with auto-encoder architecture for per-region style manipulation. |
Ansheng You; Chenglin Zhou; Qixuan Zhang; Lan Xu; |
| 347 | MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve it, in this paper, we present a novel multi-feature space joint optimization (MSO) network, which can learn modality-sharable features in both the single-modality space and the common space. |
Yajun Gao; Tengfei Liang; Yi Jin; Xiaoyan Gu; Wu Liu; Yidong Li; Congyan Lang; |
| 348 | Cross-View Exocentric to Egocentric Video Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the exocentric (third-person) view to egocentric (first-person) view video generation task. |
Gaowen Liu; Hao Tang; Hugo M. Latapie; Jason J. Corso; Yan Yan; |
| 349 | Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel unsupervised 3D face reconstruction architecture by leveraging the multi-view geometry constraints to train accurate face pose and depth maps. |
Yuxing Wang; Yawen Lu; Zhihua Xie; Guoyu Lu; |
| 350 | Visible Watermark Removal Via Self-calibrated Localization and Background Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Modern watermark removal methods perform watermark localization and background restoration simultaneously, which could be viewed as a multi-task learning problem. |
Jing Liang; Li Niu; Fengjun Guo; Teng Long; Liqing Zhang; |
| 351 | Few-shot Fine-Grained Action Recognition Via Bidirectional Attention and Contrastive Meta-Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications, whereas the data of rare fine-grained categories is very limited. Therefore, we propose the few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class. |
Jiahao Wang; Yunhong Wang; Sheng Liu; Annan Li; |
| 352 | A Novel Patch Convolutional Neural Network for View-based 3D Model Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To capture associations among views, in this work, we propose a novel patch convolutional neural network (PCNN ) for view-based 3D model retrieval. |
Zan Gao; Yuxiang Shao; Weili Guan; Meng Liu; Zhiyong Cheng; Shengyong Chen; |
| 353 | SSconv: Explicit Spectral-to-Spatial Convolution for Pansharpening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an explicit spectral-to-spatial convolution (SSconv) that aggregates spectral features into the spatial domain to perform the up-sampling operation, which can get better performance than the direct up-sampling. |
Yudong Wang; Liang-Jian Deng; Tian-Jing Zhang; Xiao Wu; |
| 354 | Chinese Character Inpainting with Contextual Semantic Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a semantics enhanced generative framework for Chinese character inpainting, where a global semantic supervising module (GSSM) is introduced to constrain contextual semantics. |
Jiahao Wang; Gang Pan; Di Sun; Jiawan Zhang; |
| 355 | Latent Memory-augmented Graph Transformer for Visual Storytelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel Latent Memory-augmented Graph Transformer~(LMGT ), a Transformer based framework for visual story generation. |
Mengshi Qi; Jie Qin; Di Huang; Zhiqiang Shen; Yi Yang; Jiebo Luo; |
| 356 | How Video Super-Resolution and Frame Interpolation Mutually Benefit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we design a spatial-temporal super-resolution network based on exploring the interaction between VSR and VFI. |
Chengcheng Zhou; Zongqing Lu; Linge Li; Qiangyu Yan; Jing-Hao Xue; |
| 357 | Text to Scene: A System of Configurable 3D Indoor Scene Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show the Text to Scene system, which can configure 3D indoor scene from natural language. |
Xinyan Yang; Fei Hu; Long Ye; |
| 358 | A System for Interactive and Intelligent AD Auxiliary Screening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is inefficient and dependents largely on the doctor’s subjective judgment and experience level. Therefore, we propose an Interactive and Intelligent AD Auxiliary Screening (IAS) system consisting of speech-based Interactive Unit Testing Module (IUTM) and truth-based Intelligent Analysis Module (IAM), both of which are developed by deep learning techniques. |
Sen Yang; Qike Zhao; Lanxin Miao; Min Chen; Lianli Gao; Jingkuan Song; Weidong Le; |
| 359 | A Multi-Domain Adaptive Graph Convolutional Network for EEG-based Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Multi-Domain Adaptive Graph Convolutional Network (MD-AGCN), fusing the knowledge of both the frequency domain and the temporal domain to fully utilize the complementary information of EEG signals. |
Rui Li; Yiting Wang; Bao-Liang Lu; |
| 360 | Scene Graph with 3D Information for Change Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is an interesting task under-explored with two main challenges: describing the relative position relationship between objects correctly and overcoming the disturbances from viewpoint changes. To address these issues, we propose a three-dimensional (3D) information aware Scene Graph based Change Captioning (SGCC) model. |
Zeming Liao; Qingbao Huang; Yu Liang; Mingyi Fu; Yi Cai; Qing Li; |
| 361 | A Transformer Based Approach for Image Manipulation Chain Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the exponentially increased solution space and the complex interactions among operations, how to reveal a long chain from a processed image remains a long-standing problem in the multimedia forensic community. To address this challenge, in this paper, we propose a new direction for manipulation chain detection. |
Jiaxiang You; Yuanman Li; Jiantao Zhou; Zhongyun Hua; Weiwei Sun; Xia Li; |
| 362 | Token Shift Transformer for Video Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents Token Shift Module (i.e., TokShift), a novel, zero-parameter, zero-FLOPs operator, for modeling temporal relations within each transformer encoder. |
Hao Zhang; Yanbin Hao; Chong-Wah Ngo; |
| 363 | X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose X-modaler — a versatile and high-performance codebase that encapsulates the state-of-the-art cross-modal analytics into several general-purpose stages (e.g., pre-processing, encoder, cross-modal interaction, decoder, and decode strategy). |
Yehao Li; Yingwei Pan; Jingwen Chen; Ting Yao; Tao Mei; |
| 364 | CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing solutions dominantly capitalize on the multi-modal inputs with mask tokens to trigger mask-based proxy pre-training tasks (e.g., masked language modeling and masked object/frame prediction). In this work, we argue that such masked inputs would inevitably introduce noise for cross-modal matching proxy task, and thus leave the inherent vision-language association under-explored. |
Jianjie Luo; Yehao Li; Yingwei Pan; Ting Yao; Hongyang Chao; Tao Mei; |
| 365 | Embracing The Dark Knowledge: Domain Generalization Using Regularized Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple, effective, and plug-and-play training strategy named Knowledge Distillation for Domain Generalization (KDDG) which is built upon a knowledge distillation framework with the gradient filter as a novel regularization term. |
Yufei Wang; Haoliang Li; Lap-pui Chau; Alex C. Kot; |
| 366 | Personalized Multi-modal Video Retrieval on Mobile Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current video retrieval systems on mobile devices cannot process complex natural language queries, especially if they contain personalized concepts, such as proper names. To address these shortcomings, we propose an efficient and privacy-preserving video retrieval system that works well with personalized queries containing proper names, without re-training using personalized labelled data from users. |
Haotian Zhang; Allan D. Jepson; Iqbal Mohomed; Konstantinos G. Derpanis; Ran Zhang; Afsaneh Fazly; |
| 367 | SVHAN: Sequential View Based Hierarchical Attention Network for 3D Shape Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Finally, roughly aggregating multi-view features leads to the loss of descriptive information, which limits the shape representation effectiveness. To handle these issues, we propose a novel sequential view based hierarchical attention network (SVHAN) for 3D shape recognition. |
Yue Zhao; Weizhi Nie; An-An Liu; Zan Gao; Yuting Su; |
| 368 | VideoDiscovery: An Automatic Short-Video Generation System for E-commerce Live-streaming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate an end-to-end intelligent system of short-video generation for live-streaming, namely VideoDiscovery”, which aims to automatically produce batches of high-value short-videos by discovering and organizing highlight content for commodity delivery. |
Yanhao Zhang; Qiang Wang; Yun Zheng; Pan Pan; Yinghui Xu; |
| 369 | SSPU-Net: Self-Supervised Point Cloud Upsampling Via Differentiable Rendering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a self-supervised point cloud upsampling network (SSPU-Net) to generate dense point clouds without using ground truth. |
Yifan Zhao; Le Hui; Jin Xie; |
| 370 | Disentangled Representation Learning and Enhancement Network for Single Image De-Raining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a disentangled representation learning and enhancement network (DRLE-Net) to address the challenging single image de-raining problems, i.e., raindrop and rain streak removal. |
Guoqing Wang; Changming Sun; Xing Xu; Jingjing Li; Zheng Wang; Zeyu Ma; |
| 371 | Anti-Distillation Backdoor Attacks: Backdoors Can Really Survive in Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we, for the first time, propose a novel Anti-Distillation Backdoor Attack (ADBA), in which the backdoor embedded in the public teacher model can survive the knowledge distillation process and thus be transferred to secret distilled student models. |
Yunjie Ge; Qian Wang; Baolin Zheng; Xinlu Zhuang; Qi Li; Chao Shen; Cong Wang; |
| 372 | Dense Semantic Contrast for Self-Supervised Visual Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Concretely, these downstream tasks require more accurate representation, in other words, the pixels from the same object must belong to a shared semantic category, which is lacking in the previous methods. In this work, we present Dense Semantic Contrast (DSC) for modeling semantic category decision boundaries at a dense level to meet the requirement of these tasks. |
Xiaoni Li; Yu Zhou; Yifei Zhang; Aoting Zhang; Wei Wang; Ning Jiang; Haiying Wu; Weiping Wang; |
| 373 | Occlusion-aware Bi-directional Guided Network for Light Field Salient Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a learning-based network to exploit occlusion features from EPIs and integrate high-level features from the central view for accurate salient object detection. |
Dong Jing; Shuo Zhang; Runmin Cong; Youfang Lin; |
| 374 | Towards Reasoning Ability in Scene Text Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, our observations indicate that recent accuracy improvement in TextVQA is mainly contributed by stronger OCR engines, better pre-training strategies and more Transformer layers, instead of newly proposed networks. In this work, towards the reasoning ability, we 1) conduct module-wise contribution analysis to quantitatively investigate how existing works improve accuracies in TextVQA; 2) design a gradient-based explainability method to explore why TextVQA models answer what they answer and find evidence for their predictions; 3) perform qualitative experiments to visually analyze models reasoning ability and explore potential reasons behind such a poor ability. |
Qingqing Wang; Liqiang Xiao; Yue Lu; Yaohui Jin; Hao He; |
| 375 | Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, in this paper, we propose a novel COncept-based NEighbor Shapley approach (dubbed as CONE-SHAP) to evaluate the importance of each concept by considering its physical and semantic neighbors, and interpret model knowledge with both instance-wise and class-wise explanations. |
Jiahui Li; Kun Kuang; Lin Li; Long Chen; Songyang Zhang; Jian Shao; Jun Xiao; |
| 376 | Integrating Semantic and Temporal Relationships in Facial Action Unit Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Facial action unit (AU) detection is a challenging task due to the variety and subtlety of individuals’ facial behavior. Facial muscle characteristics such as temporal … |
Zhihua Li; Xiang Deng; Xiaotian Li; Lijun Yin; |
| 377 | MHFC: Multi-Head Feature Collaboration for Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides the negative influence of cross-domain (e.g., the trained FEM can not adapt to the novel class flawlessly), the distribution of novel data may have a certain degree of deviation compared with the ground truth distribution, which is dubbed as distribution-shift-problem (DSP). To address the DSP, we propose Multi-Head Feature Collaboration (MHFC) algorithm, which attempts to project the multi-head features (e.g., multiple features extracted from a variety of FEMs) to a unified space and fuse them to capture more discriminative information. |
Shuai Shao; Lei Xing; Yan Wang; Rui Xu; Chunyan Zhao; Yanjiang Wang; Baodi Liu; |
| 378 | AICoacher: A System Framework for Online Realtime Workout Coach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a real-time online fitness system framework called AICoacher, which offers different online coaches. |
Haocong Ying; Tie Liu; Mingxin Ai; Jiali Ding; Yuanyuan Shang; |
| 379 | Fast and Accurate Lane Detection Via Frequency Domain Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To ensure the diversity of features and subsequently maintain information as much as possible, we introduce multi-frequency analysis into lane detection. |
Yulin He; Wei Chen; Zhengfa Liang; Dan Chen; Yusong Tan; Xin Luo; Chen Li; Yulan Guo; |
| 380 | Exploring The Quality of GAN Generated Images for Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the in-depth characteristics of ReID sample and solve the problem of What makes a GAN-generated image good for ReID”. |
Yiqi Jiang; Weihua Chen; Xiuyu Sun; Xiaoyu Shi; Fan Wang; Hao Li; |
| 381 | Deep Marginal Fisher Analysis Based CNN for Image Representation and Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It uses a graph embedding framework for convolution filter optimization by maximizing the inter-class discriminability among marginal points while minimizing intra-class distance. |
Xun Cai; Jiajing Chai; Yanbo Gao; Shuai Li; Bo Zhu; |
| 382 | VmAP: A Fair Metric for Video Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we show several disadvantages of mAP as a metric for evaluating video based object detection. |
Anupam Sobti; Vaibhav Mavi; M Balakrishnan; Chetan Arora; |
| 383 | Learning to Decode Contextual Information for Efficient Contour Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel lightweight system for contour detection that achieves state-of-the-art performance while keeps ultra-slim model size. |
Ruoxi Deng; Shengjun Liu; Jinxin Wang; Huibing Wang; Hanli Zhao; Xiaoqin Zhang; |
| 384 | The ACM Multimedia 2021 Meet Deadline Requirements Grand Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To encourage the research community to address this challenge, we organize the Meet Deadline Requirements Grand Challenge at ACM Multimedia 2021. This grand challenge provides a simulation platform onto which the participants can implement their block scheduler and bandwidth estimator and then benchmark against each other using a common set of application traces and network traces. |
Jie Zhang; Junjie Deng; Mowei Wang; Yong Cui; Wei Tsang Ooi; Jiangchuan Liu; Xinyu Zhang; Kai Zheng; Yi Li; |
| 385 | Open Set Face Anti-Spoofing in Unseen Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an end-to-end open set face anti-spoofing (OSFA) approach for unseen attack recognition. |
Xin Dong; Hao Liu; Weiwei Cai; Pengyuan Lv; Zekuan Yu; |
| 386 | Curriculum-Based Meta-learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a Curriculum-Based Meta-learning (CubMeta) method to train the meta-learner using tasks from easy to hard. |
Ji Zhang; Jingkuan Song; Yazhou Yao; Lianli Gao; |
| 387 | Two-pronged Strategy: Lightweight Augmented Graph Network Hashing for Scalable Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the problems, in this paper, we propose a simple and efficient Lightweight Augmented Graph Network Hashing (LAGNH) method with a two-pronged strategy. |
Hui Cui; Lei Zhu; Jingjing Li; Zhiyong Cheng; Zheng Zhang; |
| 388 | Discriminative Latent Semantic Graph for Video Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our main contribution is to identify three key problems in a joint framework for future video summarization tasks. |
Yang Bai; Junyan Wang; Yang Long; Bingzhang Hu; Yang Song; Maurice Pagnucco; Yu Guan; |
| 389 | An EM Framework for Online Incremental Learning of Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it remains challenging to acquire novel classes in an online fashion for the segmentation task, mainly due to its continuously-evolving semantic label space, partial pixelwise ground-truth annotations, and constrained data availability. To address this, we propose an incremental learning strategy that can fast adapt deep segmentation models without catastrophic forgetting, using a streaming input data with pixel annotations on the novel classes only. |
Shipeng Yan; Jiale Zhou; Jiangwei Xie; Songyang Zhang; Xuming He; |
| 390 | DehazeFlow: Multi-scale Conditional Flow Network for Single Image Dehazing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing methods learn a deterministic one-to-one mapping between a hazy image and its ground-truth, which ignores the ill-posedness of the dehazing task. To solve this problem, we propose DehazeFlow, a novel single image dehazing framework based on conditional normalizing flow. |
Hongyu Li; Jia Li; Dong Zhao; Long Xu; |
| 391 | Underwater Species Detection Using Channel Sharpening Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the continuous exploration of marine resources, underwater artificial intelligent robots play an increasingly important role in the fish industry. However, the detection of … |
Lihao Jiang; Yi Wang; Qi Jia; Shengwei Xu; Yu Liu; Xin Fan; Haojie Li; Risheng Liu; Xinwei Xue; Ruili Wang; |
| 392 | End-to-end Boundary Exploration for Weakly-supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that using a single model to accomplish image- and pixel-level classification will fall into the balance of multi-target and consequently weakens the recognition capability. |
Jianjun Chen; Shancheng Fang; Hongtao Xie; Zheng-Jun Zha; Yue Hu; Jianlong Tan; |
| 393 | CG-GAN: Class-Attribute Guided Generative Adversarial Network for Old Photo Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, the existing methods to solve conventional restoration tasks are difficult to generalize. To solve this problem, we propose a novel method based on generative adversarial network. |
Jixin Liu; Rui Chen; Shipeng An; Heng Zhang; |
| 394 | Neural-based Rendering and Application Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we mainly introduce two neural rendering works, one is rendering simulation and the other is image-based novel view rendering. |
Peng Dai; |
| 395 | Deep Human Dynamics Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, due to the non-enumerability of human motion, the trained model from large-scale training data often fails to comprehensively cover incomputable action categories, which may lead to a sharp decline in the performance of deep learning-based methods. To handle these limitations, we propose an untrained deep generative model, in which Graph Convolutional Networks (GCNs) are utilized to efficiently capture complicated topological relationships of human joints. |
Qiongjie Cui; Huaijiang Sun; Yue Kong; Xiaoning Sun; |
| 396 | Adversarial Learning with Mask Reconstruction for Text-Guided Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an adversarial learning framework with mask reconstruction (ALMR) for image inpainting with textual guidance, which consists of a two-stage generator and dual discriminators. |
Xingcai Wu; Yucheng Xie; Jiaqi Zeng; Zhenguo Yang; Yi Yu; Qing Li; Wenyin Liu; |
| 397 | TriTransNet: RGB-D Salient Object Detection with A Triplet Transformer Embedding Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recently U-Net framework is widely used, and continuous convolution and pooling operations generate multi-level features which are complementary with each other. In view of the more contribution of high-level features for the performance, we propose a triplet transformer embedding module to enhance them by learning long-range dependencies across layers. |
Zhengyi Liu; Yuan Wang; Zhengzheng Tu; Yun Xiao; Bin Tang; |
| 398 | Image Re-composition Via Regional Content-Style Decoupling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Regarding the composition stage, we propose a cycle consistency loss to constrain the network preserving the content and style information during the composition. |
Rong Zhang; Wei Li; Yiqun Zhang; Hong Zhang; Jinhui Yu; Ruigang Yang; Weiwei Xu; |
| 399 | Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a generative adversarial network to synthesize 3D pose sequences of co-speech upper-body gestures with appropriate affective expressions. |
Uttaran Bhattacharya; Elizabeth Childs; Nicholas Rewkowski; Dinesh Manocha; |
| 400 | ROSITA: Enhancing Vision-and-Language Semantic Alignments Via Cross- and Intra-modal Knowledge Integration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce a new VLP method called ROSITA, which integrates the cross- and intra-modal knowledge in a unified scene graph to enhance the semantic alignments. |
Yuhao Cui; Zhou Yu; Chunqi Wang; Zhongzhou Zhao; Ji Zhang; Meng Wang; Jun Yu; |
| 401 | Selective Dependency Aggregation for Action Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel selective dependency aggregation (SDA) module, which adaptively exploits multiple types of video dependencies to refine the features. |
Yi Tan; Yanbin Hao; Xiangnan He; Yinwei Wei; Xun Yang; |
| 402 | Source Data-free Unsupervised Domain Adaptation for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the problem, we propose to construct a set of source domain virtual data to mimic the source domain distribution by identifying the target domain high-confidence samples predicted by the pre-trained source model. |
Mucong Ye; Jing Zhang; Jinpeng Ouyang; Ding Yuan; |
| 403 | Remember and Reuse: Cross-Task Blind Image Quality Assessment Via Relevance-aware Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, limited by the static model and once-for-all learning strategy, they failed to perform the cross-task evaluations in many practical applications, where diverse evaluation criteria and distortion types are constantly emerging. To address this issue, in this paper, we propose a dynamic Remember and Reuse (R&R) network, which efficiently performs the cross-task BIQA based on a novel relevance-aware incremental learning strategy. |
Rui Ma; Hanxiao Luo; Qingbo Wu; King Ngi Ngan; Hongliang Li; Fanman Meng; Linfeng Xu; |
| 404 | Vehicle Counting Network with Attention-based Mask Refinement and Spatial-awareness Block Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a well-designed Vehicle Counting Network (VCNet) is novelly proposed to alleviate the problem of scale variation and inconsistent spatial distribution in congested traffic scenes. |
Ji Zhang; Jian-Jun Qiao; Xiao Wu; Wei Li; |
| 405 | X-GGM: Graph Generative Modeling for Out-of-distribution Generalization in Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we formulate OOD generalization in VQA as a compositional generalization problem and propose a graph generative modeling-based training scheme (X-GGM) to handle the problem implicitly. |
Jingjing Jiang; Ziyi Liu; Yifan Liu; Zhixiong Nan; Nanning Zheng; |
| 406 | Multifocal Attention-Based Cross-Scale Network for Image De-raining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Albeit existing deep learning-based image de-raining methods have achieved promising results, most of them only extract single scale features, and neglect the fact that similar rain streaks appear repeatedly across different scales. Therefore, this paper aims to explore the cross-scale cues in a multi-scale fashion. |
Zheyu Zhang; Yurui Zhu; Xueyang Fu; Zhiwei Xiong; Zheng-Jun Zha; Feng Wu; |
| 407 | Extending 6-DoF VR Experience Via Multi-Sphere Images Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the design of MSI naturally limits the range of authentic 6-DoF experiences, as existing mechanisms for MSI rendering cannot fully utilize multi-layer information when synthesizing novel views between multiple MSIs. To tackle this problem and extend the 6-DoF range, we propose an MSI interpolation pipeline that utilizes adjacent MSIs’ 3D information embedded inside their layers. |
Jisheng Li; Yuze He; Jinghui Jiao; Yubin Hu; Yuxing Han; Jiangtao Wen; |
| 408 | Progressive Semantic Matching for Video-Text Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although some methods employ coarse-to-fine or multi-expert networks to encode one or more common spaces for easier matching, they almost directly optimize one matching space, which is challenging, because of the huge semantic gap between different modalities. To address this issue, we aim at narrowing semantic gap by a progressive learning process with a coarse-to-fine architecture, and propose a novel Progressive Semantic Matching (PSM) method. |
Hongying Liu; Ruyi Luo; Fanhua Shang; Mantang Niu; Yuanyuan Liu; |
| 409 | Video-to-Image Casting: A Flatting Method for Video Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to flat the video and construct a Spatio-temporal Image (STI), i.e., squeezing the temporal dimension into a spatial plane. |
Xu Chen; Chenqiang Gao; Feng Yang; Xiaohan Wang; Yi Yang; Yahong Han; |
| 410 | Legitimate Adversarial Patches: Evading Human Eyes and Detection Models in The Physical World Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study legitimate adversarial attacks that evade both human eyes and detection models in the physical world. |
Jia Tan; Nan Ji; Haidong Xie; Xueshuang Xiang; |
| 411 | Milliseconds Color Stippling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel method to create high-quality color stippling from an input image in milliseconds. |
Lei Ma; Jian Shi; Yanyun Chen; |
| 412 | LSTC: Boosting Atomic Action Detection with Long-Short-Term Context Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we place the atomic action detection problem intoa Long-Short Term Context (LSTC) to analyze how the temporalreliance among video signals affect the action detection results. |
Yuxi Li; Boshen Zhang; Jian Li; Yabiao Wang; Weiyao Lin; Chengjie Wang; Jilin Li; Feiyue Huang; |
| 413 | Target-guided Adaptive Base Class Reweighting for Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a Target-guided Base Class Reweighting (TBR) approach, which uses a reweighting-in-the-loop optimization algorithm to assign a set of weights for base classes adaptively given a target task. |
Jiliang Yan; Deming Zhai; Junjun Jiang; Xianming Liu; |
| 414 | Domain Adaptive Semantic Segmentation Without Source Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Since there is no supervision from the source domain data, many self-training methods tend to fall into the winner-takes-all dilemma, where the majority classes totally dominate the segmentation networks and the networks fail to classify the minority classes. Consequently, we propose an effective framework for this challenging problem with two components: positive learning and negative learning. |
Fuming You; Jingjing Li; Lei Zhu; Zhi Chen; Zi Huang; |
| 415 | CausalRec: Causal Inference for Visual Debiasing in Visually-Aware Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing visually-aware models make use of the visual features as a separate collaborative signal similarly to other features to directly predict the user’s preference without considering a potential bias, which gives rise to a visually biased recommendation. In this paper, we derive a causal graph to identify and analyze the visual bias of these existing methods. |
Ruihong Qiu; Sen Wang; Zhi Chen; Hongzhi Yin; Zi Huang; |
| 416 | An Adaptive Iterative Inpainting Method with More Information Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new method by combining three innovative ideas. |
Shengjie Chen; Zhenhua Guo; Bo Yuan; |
| 417 | DPT: Deformable Patch-based Transformer for Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods usually use a fixed-size patch embedding which might destroy the semantics of objects. To address this problem, we propose a new Deformable Patch (DePatch) module which learns to adaptively split the images into patches with different positions and scales in a data-driven way rather than using predefined fixed patches. |
Zhiyang Chen; Yousong Zhu; Chaoyang Zhao; Guosheng Hu; Wei Zeng; Jinqiao Wang; Ming Tang; |
| 418 | Improving Weakly Supervised Object Localization Via Causal Intervention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous works endeavor to perceive the interval objects from the small and sparse discriminative attention map, yet ignoring the co-occurrence confounder (e.g., duck and water), which makes the model inspection (e.g., CAM) hard to distinguish between the object and context. In this paper, we make an early attempt to tackle this challenge via causal intervention (CI). |
Feifei Shao; Yawei Luo; Li Zhang; Lu Ye; Siliang Tang; Yi Yang; Jun Xiao; |
| 419 | DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel Deep Surroundings-person Separation Learning (DSSL) model in this paper to effectively extract and match person information, and hence achieve a superior retrieval accuracy. |
Aichun Zhu; Zijie Wang; Yifeng Li; Xili Wan; Jing Jin; Tian Wang; Fangqiang Hu; Gang Hua; |
| 420 | DEPA: Self-Supervised Audio Embedding for Depression Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes DEPA, a self-supervised, pretrained dep ression a udio embedding method for depression detection. |
Pingyue Zhang; Mengyue Wu; Heinrich Dinkel; Kai Yu; |
| 421 | Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, Spatio-Temporal Interaction Graph Parsing Networks (STIGPN) are constructed, which encode the videos with a graph composed of human and object nodes. |
Ning Wang; Guangming Zhu; Liang Zhang; Peiyi Shen; Hongsheng Li; Cong Hua; |
| 422 | Aesthetic Evaluation and Guidance for Mobile Photography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to use computational aesthetics to automatically teach people without photography training to take excellent photos. |
Hao Lou; Heng Huang; Chaoen Xiao; Xin Jin; |
| 423 | Dense Contrastive Visual-Linguistic Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the success of BERT, several multimodal representation learning approaches have been proposed that jointly represent image and text. |
Lei Shi; Kai Shuang; Shijie Geng; Peng Gao; Zuohui Fu; Gerard de Melo; Yunpeng Chen; Sen Su; |
| 424 | Transfer Vision Patterns for Multi-Task Pixel Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the observation of cross-task interdependencies of visual patterns, we propose a multi-task vision pattern transformation (VPT) method to adaptively correlate and transfer cross-task visual patterns by leveraging the powerful transformer mechanism. |
Xiaoya Zhang; Ling Zhou; Yong Li; Zhen Cui; Jin Xie; Jian Yang; |
| 425 | Image Style Transfer with Generative Adversarial Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing style transfer methods work well with relatively uniform content, they often fail to capture geometric or structural patterns that reflect the quality of generated images. The goal of this doctoral research is to investigate the image style transfer approaches, and design advanced and useful methods to solve existing problems. |
Ru Li; |
| 426 | Boosting Lightweight Single Image Super-resolution Via Joint-distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel joint-distillation (JDSR) framework to boost the representation of various off-the-shelf lightweight SR models. |
Xiaotong Luo; Qiuyuan Liang; Ding Liu; Yanyun Qu; |
| 427 | Face Hallucination Via Split-Attention in Split-Attention Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most of them fail to take into account the overall facial profile and fine texture details simultaneously, resulting in reduced naturalness and fidelity of the reconstructed face, and further impairing the performance of downstream tasks (e.g., face detection, facial recognition). To tackle this issue, we propose a novel external-internal split attention group (ESAG), which encompasses two paths responsible for facial structure information and facial texture details, respectively. |
Tao Lu; Yuanzhi Wang; Yanduo Zhang; Yu Wang; Liu Wei; Zhongyuan Wang; Junjun Jiang; |
| 428 | SalS-GAN: Spatially-Adaptive Latent Space in StyleGAN for Real Image Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How to better preserve details in the real image is still a challenge. To solve this problem, we propose a spatially-adaptive latent space, called SA latent space, and adopt it as the optimization latent space in GAN inversion task. |
Lingyun Zhang; Xiuxiu Bai; Yao Gao; |
| 429 | Deadline and Priority-aware Congestion Control for Delay-sensitive Multimedia Streaming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we propose DAP (Deadline And Priority-aware congestion control) to achieve high throughput within acceptable end-to-end latency, especially to send high-priority packets while meeting deadline requirements. |
Chao Zhou; Wenjun Wu; Dan Yang; Tianchi Huang; Liang Guo; Bing Yu; |
| 430 | Exploring Logical Reasoning for Referring Expression Comprehension Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a logic-guided approach to explore logical knowledge for referring expression comprehension in a hierarchical modular-based framework. |
Ying Cheng; Ruize Wang; Jiashuo Yu; Rui-Wei Zhao; Yuejie Zhang; Rui Feng; |
| 431 | Edit Like A Designer: Modeling Design Workflows for Unaligned Fashion Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of directly manipulating the real fashion item image, it is more intuitive for designers to modify it via the design draft. In this paper, we model design workflows for a novel task of unaligned fashion editing, allowing the user to edit a fashion item through manipulating its corresponding design draft. |
Qiyu Dai; Shuai Yang; Wenjing Wang; Wei Xiang; Jiaying Liu; |
| 432 | A Statistical Approach to Mining Semantic Similarity for Deep Unsupervised Hashing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, since the features of images for each kind of semantics usually scatter in high-dimensional space with unknown distribution, previous methods could introduce a large number of false positives and negatives for boundary points of distributions in the local semantic structure based on pairwise cosine distances. Towards this limitation, we propose a general distribution-based metric to depict the pairwise distance between images. |
Xiao Luo; Daqing Wu; Zeyu Ma; Chong Chen; Minghua Deng; Jianqiang Huang; Xian-Sheng Hua; |
| 433 | Discriminator-free Generative Adversarial Attack Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we find that the discriminator could be not necessary for generative-based adversarial attack, and propose the Symmetric Saliency-based Auto-Encoder (SSAE) to generate the perturbations, which is composed of the saliency map module and the angle-norm disentanglement of the features module. |
Shaohao Lu; Yuqiao Xian; Ke Yan; Yi Hu; Xing Sun; Xiaowei Guo; Feiyue Huang; Wei-Shi Zheng; |
| 434 | AFD-Net: Adaptive Fully-Dual Network for Few-Shot Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods solve this problem by performing subtasks of classification and localization utilizing a shared component in the detector, yet few of them take the distinct preferences towards feature embedding of two subtasks into consideration. In this paper, we carefully analyze the characteristics of FSOD, and present that a few-shot detector should consider the explicit decomposition of two subtasks, as well as leveraging information from both of them to enhance feature representations. |
Longyao Liu; Bo Ma; Yulin Zhang; Xin Yi; Haozhi Li; |
| 435 | Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recognizing the emotional state of people is a basic but challenging task in video understanding. In this paper, we propose a new task in this field, named Pairwise Emotional Relationship Recognition (PERR). |
Xun Gao; Yin Zhao; Jie Zhang; Longjun Cai; |
| 436 | Yes, Attention Is All You Need, for Exemplar Based Colorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, a general attention based colorization framework is proposed in this work, where the color histogram of reference image is adopted as a prior to eliminate the ambiguity in database. |
Wang Yin; Peng Lu; Zhaoran Zhao; Xujun Peng; |
| 437 | Image Search with Text Feedback By Deep Hierarchical Attention Mutual Information Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Image retrieval with text feedback is an emerging research topic with the objective of integrating inputs from multiple modalities as queries. In this paper, queries contain a reference image plus text feedback that describes modifications between this image and the desired image. |
Chunbin Gu; Jiajun Bu; Zhen Zhang; Zhi Yu; Dongfang Ma; Wei Wang; |
| 438 | HetEmotionNet: Two-Stream Heterogeneous Graph Recurrent Neural Network for Multi-modal Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel two-stream heterogeneous graph recurrent neural network, named HetEmotionNet, fusing multi-modal physiological signals for emotion recognition. |
Ziyu Jia; Youfang Lin; Jing Wang; Zhiyang Feng; Xiangheng Xie; Caijie Chen; |
| 439 | Self-Representation Subspace Clustering for Incomplete Multi-view Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is widely observed that self-representation subspace method enjoys a better clustering performance over the factorization based one. Therefore, we adapt it to incomplete data by jointly performing data imputation and self-representation learning. |
Jiyuan Liu; Xinwang Liu; Yi Zhang; Pei Zhang; Wenxuan Tu; Siwei Wang; Sihang Zhou; Weixuan Liang; Siqi Wang; Yuexiang Yang; |
| 440 | Text As Neural Operator:Image Manipulation By Text Instruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a setting that allows users to edit an image with multiple objects using complex text instructions to add, remove, or change the objects. |
Tianhao Zhang; Hung-Yu Tseng; Lu Jiang; Weilong Yang; Honglak Lee; Irfan Essa; |
| 441 | Neighbor-Vote: Improving Monocular 3D Object Detection Through Neighbor Distance Voting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel neighbor-voting method that incorporates neighbor predictions to ameliorate object detection from severely deformed pseudo-LiDAR point clouds. |
Xiaomeng Chu; Jiajun Deng; Yao Li; Zhenxun Yuan; Yanyong Zhang; Jianmin Ji; Yu Zhang; |
| 442 | Extracting Useful Knowledge from Noisy Web Images Via Data Purification for Fine-Grained Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the noisy label problem and propose a method that can specifically distinguish in- and out-of-distribution noisy samples. |
Chuanyi Zhang; Yazhou Yao; Xing Xu; Jie Shao; Jingkuan Song; Zechao Li; Zhenmin Tang; |
| 443 | Differentiated Learning for Multi-Modal Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel Differentiated Learning framework to make use of the diversity between multiple modalities for more effective domain adaptation. |
Jianming Lv; Kaijie Liu; Shengfeng He; |
| 444 | Heterogeneous Face Recognition with Attention-guided Feature Disentangling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an attention-guided feature disentangling framework (AgFD) to eliminate the large cross-modality discrepancy for Heterogeneous Face Recognition (HFR). |
Shanmin Yang; Xiao Yang; Yi Lin; Peng Cheng; Yi Zhang; Jianwei Zhang; |
| 445 | SmartEye: An Open Source Framework for Real-Time Video Analytics with Edge-Cloud Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper provides the multimedia research community with an open source framework named SmartEye for real-time video analytics by leveraging the edge-cloud collaboration. |
Xuezhi Wang; Guanyu Gao; |
| 446 | Adaptive Normalized Representation Learning for Generalizable Face Anti-Spoofing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, little attention has been paid to the feature extraction process for the FAS task, especially the influence of normalization, which also has a great impact on the generalization of the learned representation. To address this issue, we propose a novel perspective of face anti-spoofing that focuses on the normalization selection in the feature extraction process. |
ShuBao Liu; Ke-Yue Zhang; Taiping Yao; Mingwei Bi; Shouhong Ding; Jilin Li; Feiyue Huang; Lizhuang Ma; |
| 447 | Multi-branch Channel-wise Enhancement Network for Fine-grained Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To generate channel-wise complementary and discriminative features in beneficial details of FGVC, we propose a multi-branch channel-wise enhancement network (MCEN), which includes multi-pattern spatial disruption mechanism, inter-channel complementarity module(ICM), and novel soft target loss. |
Guangjun Li; Yongxiong Wang; Fengting Zhu; |
| 448 | Generating Point Cloud from Single Image in The Few Shot Scenario Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, most available 3D data covers only a limited amount of classes, which further restricts the models’ generalization ability to novel classes. To mitigate these issues, we propose a novel few-shot single-view point cloud generation framework by considering both class-specific and class-agnostic 3D shape priors. |
Yu Lin; Jinghui Guo; Yang Gao; Yi-fan Li; Zhuoyi Wang; Latifur Khan; |
| 449 | Coarse to Fine: Domain Adaptive Crowd Counting Via Adversarial Scoring Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite such progress, trained data-dependent models usually can not generalize well to unseen scenarios because of the inherent domain shift. To facilitate this issue, this paper proposes a novel adversarial scoring network (ASNet) to gradually bridge the gap across domains from coarse to fine granularity. |
Zhikang Zou; Xiaoye Qu; Pan Zhou; Shuangjie Xu; Xiaoqing Ye; Wenhao Wu; Jin Ye; |
| 450 | Capsule-based Object Tracking with Natural Language Specification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Query-focused should pay more attention to context modeling to promote the correlation between these two features. To address these issues, we propose a capsule-based network, referred to as CapsuleTNL, which performs regression tracking with natural language query. |
Ding Ma; Xiangqian Wu; |
| 451 | Text2Video: Automatic Video Generation Based on Text Scripts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make video creation simpler, in this paper we present Text2Video, a novel system to automatically produce videos using only text-editing for novice users. |
Yipeng Yu; Zirui Tu; Longyu Lu; Xiao Chen; Hui Zhan; Zixun Sun; |
| 452 | A2W: Context-Aware Recommendation System for Mobile Augmented Reality Web Browser Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an AR web browser with an integrated context-aware AR-to-Web content recommendation service named as A2W browser, to provide continuously user-centric web browsing experiences driven by AR headsets. |
Kit Yung Lam; Lik Hang Lee; Pan Hui; |
| 453 | Seeing Is Believing? Effects of Visualization on Smart Device Privacy Perceptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Research on smart device privacy has consistently highlighted how privacy is an important concern for users, but they fail to act on their concerns. |
Carlos Bermejo Fernandez; Petteri Nurmi; Pan Hui; |
| 454 | Towards Cross-Granularity Few-Shot Learning: Coarse-to-Fine Pseudo-Labeling with Visual-Semantic Meta-Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we advance the few-shot classification paradigm towards a more challenging scenario, i.e, cross-granularity few-shot classification, where the model observes only coarse labels during training while is expected to perform fine-grained classification during testing. |
Jinhai Yang; Hua Yang; Lin Chen; |
| 455 | Progressive Graph Attention Network for Video Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods generally explore the single interactions between objects or between frames, which are insufficient to deal with the sophisticated scenes in videos. To tackle this problem, we propose a novel model, termed Progressive Graph Attention Network (PGAT), which can jointly explore the multiple visual relations on object-level, frame-level and clip-level. |
Liang Peng; Shuangji Yang; Yi Bin; Guoqing Wang; |
| 456 | AFEC: Adaptive Feature Extraction Modules for Learned Image Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to extract high-fidelity image features adaptively with local textures as the basic unit, which significantly improves the quality of the extracted information and enhances the compactness of the latent representation of the image. |
Yi Ma; Yongqi Zhai; Jiayu Yang; Chunhui Yang; Ronggang Wang; |
| 457 | QoE Ready to Respond: A QoE-aware MEC Selection Scheme for DASH-based Adaptive Video Streaming to Mobile Users Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces QoE Ready to Respond (QoE-R2R), a QoE-aware MEC Selection scheme for DASH-based mobile adaptive video streaming for optimizing video transmission in a MEC-supported network environment. |
Wanxin Shi; Qing Li; Ruishan Zhang; Gengbiao Shen; Yong Jiang; Zhenhui Yuan; Gabriel-Miro Muntean; |
| 458 | AKECP: Adaptive Knowledge Extraction from Feature Maps for Fast and Efficient Channel Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Adaptive Knowledge Extraction for Channel Pruning (AKECP), which can compress the network fast and efficiently. |
Haonan Zhang; Longjun Liu; Hengyi Zhou; Wenxuan Hou; Hongbin Sun; Nanning Zheng; |
| 459 | GAN-aided Serial Dependence Study in Medical Image Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The specific goals of this project are to establish, identify and mitigate the impact of VSD on visual search tasks in clinical settings. |
Zhihang Ren; |
| 460 | Multiple Objects-Aware Visual Question Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a target answer is often related to multiple key objects in an image, which focuses on only one object may mislead its model to generate questions that are only related to partial fragments of the answer. To address this problem, we propose a multi-objects aware generation model to capture all key objects related to an answer and generate the corresponding question. |
Jiayuan Xie; Yi Cai; Qingbao Huang; Tao Wang; |
| 461 | Unsupervised Cross-Modal Distillation for Thermal Infrared Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We take advantage of the two-branch architecture of the baseline tracker, i.e. DiMP, for cross-modal distillation working on two components of the tracker. |
Jingxian Sun; Lichao Zhang; Yufei Zha; Abel Gonzalez-Garcia; Peng Zhang; Wei Huang; Yanning Zhang; |
| 462 | Towards Adversarial Patch Analysis and Certified Defense Against Crowd Counting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a robust attack strategy called Adversarial Patch Attack with Momentum (APAM) to systematically evaluate the robustness of crowd counting models, where the attacker’s goal is to create an adversarial perturbation that severely degrades their performances, thus leading to public safety accidents (e.g., stampede accidents). |
Qiming Wu; Zhikang Zou; Pan Zhou; Xiaoqing Ye; Binghui Wang; Ang Li; |
| 463 | Towards Robust Cross-domain Image Understanding with Unsupervised Noise Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method, termed Noise Tolerant Domain Adaptation (NTDA), for WSDA. |
Lei Zhu; Zhaojing Luo; Wei Wang; Meihui Zhang; Gang Chen; Kaiping Zheng; |
| 464 | FakeTagger: Robust Safeguards Against DeepFake Dissemination Via Provenance Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, existing studies that propose various approaches, in fighting against DeepFake and determining if the facial image is real or fake, is still at an early stage. |
Run Wang; Felix Juefei-Xu; Meng Luo; Yang Liu; Lina Wang; |
| 465 | Knowledge-Supervised Learning: Knowledge Consensus Constraints for Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The consensus of multiple views on the same data will provide extra regularization, thereby improving accuracy. Based on this idea, we proposed a novel Knowledge-Supervised Learning (KSL) method for person re-identification (Re-ID), which can improve the performance without introducing extra inference cost. |
Li Wang; Baoyu Fan; Zhenhua Guo; Yaqian Zhao; Runze Zhang; Rengang Li; Weifeng Gong; Endong Wang; |
| 466 | Generally Boosting Few-Shot Learning with HandCrafted Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The handcrafted features, such as Histogram of Oriented Gradient (HOG) and Local Binary Pattern (LBP), have no requirement on the amount of training data, and used to perform quite well in many small-scale data scenarios, since their extractions involve no learning process, and are mainly based on the empirically observed and summarized prior feature engineering knowledge. In this paper, we intend to develop a general and simple approach for generally boosting FSL via exploiting such prior knowledge in the feature learning phase. |
Yi Zhang; Sheng Huang; Fengtao Zhou; |
| 467 | Robust Real-World Image Super-Resolution Against Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a robust deep learning framework for real-world SR that randomly erases potential adversarial noises in the frequency domain of input images or features. |
Jiutao Yue; Haofeng Li; Pengxu Wei; Guanbin Li; Liang Lin; |
| 468 | TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing works usually adopt dynamic graph networks to indirectly model the intra/inter-modal interactions, making the model difficult to distinguish the referred object from distractors due to the monolithic representations of visual and linguistic contents. In this work, we exploit Transformer for its natural suitability on permutation-invariant 3D point clouds data and propose a TransRefer3D network to extract entity-and-relation aware multimodal context among objects for more discriminative feature learning. |
Dailan He; Yusheng Zhao; Junyu Luo; Tianrui Hui; Shaofei Huang; Aixi Zhang; Si Liu; |
| 469 | Convolutional Transformer Based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose Convolutional Transformer based Dual Discriminator Generative Adversarial Networks (CT-D2GAN) to perform unsupervised video anomaly detection. |
Xinyang Feng; Dongjin Song; Yuncong Chen; Zhengzhang Chen; Jingchao Ni; Haifeng Chen; |
| 470 | Text Is NOT Enough: Integrating Visual Impressions Into Open-domain Dialogue Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we point out that hidden images, named as visual impressions (VIs), can be explored from the text-only data to enhance dialogue understanding and help generate better responses. |
Lei Shen; Haolan Zhan; Xin Shen; Yonghao Song; Xiaofang Zhao; |
| 471 | D³Net: Dual-Branch Disturbance Disentangling Network for Facial Expression Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Dual-branch Disturbance Disentangling Network (D3Net), mainly consisting of an expression branch and a disturbance branch, to perform effective FER. |
Rongyun Mo; Yan Yan; Jing-Hao Xue; Si Chen; Hanzi Wang; |
| 472 | Faster-PPN: Towards Real-Time Semantic Segmentation with Dual Mutual Learning for Ultra-High Resolution Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, in this paper, we propose a novel and efficient collaborative global-local framework on the basis of PPN named Faster-PPN for high or ultra-high resolution images semantic segmentation which makes a better trade-off between the efficient and effectiveness towards the real-time speed. |
Bicheng Dai; Kaisheng Wu; Tong Wu; Kai Li; Yanyun Qu; Yuan Xie; Yun Fu; |
| 473 | Post2Story: Automatically Generating Storylines from Microblogging Platforms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate Post2Story, which aims to detect events and generate storylines on microblog posts. |
Xujian Zhao; Chongwei Wang; Peiquan Jin; Hui Zhang; Chunming Yang; Bo Li; |
| 474 | I2V-GAN: Unpaired Infrared-to-Visible Video Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In such a case, an effective video-to-video translation method from the infrared domain to the visible light counterpart is strongly needed by overcoming the intrinsic huge gap between infrared and visible fields. To address this challenging problem, we propose an infrared-to-visible (I2V) video translation method I2V-GAN to generate fine-grained and spatial-temporal consistent visible light videos by given unpaired infrared videos. |
Shuang Li; Bingfeng Han; Zhenjie Yu; Chi Harold Liu; Kai Chen; Shuigen Wang; |
| 475 | Neural Free-Viewpoint Performance Rendering Under Complex Human-object Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a neural human performance capture and rendering system to generate both high-quality geometry and photo-realistic texture of both human and objects under challenging interaction scenarios in arbitrary novel views, from only sparse RGB streams. |
Guoxing Sun; Xin Chen; Yizhang Chen; Anqi Pang; Pei Lin; Yuheng Jiang; Lan Xu; Jingyi Yu; Jingya Wang; |
| 476 | AdvFilter: Predictive Perturbation-aware Filtering Against Adversarial Attack Via Multi-domain Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we also observe that the robustness results of the filtering-based method rely on the perturbation amplitude of adversarial examples used for training. To address this problem, we propose predictive perturbation-aware \& pixel-wise filtering, where dual-perturbation filtering and an uncertainty-aware fusion module are designed and employed to automatically perceive the perturbation amplitude during the training and testing process. |
Yihao Huang; Qing Guo; Felix Juefei-Xu; Lei Ma; Weikai Miao; Yang Liu; Geguang Pu; |
| 477 | Progressive and Selective Fusion Network for High Dynamic Range Imaging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel method that can better fuse the features based on two ideas. |
Qian Ye; Jun Xiao; Kin-man Lam; Takayuki Okatani; |
| 478 | Learning to Compose Stylistic Calligraphy Artwork with Emotions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such defects prevent them from generating aesthetic, stylistic, and diverse calligraphy artworks, but only static handwriting font library instead. To address this problem, we propose a novel cross-modal approach to generate stylistic and diverse Chinese calligraphy artwork driven by different emotions automatically. |
Shaozu Yuan; Ruixue Liu; Meng Chen; Baoyang Chen; Zhijie Qiu; Xiaodong He; |
| 479 | Multi-label Pattern Image Retrieval Via Attention Mechanism Driven Graph Convolutional Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is difficult to accurately represent and retrieve pattern images which include complex details and multiple elements. Therefore, in this paper, we collect a new pattern image dataset with multiple labels per image for the pattern image retrieval task. |
Ying Li; Hongwei Zhou; Yeyu Yin; Jiaquan Gao; |
| 480 | Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a transformer-based feature reconstruction network (TFR-Net) is proposed to improve the robustness of models for the random missing in non-aligned modality sequences. |
Ziqi Yuan; Wei Li; Hua Xu; Wenmeng Yu; |
| 481 | Fast and Flexible Human Pose Estimation with HyperPose Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Hyperpose, a novel flexible and high-performance pose estimation library. |
Yixiao Guo; Jiawei Liu; Guo Li; Luo Mai; Hao Dong; |
| 482 | WeClick: Weakly-Supervised Video Semantic Segmentation with Click Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an effective weakly-supervised video semantic segmentation pipeline with click annotations, called WeClick, for saving laborious annotating effort by segmenting an instance of the semantic class with only a single click. |
Peidong Liu; Zibin He; Xiyu Yan; Yong Jiang; Shu-Tao Xia; Feng Zheng; Hu Maowei; |
| 483 | Decoupled IoU Regression for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that the complex definition of IoU and feature misalignment make it difficult to predict IoU accurately. In this paper, we propose a novel Decoupled IoU Regression (DIR) model to handle these problems. |
Yan Gao; Qimeng Wang; Xu Tang; Haochen Wang; Fei Ding; Jing Li; Yao Hu; |
| 484 | Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a re-parameterizable building block, namely Edge-oriented Convolution Block (ECB), for efficient SR design. |
Xindong Zhang; Hui Zeng; Lei Zhang; |
| 485 | Automatic Channel Pruning with Hyper-parameter Search and Dynamic Masking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most current model pruning algorithms depend on hand crafted rules or need to input the pruning ratio beforehand. To overcome this problem, we propose a learning based automatic channel pruning algorithm for deep neural network, which is inspired by recent automatic machine learning (Auto ML). |
Baopu Li; Yanwen Fan; Zhihong Pan; Yuchen Bian; Gang Zhang; |
| 486 | ARShoe: Real-Time Augmented Reality Shoe Try-on System on Smartphones Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most previous works focus on the virtual try-on for clothes while neglecting that for shoes, which is also a promising task. To this concern, this work proposes a real-time augmented reality virtual shoe try-on system for smartphones, namely ARShoe. |
Shan An; Guangfu Che; Jinghao Guo; Haogang Zhu; Junjie Ye; Fangru Zhou; Zhaoqi Zhu; Dong Wei; Aishan Liu; Wei Zhang; |
| 487 | Vision-guided Music Source Separation Via A Fine-grained Cycle-Separation Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to correctly separate mixed sources, we propose a novel Fine-grained Cycle-Separation Network (FCSN) for vision-guided music source separation. |
Ma Shuo; Yanli Ji; Xing Xu; Xiaofeng Zhu; |
| 488 | Modeling The Uncertainty for Self-supervised 3D Skeleton Action Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider that a good representation learning encoder can distinguish the underlying features of different actions, which can make the similar motions closer while pushing the dissimilar motions away. |
Yukun Su; Guosheng Lin; Ruizhou Sun; Yun Hao; Qingyao Wu; |
| 489 | Learning Human Motion Prediction Via Stochastic Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel approach in modeling the motion prediction problem based on stochastic differential equations and path integrals. |
Kedi Lyu; Zhenguang Liu; Shuang Wu; Haipeng Chen; Xuhong Zhang; Yuyu Yin; |
| 490 | Pose-guided Inter- and Intra-part Relational Transformer for Occluded Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The use of local information for feature extraction and matching is still necessary. Therefore, we propose a Pose-guided inter- and intra-part relational transformer (Pirt) for occluded person Re-Id, which builds part-aware long-term correlations by introducing transformer. |
Zhongxing Ma; Yifan Zhao; Jia Li; |
| 491 | Armor: A Benchmark for Meta-evaluation of Artificial Music Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Armor, a complex and cross-domain benchmark dataset that serves this purpose. |
Songhe Wang; Zheng Bao; Jingtong E; |
| 492 | Identity-aware Graph Memory Network for Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explicitly highlight the identity information of the actors in terms of both long-term and short-term context through a graph memory network, namely identity-aware graph memory network (IGMN). |
Jingcheng Ni; Jie Qin; Di Huang; |
| 493 | Salient Error Detection Based Refinement for Wide-baseline Image Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a refinement strategy based on salient error detection to improve the result of existing approaches of wide-baseline image interpolation, where we combine the advantages of methods based on piecewise-linear transformation and methods based on variational model. |
Yuan Chang; Yisong Chen; Guoping Wang; |
| 494 | Metaverse for Social Good: A University Campus Prototype Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we highlight the representative applications for social good. |
Haihan Duan; Jiaye Li; Sizheng Fan; Zhonghao Lin; Xiao Wu; Wei Cai; |
| 495 | MusicBERT: A Self-supervised Learning of Music Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a self-supervised learning model for music representation. |
Hongyuan Zhu; Ye Niu; Di Fu; Hao Wang; |
| 496 | SSFlow: Style-guided Neural Spline Flows for Face Image Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SSFlow to achieve identity-preserved semantic face manipulation in StyleGAN latent space based on conditional Neural Spline Flows. |
Hanbang Liang; Xianxu Hou; Linlin Shen; |
| 497 | MV-TON: Memory-based Video Virtual Try-on Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing video-based virtual try-on methods usually require clothing templates and they can only generate blurred and low-resolution results. To address these challenges, we propose a Memory-based Video virtual Try-On Network (MV-TON), which seamlessly transfers desired clothes to a target person without using any clothing templates and generates high-resolution realistic videos. |
Xiaojing Zhong; Zhonghua Wu; Taizhe Tan; Guosheng Lin; Qingyao Wu; |
| 498 | Hierarchical Fusion for Practical Ghost-free High Dynamic Range Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we decompose HDR imaging into ghost-free image fusion and ghost-based image restoration, and propose a novel practical Hierarchical Fusion Network (HFNet), which contains three sub-networks: Mask Fusion Network, Mask Compensation Network, and Refine Network. |
Pengfei Xiong; Yu Chen; |
| 499 | Self-Supervised Pre-training on The Target Domain for Cross-Domain Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Self-supervised Pre-training method on the Target Domain (SPTD), which pre-trains the model on both the source and target domains in a self-supervised manner. |
Junyin Zhang; Yongxin Ge; Xinqian Gu; Boyu Hua; Tao Xiang; |
| 500 | Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To effectively learn semantic alignments among product images and bilingual texts in translation, we design a unified product-oriented cross-modal cross-lingual model for pre-training and fine-tuning. |
Yuqing Song; Shizhe Chen; Qin Jin; Wei Luo; Jun Xie; Fei Huang; |
This table only includes 500 papers selected based on our selection algorithm. To continue with the full list, please visit Paper Digest: MM-2021 (Full List).