Paper Digest: ACM Multimedia 2022 Papers & Highlights
Interested users can choose to read all MM-2022 papers in our digest console, which supports more features.
To search for papers presented at MM-2022 on a specific topic, please make use of the search by venue (MM-2022) service. To summarize the latest research published at MM-2022 on a specific topic, you can utilize the review by venue (MM-2022) service. To synthesizes the findings from MM 2022 into comprehensive reports, give a try to MM-2022 Research. If you are interested in browsing papers by author, we have a comprehensive list of all MM-2022 authors & their papers.
This curated list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that gets you the personalized and comprehensive updates on the latest research in your field. It also empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Experience the full potential of our services today!
TABLE 1: Paper Digest: ACM Multimedia 2022 Papers & Highlights
| Paper | Author(s) | |
|---|---|---|
| 1 | PVSeRF: Joint Pixel-, Voxel- and Surface-Aligned Radiance Field for Single-Image Novel View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present PVSeRF, a learning framework that reconstructs neural radiance fields from single-view RGB images, for novel view synthesis. |
Xianggang Yu; Jiapeng Tang; Yipeng Qin; Chenghong Li; Xiaoguang Han; Linchao Bao; Shuguang Cui; |
| 2 | LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose LayoutLMv3 to pre-train multimodal Transformers for Document AI with unified text and image masking. |
Yupan Huang; Tengchao Lv; Lei Cui; Yutong Lu; Furu Wei; |
| 3 | DiT: Self-supervised Pre-training for Document Image Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose DiT, a self-supervised pre-trained Document Image Transformer model using large-scale unlabeled text images for Document AI tasks, which is essential since no supervised counterparts ever exist due to the lack of human-labeled document images. |
Junlong Li; Yiheng Xu; Tengchao Lv; Lei Cui; Cha Zhang; Furu Wei; |
| 4 | Real-time Streaming Video Denoising with Bidirectional Buffers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Bidirectional Streaming Video Denoising (BSVD) framework, to achieve high-fidelity real-time denoising for streaming videos with both past and future temporal receptive fields. |
Chenyang Qi; Junming Chen; Xin Yang; Qifeng Chen; |
| 5 | Transcript to Video: Efficient Clip Sequencing from Texts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To meet the demands for non-experts, we present Transcript-to-Video — a weakly-supervised framework that uses texts as input to automatically create video sequences from an extensive collection of shots. |
Yu Xiong; Fabian Caba Heilbron; Dahua Lin; |
| 6 | RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit those primary designs and investigate essential components for re-parameterizing SR networks. |
Xintao Wang; Chao Dong; Ying Shan; |
| 7 | ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose ProDiff, on progressive fast diffusion model for high-quality text-to-speech. |
Rongjie Huang; Zhou Zhao; Huadai Liu; Jinglin Liu; Chenye Cui; Yi Ren; |
| 8 | SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose SingGAN, a generative adversarial network designed for high-fidelity singing voice synthesis. |
Rongjie Huang; Chenye Cui; FeiYang cHEN; Yi Ren; Jinglin Liu; Zhou Zhao; Baoxing Huai; Zhefeng Wang; |
| 9 | Composite Photograph Harmonization with Complete Background Cues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we unify the two objectives in a single framework to obtain realistic portrait image composites. |
Yazhou Xing; Yu Li; Xintao Wang; Ye Zhu; Qifeng Chen; |
| 10 | Multi-view Gait Video Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the challenge caused by the entanglement of viewpoint, texture, and body structure, we present a network with two collaborative branches to decouple the novel view rendering process into two streams for human appearances (texture) and silhouettes (structure), respectively. |
Weilai Xiang; Hongyu Yang; Di Huang; Yunhong Wang; |
| 11 | Generalized Inter-class Loss for Gait Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a generalized inter-class loss that resolves the inter-class variance from both sample-level feature distribution and class-level feature distribution. |
Weichen Yu; Hongyuan Yu; Yan Huang; Liang Wang; |
| 12 | Continual Multi-view Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most of them deal with the clustering problem in which all data views are available in advance, and overlook the scenarios where data observations of new views are accumulated over time. To solve this issue, we propose a continual approach on the basis of late fusion multi-view clustering framework. |
Xinhang Wan; Jiyuan Liu; Weixuan Liang; Xinwang Liu; Yi Wen; En Zhu; |
| 13 | Towards Unbiased Visual Emotion Recognition Via Causal Intervention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such dataset characteristics are usually treated as dataset bias, which damages the robustness and generalization performance of these recognition systems. In this work, we scrutinize this problem from the perspective of causal inference, where such dataset characteristic is termed as a confounder which misleads the system to learn the spurious correlation. |
Yuedong Chen; Xu Yang; Tat-Jen Cham; Jianfei Cai; |
| 14 | ReCoRo: Region-Controllable Robust Light Enhancement with User-Specified Imprecise Masks Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Low-light enhancement is an increasingly important function in image editing and visual creation. Most existing enhancing algorithms are trained to enlighten a given image in a … |
Dejia Xu; Hayk Poghosyan; Shant Navasardyan; Yifan Jiang; Humphrey Shi; Zhangyang Wang; |
| 15 | Adaptive Anti-Bottleneck Multi-Modal Graph Learning Network for Personalized Micro-video Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Adaptive Anti-Bottleneck Multi-Modal Graph Learning Network for personalized micro-video recommendation. |
Desheng Cai; Shengsheng Qian; Quan Fang; Jun Hu; Changsheng Xu; |
| 16 | MegaPortraits: One-shot Megapixel Neural Head Avatars Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we advance the neural head avatar technology to the megapixel resolution while focusing on the particularly challenging task of cross-driving synthesis, i.e., when the appearance of the driving image is substantially different from the animated source image. |
Nikita Drobyshev; Jenya Chelishev; Taras Khakhulin; Aleksei Ivakhnenko; Victor Lempitsky; Egor Zakharov; |
| 17 | Domain Reconstruction and Resampling for Robust Salient Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our main idea is to reconstruct the data distribution of each scene from the sampling images and then resample from the distribution domain. |
Senbo Yan; Liang Peng; Chuer Yu; Zheng Yang; Haifeng Liu; Deng Cai; |
| 18 | Single Image Shadow Detection Via Complementary Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel shadow detection framework by investigating the mutual complementary mechanisms contained in this specific task. |
Yurui Zhu; Xueyang Fu; Chengzhi Cao; Xi Wang; Qibin Sun; Zheng-Jun Zha; |
| 19 | CrossHuman: Learning Cross-guidance from Multi-frame Images for Human Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose CrossHuman, a novel method that learns cross-guidance from parametric human model and multi-frame RGB images to achieve high-quality 3D human reconstruction. |
Liliang Chen; Jiaqi Li; Han Huang; Yandong Guo; |
| 20 | Symmetric Uncertainty-Aware Feature Transmission for Depth Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the extracted high-frequency information usually contains textures that are not present in depth maps in the existence of the cross-modality gap, and the noises would be fur- ther aggravated by interpolation due to the resolution gap between the RGB and depth images. To tackle these challenges, we propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR. |
Wuxuan Shi; Mang Ye; Bo Du; |
| 21 | Efficient Modeling of Future Context for Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is still under-explored how to effectively and efficiently incorporate the future context. To respond to this issue, inspired by that Non-Autoregressive Image Captioning (NAIC) can leverage two-side relation with modified mask operation, we aim to graft this advance to the conventional Autoregressive Image Captioning (AIC) model while maintaining the inference efficiency without extra time cost. |
Zhengcong Fei; |
| 22 | Bipartite Graph-based Discriminative Feature Learning for Multi-View Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose bipartite graph-based discriminative feature learning for multi-view clustering, which combines bipartite graph learning and discriminative feature learning to a unified framework. |
Weiqing Yan; Jindong Xu; Jinglei Liu; Guanghui Yue; Chang Tang; |
| 23 | Backdoor Attacks on Crowd Counting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the vulnerability of deep learning based crowd counting models to backdoor attacks, a major security threat to deep learning. |
Yuhua Sun; Tailai Zhang; Xingjun Ma; Pan Zhou; Jian Lou; Zichuan Xu; Xing Di; Yu Cheng; Lichao Sun; |
| 24 | Saliency in Augmented Reality Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, studies on how this superimposition will influence the human visual attention are lacking. Therefore, in this paper, we mainly analyze the interaction effect between background (BG) scenes and AR contents, and study the saliency prediction problem in AR. |
Huiyu Duan; Wei Shen; Xiongkuo Min; Danyang Tu; Jing Li; Guangtao Zhai; |
| 25 | Multiple Kernel Clustering with Dual Noise Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We discover that the noise can be disassembled into separable dual parts, i.e. N-noise and C-noise (Null space noise and Column space noise). In this paper, we rigorously define dual noise and propose a novel parameter-free MKC algorithm by minimizing them. |
Junpu Zhang; Liang Li; Siwei Wang; Jiyuan Liu; Yue Liu; Xinwang Liu; En Zhu; |
| 26 | Non-Autoregressive Cross-Modal Coherence Modelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a Non-Autoregressive Cross-modal Ordering Net (NACON) adopting a basic encoder-decoder architecture. |
Yi Bin; Wenhao Shi; Jipeng Zhang; Yujuan Ding; Yang Yang; Heng Tao Shen; |
| 27 | Mix-DANN and Dynamic-Modal-Distillation for Video Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Mix-Domain-Adversarial Neural Network and Dynamic-Modal-Distillation (MD-DMD), a novel multi-modal adversarial learning framework for unsupervised video domain adaptation. |
Yuehao Yin; Bin Zhu; Jingjing Chen; Lechao Cheng; Yu-Gang Jiang; |
| 28 | Cross-Modality High-Frequency Transformer for MR Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, to further advance this research field, we make an early effort to build a Transformer-based MR image super-resolution framework, with careful designs on exploring valuable domain prior knowledge. |
Chaowei Fang; Dingwen Zhang; Liang Wang; Yulun Zhang; Lechao Cheng; Junwei Han; |
| 29 | PIA: Parallel Architecture with Illumination Allocator for Joint Enhancement and Detection in Low-Light Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we make efforts on how to simultaneously realize low-light enhancement and detection from two aspects. |
Tengyu Ma; Long Ma; Xin Fan; Zhongxuan Luo; Risheng Liu; |
| 30 | Robust Attention Deraining Network for Synchronous Rain Streaks and Raindrops Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we will propose a new and universal SID model with novel modules, termed Robust Attention Deraining Network (RadNet), with strong robustness and generalization ability that are reflected in two main aspects. |
Yanyan Wei; Zhao Zhang; Mingliang Xu; Richang Hong; Jicong Fan; Shuicheng Yan; |
| 31 | Sample Weighted Multiple Kernel K-means Via Min-Max Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, it does not sufficiently consider different contributions of each sample to clustering, and thus cannot effectively obtain the ideal similarity structure, leading to unsatisfying performance. To address this issue, this paper proposes a novel sample weighted multiple kernel k-means via min-max optimization (SWMKKM), which sufficiently considers the sum of relationship between one sample and the others to represent the sample weights. |
Yi Zhang; Weixuan Liang; Xinwang Liu; Sisi Dai; Siwei Wang; Liyang Xu; En Zhu; |
| 32 | Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current digital art synthesis methods usually use single-modality inputs as guidance, thereby limiting the expressiveness of the model and the diversity of generated results. To solve this problem, we propose the multimodal guided artwork diffusion (MGAD) model, which is a diffusion-based digital artwork generation approach that utilizes multimodal prompts as guidance to control the classifier-free diffusion model. |
Nisha Huang; Fan Tang; Weiming Dong; Changsheng Xu; |
| 33 | More Is Better: Multi-source Dynamic Parsing Attention for Occluded Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Multi-source knowledge ensemble has been proved to be effective for domain adaptation. Inspired by this, we propose a multi-source dynamic parsing attention (MSDPA) mechanism that leverages knowledge learned from different source datasets to generate reliable semantic cues and dynamically integrate and adapt them in a self-supervised manner by attention mechanism. |
Xinhua Cheng; Mengxi Jia; Qian Wang; Jian Zhang; |
| 34 | Label-Efficient Domain Generalization Via Collaborative Exploration and Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To escape from the dilemma between domain generalization and annotation costs, in this paper, we introduce a novel task named label-efficient domain generalization (LEDG) to enable model generalization with label-limited source domains. To address this challenging task, we propose a novel framework called Collaborative Exploration and Generalization (CEG) which jointly optimizes active exploration and semi-supervised generalization. |
Junkun Yuan; Xu Ma; Defang Chen; Kun Kuang; Fei Wu; Lanfen Lin; |
| 35 | TAGPerson: A Target-Aware Generation Pipeline for Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In previous methods, the generation process is based on virtual scenes, and their synthetic training data can not be changed according to different target real scenes automatically. To handle this problem, we propose a novel Target-Aware Generation pipeline to produce synthetic person images, called TAGPerson. |
Kai Chen; Weihua Chen; Tao He; Rong Du; Fan Wang; Xiuyu Sun; Yuchen Guo; Guiguang Ding; |
| 36 | Visual Knowledge Graph for Human Action Reasoning in Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such a manner lacks the detailed and semantic understanding of body movement, which is the critical knowledge to explain and infer complex human actions. To fill this gap, we propose to summarize a novel visual knowledge graph from over 15M detailed human annotations, for describing action as the distinct composition of body parts, part movements and interactive objects in videos. |
Yue Ma; Yali Wang; Yue Wu; Ziyu Lyu; Siran Chen; Xiu Li; Yu Qiao; |
| 37 | Disentangled Representation Learning for Multimodal Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the serious problem is that the distribution gap and information redundancy often exist across heterogeneous modalities, resulting in learned multimodal representations that may be unrefined. Motivated by these observations, we propose a Feature-Disentangled Multimodal Emotion Recognition (FDMER) method, which learns the common and private feature representations for each modality. |
Dingkang Yang; Shuai Huang; Haopeng Kuang; Yangtao Du; Lihua Zhang; |
| 38 | Learning Modality-Specific and -Agnostic Representations for Asynchronous Multimodal Language Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For multimodal fusion of asynchronous sequences, the existing methods focus on projecting multiple modalities into a common latent space and learning the hybrid representations, which neglects the diversity of each modality and the commonality across different modalities. Motivated by this observation, we propose a Multimodal Fusion approach for learning modality-Specific and modality-Agnostic representations (MFSA) to refine multimodal representations and leverage the complementarity across different modalities. |
Dingkang Yang; Haopeng Kuang; Shuai Huang; Lihua Zhang; |
| 39 | EASE: Robust Facial Expression Recognition Via Emotion Ambiguity-SEnsitive Cooperative Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, it is challenging to train a robust model for FER. To address this, we propose Emotion Ambiguity-SEnsitive cooperative networks (EASE) which contain two components. |
Lijuan Wang; Guoli Jia; Ning Jiang; Haiying Wu; Jufeng Yang; |
| 40 | UConNet: Unsupervised Controllable Network for Image and Video Deraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the first Unsupervised Controllable Network (UConNet) to flexibly tackle different rain scenarios by adaptively controlling the network at the inference stage. |
Junhao Zhuang; Yisi Luo; Xile Zhao; Taixiang Jiang; Bichuan Guo; |
| 41 | Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the observation that humans learn to recognize the texts through both reading and writing, we propose to learn discrimination and generation by integrating contrastive learning and masked image modeling in our self-supervised method. |
Mingkun Yang; Minghui Liao; Pu Lu; Jing Wang; Shenggao Zhu; Hualin Luo; Qi Tian; Xiang Bai; |
| 42 | Multi-Granular Semantic Mining for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we propose a heterogeneous graph neural network (Hgnn) to model the heterogeneity of multi-granular semantics within a set of input images. |
Meijie Zhang; Jianwu Li; Tianfei Zhou; |
| 43 | Towards Complex Document Understanding By Discrete Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages comprising semi-structured table(s) and unstructured text as well as 16,558 question-answer pairs. |
Fengbin Zhu; Wenqiang Lei; Fuli Feng; Chao Wang; Haozhou Zhang; Tat-Seng Chua; |
| 44 | Augmented Dual-Contrastive Aggregation Learning for Unsupervised Visible-Infrared Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For the unsupervised learning visible infrared re-identification (USL-VI-ReID), the large cross-modality discrepancies lead to difficulties in generating reliable cross-modality labels and learning modality-invariant features without any annotations. To address this problem, we propose a novel Augmented Dual-Contrastive Aggregation (ADCA) learning framework. |
Bin Yang; Mang Ye; Jun Chen; Zesen Wu; |
| 45 | Meta Reconciliation Normalization for Lifelong Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to raise awareness of the importance of training proper batch normalization layers by proposing a new meta reconciliation normalization (MRN) method specifically designed for tackling LReID. |
Nan Pu; Yu Liu; Wei Chen; Erwin M. Bakker; Michael S. Lew; |
| 46 | Dynamic Spatio-Temporal Modular Network for Video Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, the performances of existing methods tend to drop when answering compositional questions involving realistic scenarios. To tackle these challenges, we propose a Dynamic Spatio-Temporal Modular Network (DSTN) model, which utilizes a spatio-temporal modular network to simulate the compositional reasoning procedure of human beings. |
Zi Qian; Xin Wang; Xuguang Duan; Hong Chen; Wenwu Zhu; |
| 47 | Equivariant and Invariant Grounding for Video Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Taking a causal look at VideoQA, we devise a self-interpretable framework, Equivariant and Invariant Grounding for Interpretable VideoQA (EIGV). |
Yicong Li; Xiang Wang; Junbin Xiao; Tat-Seng Chua; |
| 48 | Magic ELF: Image Deraining Meets Association Learning and Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, little effort has been made to effectively and efficiently harmonize these two architectures to satisfy image deraining. This paper aims to unify these two architectures to take advantage of their learning merits for image deraining. |
Kui Jiang; Zhongyuan Wang; Chen Chen; Zheng Wang; Laizhong Cui; Chia-Wen Lin; |
| 49 | VQ-DcTr: Vector-Quantized Autoencoder With Dual-channel Transformer Points Splitting for 3D Point Cloud Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this challenge, we concentrate on discrete representations, which are potentially a more natural fit for the modalities of the point cloud. Therefore, we propose to employ Vector Quantization (VQ) Auto-Encoder and Dual-channel Transformer for point cloud completion (VQ-DcTr). |
Ben Fei; Weidong Yang; Wen-Ming Chen; Lipeng Ma; |
| 50 | Multi-directional Knowledge Transfer for Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, (1) we use two independent unidirectional knowledge self-transfer strategies to calibrate the distributions of the novel categories from base categories in the visual and the textual space. |
Shuo Wang; Xinyu Zhang; Yanbin Hao; Chengbing Wang; Xiangnan He; |
| 51 | Relation-enhanced Negative Sampling for Multimodal Knowledge Graph Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To that end, in this paper, we propose a MultiModal Relation-enhanced Negative Sampling (MMRNS) framework for multimodal KGC task. |
Derong Xu; Tong Xu; Shiwei Wu; Jingbo Zhou; Enhong Chen; |
| 52 | Exploring The Effectiveness of Video Perceptual Representation in Blind Video Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With the obtained insight that distortion impairs the perceived video quality and results in a curved trajectory of the perceptual representation, we propose a temporal perceptual quality index (TPQI) to measure the temporal distortion by describing the graphic morphology of the representation. |
Liang Liao; Kangmin Xu; Haoning Wu; Chaofeng Chen; Wenxiu Sun; Qiong Yan; Weisi Lin; |
| 53 | Parameterization of Cross-token Relations with Relative Positional Encoding for Vision MLP Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the heavily parameterized token-mixing layers naturally lack mechanisms to capture local information and multi-granular non-local relations, thus their discriminative power is restrained. To tackle this issue, we propose a new positional spacial gating unit (PoSGU). |
Zhicai Wang; Yanbin Hao; Xingyu Gao; Hao Zhang; Shuo Wang; Tingting Mu; Xiangnan He; |
| 54 | Hierarchical Hourglass Convolutional Network for Efficient Video Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Particularly, rigid temporal convolution would fail to capture correct motions when a specific target moves out of the reception field of temporal convolution between adjacent frames.To tackle large visual displacements between temporal neighbors, we propose a new temporal convolution namedHourglass Convolution (HgC). |
Yi Tan; Yanbin Hao; Hao Zhang; Shuo Wang; Xiangnan He; |
| 55 | Event-guided Video Clip Generation from Blurry Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Event sequences show high temporal resolution and high dynamic range, while intensity images easily suffer from motion blur due to the low frame rate of APS. In this paper, we present an end-to-end convolutional neural network based method under the local and global constraints of events to restore clear, sharp intensity frames through collaborative learning from a blurry image and its associated event streams. |
Xin Ding; Tsuyoshi Takatani; Zhongyuan Wang; Ying Fu; Yinqiang Zheng; |
| 56 | Point to Rectangle Matching for Image Text Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that such a deterministic point mapping is obviously insufficient to represent a potential set of retrieval results for one-to-many correspondence, despite its noticeable progress. As a remedy to this issue, we propose a Point to Rectangle Matching (abbreviated as P2RM) mechanism, which actually is a geometric representation learning method for image-text retrieval. |
Zheng Wang; Zhenwei Gao; Xing Xu; Yadan Luo; Yang Yang; Heng Tao Shen; |
| 57 | Reflecting on Experiences for Response Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Multimodal dialogue systems attract much attention recently, but they are far from skills like: 1) automatically generate context- specific responses instead of safe but general responses; 2) naturally coordinate between the different information modalities (e.g. text and image) in responses; 3) intuitively explain the reasons for generated responses and improve a specific response without re-training the whole model. To approach these goals, we propose a different angle for the task – Reflecting Experiences for Response Generation (RERG). |
Chenchen Ye; Lizi Liao; Suyu Liu; Tat-Seng Chua; |
| 58 | Ordered Attention for Coherent Visual Storytelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the problem of visual storytelling, i.e., generating a story for a given sequence of images. |
Tom Braude; Idan Schwartz; Alex Schwing; Ariel Shamir; |
| 59 | CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most of recent studies focus on cross-modal contrastive learning (CMCL) to promote image-text alignment by pulling embeddings of positive sample pairs together while pushing those of negative pairs apart, which ignores the natural asymmetry property between different modalities and requires large-scale image-text corpus to achieve arduous progress. To mitigate this predicament, we propose CMAL, a Cross-Modal Associative Learning framework with anchor points detection and cross-modal associative learning for VLP. |
Zhiyuan Ma; Jianjun Li; Guohui Li; Kaiyan Huang; |
| 60 | Video Coding Using Learned Latent GAN Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose in this paper a new paradigm for facial video compression. |
Mustafa Shukor; Bharath Bhushan Damodaran; Xu Yao; Pierre Hellier; |
| 61 | MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training Via Multi-Stage Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Meanwhile, multi-level alignments are inherently consistent and able to facilitate the representation learning synergistically. Therefore, in this paper, we propose to learn Multi-level semantic alignment for Vision-language Pre-TRaining (MVPTR). |
Zejun Li; Zhihao Fan; Huaixiao Tou; Jingjing Chen; Zhongyu Wei; Xuanjing Huang; |
| 62 | ARRA: Absolute-Relative Ranking Attack Against Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method termed Absolute-Relative Ranking Attack (ARRA) that considers a more practical attack scenario. |
Siyuan Li; Xing Xu; Zailei Zhou; Yang Yang; Guoqing Wang; Heng Tao Shen; |
| 63 | Robust Industrial UAV/UGV-Based Unsupervised Domain Adaptive Crack Recognitions with Depth and Edge Awareness: From System and Database Constructions to Real-Site Inspections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a complete crack inspection system with three main components, including the autonomous system setup, the geographic-information-system-based 3D reconstruction, and the database construction as well as domain adaptive algorithms design. |
Kangcheng Liu; |
| 64 | Token Embeddings Alignment for Cross-Modal Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Token Embeddings AlignMent (TEAM) block, it first explicitly aligns visual tokens and textual tokens, then produces token-level matching scores to measure fine-grained similarity between input image and text. |
Chen-Wei Xie; Jianmin Wu; Yun Zheng; Pan Pan; Xian-Sheng Hua; |
| 65 | DHHN: Dual Hierarchical Hybrid Network for Weakly-Supervised Audio-Visual Video Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such a design exist two defects: 1) The various semantic information hidden in temporal lengths is neglected, which may lead the models to learn incorrect information; 2) Due to the joint context modeling, the unique features of different modalities are not fully explored. In this paper, we propose a novel AVVP framework termedDual Hierarchical Hybrid Network (DHHN) to tackle the above two problems. |
Xun Jiang; Xing Xu; Zhiguo Chen; Jingran Zhang; Jingkuan Song; Fumin Shen; Huimin Lu; Heng Tao Shen; |
| 66 | Learning to Retrieve Videos By Asking Questions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This can be suboptimal if the initial query has ambiguities, which would lead to many falsely retrieved videos. To overcome this limitation, we propose a novel framework for Video Retrieval using Dialog (ViReD), which enables the user to interact with an AI agent via multiple rounds of dialog. |
Avinash Madasu; Junier Oliva; Gedas Bertasius; |
| 67 | ParseMVS: Learning Primitive-aware Surface Representations for Sparse Multi-view Stereopsis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle the challenging MVS task from sparsely sampled views (up to an order of magnitude fewer images), which is more practical and cost-efficient in applications. |
Haiyang Ying; Jinzhi Zhang; Yuzhe Chen; Zheng Cao; Jing Xiao; Ruqi Huang; Lu Fang; |
| 68 | Unsupervised Video Hashing with Multi-granularity Contextualization and Multi-structure Preservation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose aMulti-granularity Contextualized and Multi-Structure preserved Hashing (MCMSH) method, exploring multiple axial contexts for discriminative video representation generation and various structural information for unsupervised learning simultaneously. |
Yanbin Hao; Jingru Duan; Hao Zhang; Bin Zhu; Pengyuan Zhou; Xiangnan He; |
| 69 | Understanding Political Polarization Via Jointly Modeling Users, Connections and Multimodal Contents on Heterogeneous Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we adopt a heterogeneous graph neural network to jointly model user characteristics, multimodal post contents as well as user-item relations in a bipartite graph to learn a comprehensive and effective user embedding without requiring ideology labels. |
Hanjia Lyu; Jiebo Luo; |
| 70 | You Only Align Once: Bidirectional Interaction for Spatial-Temporal Video Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an effective yet efficient recurrent network with bidirectional interaction for ST-VSR, where only one alignment and fusion is needed. |
Mengshun Hu; Kui Jiang; Zhixiang Nie; Zheng Wang; |
| 71 | Progressive Spatial-temporal Collaborative Network for Video Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most video frame interpolation (VFI) algorithms infer the intermediate frame with the help of adjacent frames through the cascaded motion estimation and content refinement.However, the intrinsic correlations between motion and content are barely investigated, commonly producing interpolated results with inconsistency and blurry contents.Specifically, we first discover a simple yet essential domain knowledge that contents and motions characteristics should be homogeneous to a certain degree from the same objects, and formulate the consistency into the loss function for model optimization. Based on this, we propose to learn the collaborative representation between motions and contents, and construct a novel progressive spatial-temporal Collaborative network (Prost-Net) for video frame interpolation.Specifically, we develop a content-guided motion module (CGMM) and a motion-guided content module (MGCM) for individual content and motion representation. |
Mengshun Hu; Kui Jiang; Liang Liao; Zhixiang Nie; Jing Xiao; Zheng Wang; |
| 72 | End-to-End 3D Face Reconstruction with Expressions and Specular Albedos from Single In-the-wild Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a convolutional neural network based framework to regress the face model from a single image in the wild. |
Qixin Deng; Binh H. Le; Aobo Jin; Zhigang Deng; |
| 73 | Video Moment Retrieval with Hierarchical Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we introduce a hierarchical contrastive learning method to better align the query and video by maximizing the mutual information (MI) between query and three different granularities of video to learn informative representations. |
Bolin Zhang; Chao Yang; Bin Jiang; Xiaokang Zhou; |
| 74 | AVA-AVD: Audio-visual Speaker Diarization in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing audio-visual diarization datasets are mainly focused on indoor environments like meeting rooms or news studios, which are quite different from in-the-wild videos in many scenarios such as movies, documentaries, and audience sitcoms. To develop diarization methods for these challenging videos, we create the AVA Audio-Visual Diarization (AVA-AVD) dataset. |
Eric Zhongcong Xu; Zeyang Song; Satoshi Tsutsui; Chao Feng; Mang Ye; Mike Zheng Shou; |
| 75 | On Generating Identifiable Virtual Faces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formalize and tackle the problem of generating identifiable virtual face images. |
Zhuowen Yuan; Zhengxin You; Sheng Li; Zhenxing Qian; Xinpeng Zhang; Alex Kot; |
| 76 | Learning Granularity-Unified Representations for Text-to-Image Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an end-to-end framework based on transformers to learn granularity-unified representations for both modalities, denoted as LGUR. |
Zhiyin Shao; Xinyu Zhang; Meng Fang; Zhifeng Lin; Jian Wang; Changxing Ding; |
| 77 | Self-Supervised Text Erasing with Controllable Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we study an unsupervised scenario by proposing a novel Self-supervised Text Erasing (STE) framework that jointly learns to synthesize training images with erasure ground-truth and accurately erase texts in the real world. |
Gangwei Jiang; Shiyao Wang; Tiezheng Ge; Yuning Jiang; Ying Wei; Defu Lian; |
| 78 | GT-MUST: Gated Try-on By Learning The Mannequin-Specific Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, ”shape-only transformation” ignores the local structures and results in unnatural distortions. To address this issue, we propose a Gated Try-on method by learning the ManneqUin-Specific Transformation (GT-MUST). |
Ning Wang; Jing Zhang; Lefei Zhang; Dacheng Tao; |
| 79 | Improving Meeting Inclusiveness Using Speech Interruption Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One challenge is that VRH is used in less than $1\%$ of all meetings. In order to drive adoption of its usage to improve inclusiveness (and participation), we present a machine learning-based system that predicts when a meeting participant attempts to obtain the floor, but fails to interrupt (termed a ‘failed interruption’). |
Szu-Wei Fu; Yaran Fan; Yasaman Hosseinkashi; Jayant Gupchup; Ross Cutler; |
| 80 | DDGHM: Dual Dynamic Graph with Hybrid Metric Training for Cross-Domain Sequential Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenges, we propose DDGHM, a novel framework for the CDSR problem, which includes two main modules, i.e., dual dynamic graph modeling and hybrid metric training. |
Xiaolin Zheng; Jiajie Su; Weiming Liu; Chaochao Chen; |
| 81 | Efficient Multiple Kernel Clustering Via Spectral Perturbation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, based on the spectral perturbation theory, we propose an efficient MKC method that reduces the computational complexity from O(n3) to O(nk2 + k3), with n and k denoting the number of data samples and the number of clusters, respectively. |
Chang Tang; Zhenglai Li; Weiqing Yan; Guanghui Yue; Wei Zhang; |
| 82 | Unsupervised and Pseudo-Supervised Vision-Language Alignment in Visual Dialog Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most current work either focuses on attention-based fusion or pre-training on large-scale image-text pairs, ignoring the critical role of explicit vision-language alignment in visual dialog. To remedy this defect, we propose a novel unsupervised and pseudo-supervised vision-language alignment approach for visual dialog (AlignVD). |
Feilong Chen; Duzhen Zhang; Xiuyi Chen; Jing Shi; Shuang Xu; Bo XU; |
| 83 | You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel local descriptor-based framework, called You Only Hypothesize Once (YOHO), for the registration of two unaligned point clouds. |
Haiping Wang; Yuan Liu; Zhen Dong; Wenping Wang; |
| 84 | CRNet: Unsupervised Color Retention Network for Blind Motion Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there is a great chromatic aberration between the latent and original images, directly degrading the performance. In this paper, we therefore propose a novel unsupervised color retention network termed CRNet to perform blind motion deblurring. |
Suiyi Zhao; Zhao Zhang; Richang Hong; Mingliang Xu; Haijun Zhang; Meng Wang; Shuicheng Yan; |
| 85 | X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, this paper presents a novel multi-grained contrastive model, namely X-CLIP, for video-text retrieval. |
Yiwei Ma; Guohai Xu; Xiaoshuai Sun; Ming Yan; Ji Zhang; Rongrong Ji; |
| 86 | Sketch Transformer: Asymmetrical Disentanglement Learning from Dynamic Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel asymmetrical disentanglement and dynamic synthesis learning method in the transformer framework (SketchTrans) to handle modality discrepancy by combining modality-shared information with modality-specific information. |
Cuiqun Chen; Mang Ye; Meibin Qi; Bo Du; |
| 87 | Complementarity-Enhanced and Redundancy-Minimized Collaboration Network for Multi-agent Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this paper presents a complementarity-enhanced and redundancy-minimized collaboration network (CRCNet), for efficiently guiding and supervising the fusion among shared features. |
Guiyang Luo; Hui Zhang; Quan Yuan; Jinglin Li; |
| 88 | Cross-Domain and Cross-Modal Knowledge Distillation in Domain Adaptation for 3D Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel model (Dual-Cross) that integrates Cross-Domain Knowledge Distillation (CDKD) and Cross-Modal Knowledge Distillation (CMKD) to mitigate domain shift. |
Miaoyu Li; Yachao Zhang; Yuan Xie; Zuodong Gao; Cuihua Li; Zhizhong Zhang; Yanyun Qu; |
| 89 | MIntRec: A New Dataset for Multimodal Intent Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Most existing intent recognition methods have limitations in leveraging the multimodal information due to the restrictions of the benchmark datasets with only text information. This paper introduces a novel dataset for multimodal intent recognition (MIntRec) to address this issue. |
Hanlei Zhang; Hua Xu; Xin Wang; Qianrui Zhou; Shaojie Zhao; Jiayan Teng; |
| 90 | R-FEC: RL-based FEC Adjustment for Better QoE in WebRTC Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For instance, the higher FEC may enhance the tolerance to packet losses, but it may increase latency due to FEC processing overhead and hurt the video quality due to the additional bandwidth used for FEC. To address this issue, we propose R-FEC which is a reinforcement learning (RL) based framework for video and FEC bitrate decisions in video conferencing. |
Insoo Lee; Seyeon Kim; Sandesh Sathyanarayana; Kyungmin Bin; Song Chong; Kyunghan Lee; Dirk Grunwald; Sangtae Ha; |
| 91 | Depth-inspired Label Mining for Unsupervised RGB-D Salient Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Compared with the well-studied unsupervised SOD in the RGB domain, deep unsupervised RGB-D SOD is a less explored direction in the literature. In this paper, we propose to tackle this task by introducing a novel systemic design for high-quality pseudo-label mining. |
Teng Yang; Yue Wang; Lu Zhang; Jinqing Qi; Huchuan Lu; |
| 92 | VMRF: View Matching Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design VMRF, an innovative view matching NeRF that enables effective NeRF training without requiring prior knowledge in camera poses or camera pose distributions. |
Jiahui Zhang; Fangneng Zhan; Rongliang Wu; Yingchen Yu; Wenqing Zhang; Bai Song; Xiaoqin Zhang; Shijian Lu; |
| 93 | FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This causes cumbersome deployment and degradation of speech quality due to error propagation; 2) The audio reconstruction algorithm used by these models limits the inference speed and audio quality, while neural vocoders are not available for these models since their output spectrograms are not accurate enough; 3) The autoregressive model suffers from high inference latency, while the flow-based model has high memory occupancy: neither of them is efficient enough in both time and memory usage. To tackle these problems, we propose FastLTS, a non-autoregressive end-to-end model which can directly synthesize high-quality speech audios from unconstrained talking videos with low latency, and has a relatively small model size. |
Yongqi Wang; Zhou Zhao; |
| 94 | Look Less Think More: Rethinking Compositional Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Beyond that, we propose a simple yet effective framework, Look Less Think More (LLTM), to reduce the strong association between visual objects and action-level labels (Look Less), and then discover the commonsense relationships between object categories and human actions (Think More). |
Rui Yan; Peng Huang; Xiangbo Shu; Junhao Zhang; Yonghua Pan; Jinhui Tang; |
| 95 | Co-Completion for Occluded Facial Expression Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an intuitive and simplified workflow, Co-Completion, which combines occlusion discarding and feature completion together to reduce the impact of occlusions on facial expression recognition. |
Zhen Xing; Weimin Tan; Ruian He; Yangle Lin; Bo Yan; |
| 96 | Multi-Level Region Matching for Fine-Grained Sketch-Based Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, we argue that matching different regions between each sketch-image pair can further boost model robustness. Therefore, we propose Multi-Level Region Matching (MLRM) for FG-SBIR, which consists of two modules: a Discriminative Region Extraction module (DRE) and a Region and Level Attention module (RLA). |
Zhixin Ling; Zhen Xing; Jiangtong Li; Li Niu; |
| 97 | Enhancement By Your Aesthetic: An Intelligible Unsupervised Personalized Enhancer for Low-Light Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an intelligible unsupervised personalized enhancer (iUP-Enhancer) for low-light images, which establishes the correlations between the low-light and the unpaired reference images with regard to three user-friendly attributions (brightness, chromaticity, and noise). |
Naishan Zheng; Jie Huang; Qi Zhu; Man Zhou; Feng Zhao; Zheng-Jun Zha; |
| 98 | CoHOZ: Contrastive Multimodal Prompt Tuning for Hierarchical Open-set Zero-shot Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The existing literature mostly addresses these two problems separately. In this paper, we take the ambition for solving the combination of these two problems to fulfill semantically recognizing the unknown classes detected in OSR by zero-shot prediction. |
Ning Liao; Yifeng Liu; Li Xiaobo; Chenyi Lei; Guoxin Wang; Xian-Sheng Hua; Junchi Yan; |
| 99 | Global Meets Local: Effective Multi-Label Image Classification Via Category-Aware Weak Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose category-aware weak supervision to concentrate on non-existent categories so as to provide deterministic information for local feature learning, restricting the local branch to focus on more high-quality regions of interest. |
Jiawei Zhan; Jun Liu; Wei Tang; Guannan Jiang; Xi Wang; Bin-Bin Gao; Tianliang Zhang; Wenlong Wu; Wei Zhang; Chengjie Wang; Yuan Xie; |
| 100 | Dual Contrastive Learning for Spatio-temporal Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Simply regarding them as positive pairs will draw the model to the static background rather than the motion pattern. To tackle this challenge, this paper presents a novel dual contrastive formulation. |
Shuangrui Ding; Rui Qian; Hongkai Xiong; |
| 101 | Heterogeneous Learning for Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to explicitly emphasize the heterogeneity in SGG, we propose a plug-and-play Heterogeneous Learning Branch (HLB), which enhances the independent representation capability of relation features. |
Yunqing He; Tongwei Ren; Jinhui Tang; Gangshan Wu; |
| 102 | MAPLE: Masked Pseudo-Labeling AutoEncoder for Semi-supervised Point Cloud Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Masked Pseudo-Labeling autoEncoder (MAPLE) framework to learn effective representations with much fewer annotations for point cloud action recognition. |
Xiaodong Chen; Wu Liu; Xinchen Liu; Yongdong Zhang; Jungong Han; Tao Mei; |
| 103 | Time and Memory Efficient Large-Scale Canonical Correlation Analysis in Fourier Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the high complexity involved in pursuing eigenvector lays a heavy burden on the memory and computational time, making CCA nearly impractical in large-scale cases. In this paper, we attempt to overcome this issue by representing the data in the Fourier domain. |
Xiang-Jun Shen; Zhaorui Xu; Liangjun Wang; Zechao Li; |
| 104 | RONF: Reliable Outlier Synthesis Under Noisy Feature Space for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Reliable Outlier synthesis under Noisy Feature space (RONF), which synthesizes reliable virtual outliers in noisy feature space to provide supervision signals for model regularization. |
Rundong He; Zhongyi Han; Xiankai Lu; Yilong Yin; |
| 105 | No-Reference Image Quality Assessment Using Dynamic Complex-Valued Neural Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims at improving the power of CNNs for NR-IQA in two aspects. |
Zihan Zhou; Yong Xu; Ruotao Xu; Yuhui Quan; |
| 106 | Confederated Learning: Going Beyond Centralization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider a generalized paradigm: different roles are granted multiple permissions to complete their corresponding jobs, called Confederated Learning. |
Zitai Wang; Qianqian Xu; Ke Ma; Xiaochun Cao; Qingming Huang; |
| 107 | Correspondence Matters for Video Referring Expression Comprehension Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel Dual Correspondence Network (dubbed as DCNet) which explicitly enhances the dense associations in both the inter-frame and cross-modal manners. |
Meng Cao; Ji Jiang; Long Chen; Yuexian Zou; |
| 108 | Keyword Spotting in The Homomorphic Encrypted Domain Using Deep Complex-Valued CNN Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a non-interactive scheme to achieve end-to-end keyword spotting in the homomorphic encrypted domain using deep learning techniques. |
Peijia Zheng; Zhiwei Cai; Huicong Zeng; Jiwu Huang; |
| 109 | DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thereby propose a novel Dynamical Semantic Evolution GAN (DSE-GAN) to re-compose each stage’s text features under a novel single adversarial multi-stage architecture. |
Mengqi Huang; Zhendong Mao; Penghui Wang; Quan Wang; Yongdong Zhang; |
| 110 | ELMformer: Efficient Raw Image Restoration with A Locally Multiplicative Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to get raw images of high quality for downstream Image Signal Process (ISP), in this paper we present an Efficient Locally Multiplicative Transformer called ELMformer for raw image restoration. |
Jiaqi Ma; Shengyuan Yan; Lefei Zhang; Guoli Wang; Qian Zhang; |
| 111 | Towards All Weather and Unobstructed Multi-Spectral Image Stitching: Algorithm and Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To remedy the deficient imaging of optical sensors, we investigate the complementarity across infrared and visible images to improve the perception of scenes in terms of visual information and viewing ranges. |
Zhiying Jiang; Zengxi Zhang; Xin Fan; Risheng Liu; |
| 112 | Mutual Adaptive Reasoning for Monocular 3D Multi-Person Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing bottom-up methods treat camera-centric 3D human pose estimation as two unrelated subtasks: 2.5D pose estimation and camera-centric depth estimation. In this paper, we propose a unified model that leverages the mutual benefits of both these subtasks. |
Juze Zhang; Jingya Wang; Ye Shi; Fei Gao; Lan Xu; Jingyi Yu; |
| 113 | MM-ALT: A Multimodal Automatic Lyric Transcription System Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, ALT with audio data alone is a notoriously difficult task due to instrumental accompaniment and musical constraints resulting in degradation of both the phonetic cues and the intelligibility of sung lyrics. To tackle this challenge, we propose the MultiModal Automatic Lyric Transcription system (MM-ALT), together with a new dataset, N20EM, which consists of audio recordings, videos of lip movements, and inertial measurement unit (IMU) data of an earbud worn by the performing singer. |
Xiangming Gu; Longshen Ou; Danielle Ong; Ye Wang; |
| 114 | Prompt-based Zero-shot Video Moment Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To avoid the lack of visual features, we propose a Prompt-based Zero-shot Video Moment Retrieval (PZVMR) method. |
Guolong Wang; Xun Wu; Zhaoyuan Liu; Junchi Yan; |
| 115 | Towards Adversarial Attack on Vision-Language Pre-training Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper studied the adversarial attack on popular VLP models and V+L tasks. |
Jiaming Zhang; Qi Yi; Jitao Sang; |
| 116 | Self-Supervised Multi-view Stereo Via Adjacent Geometry Guided Volume Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we observe that adjacent geometry shares certain commonality that can help to infer the correct geometry of the challenging or low-confident regions. |
Luoyuan Xu; Tao Guan; Yuesong Wang; Yawei Luo; Zhuo Chen; Wenkai Liu; Wei Yang; |
| 117 | Prompting for Multi-Modal Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, multi-modal tracking still severely suffers from data deficiency, thus resulting in the insufficient learning of fusion modules. Instead of building such a fusion module, in this paper, we provide a new perspective on multi-modal tracking by attaching importance to the multi-modal visual prompts. |
Jinyu Yang; Zhe Li; Feng Zheng; Ales Leonardis; Jingkuan Song; |
| 118 | The More, The Better? Active Silencing of Non-Positive Transfer for Efficient Multi-Domain Few-Shot Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, with extensive empirical evidence, we find more is not always better: current models do not necessarily benefit from pre-training on more base classes and domains, since the pre-trained knowledge might be non-positive for a downstream task. In this work, we hypothesize that such redundant pre-training can be avoided without compromising the downstream performance. |
Xingxing Zhang; Zhizhe Liu; Weikai Yang; Liyuan Wang; Jun Zhu; |
| 119 | Free-Lunch for Cross-Domain Few-Shot Learning: Style-Aware Episodic Training with Robust Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Compared to the well-studied few-shot learning problem, the difficulty for CDFSL lies in that the available training data from test tasks is not only extremely limited but also presents severe class differences from training tasks. To tackle this challenge, we propose Style-aware Episodic Training with Robust Contrastive Learning (SET-RCL), which is motivated by the key observation that a remarkable style-shift between tasks from source and target domains plays a negative role in cross-domain generalization. |
Ji Zhang; Jingkuan Song; Lianli Gao; Hengtao Shen; |
| 120 | Simple Self-supervised Multiplex Graph Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple method to achieve efficient and effective SMGRL. |
Yujie Mo; Yuhuan Chen; Liang Peng; Xiaoshuang Shi; Xiaofeng Zhu; |
| 121 | Counterfactually Measuring and Eliminating Social Bias in Vision-Language Pre-training Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we (1) introduce a counterfactual-based bias measurement CounterBias to quantify the social bias in VLP models by comparing the [MASK]ed prediction probabilities of factual and counterfactual samples; (2) construct a novel VL-Bias dataset including 24K image-text pairs for measuring gender bias in VLP models, from which we observed that significant gender bias is prevalent in VLP models; and (3) propose a VLP debiasing method FairVLP to minimize the difference in the [MASK]ed prediction probabilities between factual and counterfactual image-text pairs for VLP debiasing. |
Yi Zhang; Junyang Wang; Jitao Sang; |
| 122 | Invariant Representation Learning for Multimedia Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, suppose a user bought two dresses on the same model, this co-occurrence would produce a correlation between the model and purchases, but the correlation is spurious from the view of fashion recommendation. Existing work alleviates this issue by customizing preference-aware representations, requiring high-cost analysis and design.In this paper, we propose an Invariant Representation Learning Framework (InvRL) to alleviate the impact of the spurious correlations. |
Xiaoyu Du; Zike Wu; Fuli Feng; Xiangnan He; Jinhui Tang; |
| 123 | DuetFace: Collaborative Privacy-Preserving Face Recognition Via Channel Splitting in The Frequency Domain Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents DuetFace, a novel privacy-preserving face recognition method that employs collaborative inference in the frequency domain. |
Yuxi Mi; Yuge Huang; Jiazhen Ji; Hongquan Liu; Xingkun Xu; Shouhong Ding; Shuigeng Zhou; |
| 124 | Skeleton-based Action Recognition Via Adaptive Cross-Form Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods (either adapting structure of GCNs or model ensemble) require the co-existence of all skeleton forms during both training and inference stages, while a typical situation in real life is the existence of only partial forms for inference. To tackle this, we present Adaptive Cross-Form Learning (ACFL), which empowers well-designed GCNs to generate complementary representation from single-form skeletons without changing model capacity. |
Xuanhan Wang; Yan Dai; Lianli Gao; Jingkuan Song; |
| 125 | Unsupervised Domain Adaptation Integrating Transformer and Mutual Information for Cross-Corpus Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper focuses on an interesting task, i.e., unsupervised cross-corpus Speech Emotion Recognition (SER), in which the labelled training (source) corpus and the unlabelled testing (target) corpus have different feature distributions, resulting in the discrepancy between the source and target domains. To address this issue, this paper proposes an unsupervised domain adaptation method integrating Transformers and Mutual Information (MI) for cross-corpus SER. |
Shiqing Zhang; Ruixin Liu; Yijiao Yang; Xiaoming Zhao; Jun Yu; |
| 126 | TGDM: Target Guided Dynamic Mixup for Cross-Domain Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To help knowledge transfer, this paper introduces an intermediate domain generated by mixing images in the source and the target domain. |
Linhai Zhuo; Yuqian Fu; Jingjing Chen; Yixin Cao; Yu-Gang Jiang; |
| 127 | Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new task, named Temporal Emotion Localization in videos (TEL), which aims to detect human emotions and localize their corresponding temporal boundaries in untrimmed videos with aligned subtitles. |
Juncheng Li; Junlin Xie; Linchao Zhu; Long Qian; Siliang Tang; Wenqiao Zhang; Haochen Shi; Shengyu Zhang; Longhui Wei; Qi Tian; Yueting Zhuang; |
| 128 | Bi-directional Heterogeneous Graph Hashing Towards Efficient Outfit Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, the majority of existing methods mainly focus on improving the recommendation effectiveness, while overlooking the recommendation efficiency. Inspired by this, we devise a novel bi-directional heterogeneous graph hashing scheme, called BiHGH, towards efficient personalized outfit recommendation. |
Weili Guan; Xuemeng Song; Haoyu Zhang; Meng Liu; Chung-Hsing Yeh; Xiaojun Chang; |
| 129 | Cycle Encoding of A StyleGAN Encoder for Improved Reconstruction and Editability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a simple yet effective method, named cycle encoding, for a high-quality pivot code. |
Xudong Mao; Liujuan Cao; Aurele Tohokantche Gnanha; Zhenguo Yang; Qing Li; Rongrong Ji; |
| 130 | PreyNet: Preying on Camouflaged Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we strive to seek answers for accurate COD and propose a PreyNet, which mimics the two processes of predation, namely, initial detection (sensory mechanism) and predator learning (cognitive mechanism). |
Miao Zhang; Shuang Xu; Yongri Piao; Dongxiang Shi; Shusen Lin; Huchuan Lu; |
| 131 | A Lightweight Graph Transformer Network for Human Mesh Reconstruction from 2D Human Pose Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present GTRS, a lightweight pose-based method that can reconstruct human mesh from 2D human pose. |
Ce Zheng; Matias Mendieta; Pu Wang; Aidong Lu; Chen Chen; |
| 132 | Temporal Sentiment Localization: Listen and Look in Untrimmed Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing works simply classify a video into a single sentimental category, ignoring the fact that sentiment in untrimmed videos may appear in multiple segments with varying lengths and unknown locations. To address this, we propose a challenging task, i.e., Temporal Sentiment Localization (TSL), to find which parts of the video convey sentiment. |
Zhicheng Zhang; Jufeng Yang; |
| 133 | Dynamic Graph Reasoning for Multi-person 3D Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose GR-M3D, which models the Multi-person 3D pose estimation with dynamic Graph Reasoning. |
Zhongwei Qiu; Qiansheng Yang; Jian Wang; Dongmei Fu; |
| 134 | IVT: An End-to-End Instance-guided Video Transformer for 3D Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we simplify the paradigm into an end-to-end framework, Instance-guided Video Transformer (IVT), which enables learning spatiotemporal contextual depth information from visual features effectively and predicts 3D poses directly from video frames. |
Zhongwei Qiu; Qiansheng Yang; Jian Wang; Dongmei Fu; |
| 135 | Fine-tuning with Multi-modal Entity Prompts for News Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the problem, we investigate a concise and flexible paradigm to achieve global entity-aware by introducing a prompting mechanism with fine-tuning pre-trained models, named Fine-tuning with Multi-modal Entity Prompts for News Image Captioning (NewsMEP). |
Jingjing Zhang; Shancheng Fang; Zhendong Mao; Zhiwei Zhang; Yongdong Zhang; |
| 136 | Global-Local Cross-View Fisher Discrimination for View-Invariant Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: View change brings a significant challenge to action representation and recognition due to pose occlusion and deformation. We propose a Global-Local Cross-View Fisher Discrimination (GL-CVFD) algorithm to tackle this problem. |
Lingling Gao; Yanli Ji; Yang Yang; HengTao Shen; |
| 137 | Learning A Dynamic Cross-Modal Network for Multispectral Pedestrian Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Further, we argue that a local tight correspondence across modalities is desired for multi-modal feature aggregation. To address these issues, we introduce a multispectral pedestrian detection framework that comprises a novel dynamic cross-modal network (DCMNet), which strives to adaptively utilize the local and non-local complementary information between multi-modal features. |
Jin Xie; Rao Muhammad Anwer; Hisham Cholakkal; Jing Nie; Jiale Cao; Jorma Laaksonen; Fahad Shahbaz Khan; |
| 138 | A Parameter-free Multi-view Information Bottleneck Clustering Method By Cross-view Weighting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel parameter-free multi-view information bottleneck (PMIB) clustering method to automatically identify and exploit useful complementary information among views, thus reducing the negative impact from the harmful views. |
Shizhe Hu; Ruilin Geng; Zhaoxu Cheng; Chaoyang Zhang; Guoliang Zou; Zhengzheng Lou; Yangdong Ye; |
| 139 | Adaptive Hypergraph Convolutional Network for No-Reference 360-degree Image Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Third, in the graph construction, they only consider the spatial location of the viewport, ignoring its content characteristics. Accordingly, to address these issues, we propose an adaptive hypergraph convolutional network for NR 360IQA, denoted as AHGCN. |
Jun Fu; Chen Hou; Wei Zhou; Jiahua Xu; Zhibo Chen; |
| 140 | RKformer: Runge-Kutta Transformer with Random-Connection Attention for Infrared Small Target Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the small scale of targets as well as noise and clutter in the background, current deep neural network-based methods struggle in extracting features with discriminative semantics while preserving fine details. In this paper, we address this problem by proposing a novel RKformer model with an encoder-decoder structure, where four specifically designed Runge-Kutta transformer (RKT) blocks are stacked sequentially in the encoder. |
Mingjin Zhang; Haichen Bai; Jing Zhang; Rui Zhang; Chaoyue Wang; Jie Guo; Xinbo Gao; |
| 141 | Exploring Feature Compensation and Cross-level Correlation for Infrared Small Target Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, SIRST detection is challenging due to the low-contrast between small targets and noisy background in infrared images. To address this challenge, we propose a novel FC3-Net by exploring feature compensation and cross-level correlation for SIRST detection. |
Mingjin Zhang; Ke Yue; Jing Zhang; Yunsong Li; Xinbo Gao; |
| 142 | MAVT-FG: Multimodal Audio-Visual Transformer for Weakly-supervised Fine-Grained Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Multimodal Audio-Visual Transformer for Weakly-supervised Fine-Grained Recognition (MAVT-FG) model which incorporates audio-visual modalities. |
Xiaoyu Zhou; Xiaotong Song; Hao Wu; Jingran Zhang; Xing Xu; |
| 143 | MMT: Image-guided Story Ending Generation with Multimodal Memory Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To date, existing methods for IgSEG ignore the relationships between the multimodal information and do not integrate multimodal features appropriately. Therefore, in this work, we propose Multimodal Memory Transformer (MMT), an end-to-end framework that models and fuses both contextual and visual information to effectively capture the multimodal dependency for IgSEG. |
Dizhan Xue; Shengsheng Qian; Quan Fang; Changsheng Xu; |
| 144 | Gaze- and Spacing-flow Unveil Intentions: Hidden Follower Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Fortunately, from a socio-cognitive perspective, we found and verified the phenomena that the gaze-flow pattern and the spacing-flow pattern between hidden and normal followers are different. |
Danni Xu; Ruimin Hu; Zheng Wang; Linbo Luo; Dengshi Li; Wenjun Zeng; |
| 145 | KnifeCut: Refining Thin Part Segmentation with Cutting Lines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To refine the thin parts for unsatisfactory pre-segmentation, we propose an efficient interaction mode, where users only need to draw a line across the mislabeled thin part like cutting with a knife. |
Zheng Lin; Zheng-Peng Duan; Zhao Zhang; Chun-Le Guo; Ming-Ming Cheng; |
| 146 | You Can Even Annotate Text with Voice: Transcription-only-Supervised Text Spotting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a query-based paradigm to learn implicit location features via the interaction of text queries and image embeddings. |
Jingqun Tang; Su Qiao; Benlei Cui; Yuhang Ma; Sheng Zhang; Dimitrios Kanoulas; |
| 147 | Unbiased Directed Object Attention Graph for Object Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, in this paper, we propose a directed object attention (DOA) graph to guide the agent in explicitly learning the attention relationships between objects, thereby reducing the object attention bias. |
Ronghao Dang; Zhuofan Shi; Liuyi Wang; Zongtao He; Chengju Liu; Qijun Chen; |
| 148 | Few-shot Open-set Recognition Using Background As Unknowns Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to solve the problem from two novel aspects. |
Nan Song; Chi Zhang; Guosheng Lin; |
| 149 | Towards Open-Ended Text-to-Face Generation, Combination and Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a generative network for open-ended text-to-face generation, which is termed OpenFaceGAN. |
Jun Peng; Han Pan; Yiyi Zhou; Jing He; Xiaoshuai Sun; Yan Wang; Yongjian Wu; Rongrong Ji; |
| 150 | Learning Dynamic Prior Knowledge for Text-to-Face Pixel Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We define T2F as a pixel synthesis problem conditioned on the texts and propose a novel dynamic pixel synthesis network, PixelFace, for end-to-end T2F generation in this paper. |
Jun Peng; Xiaoxiong Du; Yiyi Zhou; Jing He; Yunhang Shen; Xiaoshuai Sun; Rongrong Ji; |
| 151 | Gait Recognition in The Wild with Multi-hop Temporal Switch Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current methods that obtain state-of-the-art performance on in-the-lab benchmarks achieve much worse accuracy on the recently proposed in-the-wild datasets because these methods can hardly model the varied temporal dynamics of gait sequences in unconstrained scenes. Therefore, this paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes. |
Jinkai Zheng; Xinchen Liu; Xiaoyan Gu; Yaoqi Sun; Chuang Gan; Jiyong Zhang; Wu Liu; Chenggang Yan; |
| 152 | Difference Residual Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The analytic solution to this objective function is determined by both graph topology and node attributes, which theoretically proves that DRC can prevent over-smoothing issue. |
Liang Yang; Weihang Peng; Wenmiao Zhou; Bingxin Niu; Junhua Gu; Chuan Wang; Yuanfang Guo; Dongxiao He; Xiaochun Cao; |
| 153 | Wavelet-enhanced Weakly Supervised Local Feature Learning for Face Forgery Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there are still unignorable problems: a) local feature learning requires patch-level labels to circumvent label noise, which is not practical in real-world scenarios; b) the commonly used DCT (FFT) transform loses all spatial information, which brings difficulty in handling local details. To compensate for such limitations, a novel wavelet-enhanced weakly supervised local feature learning framework is proposed in this paper. |
Jiaming Li; Hongtao Xie; Lingyun Yu; Yongdong Zhang; |
| 154 | Relative Alignment Network for Source-Free Multimodal Video Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing source-free domain adaptation methods cannot be directly applied to this task, since videos always suffer from domain discrepancy along both the multimodal and temporal aspects, which brings difficulties in domain adaptation especially when the source data are unavailable. In this paper, we propose a Multimodal and Temporal Relative Alignment Network (MTRAN) to deal with the above challenges. |
Yi Huang; Xiaoshan Yang; Ji Zhang; Changsheng Xu; |
| 155 | Proxy Probing Decoder for Weakly Supervised Object Localization: A Baseline Investigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such fine-tuning scheme would cause the model to degrade, e.g. affect the classification performance and generalization capabilities of the pre-trained model. In this paper, we propose a novel method named Proxy Probing Decoder (PPD) to meet these challenges, which utilizes the segmentation property of self-attention map in the self-supervised vision transformer and breaks through model fine-tuning with a novel proxy probing decoder. |
Jingyuan Xu; Hongtao Xie; Chuanbin Liu; Yongdong Zhang; |
| 156 | JPEG Compression-aware Image Forgery Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods suffer a severe performance drop when the forged images are JPEG compressed, which is widely applied in social media transmission. To tackle this issue, we propose a wavelet-based compression representation learning scheme for the specific JPEG-resistant image forgery localization. |
Menglu Wang; Xueyang Fu; Jiawei Liu; Zheng-Jun Zha; |
| 157 | Micro-video Tagging Via Jointly Modeling Social Influence and Tag Relation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on annotating micro-videos with tags. |
Xiao Wang; Tian Gan; Yinwei Wei; Jianlong Wu; Dai Meng; Liqiang Nie; |
| 158 | Detach and Attach: Stylized Image Captioning Without Paired Stylized Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the problems above, we propose a two-step method based on Transformer: firstly detach style representations from large-scaled stylized text-only corpus to provide more holistic style supervision, and secondly attach the style representations to image content to generate stylized captions. |
Yutong Tan; Zheng Lin; Peng Fu; Mingyu Zheng; Lanrui Wang; Yanan Cao; Weipinng Wang; |
| 159 | Counterfactual Reasoning for Out-of-distribution Multimodal Sentiment Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: From the graph, we find that the spurious correlations are attributed to the direct effect of textual modality on the model prediction while the indirect one is more reliable by considering multimodal semantics. Inspired by this, we devise a model-agnostic counterfactual framework for multimodal sentiment analysis, which captures the direct effect of textual modality via an extra text model and estimates the indirect one by a multimodal model. |
Teng Sun; Wenjie Wang; Liqaing Jing; Yiran Cui; Xuemeng Song; Liqiang Nie; |
| 160 | CycleHand: Increasing 3D Pose Estimation Ability on In-the-wild Monocular Image Through Cyclic Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current methods for 3D hand pose estimation fail to generalize well to in-the-wild new scenarios due to varying camera viewpoints, self-occlusions, and complex environments. To address this problem, we propose CycleHand to improve the generalization ability of the model in a self-supervised manner. |
Daiheng Gao; Xindi Zhang; Xingyu Chen; Andong Tan; Bang Zhang; Pan Pan; Ping Tan; |
| 161 | Counterexample Contrastive Learning for Spurious Correlation Elimination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that there actually exist valuable samples within the original dataset which are potential to assist model circumvent spurious correlations. |
Jinqiang Wang; Rui Hu; Chaoquan Jiang; Rui Hu; Jitao Sang; |
| 162 | Attribute-guided Dynamic Routing Graph Network for Transductive Few-shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing attribute-based methods usually treat the importance of different attributes as equals to conclude the sample relations, which cannot distinguish the classes with many similar attributes well. In order to address this problem, we propose an Attribute-guided Dynamic Routing Graph Network (ADRGN) to explicitly learn task-dependent attribute importance scores to help explore the sample relations in a fine-grained manner for adaptive graph-based inference. |
Chaofan Chen; Xiaoshan Yang; Ming Yan; Changsheng Xu; |
| 163 | SPTS: Single-Point Text Spotting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an end-to-end scene text spotting method that tackles scene text spotting as a sequence prediction task. |
Dezhi Peng; Xinyu Wang; Yuliang Liu; Jiaxin Zhang; Mingxin Huang; Songxuan Lai; Jing Li; Shenggao Zhu; Dahua Lin; Chunhua Shen; Xiang Bai; Lianwen Jin; |
| 164 | Rethinking The Metric in Few-shot Learning: From An Adaptive Multi-Distance Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, for the first time, we investigate the contributions of different distance metrics, and propose an adaptive fusion scheme, bringing significant improvements in few-shot classification. |
Jinxiang Lai; Siqian Yang; Guannan Jiang; Xi Wang; Yuxi Li; Zihui Jia; Xiaochen Chen; Jun Liu; Bin-Bin Gao; Wei Zhang; Yuan Xie; Chengjie Wang; |
| 165 | A Knowledge Augmented and Multimodal-Based Framework for Video Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a knowledge augmented and multimodal-based video summarization method, termed KAMV, to address the problem above. |
Jiehang Xie; Xuanbai Chen; Shao-Ping Lu; Yulu Yang; |
| 166 | BadHash: Invisible Backdoor Attacks Against Deep Hashing with Clean Label Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to improve the attack effectiveness, we introduce a label-based contrastive learning network LabCLN to exploit the semantic characteristics of different labels, which are subsequently used for confusing and misleading the target model to learn the embedded trigger. |
Shengshan Hu; Ziqi Zhou; Yechao Zhang; Leo Yu Zhang; Yifeng Zheng; Yuanyuan He; Hai Jin; |
| 167 | Searching Lightweight Neural Network for Image Signal Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most of these networks require heavy computation burden and thus are far from sufficient to be deployed on resource-limited platforms, including but not limited to mobile devices and FPGA. To tackle this challenge, we propose an automated search framework that derives ISP models with high image quality while satisfying the low-computation requirement. |
Haojia Lin; Lijiang Li; Xiawu Zheng; Fei Chao; Rongrong Ji; |
| 168 | Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates an open research problem of generating text-image pairs to improve the training of fine-grained image-to-text cross-modal retrieval task, and proposes a novel framework for paired data augmentation by uncovering the hidden semantic information of StyleGAN2 model. |
Hao Wang; Guosheng Lin; Steven Hoi; Chunyan Miao; |
| 169 | Learned Internet Congestion Control for Short Video Uploading Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present DuGu, a novel learning-based CC algorithm designed by considering the unique proprieties of video uploading via the probing phase and internet networking via the control phase. |
Tianchi Huang; Chao Zhou; Lianchen Jia; Rui-Xiao Zhang; Lifeng Sun; |
| 170 | Clustering Generative Adversarial Networks for Story Visualization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to build a concise and single-GAN-based network, neither depending on additional semantic information nor captioning networks. |
Bowen Li; Philip H. S. Torr; Thomas Lukasiewicz; |
| 171 | Alleviating Style Sensitivity Then Adapting: Source-free Domain Adaptation for Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel style-insensitive source-free domain adaptation framework (SI-SFDA) for medical image segmentation to reduce the impacts of style shifts. |
Yalan Ye; Ziqi Liu; Yangwuyong Zhang; Jingjing Li; Hengtao Shen; |
| 172 | Learning Visible Surface Area Estimation for Irregular Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel problem setting, deep learning for visible surface area estimation, which is the first trial to estimate the visible surface area for irregular objects from monocular images. |
Xu Liu; Jianing Li; Xianqi Zhang; Jingyuan Sun; Xiaopeng Fan; Yonghong Tian; |
| 173 | IDEA: Increasing Text Diversity Via Online Multi-Label Recognition for Vision-Language Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the observation that the texts incorporate incomplete fine-grained image information, we introduce IDEA, which stands for increasing text diversity via online multi-label recognition for VLP. |
Xinyu Huang; Youcai Zhang; Ying Cheng; Weiwei Tian; Ruiwei Zhao; Rui Feng; Yuejie Zhang; Yaqian Li; Yandong Guo; Xiaobo Zhang; |
| 174 | Hierarchical Few-Shot Object Detection: Problem, Benchmark and Method Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose and solve a new problem called hierarchical few-shot object detection (Hi-FSOD), which aims to detect objects with hierarchical categories in the FSOD paradigm. |
Lu Zhang; Yang Wang; Jiaogen Zhou; Chenbo Zhang; Yinglu Zhang; Jihong Guan; Yatao Bian; Shuigeng Zhou; |
| 175 | Zero-shot Video Classification with Appropriate Web and Task Knowledge Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, the category-object relationships are usually extracted from commonsense knowledge or word embedding, which is not consistent with video modality. To tackle these issues, we propose to mine associated objects and category-object relationships for each category from retrieved web images. |
Junbao Zhuo; Yan Zhu; Shuhao Cui; Shuhui Wang; Bin M A; Qingming Huang; Xiaoming Wei; Xiaolin Wei; |
| 176 | PIMoG: An Effective Screen-shooting Noise-Layer Simulation for Deep-Learning-Based Watermarking Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to design an effective noise layer for screen-shooting robustness, we propose new insight in this paper, that is, it is not necessary to quantitatively simulate the overall procedure in the screen-shooting noise layer, only including the most influenced distortions is enough to generate an effective noise layer with strong robustness. To verify this insight, we propose a screen-shooting noise layer dubbed PIMoG. |
Han Fang; Zhaoyang Jia; Zehua Ma; Ee-Chien Chang; Weiming Zhang; |
| 177 | Boosting Video-Text Retrieval with Explicit High-Level Semantics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods typically employ completely heterogeneous visual-textual information to align video and text, whilst lacking the awareness of homogeneous high-level semantic information residing in both modalities. To fill this gap, in this work, we propose a novel visual-linguistic aligning model named HiSE for VTR, which improves the cross-modal representation by incorporating explicit high-level semantics. |
Haoran Wang; Di Xu; Dongliang He; Fu Li; Zhong Ji; Jungong Han; Errui Ding; |
| 178 | Comprehensive Relationship Reasoning for Composed Query Based Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a comprehensive relationship reasoning network by fully exploring the four types of information for CQBIR, which mainly includes two key designs. |
Feifei Zhang; Ming Yan; Ji Zhang; Changsheng Xu; |
| 179 | Boat in The Sky: Background Decoupling and Object-aware Pooling for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel self-supervised framework to mitigate the semantic and spatial ambiguity from the perspectives of background bias and object perception. |
Jianjun Xu; Hongtao Xie; Hai Xu; Yuxin Wang; Sun-ao Liu; Yongdong Zhang; |
| 180 | ME-D2N: Multi-Expert Domain Decompositional Network for Cross-Domain Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Alternatively, learning CD-FSL models with few labeled target domain data which is more realistic and promising is advocated in previous work. Thus, in this paper, we stick to this setting and technically contribute a novel Multi-Expert Domain Decompositional Network (ME-D2N). |
Yuqian Fu; Yu Xie; Yanwei Fu; Jingjing Chen; Yu-Gang Jiang; |
| 181 | Image Quality Assessment: From Mean Opinion Score to Opinion Score Distribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a convolutional neural network based on fuzzy theory to predict the opinion score distribution of image quality. |
Yixuan Gao; Xiongkuo Min; Yucheng Zhu; Jing Li; Xiao-Ping Zhang; Guangtao Zhai; |
| 182 | Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and Adaptive Inference Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to feed different data samples with varying quantization schemes to achieve a data-dependent dynamic inference, at a fine-grained layer level. |
Chen Tang; Haoyu Zhai; Kai Ouyang; Zhi Wang; Yifei Zhu; Wenwu Zhu; |
| 183 | Adaptive Structural Similarity Preserving for Unsupervised Cross Modal Hashing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, most of them mainly focus on association mining and alignment among pairwise instances in continuous space but ignore the latent structural correlations contained in the semantic hashing space. In this paper, we propose an unsupervised hash learning framework ASSPH to solve the above problems. |
Liang Li; Baihua Zheng; Weiwei Sun; |
| 184 | Self-Supervised Human Pose Based Multi-Camera Video Synchronization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the human-centric video analysis and propose a self-supervised framework for the automatic multi-camera video synchronization. |
Liqiang Yin; Ruize Han; Wei Feng; Song Wang; |
| 185 | Real-World Blind Super-Resolution Via Feature Matching with Implicit High-Resolution Priors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Feature Matching SR (FeMaSR), which restores realistic HR images in a much more compact feature space. |
Chaofeng Chen; Xinyu Shi; Yipeng Qin; Xiaoming Li; Xiaoguang Han; Tao Yang; Shihui Guo; |
| 186 | Unified Normalization for Accelerating and Stabilizing Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We find that this dilemma is caused by abnormal behaviors of activation statistics, including large fluctuations over iterations and extreme outliers across layers. To tackle these issues, we propose Unified Normalization (UN), which can speed up the inference by being fused with other linear operations and achieve comparable performance on par with LN. |
Qiming Yang; Kai Zhang; Chaoxiang Lan; Zhi Yang; Zheyang Li; Wenming Tan; Jun Xiao; Shiliang Pu; |
| 187 | A Deep Learning Based No-reference Quality Assessment Model for UGC Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous UGC video quality assessment (VQA) studies either use the image recognition model or the image quality assessment (IQA) models to extract frame-level features of UGC videos for quality regression, which are regarded as the sub-optimal solutions because of the domain shifts between these tasks and the UGC VQA task. In this paper, we propose a very simple but effective UGC VQA model, which tries to address this problem by training an end-to-end spatial feature extraction network to directly learn the quality-aware spatial feature representation from raw pixels of the video frames. |
Wei Sun; Xiongkuo Min; Wei Lu; Guangtao Zhai; |
| 188 | Repainting and Imitating Learning for Lane Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we target at finding an enhanced feature space where the lane features are distinctive while maintaining a similar distribution of lanes in the wild. |
Yue He; Minyue Jiang; Xiaoqing Ye; Liang Du; Zhikang Zou; Wei Zhang; Xiao Tan; Errui Ding; |
| 189 | Beyond Geo-localization: Fine-grained Orientation of Street-view Images By Cross-view Matching with Satellite Imagery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we re-state the importance of finding fine-grained orientation for street-view images, formally define the problem and provide a set of evaluation metrics to assess the quality of the orientation estimation. |
Wenmiao Hu; Yichen Zhang; Yuxuan Liang; Yifang Yin; Andrei Georgescu; An Tran; Hannes Kruppa; See-Kiong Ng; Roger Zimmermann; |
| 190 | Deep Multi-Resolution Mutual Learning for Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current deep inpainting networks still tend to produce unreasonable structures and blurry textures due to the ill-posed properties of the task, i.e., image inpainting is still a challenging topic. In this paper, we therefore propose a novel deep multi-resolution mutual learning (DMRML) strategy, which can fully explore the information from various resolutions. |
Huan Zheng; Zhao Zhang; Haijun Zhang; Yi Yang; Shuicheng Yan; Meng Wang; |
| 191 | Dynamic Scene Graph Generation Via Temporal Prior Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Real-world videos are composed of complex actions with inherent temporal continuity (eg person-touching-bottle is usually followed by person-holding-bottle). In this work, we propose a novel method to mine such temporal continuity for dynamic scene graph generation (DSGG), namely Temporal Prior Inference (TPI). |
Shuang Wang; Lianli Gao; Xinyu Lyu; Yuyu Guo; Pengpeng Zeng; Jingkuan Song; |
| 192 | Unified Multimodal Model with Unlikelihood Training for Visual Dialog Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the likelihood objective often leads to frequent and dull outputs and fails to exploit the useful knowledge from negative instances (involving incorrect answers). In this paper, we propose a Unified Multimodal Model with UnLikelihood Training, named UniMM-UL, to tackle this problem. |
Zihao Wang; Junli Wang; Changjun Jiang; |
| 193 | Graph Reasoning Transformer for Image Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Graph Reasoning Transformer (GReaT) for image parsing to enable image patches to interact following a relation reasoning pattern. |
Dong Zhang; Jinhui Tang; Kwang-Ting Cheng; |
| 194 | Exposure-Consistency Representation Learning for Exposure Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The key to addressing this issue is consistently learning underexposure and overexposure corrections. To achieve this goal, we propose an Exposure-Consistency Processing (ECP) module to consistently learn the representation of both underexposure and overexposure in the feature space. |
Jie Huang; Man Zhou; Yajing Liu; Mingde Yao; Feng Zhao; Zhiwei Xiong; |
| 195 | Cloud2Sketch: Augmenting Clouds with Imaginary Sketches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an interesting task that augments clouds in the sky with imagined sketches. |
Zhaoyi Wan; Dejia Xu; Zhangyang Wang; Jian Wang; Jiebo Luo; |
| 196 | Physical Backdoor Attacks to Lane Detection Systems in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we introduce two attack methodologies (poison-annotation and clean-annotation) to generate poisoned samples. |
Xingshuo Han; Guowen Xu; Yuan Zhou; Xuehuan Yang; Jiwei Li; Tianwei Zhang; |
| 197 | Multi-Modal Experience Inspired AI Creation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, in reality, humans usually make creations according to their experiences, which may involve different modalities and be sequentially correlated. To model such human capabilities, in this paper, we define and solve a novel AI creation problem based on human experiences. |
Qian Cao; Xu Chen; Ruihua Song; Hao Jiang; Guang Yang; Zhao Cao; |
| 198 | Universal Domain Adaptive Object Detector Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose US-DAF, namely Universal Scale-Aware Domain Adaptive Faster RCNN with Multi-Label Learning, to reduce the negative transfer effect during training while maximizing transferability as well as discriminability in both domains under a variety of scales. |
Wenxu Shi; Lei Zhang; Weijie Chen; Shiliang Pu; |
| 199 | UDoc-GAN: Unpaired Document Illumination Correction with Background Light Prior Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose UDoc-GAN, the first framework to address the problem of document illumination correction under the unpaired setting. |
Yonghui Wang; Wengang Zhou; Zhenbo Lu; Houqiang Li; |
| 200 | Dual Part Discovery Network for Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel Dual Part Discovery Network (DPDN) that considers both attribute and category discriminative information by discovering attribute-guided parts and category-guided parts simultaneously to improve knowledge transfer. |
Jiannan Ge; Hongtao Xie; Shaobo Min; Pandeng Li; Yongdong Zhang; |
| 201 | Transductive Aesthetic Preference Propagation for Personalized Image Aesthetics Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a fixed fine-tuning strategy may cause under/over-fitting on limited personal data and it also brings additional training cost. To alleviate these issues, we employ a meta learning-based Transductive Aesthetic Preference Propagation (TAPP-PIAA) algorithm under regression manner to substitute the fine-tuning strategy. |
Yaohui Li; Yuzhe Yang; Huaxiong Li; Haoxing Chen; Liwu Xu; Leida Li; Yaqian Li; Yandong Guo; |
| 202 | Task-adaptive Spatial-Temporal Video Sampler for Few-shot Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, existing frame sampling strategies may omit critical action information in temporal and spatial dimensions, which further impacts video utilization efficiency. In this paper, we propose a novel video frame sampler for few-shot action recognition to address this issue, where task-specific spatial-temporal frame sampling is achieved via a temporal selector (TS) and a spatial amplifier (SA). |
Huabin Liu; Weixian Lv; John See; Weiyao Lin; |
| 203 | Learning An Inference-accelerated Network from A Pre-trained Model with Frequency-enhanced Feature Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to accelerate the inference time of CNNs, in this paper, we propose to resize the low-level and middle-level feature maps to smaller scales to reduce the spatial computation costs of CNNs. |
Xuesong Niu; Jili Gu; Guoxin Zhang; Pengfei Wan; Zhongyuan Wang; |
| 204 | Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, with intensive pilot experiments, we quantify the objects located at different regions and find that the truncated instances” (i.e., at the border regions of each image) are the main bottleneck hindering the performance of DETR3D. |
Zehui Chen; Zhenyu Li; Shiquan Zhang; Liangji Fang; Qinhong Jiang; Feng Zhao; |
| 205 | Curriculum-NAS: Curriculum Weight-Sharing Neural Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Contrastively, in this paper, we empirically discover that different data samples have different influences on architectures, e.g., some data samples are easy to fit by certain architectures but hard by others. |
Yuwei Zhou; Xin Wang; Hong Chen; Xuguang Duan; Chaoyu Guan; Wenwu Zhu; |
| 206 | Rethinking The Reference-based Distinctive Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we design a two-stage matching mechanism, which strictly controls the similarity between the target and reference images at object-/attribute- level (vs. scene-level). |
Yangjun Mao; Long Chen; Zhihong Jiang; Dong Zhang; Zhimeng Zhang; Jian Shao; Jun Xiao; |
| 207 | Imitated Detectors: Stealing Knowledge of Black-box Object Detectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we treat it as black-box knowledge distillation and propose a teacher-student framework named Imitated Detector to transfer the knowledge of the victim model to the imitated model. |
Siyuan Liang; Aishan Liu; Jiawei Liang; Longkang Li; Yang Bai; Xiaochun Cao; |
| 208 | Hierarchical Walking Transformer for Object Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, transformer purely based on attention mechanism has been applied to a wide range of tasks and achieved impressive performance. Though extensive efforts have been made, there are still drawbacks to the transformer architecture which hinder its further applications: (i) the quadratic complexity brought by attention mechanism; (ii) barely incorporated inductive bias.In this paper, we present a new hierarchical walking attention, which provides a scalable, flexible, and interpretable sparsification strategy to reduce the complexity from quadratic to linear, and meanwhile evidently boost the performance. |
Xudong Tian; Jun Liu; Zhizhong Zhang; Chengjie Wang; Yanyun Qu; Yuan Xie; Lizhuang Ma; |
| 209 | DualSign: Semi-Supervised Sign Language Production with Balanced Multi-Modal Multi-Task Dual Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, to reduce reliance on gloss annotations in two-staged approaches, we propose DualSign, a semi-supervised two-staged SLP framework, which can effectively utilize partially gloss-annotated text-pose pairs and monolingual gloss data. |
Wencan Huang; Zhou Zhao; Jinzheng He; Mingmin Zhang; |
| 210 | Maze: A Cost-Efficient Video Deduplication System at Web-scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The duplicate contents are more likely discovered as clips inside the videos, demanding processing techniques with close attention to details.To address above-mentioned issues, we propose Maze, a full-fledged video deduplication system. |
An Qin; Mengbai Xiao; Ben Huang; Xiaodong Zhang; |
| 211 | Pay Attention to Your Positive Pairs: Positive Pair Aware Contrastive Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Positive pair Aware Contrastive Knowledge Distillation (PACKD) framework to extend the contrastive distillation with more positive pairs to capture more abundant knowledge from the teacher. |
Zhipeng Yu; Qianqian Xu; Yangbangyan Jiang; Haoyu Qin; Qingming Huang; |
| 212 | HERO: HiErarchical Spatio-tempoRal ReasOning with Contrastive Action Correspondence for End-to-End Video Object Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is a challenging vision-language task that necessitates constructing the correct cross-modal correspondence and modeling the appropriate spatio-temporal context of the query video and caption, thereby localizing the specific objects accurately. In this paper, we tackle this task by a novel framework called HiErarchical spatio-tempoRal reasOning (HERO) with contrastive action correspondence. |
Mengze Li; Tianbao Wang; Haoyu Zhang; Shengyu Zhang; Zhou Zhao; Wenqiao Zhang; Jiaxu Miao; Shiliang Pu; Fei Wu; |
| 213 | Semi-supervised Crowd Counting Via Density Agency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new agency-guided semi-supervised counting approach. |
Hui Lin; Zhiheng Ma; Xiaopeng Hong; Yaowei Wang; Zhou Su; |
| 214 | LVI-ExC: A Target-free LiDAR-Visual-Inertial Extrinsic Calibration Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose LVI-ExC, an integrated LiDAR-Visual-Inertial Extrinsic Calibration framework, which takes natural multi-modal data as input and yields sensor-to-sensor extrinsics end-to-end without any auxiliary object (site) or manual assistance. |
Zhong Wang; Lin Zhang; Ying Shen; Yicong Zhou; |
| 215 | SGINet: Toward Sufficient Interaction Between Single Image Deraining and Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Semantic Guided Interactive Network (SGINet), which considers the sufficient interaction between SID and semantic segmentation using a three-stage deraining manner, i.e., coarse deraining, semantic information extraction, and semantics guided deraining. |
Yanyan Wei; Zhao Zhang; Huan Zheng; Richang Hong; Yi Yang; Meng Wang; |
| 216 | ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal Fashion Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To facilitate accurate generation, cross-modal synthesis methods typically rely on Contrastive Language-Image Pre-training (CLIP) to align textual and garment information. In this work, we argue that simply aligning texture and garment information is not sufficient to capture the semantics of the visual information and therefore propose MaskCLIP. |
Xujie Zhang; Yu Sha; Michael C. Kampffmeyer; Zhenyu Xie; Zequn Jie; Chengwen Huang; Jianqing Peng; Xiaodan Liang; |
| 217 | AesUST: Towards Aesthetic-Enhanced Universal Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing approaches suffer from the aesthetic-unrealistic problem that introduces disharmonious patterns and evident artifacts, making the results easy to spot from real paintings. To address this limitation, we propose AesUST, a novel Aesthetic-enhanced Universal Style Transfer approach that can generate aesthetically more realistic and pleasing results for arbitrary styles. |
Zhizhong Wang; Zhanjie Zhang; Lei Zhao; Zhiwen Zuo; Ailin Li; Wei Xing; Dongming Lu; |
| 218 | Approximate Shifted Laplacian Reconstruction for Multiple Kernel Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, negatively, the main challenging is that the kernel matrix with the size n x n leads to O(n2) memory complexity and O(n3) computational complexity. To mitigate such a challenging, taking graph Laplacian as breakthrough, this paper proposes a novel and simple MKC method, dubbed as approximate shifted Laplacian reconstruction (ASLR). |
Jiali You; Zhenwen Ren; Quansen Sun; Yuan Sun; Xingfeng Li; |
| 219 | GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, in this paper we propose a novel two-stage framework that focuses on utilizing such bidirectional relations within verbs and roles. |
Zhi-Qi Cheng; Qi Dai; Siyao Li; Teruko Mitamura; Alexander Hauptmann; |
| 220 | Domain-Specific Conditional Jigsaw Adaptation for Enhancing Transferability and Discriminability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the transferability or discriminability lackage of the traditional methods results in the limited ability to generalize on the target domain. To remedy this issue, a novel unsupervised domain adaptation framework called Domain-specific Conditional Jigsaw Adaptation Network (DCJAN) is proposed for UDA, which simultaneously encourages the network to extract transferable and discriminative features. |
Qi He; Zhaoquan Yuan; Xiao Wu; Jun-Yan He; |
| 221 | GroupDancer: Music to Multi-People Dance Synthesis with Style Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on MDC, we present a novel framework, GroupDancer, consisting of three stages: Dancer Collaboration, Motion Choreography and Motion Transition. |
Zixuan Wang; Jia Jia; Haozhe Wu; Junliang Xing; Jinghe Cai; Fanbo Meng; Guowen Chen; Yanfeng Wang; |
| 222 | DoF-NeRF: Depth-of-Field Meets Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This limits their applicability as images captured from the real world often have finite depth-of-field (DoF). To mitigate this issue, we introduce DoF-NeRF, a novel neural rendering approach that can deal with shallow DoF inputs and can simulate DoF effect. |
Zijin Wu; Xingyi Li; Juewen Peng; Hao Lu; Zhiguo Cao; Weicai Zhong; |
| 223 | Learning-Based Video Coding with Joint Deep Compression and Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an end-to-end deep video codec with jointly optimized compression and enhancement modules (JCEVC). |
Tiesong Zhao; Weize Feng; HongJi Zeng; Yiwen Xu; Yuzhen Niu; Jiaying Liu; |
| 224 | Learnability Enhancement for Low-light Raw Denoising: Where Paired Real Data Meets Noise Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the limited data volume and complicated noise distribution have constituted a learnability bottleneck for paired real data, which limits the denoising performance of learning-based methods. To address this issue, we present a learnability enhancement strategy to reform paired real data according to noise modeling. |
Hansen Feng; Lizhi Wang; Yuzhi Wang; Hua Huang; |
| 225 | FMNet: Frequency-Aware Modulation Network for SDR-to-HDR Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the Discrete Cosine Transform (DCT), in this paper, we propose a Frequency-aware Modulation Network (FMNet) to enhance the contrast in a frequency-adaptive way for SDR-to-HDR translation. |
Gang Xu; Qibin Hou; Le Zhang; Ming-Ming Cheng; |
| 226 | CharFormer: A Glyph Fusion Based Attentive Framework for High-precision Character Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel generic framework based on glyph fusion and attention mechanisms, i.e., CharFormer, for precisely recovering character images without changing their inherent glyphs. |
Daqian Shi; Xiaolei Diao; Lida Shi; Hao Tang; Yang Chi; Chuntao Li; Hao Xu; |
| 227 | RCRN: Real-world Character Image Restoration Network Via Skeleton Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: There are limitations when applying current image restoration methods to such real-world character images, since (i) the categories of noise in character images are different from those in general images; (ii) real-world character images usually contain more complex image degradation, e.g., mixed noise at different noise levels. To address these problems, we propose a real-world character restoration network (RCRN) to effectively restore degraded character images, where character skeleton information and scale-ensemble feature extraction are utilized to obtain better restoration performance. |
Daqian Shi; Xiaolei Diao; Hao Tang; Xiaomin Li; Hao Xing; Hao Xu; |
| 228 | Hierarchical Graph Embedded Pose Regularity Learning Via Spatio-Temporal Transformer for Abnormal Behavior Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a hierarchical graph embedded pose regularity learning framework via spatio-temporal transformer, which leverages the strength of graph representation in encoding strongly-structured skeleton feature. |
Chao Huang; Yabo Liu; Zheng Zhang; Chengliang Liu; Jie Wen; Yong Xu; Yaowei Wang; |
| 229 | Towards Robust Video Object Segmentation with Adaptive Object Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Accordingly, we propose a new deep network, which can adaptively construct object representations and calibrate object masks to achieve stronger robustness. |
Xiaohao Xu; Jinglu Wang; Xiang Ming; Yan Lu; |
| 230 | Consistency-Contrast Learning for Conceptual Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: As an emerging compression scheme, conceptual coding usually encodes images into structural and textural representations and decodes them in a deep synthesis fashion. However, … |
Jianhui Chang; Jian Zhang; Youmin Xu; Jiguo Li; Siwei Ma; Wen Gao; |
| 231 | Everything Is There in Latent Space: Attribute Editing and Attribute Style Manipulation By StyleGAN Latent Space Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel sampling method to sample latent from the manifold, enabling us to generate a diverse set of attribute styles beyond the styles present in the training set. |
Rishubh Parihar; Ankit Dhiman; Tejan Karmali; Venkatesh R; |
| 232 | Face Forgery Detection Via Symmetric Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a symmetric transformer for channel and spatial feature extraction, which is because the channel and spatial features of a robust forgery detector should be consistent in the temporal domain. |
Luchuan Song; Xiaodan Li; Zheng Fang; Zhenchao Jin; YueFeng Chen; Chenliang Xu; |
| 233 | In-N-Out Generative Learning for Dense Unsupervised Video Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on unsupervised learning for Video Object Segmentation (VOS) which learns visual correspondence (i.e., the similarity between pixel-level features) from unlabeled videos. |
Xiao Pan; Peike Li; Zongxin Yang; Huiling Zhou; Chang Zhou; Hongxia Yang; Jingren Zhou; Yi Yang; |
| 234 | Neighbor Correspondence Matching for Flow-based Video Frame Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing methods cannot handle small objects or large motion well, especially in high-resolution videos such as 4K videos. To eliminate such limitations, we introduce a neighbor correspondence matching (NCM) algorithm for flow-based frame synthesis. |
Zhaoyang Jia; Yan Lu; Houqiang Li; |
| 235 | D2Animator: Dual Distillation of StyleGAN For High-Resolution Face Animation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we devise a dual distillation strategy (D2Animator) for generating animated high-resolution face videos conditioned on identities and poses from different images. |
Zhuo Chen; Chaoyue Wang; Haimei Zhao; Bo Yuan; Xiu Li; |
| 236 | Reducing The Vision and Language Bias for Temporal Sentence Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although previous TSG methods have achieved decent performance, they tend to capture the selection biases of frequently appeared video-query pairs in the dataset rather than present robust multimodal reasoning abilities, especially for the rarely appeared pairs. In this paper, we study the above issue of selection biases and accordingly propose a Debiasing-TSG (D-TSG) model to filter and remove the negative biases in both vision and language modalities for enhancing the model generalization ability. |
Daizong Liu; Xiaoye Qu; Wei Hu; |
| 237 | Skimming, Locating, Then Perusing: A Human-Like Framework for Natural Language Video Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, inspired by how humans perceive and localize a segment, we propose a two-step human-like framework called Skimming-Locating-Perusing (SLP). |
Daizong Liu; Wei Hu; |
| 238 | Image-Text Matching with Fine-Grained Relational Dependency and Bidirectional Attention-Based Generative Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the same time, it is usually ignored that the mutual transformation between modalities also facilitates the embedding of modalities. Given these problems, we propose a method called BiKA (Bidirectional Knowledge-assisted embedding and Attention-based generation). |
Jianwei Zhu; Zhixin Li; Yufei Zeng; Jiahui Wei; Huifang Ma; |
| 239 | Pixel-Level Anomaly Detection Via Uncertainty-aware Prototypical Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an uncertainty-aware prototypical transformer (UPformer), which takes into account both the diversity and uncertainty of anomaly to achieve accurate pixel-level visual anomaly detection. |
Chao Huang; Chengliang Liu; Zheng Zhang; Zhihao Wu; Jie Wen; Qiuping Jiang; Yong Xu; |
| 240 | Generating Transferable Adversarial Examples Against Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing adversarial methods suffer from weak transferable attacking ability due to the overlook of these architectural features. To address the problem, we propose an Architecture-oriented Transferable Attacking (ATA) framework to generate transferable adversarial examples by activating the uncertain attention and perturbing the sensitive embedding.Specifically, we first locate the patch-wise attentional regions that mostly affect model perception, therefore intensively activating the uncertainty of the attention mechanism and confusing the model decisions in turn.Furthermore, we search the pixel-wise attacking positions that are more likely to derange the embedded tokens using sensitive embedding perturbation, which could serve as a strong transferable attacking pattern.By jointly confusing the unique yet widely-used architectural features among transformer-based models, we can activate strong attacking transferability among diverse ViTs. |
Yuxuan Wang; Jiakai Wang; Zixin Yin; Ruihao Gong; Jingyi Wang; Aishan Liu; Xianglong Liu; |
| 241 | Background Layout Generation and Object Knowledge Transfer for Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a close inspection of their generated images shows two major limitations: 1) the background (e.g., fence, lake) of the generated image with the complicated, real-world scene tends to be unrealistic; 2) the object (e.g., elephant, zebra) in the generated image often presents highly distorted shape or key parts missing. To address these limitations, we propose a two-stage T2I approach, where the first stage redesigns the text-to-layout process to incorporate the background layout with the existing object layout, the second stage transfers the object knowledge from an existing class-to-image model to the layout-to-image process to improve the object fidelity. |
Zhuowei Chen; Zhendong Mao; Shancheng Fang; Bo Hu; |
| 242 | Neural Network Model Protection with Piracy Identification and Tampering Localization Capability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a model hash generator method to protect neural network models. |
Cheng Xiong; Guorui Feng; Xinran Li; Xinpeng Zhang; Chuan Qin; |
| 243 | Estimation of Reliable Proposal Quality for Temporal Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper reveals the temporal misalignment between the two tasks hindering further progress. To address this, we propose a new method that gives insights into moment and region perspectives simultaneously to align the two tasks by acquiring reliable proposal quality. |
Junshan Hu; Chaoxu Guo; Liansheng Zhuang; Biao Wang; Tiezheng Ge; Yuning Jiang; Houqiang Li; |
| 244 | Adaptive Camera Margin for Mask-guided Domain Adaptive Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, how to reduce the intra-domain variations and narrow the inter-domain gaps is far from solved and remains a challenging problem under this framework. In this paper, we address these issues from two aspects. |
Rui Wang; Feng Chen; Jun Tang; Pu Yan; |
| 245 | Normalization-based Feature Selection and Restitution for Pan-sharpening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The commonly challenging issue of pan-sharpening is how to correctly select consistent features and propagate them, and properly handle inconsistent ones between PAN and MS modalities. To solve this issue, we propose a Normalization-based Feature Selection and Restitution mechanism, which is capable of filtering out the inconsistent features and promoting to learn the consistent ones. |
Man Zhou; Jie Huang; Keyu Yan; Gang Yang; Aiping Liu; Chongyi Li; Feng Zhao; |
| 246 | Adaptively Learning Low-high Frequency Information Integration for Pan-sharpening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel pan-sharpening framework by adaptively learning low-high frequency information integration in the spatial and frequency dual domains. |
Man Zhou; Jie Huang; Chongyi Li; Hu Yu; Keyu Yan; Naishan Zheng; Feng Zhao; |
| 247 | Learn to Understand Negation in Video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a learning based method for training a negation-aware video retrieval model. |
Ziyue Wang; Aozhu Chen; Fan Hu; Xirong Li; |
| 248 | Semantic Data Augmentation Based Distance Metric Learning for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Domain generalization (DG) aims to learn a model on one or more different but related source domains that could be generalized into an unseen target domain. |
Mengzhu Wang; Jianlong Yuan; Qi Qian; Zhibin Wang; Hao Li; |
| 249 | Learning for Motion Deblurring with Hybrid Frames and Events Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, inspired by the two-pathway visual system, a novel dual-stream based framework is proposed for motion deblurring (DS-Deblur), which flexibly utilizes the respective advantages from frame and event. |
Wen Yang; Jinjian Wu; Jupo Ma; Leida Li; Weisheng Dong; Guangming Shi; |
| 250 | Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a systematic and effective approach to enhance Med-VLP by structured medical knowledge from three perspectives. |
Zhihong Chen; Guanbin Li; Xiang Wan; |
| 251 | From Abstract to Details: A Generative Multimodal Fusion Framework for Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, these concatenation-based models treat all modalities equally for each user and overlook the fact that users tend to pay unequal attention to information of various modalities when browsing items in the real scenario. To address the above issues, this paper proposes a generative multimodal fusion framework (GMMF) for CTR prediction task. |
Fangxiong Xiao; Lixi Deng; Jingjing Chen; Houye Ji; Xiaorui Yang; Zhuoye Ding; Bo Long; |
| 252 | TxVAD: Improved Video Action Detection By Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a conceptually simple paradigm for video action detection using Transformers, which effectively removes the need for specialized components and achieves superior performance. |
Zhenyu Wu; Zhou Ren; Yi Wu; Zhangyang Wang; Gang Hua; |
| 253 | SER30K: A Large-Scale Dataset for Sticker Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since the characteristics in stickers from the same theme are similar, we can only accurately predict the emotion by capturing the local information (e.g., expressions, poses) and understanding the global information (e.g., relations among objects). To tackle this challenge, we propose a LOcal Re-Attention multimodal network (LORA) to learn sticker emotions in an end-to-end manner. |
Shengzhe Liu; Xin Zhang; Jufeng Yang; |
| 254 | Adaptive Mixture of Experts Learning for Generalizable Face Anti-Spoofing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they neglect individual source domains’ discriminative characteristics and diverse domain-specific information of the unseen domains, and the trained model is not sufficient to be adapted to various unseen domains. To address this issue, we propose an Adaptive Mixture of Experts Learning (AMEL) framework, which exploits the domain-specific information to adaptively establish the link among the seen source domains and unseen target domains to further improve the generalization. |
Qianyu Zhou; Ke-Yue Zhang; Taiping Yao; Ran Yi; Shouhong Ding; Lizhuang Ma; |
| 255 | Adjustable Memory-efficient Image Super-resolution Via Individual Kernel Sparsity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an individual kernel sparsity (IKS) method for memory-efficient and sparsity-adjustable image SR to aid deep network deployment in memory-limited devices. |
Xiaotong Luo; Mingliang Dai; Yulun Zhang; Yuan Xie; Ding Liu; Yanyun Qu; Yun Fu; Junping Zhang; |
| 256 | ChartStamp: Robust Chart Embedding for Real-World Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ChartStamp, the first chart embedding method that is robust to real-world printing and displaying (printed on paper and displayed on screen, respectively, and then captured with a camera) while maintaining a good perceptual quality. |
Jiayun Fu; Bin B. Zhu; Haidong Zhang; Yayi Zou; Song Ge; Weiwei Cui; Yun Wang; Dongmei Zhang; Xiaojing Ma; Hai Jin; |
| 257 | Situational Perception Guided Image Matting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Situational Perception Guided Image Matting (SPG-IM) method that mitigates subjective bias of matting annotations and captures sufficient situational perception information for better global saliency distilled from the visual-to-textual task. |
Bo Xu; Jiake Xie; Han Huang; Ziwen Li; Cheng Lu; Yong Tang; Yandong Guo; |
| 258 | Distilling Resolution-robust Identity Knowledge for Texture-Enhanced Face Hallucination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a texture and identity integration network (TIIN) to effectively incorporate identity information into face hallucination tasks. |
Qiqi Bao; Rui Zhu; Bowen Gang; Pengyang Zhao; Wenming Yang; Qingmin Liao; |
| 259 | Learning Smooth Representation for Multi-view Subspace Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to achieve a smooth representation for each view and thus facilitate the downstream clustering task. |
Shudong Huang; Yixi Liu; Yazhou Ren; Ivor W. Tsang; Zenglin Xu; Jiancheng Lv; |
| 260 | Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a Transformer-based double-helix model, namely HelixFormer, to achieve the cross-image object semantic relation mining in a bidirectional and symmetrical manner. |
Bo Zhang; Jiakang Yuan; Baopu Li; Tao Chen; Jiayuan Fan; Botian Shi; |
| 261 | Cycle-Interactive Generative Adversarial Network for Robust Unsupervised Low-Light Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Herein, we propose a novel Cycle-Interactive Generative Adversarial Network (CIGAN) for unsupervised low-light image enhancement, which is capable of not only better transferring illumination distributions between low/normal-light images but also manipulating detailed signals between two domains, e.g., suppressing/synthesizing realistic noise in the cyclic enhancement/degradation process. |
Zhangkai Ni; Wenhan Yang; Hanli Wang; Shiqi Wang; Lin Ma; Sam Kwong; |
| 262 | PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the two-stage pipeline usually suffers from the performance limitation of low-quality proposals in the first stage and the loss of spatial details caused by region feature pooling, as well as complicated strategies designed for things and stuff categories separately. To alleviate these drawbacks, we propose a one-stage end-to-end Pixel-Phrase Matching Network (PPMN), which directly matches each phrase to its corresponding pixels instead of region proposals and outputs panoptic segmentation by simple combination. |
Zihan Ding; Zi-han Ding; Tianrui Hui; Junshi Huang; Xiaoming Wei; Xiaolin Wei; Si Liu; |
| 263 | Set-Based Face Recognition Beyond Disentanglement: Burstiness Suppression With Variance Vocabulary Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus we propose to separate the identity features with the variance features in a light-weighted set-based disentanglement framework.Beyond disentanglement, the variance features are fully utilized to indicate face quality and burstiness in a set, rather than being discarded after training. |
Jiong Wang; Zhou Zhao; Fei Wu; |
| 264 | T-former: An Efficient Transformer for Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion. |
Ye Deng; Siqi Hui; Sanping Zhou; Deyu Meng; Jinjun Wang; |
| 265 | Adaptive Affine Transformation: A Simple and Effective Operation for Spatial Misaligned Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Different from those dense flow based methods, we propose one simple but effective operator named AdaAT (Adaptive Affine Transformation) to realize misaligned image generation. |
Zhimeng Zhang; Yu Ding; |
| 266 | Inferential Visual Question Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Other methods rely heavily on elaborate but expensive artificial preprocessing to generate. To overcome these limitations, we propose a method to generate inferential questions from the image with noisy captions. |
Chao Bi; Shuhui Wang; Zhe Xue; Shengbo Chen; Qingming Huang; |
| 267 | SIR-Former: Stereo Image Restoration Using Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing methods adopt convolutional neural networks to align the views and fuse the information locally, which has difficulty in capturing the global correspondence across stereo images for view alignment and makes it hard to integrate the long-term information across views. In this paper, we propose to address the stereo image restoration with transformer by leveraging its powerful capability of modeling long-range context dependencies. |
Zizheng Yang; Mingde Yao; Jie Huang; Man Zhou; Feng Zhao; |
| 268 | Cross-Compatible Embedding and Semantic Consistent Feature Construction for Sketch Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the idea of transplantation without rejection, we propose a Cross-Compatible Embedding (CCE) approach to narrow the gap. |
Yafei Zhang; Yongzeng Wang; Huafeng Li; Shuang Li; |
| 269 | DisCo: Disentangled Implicit Content and Rhythm Learning for Diverse Co-Speech Gestures Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This imbalance results in the difficulty of generating low frequent occurrence motions and it cannot be easily solved by resampling, due to the inherent many-to-many mapping between content and rhythm. Therefore, we present DisCo, which disentangles motion into implicit content and rhythm features by contrastive loss for adopting different data balance strategies. |
Haiyang Liu; Naoya Iwamoto; Zihao Zhu; Zhengqing Li; You Zhou; Elif Bozkurt; Bo Zheng; |
| 270 | Quality Assessment of Image Super-Resolution: Balancing Deterministic and Statistical Fidelity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we observe an interesting trend from more traditional SR algorithms that are typically inclined to optimize for DF while losing SF, to more recent generative adversarial network (GAN) based approaches that by contrast exhibit strong advantages in achieving high SF but sometimes appear weak at maintaining DF. |
Wei Zhou; Zhou Wang; |
| 271 | ICNet: Joint Alignment and Reconstruction Via Iterative Collaboration for Video Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel many-to-many VSR framework with Iterative Collaboration (ICNet), which employs the concurrent operation by iterative collaboration between alignment and reconstruction proving to be more efficient and effective than existing recurrent and sliding-window frameworks. |
Jiaxu Leng; Jia Wang; Xinbo Gao; Bo Hu; Ji Gan; Chenqiang Gao; |
| 272 | Interact with Open Scenes: A Life-long Evolution Framework for Interactive Segmentation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: But most existing methods overlooked the generalization ability of their models when witnessing new target scenes. To overcome this problem, we propose a life-long evolution framework for interactive models in this paper, which provides a possible solution for dealing with dynamic target scenes with one single model. |
Ruitong Gan; Junsong Fan; Yuxi Wang; Zhaoxiang Zhang; |
| 273 | REMOT: A Region-to-Whole Framework for Realistic Human Motion Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods always generate obvious artifacts due to the dramatic differences in poses, scales, and shifts between the source person and the driving person. To overcome these challenges, this paper presents a novel REgion-to-whole human MOtion Transfer (REMOT) framework based on GANs. |
Quanwei Yang; Xinchen Liu; Wu Liu; Hongtao Xie; Xiaoyan Gu; Lingyun Yu; Yongdong Zhang; |
| 274 | Finding The Host from The Lesion By Iteratively Mining The Registration Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates an interesting problem that finds the host organ of a lesion without actually labeling the organ. |
Zijie Yang; Lingxi Xie; Xinyue Huo; Sheng Tang; Qi Tian; Yongdong Zhang; |
| 275 | MMDV: Interpreting DNNs Via Building Evaluation Metrics, Manual Manipulation and Decision Visualization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents Manual Manipulation and Decision Visualization (MMDV) which makes Human-in-the-loop improve the interpretability of deep neural networks. |
Keyang Cheng; Yu Si; Hao Zhou; Rabia Tahir; |
| 276 | Few-shot Image Generation Using Discrete Content Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we make the first attempt to adapt few-shot image translation method to few-shot image generation task. |
Yan Hong; Li Niu; Jianfu Zhang; Liqing Zhang; |
| 277 | CariPainter: Sketch Guided Interactive Caricature Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose CariPainter, the first interactive caricature generating and editing method. |
Xin Huang; Dong Liang; Hongrui Cai; Juyong Zhang; Jinyuan Jia; |
| 278 | Target-Driven Structured Transformer Planner for Vision-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this article, we propose a Target-Driven Structured Transformer Planner (TD-STP) for long-horizon goal-guided and room layout-aware navigation. |
Yusheng Zhao; Jinyu Chen; Chen Gao; Wenguan Wang; Lirong Yang; Haibing Ren; Huaxia Xia; Si Liu; |
| 279 | Modality Eigen-Encodings Are Keys to Open Modality Informative Containers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Differently, we propose a novel scheme for multi-modal fusion named Vision Language Interaction (VLI). |
Yiyuan Zhang; Yuqi Ji; |
| 280 | Self-Paced Label Distribution Learning for In-The-Wild Facial Expression Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a play-and-plug method of self-paced label distribution learning (SPLDL) for in-the-wild FER. |
Jianjian Shao; Zhenqian Wu; Yuanyan Luo; Shudong Huang; Xiaorong Pu; Yazhou Ren; |
| 281 | Learning Parallax Transformer Network for Stereo Image JPEG Artifacts Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel parallax transformer network (PTNet) to integrate the information from stereo image pairs for stereo image JPEG artifacts removal. |
Xuhao Jiang; Weimin Tan; Ri Cheng; Shili Zhou; Bo Yan; |
| 282 | RPPformer-Flow: Relative Position Guided Point Transformer for Scene Flow Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide a full transformer based solution for scene flow estimation. |
Hanlin Li; Guanting Dong; Yueyi Zhang; Xiaoyan Sun; Zhiwei Xiong; |
| 283 | Hybrid Conditional Deep Inverse Tone Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we try to tackle the conversion task from SDR to HDR-WCG for media contents and consumer displays. |
Tong Shao; Deming Zhai; Junjun Jiang; Xianming Liu; |
| 284 | TVFormer: Trajectory-guided Visual Quality Assessment on 360° Images with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Transformer-based approach for trajectory-guided VQA on 360° images (named TVFormer), in which both the tasks of head trajectory prediction and BVQA can be accomplished for 360° images. |
Li Yang; Mai Xu; Tie Liu; Liangyu Huo; Xinbo Gao; |
| 285 | DS-MVSNet: Unsupervised Multi-view Stereo Via Depth Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the DS-MVSNet, an end-to-end unsupervised MVS structure with the source depths synthesis. |
Jingliang Li; Zhengda Lu; Yiqun Wang; Ying Wang; Jun Xiao; |
| 286 | CLOP: Video-and-Language Pre-Training with Knowledge Regularizations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There are related works that propose object-aware approaches to inject similar knowledge as inputs. |
Guohao Li; Hu Yang; Feng He; Zhifan Feng; Yajuan Lyu; Hua Wu; Haifeng Wang; |
| 287 | Patch-based Knowledge Distillation for Lifelong Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Similar to other lifelong learning tasks, it severely suffers from the so-called catastrophic forgetting problem, which refers to the notable performance degradation on previously-seen data after adapting the model to some newly incoming data. |
Zhicheng Sun; Yadong MU; |
| 288 | Speech Fusion to Face: Bridging The Gap Between Human’s Vocal Characteristics and Facial Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing solutions to the problem of speech2face renders limited image quality and fails to preserve facial similarity due to the lack of quality dataset for training and appropriate integration of vocal features. In this paper, we investigate these key technical challenges and propose Speech Fusion to Face, or SF2F in short, attempting to address the issue of facial image quality and the poor connection between vocal feature domain and modern image generation models. |
Yeqi BAI; Tao Ma; Lipo Wang; Zhenjie Zhang; |
| 289 | AdsCVLR: Commercial Visual-Linguistic Representation Modeling in Sponsored Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem of multi-modal modeling in sponsored search, which models the relevance between user query and commercial ads with multi-modal structured information. |
Yongjie Zhu; Chunhui Han; Yuefeng Zhan; Bochen Pang; Zhaoju Li; Hao Sun; Si Li; Boxin Shi; Nan Duan; Weiwei Deng; Ruofei Zhang; Liangjie Zhang; Qi Zhang; |
| 290 | DEAL: An Unsupervised Domain Adaptive Framework for Graph-level Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel GNN framework named DEAL by incorporating both source graphs and target graphs, which is featured by two modules, i.e., adversarial perturbation and pseudo-label distilling. |
Nan Yin; Li Shen; Baopu Li; Mengzhu Wang; Xiao Luo; Chong Chen; Zhigang Luo; Xian-Sheng Hua; |
| 291 | Cross-modal Semantic Alignment Pre-training for Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the significant progress achieved by large-scale pre-training methods, in this paper, we propose CSAP, a new method of Cross-modal Semantic Alignment Pre-training for Vision-and-Language Navigation. |
Siying Wu; Xueyang Fu; Feng Wu; Zheng-Jun Zha; |
| 292 | Structure-Inferred Bi-level Model for Underwater Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a Structural-Inferred Bi-level Model (SIBM) that incorporates different modalities of knowledge (i.e., semantic domain, gradient-domain, and pixel domain) hierarchically enhancing underwater images. |
Pan Mu; Haotian Qian; Cong Bai; |
| 293 | Where Are You Looking? A Large-Scale Dataset of Head and Gaze Behavior for 360-Degree Videos and A Pilot Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we first propose a quantitative taxonomy for 360° videos that contains three objective technical metrics. Based on this taxonomy, we collect a dataset containing users’ head and gaze behaviors simultaneously, which outperforms existing datasets with rich dimensions, large scale, strong diversity, and high frequency. |
Yili Jin; Junhua Liu; Fangxin Wang; Shuguang Cui; |
| 294 | Region-based Pixels Integration Mechanism for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To explore more pixel-level semantic information and recognize all pixels within the objects for segmentation, we propose a Region-based Pixels Integration Mechanism (RPIM) which discovers the intra-region and inter-region information. |
Chen Qian; Hui Zhang; |
| 295 | Mixed Supervision for Instance Learning in Object Detection with Few-shot Annotation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the mixed supervision instance learning (MSIL), as a novel MSOD framework to leverage a handful of instance-level annotations to provide more explicit and implicit supervision. |
Yi Zhong; Chengyao Wang; Shiyong Li; Zhu Zhou; Yaowei Wang; Wei-Shi Zheng; |
| 296 | DomainPlus: Cross Transform Domain Learning Towards High Dynamic Range Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a cross-transform domain neural network for efficient HDR imaging. |
Bolun Zheng; Xiaokai Pan; Hua Zhang; Xiaofei Zhou; Gregory Slabaugh; Chenggang Yan; Shanxin Yuan; |
| 297 | LFBCNet: Light Field Boundary-aware and Cascaded Interaction Network for Salient Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a light field boundary-aware and cascaded interaction network based on light field macro-EPI, named LFBCNet. |
Mianzhao Wang; Fan Shi; Xu Cheng; Meng Zhao; Yao Zhang; Chen Jia; Weiwei Tian; Shengyong Chen; |
| 298 | Structure- and Texture-Aware Learning for Low-Light Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing methods tend to learn the structure and texture of low-light images in a coupled manner, without well considering the heterogeneity between them, which challenges the capability of the model to learn both adequately. In this paper, we tackle this problem in a divide and conquer strategy, based on the observation that the structure and texture representations are highly separated in the frequency spectrum. |
Jinghao Zhang; Jie Huang; Mingde Yao; Man Zhou; Feng Zhao; |
| 299 | ScatterNet: Point Cloud Learning Via Scatters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ScatterNet, a novel 3D local feature learning approach for exploring and aggregating hypothetical scatters of the point clouds. |
Qi Liu; Nianjuan Jiang; Jiangbo Lu; Mingang Chen; Ran Yi; Lizhuang Ma; |
| 300 | PRO-Face: A Generic Framework for Privacy-preserving Recognizable Obfuscation of Face Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the privacy-utility challenge, we propose a novel, generic, effective, yet lightweight framework for Privacy-preserving Recognizable Obfuscation of Face images (named as PRO-Face). |
Lin Yuan; Linguo Liu; Xiao Pu; Zhao Li; Hongbo Li; Xinbo Gao; |
| 301 | Text’s Armor: Optimized Local Adversarial Perturbation Against Scene Text Editing Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to actively defeat text editing attacks by designing invisible armors for texts in the scene. |
Tao Xiang; Hangcheng Liu; Shangwei Guo; Hantao Liu; Tianwei Zhang; |
| 302 | Anomaly Warning: Learning and Memorizing Future Semantic Patterns for Unsupervised Ex-ante Potential Anomaly Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the ex-ante prediction ability of humans, we propose an unsupervised Ex-ante Potential Anomaly Prediction Network (EPAP-Net), which learns to build a semantic pool to memorize the normal semantic patterns of future frames for indirect anomaly prediction. |
Jiaxu Leng; Mingpi Tan; Xinbo Gao; Wen Lu; Zongyi Xu; |
| 303 | Feeling Without Sharing: A Federated Video Emotion Recognition Framework Via Privacy-Agnostic Hybrid Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate the heterogeneous data, we propose EmoFed, a practical framework of federated learning video-based emotion recognition via multi-group clustering and privacy-agnostic hybrid aggregation. |
Fan Qi; Zixin Zhang; Xianshan Yang; Huaiwen Zhang; Changsheng Xu; |
| 304 | Adaptive Transformer-Based Conditioned Variational Autoencoder for Incomplete Social Event Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: (2) The majority of existing multi-modal methods just simply concatenate the coarse-grained image features and text features of the event to get the multi-modal features to classify social events, which ignores the irrelevant multi-modal features and limits their modeling capabilities. To tackle these challenges, in this paper, we propose an Adaptive Transformer-Based Conditioned Variational Autoencoder Network (AT-CVAE) for incomplete social event classification. |
Zhangming Li; Shengsheng Qian; Jie Cao; Quan Fang; Changsheng Xu; |
| 305 | DetFusion: A Detection-driven Infrared and Visible Image Fusion Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For object detection tasks, object-related information in images is often more valuable than focusing on the pixel-level details of images alone. To fill this gap, we propose a detection-driven infrared and visible image fusion network, termed DetFusion, which utilizes object-related information learned in the object detection networks to guide multimodal image fusion. |
Yiming Sun; Bing Cao; Pengfei Zhu; Qinghua Hu; |
| 306 | CubeMLP: An MLP-based Model for Multimodal Sentiment Analysis and Depression Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce CubeMLP, a multimodal feature processing framework based entirely on MLP. |
Hao Sun; Hongyi Wang; Jiaqing Liu; Yen-Wei Chen; Lanfen Lin; |
| 307 | AtHom: Two Divergent Attentions Stimulated By Homomorphic Training in Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To generate semantically consistent high-resolution images, we propose a novel method named AtHom, in which two attention modules are developed to extract the relationships from both independent modality and unified modality. |
Zhenbo Shi; Zhi Chen; Zhenbo Xu; Wei Yang; Liusheng Huang; |
| 308 | Exploring Effective Knowledge Transfer for Few-shot Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these two distinct issues remain unsolved in most existing FSOD methods. In this paper, we propose to overcome these challenges by exploiting rich knowledge the model has learned and effectively transferring them to the novel classes. |
Zhiyuan Zhao; Qingjie Liu; Yunhong Wang; |
| 309 | Chinese Character Recognition with Augmented Character Profile Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Chinese character recognition method named Augmented Character Profile Matching (ACPM), which utilizes a collection of character knowledge from three decomposition levels to recognize Chinese characters. |
Xinyan Zu; Haiyang Yu; Bin Li; Xiangyang Xue; |
| 310 | Paint and Distill: Boosting 3D Object Detection with Semantic Passing Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models with the guidance of rich context painting, with no extra computation cost during inference. |
Bo Ju; Zhikang Zou; Xiaoqing Ye; Minyue Jiang; Xiao Tan; Errui Ding; Jingdong Wang; |
| 311 | Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, this paper proposes a powerful entropy model which efficiently captures both spatial and temporal dependencies. |
Jiahao Li; Bin Li; Yan Lu; |
| 312 | Learning Hybrid Behavior Patterns for Multimedia Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: And it is the lack of different behavior pattern constraints and multimodal feature reconciliations that results in performance degradation. Towards this end, We propose a Hybrid Clustering Graph Convolutional Network (HCGCN) for multimedia recommendation. |
Zongshen Mu; Yueting Zhuang; Jie Tan; Jun Xiao; Siliang Tang; |
| 313 | MF-Net: A Novel Few-shot Stylized Multilingual Font Generation Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a solution for few-shot multilingual stylized font generation by a fast feed-forward network, Multilingual Font Generation Network (MF-Net), which can transfer previously unseen font styles from a few samples to characters from previously unseen languages. |
Yufan Zhang; Junkai Man; Peng Sun; |
| 314 | DPCNet: Dual Path Multi-Excitation Collaborative Network for Facial Expression Representation Learning in Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current works of facial expression learning in video consume significant computational resources to learn spatial channel feature representations and temporal relationships. To mitigate this issue, we propose a Dual Path multi-excitation Collaborative Network (DPCNet) to learn the critical information for facial expression representation from fewer keyframes in videos. |
Yan Wang; Yixuan Sun; Wei Song; Shuyong Gao; Yiwen Huang; Zhaoyu Chen; Weifeng Ge; Wenqiang Zhang; |
| 315 | MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a Multimodal Pyramid Attentional Network (MM-Pyramid ) for event localization. |
Jiashuo Yu; Ying Cheng; Rui-Wei Zhao; Rui Feng; Yuejie Zhang; |
| 316 | Modality-aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we analyze the modality asynchrony and undifferentiated instances phenomena of the multiple instance learning (MIL) procedure, and further investigate its negative impact on weakly-supervised audio-visual learning. To address these issues, we propose a modality-aware contrastive instance learning with self-distillation (MACIL-SD) strategy . |
Jiashuo Yu; Jinyu Liu; Ying Cheng; Rui Feng; Yuejie Zhang; |
| 317 | Multi-Attention Network for Compressed Video Referring Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This may hamper its application in real-world computing resource limited scenarios, such as autonomous cars and drones. To alleviate this problem, in this paper, we explore the referring object segmenta- tion task on compressed videos, namely on the original video data flow. |
Weidong Chen; Dexiang Hong; Yuankai Qi; Zhenjun Han; Shuhui Wang; Laiyun Qing; Qingming Huang; Guorong Li; |
| 318 | Bayesian Based Re-parameterization for DNN Model Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel perspective of re-parametric pruning by Bayesian estimation. |
Xiaotong Lu; Teng Xi; Baopu Li; Gang Zhang; Weisheng Dong; Guangming Shi; |
| 319 | AEDNet: Asynchronous Event Denoising with Spatial-Temporal Correlation Among Irregular Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the nature of irregular format and asynchronous readout, DVS data is always transformed into a regular tensor (e.g., 3D voxel or image) for deep learning method, which corrupts its own asynchronous properties. To maintain asynchronous, we establish an innovative asynchronous event denoise neural network, named AEDNet, which directly consumes the correlation of the irregular signal in spatial-temporal range without destroying its original structural property. |
Huachen Fang; Jinjian Wu; Leida Li; Junhui Hou; Weisheng Dong; Guangming Shi; |
| 320 | Towards Continual Adaptation in Industrial Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Due to the limitation of flexibility and the requirements of realistic industrial scenarios, it is urgent to enhance the ability of continual adaptation of AD models. Therefore, this paper proposes a unified framework by incorporating continual learning (CL) to achieve our newly designed task of continual anomaly detection (CAD). |
Wujin Li; Jiawei Zhan; Jinbao Wang; Bizhong Xia; Bin-Bin Gao; Jun Liu; Chengjie Wang; Feng Zheng; |
| 321 | MmBody Benchmark: 3D Body Reconstruction Dataset and Analysis for Millimeter Wave Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Millimeter Ware (mmWave) Radar is gaining popularity as it can work in adverse environments like smoke, rain, snow, poor lighting, etc. Prior work has explored the possibility of … |
Anjun Chen; Xiangyu Wang; Shaohao Zhu; Yanxu Li; Jiming Chen; Qi Ye; |
| 322 | Defeating DeepFakes Via Adversarial Visual Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a proactive framework for combating DeepFake before the data manipulations. |
Ziwen He; Wei Wang; Weinan Guan; Jing Dong; Tieniu Tan; |
| 323 | AVQA: A Dataset for Audio-Visual Question Answering on Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, to overcome the limitations of existing datasets, we introduce AVQA, a new audio-visual question answering dataset on videos in real-life scenarios. |
Pinci Yang; Xin Wang; Xuguang Duan; Hong Chen; Runze Hou; Cong Jin; Wenwu Zhu; |
| 324 | Attack Is The Best Defense: Towards Preemptive-Protection Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus propose a novel preemptive-Protection person Re-IDentification (PRIDE) method. |
Lin Wang; Wanqian Zhang; Dayan Wu; Fei Zhu; Bo Li; |
| 325 | Domain Adaptation for Time-Series Classification to Mitigate Covariate Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel supervised DA based on two steps. |
Felix Ott; David R\{u}gamer; Lucas Heublein; Bernd Bischl; Christopher Mutschler; |
| 326 | Search-oriented Micro-video Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thereafter, we present a flow-based diverse captioning model to generate different captions from consumers’ search demand. |
Liqiang Nie; Leigang Qu; Dai Meng; Min Zhang; Qi Tian; Alberto Del Bimbo; |
| 327 | Cross-Modality Domain Adaptation for Freespace Detection: A Simple Yet Effective Baseline Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a cross-modality domain adaptation framework which exploits both RGB images and surface normal maps generated from depth images. |
Yuanbin Wang; Leyan Zhu; Shaofei Huang; Tianrui Hui; Xiaojie Li; Fei Wang; Si Liu; |
| 328 | Skeleton2Humanoid: Animating Simulated Characters for Physically-plausible Motion In-betweening Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, modern deep learning based motion synthesis approaches barely consider the physical plausibility of synthesized motions and consequently they usually produce unrealistic human motions. In order to solve this problem, we propose a system Skeleton2Humanoid which performs physics-oriented motion correction at test time by regularizing synthesized skeleton motions in a physics simulator. |
Yunhao Li; Zhenbo Yu; Yucheng Zhu; Bingbing Ni; Guangtao Zhai; Wei Shen; |
| 329 | SD-GAN: Semantic Decomposition for Face Image Synthesis with Discrete Attribute Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an innovative framework to tackle challenging facial discrete attribute synthesis via semantic decomposing, dubbed SD-GAN. |
Kangneng Zhou; Xiaobin Zhu; Daiheng Gao; Kai Lee; Xinjie Li; Xu-cheng Yin; |
| 330 | Design What You Desire: Icon Generation from Orthogonal Application and Theme Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on a realistic business scenario: automated generation of customizable icons given desired mobile applications and theme styles. |
Yinpeng Chen; Zhiyu Pan; Min Shi; Hao Lu; Zhiguo Cao; Weicai Zhong; |
| 331 | Understanding News Text and Images Connection with Context-enriched Multimodal Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the complex challenge of connecting images to news text. |
Cl\'{a}udio Bartolomeu; Rui N\'{o}brega; David Semedo; |
| 332 | MaMiCo: Macro-to-Micro Semantic Correspondence for Self-supervised Video Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The temporal and spatial semantic correspondence across different granularities, i.e., video, clip, and frame levels, is typically overlooked. To tackle this issue, we propose a self-supervised Macro-to-Micro Semantic Correspondence (MaMiCo) learning framework, pursuing fine-grained spatiotemporal representations from a macro-to-micro perspective. |
Bo Fang; Wenhao Wu; Chang Liu; Yu Zhou; Dongliang He; Weipinng Wang; |
| 333 | DAO: Dynamic Adaptive Offloading for Video Analytics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While adaptive video streaming for user viewing has been widely studied, none of the existing works can guarantee the analytics accuracy at the server in bandwidth- and content-adaptive way. To fill in this gap, this paper presents DAO, a dynamic adaptive offloading framework for video analytics that jointly considers the dynamics of network bandwidth and video content. |
taslim murad; Anh Nguyen; Zhisheng Yan; |
| 334 | Weakly-supervised Disentanglement Network for Video Fingerspelling Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Weakly-supervised Disentanglement Network, namely WED, that requires no additional knowledge, and better exploits the video-sentence weak supervisions. |
Ziqi Jiang; Shengyu Zhang; Siyuan Yao; Wenqiao Zhang; Sihan Zhang; Juncheng Li; Zhou Zhao; Fei Wu; |
| 335 | Dynamic Incomplete Multi-view Imputing and Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, 1) their imputation quality and clustering performance depend heavily on the static prior partition, such as predefined zeros filling, destroying the diversity of different views; 2) the size of base partitions is too small, which would lose advantageous details of base kernels to decrease clustering performance. To address these issues, we propose a novel IMVC method, named Dynamic Incomplete Multi-view Imputing and Clustering (DIMIC). |
Xingfeng Li; Quansen Sun; Zhenwen Ren; Yinghui Sun; |
| 336 | Exploring Negatives in Contrastive Learning for Unpaired Image-to-Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike previous CL approaches that use negatives as much as possible, in this paper, we study the negatives from an information-theoretic perspective and introduce a new negative Pruning technology for Unpaired image-to-image Translation (PUT) by sparsifying and ranking the patches. |
Yupei Lin; Sen Zhang; Tianshui Chen; Yongyi Lu; Guangping Li; Yukai Shi; |
| 337 | Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus the generated explanations are less faithful to visual-language reasoning. To mitigate these problems, we propose a unified Chunk-aware Alignment and Lexical Constraint based method, dubbed as CALeC. |
Qian Yang; Yunxin Li; Baotian Hu; Lin Ma; Yuxin Ding; Min Zhang; |
| 338 | Complementary Graph Representation Learning for Functional Neuroimaging Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new graph convolutional network (GCN) method to capture local and global patterns for conducting dynamically functional connectivity analysis. |
Rongyao Hu; Liang Peng; Jiangzhang Gan; Xiaoshuang Shi; Xiaofeng Zhu; |
| 339 | Energy-Based Domain Generalization for Face Anti-Spoofing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we resort to an energy-based model (EBM) to tackle FAS in a generative perspective. |
Zhekai Du; Jingjing Li; Lin Zuo; Lei Zhu; Ke Lu; |
| 340 | Self-Aligned Concave Curve: Illumination Enhancement for Unsupervised Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we are the first to propose a learnable illumination enhancement model for high-level vision. |
Wenjing Wang; Zhengbo Xu; Haofeng Huang; Jiaying Liu; |
| 341 | SDRTV-to-HDRTV Via Hierarchical Dynamic Context Feature Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the task of SDR videos to HDR videos(SDRTV-to-HDRTV conversion). |
Gang He; Kepeng Xu; Li Xu; Chang Wu; Ming Sun; Xing Wen; Yu-Wing Tai; |
| 342 | Long-Term Person Re-identification with Dramatic Appearance Change: Algorithm and Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The works on cross-appearance Re-ID, including datasets and algorithms, are still few. Therefore, this paper contributes a cross-season appearance change Re-ID dataset, namely NKUP+, including more than 300 IDs from surveillance videos over 10 months, to support the studies of the cross-appearance Re-ID. |
Mengmeng Liu; Zhi Ma; Tao Li; Yanfeng Jiang; Kai Wang; |
| 343 | Align and Adapt: A Two-stage Adaptation Framework for Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Early advances in domain adaptation focus on invariant representations learning (IRL) methods to align domain distributions. |
Yan Yu; Yuchen Zhai; Yin Zhang; |
| 344 | A Tree-Based Structure-Aware Transformer Decoder for Image-To-Markup Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TSDNet, a novel Tree-based Structure-aware Transformer Decoder NETwork to directly generate the tree representation of the target markup in a structure-aware manner. |
Shuhan Zhong; Sizhe Song; Guanyao Li; S.-H. Gary Chan; |
| 345 | Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, in this paper, we propose to enhance a simple grounding module with both object-aware and interaction-aware knowledge to acquire more reliable pseudo labels. |
Xingchen Li; Long Chen; Wenbo Ma; Yi Yang; Jun Xiao; |
| 346 | Prototype-based Selective Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods often ignore the mispredictions of the teacher signal, which may make the model vulnerable when disturbed by the wrong output of the teacher network. To tackle the above issues, we propose a novel method termed Prototype-based Selective Knowledge Distillation (PSKD) for ZS-SBIR. |
Kai Wang; Yifan Wang; Xing Xu; Xin Liu; Weihua Ou; Huimin Lu; |
| 347 | EliMRec: Eliminating Single-modal Bias in Multimedia Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Owing to the defect in architecture, there is still room for improvement for recent multimedia recommendation.In this paper, we propose EliMRec, a generic and modal-agnostic framework to eliminate the single-modal bias in multimedia recommendation. |
Xiaohao Liu; Zhulin Tao; Jiahong Shao; Lifang Yang; Xianglin Huang; |
| 348 | Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a simple yet effective self-supervised framework for audio-visual representation learning, to localize the sound source in videos. |
Jinxiang Liu; Chen Ju; Weidi Xie; Ya Zhang; |
| 349 | Self-Supervised Representation Learning for Skeleton-Based Group Activity Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims at learning discriminative representation for GAR in a self-supervised manner based on human skeletons. |
Cunling Bian; Wei Feng; Song Wang; |
| 350 | Tracking Game: Self-adaptative Agent Based Multi-object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, previous methods encounter tracking failures in complex scenes, since they lose most of the unique attributes of each target. In this paper, we formulate the MOT problem as Tracking Game and propose a Self-adaptative Agent Tracker (SAT) framework to solve this problem. |
Shuai Wang; Da Yang; Yubin Wu; Yang Liu; Hao Sheng; |
| 351 | Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the issues, we propose a generalized Deep Evidential Cross-modal Learning framework (DECL), which integrates a novel Cross-modal Evidential Learning paradigm (CEL) and a Robust Dynamic Hinge loss (RDH) with positive and negative learning. |
Yang Qin; Dezhong Peng; Xi Peng; Xu Wang; Peng Hu; |
| 352 | DeepWSD: Projecting Degradations in Perceptual Space to Wasserstein Distance in Deep Feature Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing deep learning-based full-reference IQA (FR-IQA) models usually predict the image quality in a deterministic way by explicitly comparing the features, gauging how severely distorted an image is by how far the corresponding feature lies from the space of the reference images. Herein, we look at this problem from a different viewpoint and propose to model the quality degradation in perceptual space from a statistical distribution perspective. |
Xingran Liao; Baoliang Chen; Hanwei Zhu; Shiqi Wang; Mingliang Zhou; Sam Kwong; |
| 353 | Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a noise-robust cross-lingual cross-modal retrieval method for low-resource languages. |
Yabing Wang; Jianfeng Dong; Tianxiang Liang; Minsong Zhang; Rui Cai; Xun Wang; |
| 354 | Improved Deep Unsupervised Hashing Via Prototypical Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this research, we develop a simple yet effective approach named deeP U nsupeR vised hashing via P rototypical LEarning. |
Zeyu Ma; Wei Ju; Xiao Luo; Chong Chen; Xian-Sheng Hua; Guangming Lu; |
| 355 | High-Quality 3D Face Reconstruction with Affine Convolutional Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in 3D face reconstruction, the spatial misalignment between the input image (e.g. face) and the canonical/UV output makes the feature encoding-decoding process quite challenging. In this paper, to tackle this problem, we propose a new network architecture, namely the Affine Convolution Networks, which enables CNN based approaches to handle spatially non-corresponding input and output images and maintain high-fidelity quality output at the same time. |
Zhiqian Lin; Jiangke Lin; Lincheng Li; Yi Yuan; Zhengxia Zou; |
| 356 | TextBlock: Towards Scene Text Spotting Without Fine-grained Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: . Based on these questions, we propose a new perspective of coarse-grained detection with multi-instance recognition for text spotting. |
Jin Wei; Yuan Zhang; Yu Zhou; Gangyan Zeng; Zhi Qiao; Youhui Guo; Haiying Wu; Hongbin Wang; Weipinng Wang; |
| 357 | Learning Intrinsic and Extrinsic Intentions for Cold-start Recommendation with Neural Stochastic Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an intention neural process model (INP) for user cold-start recommendation (i.e., user with very few historical interactions), a novel extension of the neural stochastic process family using a general meta learning strategy with intrinsic and extrinsic intention learning for robust user preference learning. |
Huafeng Liu; Liping Jing; Dahai Yu; Mingjie Zhou; Michael Ng; |
| 358 | Text Style Transfer Based on Multi-factor Disentanglement and Mixture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a framework to disentangle the text images into three factors: text content, font, and style features, and then remix the factors of different images to transfer a new style. |
Anna Zhu; Zhanhui Yin; Brian Kenji Iwana; Xinyu Zhou; Shengwu Xiong; |
| 359 | Rethinking Super-Resolution As Text-Guided Details Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new perspective that regards the SISR as a semantic image detail enhancement problem to generate semantically reasonable HR image that are faithful to the ground truth. |
Chenxi Ma; Bo Yan; Qing Lin; Weimin Tan; Siming Chen; |
| 360 | TPSNet: Reverse Thinking of Thin Plate Splines for Arbitrary Shape Scene Text Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thin-Plate-Spline (TPS) transformation has achieved great success in scene text recognition. Inspired by this, we reversely think of its usage and sophisticatedly take TPS as an exquisite representation for arbitrary shape text representation. |
Wei Wang; Yu Zhou; Jiahao Lv; Dayan Wu; Guoqing Zhao; Ning Jiang; Weipinng Wang; |
| 361 | DVR: Micro-Video Recommendation Optimizing Watch-Time-Gain Under Duration Bias Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on an essential bias in micro-video recommendation, duration bias. |
Yu Zheng; Chen Gao; Jingtao Ding; Lingling Yi; Depeng Jin; Yong Li; Meng Wang; |
| 362 | Cycle Self-Training for Semi-Supervised Object Detection with Distribution Consistency Reweighting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the coupling problem, we propose a Cycle Self-Training (CST) framework for SSOD, which consists of two teachers T1 and T2, two students S1 and S2. |
Hao Liu; Bin Chen; Bo Wang; Chunpeng Wu; Feng Dai; Peng Wu; |
| 363 | Boosting Single-Frame 3D Object Detection By Simulating Multi-Frame Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To boost a detector for single-frame 3D object detection, we present a new approach to train it to simulate features and responses following a detector trained on multi-frame point clouds. |
Wu Zheng; Li Jiang; Fanbin Lu; Yangyang Ye; Chi-Wing Fu; |
| 364 | MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our method can handle input triangular meshes with arbitrary topologies (2K – 3M triangles). We present a novel training technique to train MESH2IR using energy decay relief and highlight its benefits. |
Anton Ratnarajah; Zhenyu Tang; Rohith Aralikatti; Dinesh Manocha; |
| 365 | Exploring High-quality Target Domain Information for Unsupervised Domain Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple yet effective method that can achieve competitive performance to the advanced distillation methods. |
Junjie Li; Zilei Wang; Yuan Gao; Xiaoming Hu; |
| 366 | APPTracker: Improving Tracking Multiple Objects in Low-Frame-Rate Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the local nature of optical-flow-based methods, we propose an online tracking method by extending the CenterTrack architecture with a new head, named APP, to recognize unreliable displacement estimations. |
Tao Zhou; Wenhan Luo; Zhiguo Shi; Jiming Chen; Qi Ye; |
| 367 | Uncertainty-Aware 3D Human Pose Estimation from Monocular Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we exploit the EDL to measure the depth prediction uncertainty of the network, and decompose the x-y coordinates into individual distributions to model the deviation uncertainty of the inaccurate 2D keypoints. |
Jinlu Zhang; Yujin Chen; Zhigang Tu; |
| 368 | Phase-based Memory Network for Video Dehazing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate in the frequency domain, which enables us to capture small motions effectively, and find that the phase component contains more semantic structures yet less haze information than the amplitude component of the hazy image. |
Ye Liu; Liang Wan; Huazhu Fu; Jing Qin; Lei Zhu; |
| 369 | One-step Low-Rank Representation for Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel one-step representation-based method, i.e., One-step Low-Rank Representation (OLRR), is proposed to capture multi-subspace structures for clustering. |
Zhiqiang Fu; Yao Zhao; Dongxia Chang; Yiming Wang; Jie Wen; Xingxing Zhang; Guodong Guo; |
| 370 | Towards Counterfactual Image Manipulation Via CLIP Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: An intriguing yet challenging problem arises: Can generative models achieve counterfactual editing against their learnt priors? Due to the lack of counterfactual samples in natural datasets, we investigate this problem in a text-driven manner with Contrastive-Language-Image-Pretraining (CLIP), which can offer rich semantic knowledge even for various counterfactual concepts. |
Yingchen Yu; Fangneng Zhan; Rongliang Wu; Jiahui Zhang; Shijian Lu; Miaomiao Cui; Xuansong Xie; Xian-Sheng Hua; Chunyan Miao; |
| 371 | Box-FaceS: A Bidirectional Method for Box-Guided Face Component Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although existing methods have realized component editing with user-provided geometry guidance, such as masks or sketches, their performance is largely dependent on the user’s painting efforts. To address these issues, we propose Box-FaceS, a bidirectional method that can edit face components by simply translating and zooming the bounding boxes. |
Wenjing Huang; Shikui Tu; Lei Xu; |
| 372 | FastPR: One-stage Semantic Person Retrieval Via Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve the problems, we propose FastPR, a one-stage semantic person retrieval method via self-supervised learning, to optimize the person localization and semantic retrieval simultaneously. |
Meng Sun; Ju Ren; Xin Wang; Wenwu Zhu; Yaoxue Zhang; |
| 373 | AI Illustrator: Translating Raw Descriptions Into Images By Prompt-based Cross-Modal Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: AI illustrator aims to automatically design visually appealing images for books to provoke rich thoughts and emotions. To achieve this goal, we propose a framework for translating raw descriptions with complex semantics into semantically corresponding images. |
Yiyang Ma; Huan Yang; Bei Liu; Jianlong Fu; Jiaying Liu; |
| 374 | Visual Dialog for Spotting The Differences Between Pairs of Similar Images Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Visual dialog has witnessed great progress after introducing various vision-oriented goals into the conversation. Much of previous work focuses on tasks where only one image can … |
Duo Zheng; Fandong Meng; Qingyi Si; Hairun Fan; Zipeng Xu; Jie Zhou; Fangxiang Feng; Xiaojie Wang; |
| 375 | Breaking Isolation: Multimodal Graph Fusion for Multimedia Recommendation By Edge-wise Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a hard modulation and a soft modulation to fully investigate the multimodal dynamics behind. |
Feiyu Chen; Junjie Wang; Yinwei Wei; Hai-Tao Zheng; Jie Shao; |
| 376 | Class Gradient Projection For Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Class Gradient Projection (CGP), which calculates the gradient subspace from individual classes rather than tasks. |
Cheng Chen; Ji Zhang; Jingkuan Song; Lianli Gao; |
| 377 | A3GAN: Attribute-Aware Anonymization Networks for Face De-identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, for the first attempt, we think the face De-ID from the perspective of attribute editing and propose an attribute-aware anonymization network (A3GAN) by formulating face De-ID as a joint task of semantic suppression and controllable attribute injection. |
Liming Zhai; Qing Guo; Xiaofei Xie; Lei Ma; Yi Estelle Wang; Yang Liu; |
| 378 | Pixelwise Adaptive Discretization with Uncertainty Sampling for Depth Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we are the first to consider the difference between pixels and propose Pixelwise Adaptive Discretization to generate the tailored depth hypotheses for each pixel. |
Rui Peng; Tao Zhang; Bing Li; Yitong Wang; |
| 379 | Towards High-Fidelity Face Normal Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since collecting such high-fidelity database is difficult in practice, which prevents current methods from recovering face normal with fine-grained geometric details. To mitigate this issue, we propose a coarse-to-fine framework to estimate face normal from an in-the-wild image with only a coarse exemplar reference. |
Meng Wang; Chaoyue Wang; Xiaojie Guo; Jiawan Zhang; |
| 380 | MVLayoutNet: 3D Layout Reconstruction with Multi-view Panoramas Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present MVLayoutNet, a network for holistic 3D reconstruction from multi-view panoramas. |
Zhihua Hu; Bo Duan; Yanfeng Zhang; Mingwei Sun; Jingwei Huang; |
| 381 | Structure-Enhanced Pop Music Generation Via Harmony-Aware Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to leverage harmony-aware learning for structure-enhanced pop music generation. |
Xueyao Zhang; Jinchao Zhang; Yao Qiu; Li Wang; Jie Zhou; |
| 382 | C3CMR: Cross-Modality Cross-Instance Contrastive Learning for Cross-Media Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Typically, existing approaches for this task mainly learn inter-modal invariance and focus on how to combine pair-level loss and class-level loss, which cannot effectively and adequately learn discriminative features. To address these issues, in this paper, we propose a novel Cross-Modality Cross-Instance Contrastive Learning for Cross-Media Retrieval (C3CMR) method. |
Junsheng Wang; Tiantian Gong; Zhixiong Zeng; Changchang Sun; Yan Yan; |
| 383 | Two-Stage Multi-Scale Resolution-Adaptive Network for Low-Resolution Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel two-stage multi-scale resolution-adaptive network to learn more robust resolution-invariant representations. |
Haihan Wang; Shangfei Wang; Lin Fang; |
| 384 | Visual Grounding in Remote Sensing Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we collect a new visual grounding dataset, called RSVG, and design a new method, namely GeoVG. |
Yuxi Sun; Shanshan Feng; Xutao Li; Yunming Ye; Jian Kang; Xu Huang; |
| 385 | Enlarging The Long-time Dependencies Via RL-based Memory Network in Movie Affective Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Limited by the existing sequence models such as LSTM, Transformer, etc., current works generally split the movies into dependent clips and predict the affective impacts (Valence/Arousal) independently, ignoring the long historical impacts across the clips. In this paper, we introduce a novel Reinforcement learning based Memory Net (RMN) for this task, which facilitates the prediction of the current clip to rely on the possible related historical clips of this movie. |
Jie Zhang; Yin Zhao; Kai Qian; |
| 386 | SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Most FGVC approaches focus on the attention mechanism research for discriminative regions mining while neglecting their interdependencies and composed holistic object structure, which are essential for model’s discriminative information localization and understanding ability. To address the above limitations, we propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning to contain both the appearance information and structure information. |
Hongbo Sun; Xiangteng He; Yuxin Peng; |
| 387 | Enhancing Image Rescaling Using Dual Latent Variables in Invertible Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Normalizing flow models have been used successfully for generative image super-resolution (SR) by approximating complex distribution of natural images to simple tractable distribution in latent space through Invertible Neural Networks (INN). |
Min Zhang; Zhihong Pan; Xin Zhou; C.-C. Jay Kuo; |
| 388 | CACOLIT: Cross-domain Adaptive Co-learning for Imbalanced Image-to-Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new Cross-domain Adaptive Co-learning paradigm, CACOLIT, to alleviate the imbalanced unsupervised I2I training problem. |
Yijun Wang; Tao Liang; Jianxin Lin; |
| 389 | Rethinking Open-World Object Detection in Autonomous Driving Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing object detection models have been demonstrated to successfully discriminate and localize the predefined object categories under the seen or similar situations. |
Zeyu Ma; Yang Yang; Guoqing Wang; Xing Xu; Heng Tao Shen; Mingxing Zhang; |
| 390 | Partially Relevant Video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fill the gap, we propose in this paper a novel T2VR subtask termed Partially Relevant Video Retrieval (PRVR). |
Jianfeng Dong; Xianke Chen; Minsong Zhang; Xun Yang; Shujie Chen; Xirong Li; Xun Wang; |
| 391 | Multigranular Visual-Semantic Embedding for Cloth-Changing Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, visual-semantic information is also often ignored. To solve these issues, in this work, a novel multigranular visual-semantic embedding algorithm (MVSE) is proposed for cloth-changing person ReID, where visual semantic information and human attributes are embedded into the network, and the generalized features of human appearance can be well learned to effectively solve the problem of cloth-changing. |
Zan Gao; Hongwei Wei; Weili Guan; Weizhi Nie; Meng Liu; Meng Wang; |
| 392 | Model-Guided Multi-Contrast Deep Unfolding Network for MRI Super-resolution Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: 2) most existing SR reconstruction approaches only use a single contrast or use a simple multi-contrast fusion mechanism, neglecting the complex relationships between different contrasts that are critical for SR improvement. To deal with these issues, in this paper, a novel Model-Guided interpretable Deep Unfolding Network (MGDUN) for medical image SR reconstruction is proposed. |
Gang Yang; Li Zhang; Man Zhou; Aiping Liu; Xun Chen; Zhiwei Xiong; Feng Wu; |
| 393 | Cross-Modal Retrieval with Heterogeneous Graph Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose heterogeneous graph embeddings to preserve more abundant cross-modal information. |
Dapeng Chen; Min Wang; Haobin Chen; Lin Wu; Jing Qin; Wei Peng; |
| 394 | Multimodal In-bed Pose and Shape Estimation Under The Blankets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a multimodal approach to uncover the subjects and view bodies at rest without the blankets obscuring the view. |
Yu Yin; Joseph P. Robinson; Yun Fu; |
| 395 | Adma-GAN: Attribute-Driven Memory Augmented GANs for Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, directly utilizing the limited information in one sentence misses some key attribute descriptions, which are the crucial factors to describe an image accurately. To alleviate the above problem, we propose an effective text representation method with the complements of attribute information. |
Xintian Wu; Hanbin Zhao; Liangli Zheng; Shouhong Ding; Xi Li; |
| 396 | Relational Representation Learning in Visually-Rich Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In spite of their impressive results, we observe that the widespread relational hints (e.g., relation of key/value fields on receipts) built upon contextual knowledge are not excavated yet. To mitigate this gap, we propose DocReL, a Document Relational Representation Learning framework. |
Xin Li; Yan Zheng; Yiqing Hu; Haoyu Cao; Yunfei Wu; Deqiang Jiang; Yinsong Liu; Bo Ren; |
| 397 | Multi-Scale Coarse-to-Fine Transformer for Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate the limitation, we propose a two-stage flow-free video interpolation architecture. |
Chen Li; Li Song; Xueyi Zou; Jiaming Guo; Youliang Yan; Wenjun Zhang; |
| 398 | Early-Learning Regularized Contrastive Learning for Cross-Modal Retrieval with Noisy Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the solution, we propose to project the multi-modal data to a shared feature space by contrastive learning, in which early learning regularization is employed to prevent the memorization of noisy labels when training the model, and the dynamic weight balance strategy is employed to alleviate clustering drift. |
Tianyuan Xu; Xueliang Liu; Zhen Huang; Dan Guo; Richang Hong; Meng Wang; |
| 399 | Content and Gradient Model-driven Deep Network for Single Image Reflection Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a content and gradient-guided deep network (CGDNet) for single image reflection removal, which is a full-interpretable and model-driven network. |
Ya-Nan Zhang; Linlin Shen; Qiufu Li; |
| 400 | Scale-flow: Estimating 3D Motion from Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the scale matching, we propose a unified framework: Scale-flow, which combines scale matching and optical flow estimation. |
Han Ling; Quansen Sun; Zhenwen Ren; Yazhou Liu; Hongyuan Wang; Zichen Wang; |
| 401 | Inferring Speaking Styles from Multi-modal Conversational Context By Multi-scale Relational Graph Convolutional Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, to learn the dependencies in conversations at both global and local scales and to improve the synthesis of speaking styles, we propose a context modeling method which models the dependencies among the multi-modal information in context with multi-scale relational graph convolutional network (MSRGCN). |
Jingbei Li; Yi Meng; Xixin Wu; Zhiyong Wu; Jia Jia; Helen Meng; Qiao Tian; Yuping Wang; Yuxuan Wang; |
| 402 | Synthetic Data Supervised Salient Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although deep salient object detection (SOD) has achieved remarkable progress, deep SOD models are extremely data-hungry, requiring large-scale pixel-wise annotations to deliver such promising results. In this paper, we propose a novel yet effective method for SOD, coined SODGAN, which can generate infinite high-quality image-mask pairs requiring only a few labeled data, and these synthesized pairs can replace the human-labeled DUTS-TR to train any off-the-shelf SOD model. |
Zhenyu Wu; Lin Wang; Wei Wang; Tengfei Shi; Chenglizhao Chen; Aimin Hao; Shuo Li; |
| 403 | PassWalk: Spatial Authentication Leveraging Lateral Shift and Gaze on Mobile Headsets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present PassWalk, a keyboard-less authentication system leveraging multi-modal inputs on mobile headsets. |
Abhishek Kumar; Lik-Hang Lee; Jagmohan Chauhan; Xiang Su; Mohammad A. Hoque; Susanna Pirttikangas; Sasu Tarkoma; Pan Hui; |
| 404 | OCR-Pose: Occlusion-aware Contrastive Representation for Unsupervised 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Occlusion is a significant problem in 3D human pose estimation from the 2D counterpart. On one hand, without explicit annotation, the 3D skeleton is hard to be accurately … |
Junjie Wang; Zhenbo Yu; Zhengyan Tong; Hang Wang; Jinxian Liu; Wenjun Zhang; Xiaoyan Wu; |
| 405 | MimCo: Masked Image Modeling Pre-training with Contrastive Teacher Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel and flexible pre-training framework, named MimCo, which combines MIM and contrastive learning through two-stage pre-training. |
Qiang Zhou; Chaohui Yu; Hao Luo; Zhibin Wang; Hao Li; |
| 406 | Phoneme-Aware Adaptation with Discrepancy Minimization and Dynamically-Classified Vector for Text-independent Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, the utilization of these massive mismatched data and application of these auxiliary tasks may bring many rich features that could be exploited. In this paper, we propose a phoneme-aware adaptation network with discrepancy minimization and dynamically-classified vector for text-independent speaker verification to address these abovementioned challenges. |
Jia Wang; Tianhao Lan; Jie Chen; Chengwen Luo; Chao Wu; Jianqiang Li; |
| 407 | PC2-PU: Patch Correlation and Point Correlation for Effective Point Cloud Upsampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel method PC$^2$-PU, which explores patch-to-patch and point-to-point correlations for more effective and robust point cloud upsampling. |
Chen Long; WenXiao Zhang; Ruihui Li; Hao Wang; Zhen Dong; Bisheng Yang; |
| 408 | Feature and Semantic Views Consensus Hashing for Image Set Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the existing ISC methods face two problems: (1) The high computational cost prohibits these methods from being applied into median or large-scale applications; (2) the consensus information between feature and semantic representation of image set are largely ignored. To overcome these issues, in this paper, we propose a novel ISC method, termly feature and semantic views consensus hashing (FSVCH). |
Yuan Sun; Dezhong Peng; Haixiao Huang; Zhenwen Ren; |
| 409 | Progressive Limb-Aware Virtual Try-On Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In addition, these methods usually mask the limb textures of the input for the clothing-agnostic person representation, which results in inaccurate predictions for human limb regions (i.e., the exposed arm skin), especially when transforming between long-sleeved and short-sleeved garments. To address these problems, we present a progressive virtual try-on framework, named PL-VTON, which performs pixel-level clothing warping based on multiple attributes of clothing and embeds explicit limb-aware features to generate photo-realistic try-on results. |
Xiaoyu Han; Shengping Zhang; Qinglin Liu; Zonglin Li; Chenyang Wang; |
| 410 | Self-Supervised Graph Neural Network for Multi-Source Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, traditional multi-task self-supervised learning faces two challenges: (1) the pretext task may not strongly relate to the downstream task, thus it could be difficult to learn useful knowledge being shared from the pretext task to the target task; (2) when the same feature extractor is shared between the pretext task and the downstream one and only different prediction heads are used, it is ineffective to enable inter-task information exchange and knowledge sharing. To address these issues, we propose a novel Self-Supervised Graph Neural Network (SSG), where a graph neural network is used as the bridge to enable more effective inter-task information exchange and knowledge sharing. |
Jin Yuan; Feng Hou; Yangzhou Du; Zhongchao Shi; Xin Geng; Jianping Fan; Yong Rui; |
| 411 | Generic Image Manipulation Localization Through The Lens of Multi-scale Spatial Inconsistence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, we argue that these methods struggle with the modeling of spatial inconsistency within multi-scale, resulting in sub-optimal model performance. To overcome this problem, in this paper, we propose a novel end-to-end method to identify the multi-scale spatial inconsistency for image manipulation localization (abbreviated as MSI) where the multi-scale edge-guided attention stream (MEA) and multi-scale context-aware search stream (MCS) are jointly explored in a unified framework, moreover, multi-scale information is efficiently used. |
Zan Gao; Shenghao Chen; Yangyang Guo; Weili Guan; Jie Nie; Anan Liu; |
| 412 | Geometry-Aware Reference Synthesis for Multi-View Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent multi-view multimedia applications struggle between high-resolution (HR) visual experience and storage or bandwidth constraints. Therefore, this paper proposes a Multi-View Image Super-Resolution (MVISR) task. |
Ri Cheng; Yuqi Sun; Bo Yan; Weimin Tan; Chenxi Ma; |
| 413 | Cross-modal Co-occurrence Attributes Alignments for Person Search By Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although great efforts have been made to align images with sentences, the challenge of reporting bias, i.e., attributes are only partially matched across modalities, still incurs large noise and influences the accurate retrieval seriously. To address this challenge, we propose a novel cross-modal matching method named Cross-modal Co-occurrence Attributes Alignments (C2A2), which can better deal with noise and obtain significant improvements in retrieval performance for person search by language. |
Kai Niu; Linjiang Huang; Yan Huang; Peng Wang; Liang Wang; Yanning Zhang; |
| 414 | A Unified Framework Against Topology and Class Imbalance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To solve the complicated imbalance problem, we propose a unified topology-aware AUC optimization (TOPOAUC) framework, which could simultaneously deal with the topology and class imbalance problem in graph learning. |
Junyu Chen; Qianqian Xu; Zhiyong Yang; Xiaochun Cao; Qingming Huang; |
| 415 | Recurrent Meta-Learning Against Generalized Cold-start Problem in CTR Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by this, a generalized definition of the cold-start problem is provided where both new users/ads and recent behavioral data from known users are considered. To attack this problem, we propose a recursive meta-learning model with the user’s behavior sequence prediction as a separate training task. |
Junyu Chen; Qianqian Xu; Zhiyong Yang; Ke Ma; Xiaochun Cao; Qingming Huang; |
| 416 | CyclicShift: A Data Augmentation Method For Enriching Data Patterns Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet effective data augmentation strategy, dubbed CyclicShift, to enrich data patterns. |
Hui Lu; Xuan Cheng; Wentao Xia; Pan Deng; MingHui Liu; Tianshu Xie; XiaoMin Wang; Ming Liu; |
| 417 | A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work presents a unified end-to-end retriever-reader framework towards knowledge-based VQA. |
Yangyang Guo; Liqiang Nie; Yongkang Wong; Yibing Liu; Zhiyong Cheng; Mohan Kankanhalli; |
| 418 | Deepfake Video Detection with Spatiotemporal Dropout Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by that, this paper proposes a simple yet effective patch-level approach to facilitate deepfake video detection via spatiotemporal dropout transformer. |
DaiChi Zhang; Fanzhao Lin; Yingying Hua; Pengju Wang; Dan Zeng; Shiming Ge; |
| 419 | Few-shot X-ray Prohibited Item Detection: A Benchmark and Weak-feature Enhancement Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, professional X-ray FSOD evaluation benchmarks and effective models of this scenario have been rarely studied in recent years. Therefore, in this paper, we propose the first X-ray FSOD dataset on the typical industrial X-ray security inspection scenario consisting of 12,333 images and 41,704 instances from 20 categories, which could benchmark and promote FSOD studies in such more challenging scenarios. |
Renshuai Tao; Tianbo Wang; Ziyang Wu; Cong Liu; Aishan Liu; Xianglong Liu; |
| 420 | Towards Accurate Post-Training Quantization for Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find the main reasons lie in (1) the existing calibration metric is inaccurate in measuring the quantization influence for extremely low-bit representation, and (2) the existing quantization paradigm is unfriendly to the power-law distribution of Softmax. Based on these observations, we propose a novel Accurate Post-training Quantization framework for Vision Transformer, namely APQ-ViT. |
Yifu Ding; Haotong Qin; Qinghua Yan; Zhenhua Chai; Junjie Liu; Xiaolin Wei; Xianglong Liu; |
| 421 | Towards Causality Inference for Very Important Person Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we re-built a complex uncontrolled conditions (CUC) dataset to make the VIPLoc closer to the actual situation, containing no, single, and multiple VIPs. |
Xiao Wang; Zheng Wang; Wu Liu; Xin Xu; Qijun Zhao; Shin’ichi Satoh; |
| 422 | Relative Pose Estimation for Multi-Camera Systems from Point Correspondences with Scale Ratio Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper exploits known scale ratios besides the point coordinates, which are also intrinsically provided by scale invariant feature detectors (e.g., SIFT). |
Banglei Guan; Ji Zhao; |
| 423 | DOMFN: A Divergence-Orientated Multi-Modal Fusion Network for Resume Assessment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by practical resume evaluations that consider both the content and layout, we construct the multi-modalities from resumes but face a new challenge that sometimes the performance of multi-modal fusion is even worse than the best uni-modality. In this paper, we experimentally find that this phenomenon is due to the cross-modal divergence. |
Yang Yang; Jingshuai Zhang; Fan Gao; Xiaoru Gao; Hengshu Zhu; |
| 424 | Learning from Different Text-image Pairs: A Relation-enhanced Graph Convolutional Network for Multimodal NER Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we primarily explore two kinds of external matching relations between different (text, image) pairs, i.e., inter-modal relations and intra-modal relations. |
Fei Zhao; Chunhui Li; Zhen Wu; Shangyu Xing; Xinyu Dai; |
| 425 | Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To the best of our knowledge, there is still no complete and effective pipeline for rectifying document images in the wild. To address this issue, we propose a novel approach called Marior (Margin Removal and Iterative Content Rectification). |
Jiaxin Zhang; Canjie Luo; Lianwen Jin; Fengjun Guo; Kai Ding; |
| 426 | SongDriver: Real-time Music Accompaniment Generation Without Logical Latency Nor Exposure Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SongDriver, a real-time music accompaniment generation system without logical latency nor exposure bias. |
Zihao Wang; Kejun Zhang; Yuxing Wang; Chen Zhang; Qihao Liang; Pengfei Yu; Yongsheng Feng; Wenbo Liu; Yikai Wang; Yuntao Bao; Yiheng Yang; |
| 427 | Dynamic Prototype Mask for Occluded Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To escape from the extra pre-trained networks and achieve an automatic alignment in an end-to-end trainable network, we propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge. |
Lei Tan; Pingyang Dai; Rongrong Ji; Yongjian Wu; |
| 428 | Not All Pixels Are Matched: Dense Contrastive Learning for Cross-Modality Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a dense contrastive learning framework (DCLNet), which performs pixel-to-pixel dense alignment acting on the intermediate representations, rather than the final deep feature. |
Hanzhe Sun; Jun Liu; Zhizhong Zhang; Chengjie Wang; Yanyun Qu; Yuan Xie; Lizhuang Ma; |
| 429 | Self-supervised Multi-view Stereo Via Inter and Intra Network Pseudo Depth Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a self-supervised dual network MVS framework with inter and intra network pseudo depth labels for more powerful supervision guidance. |
Ke Qiu; Yawen Lai; Shiyi Liu; Ronggang Wang; |
| 430 | A Multi-view Spectral-Spatial-Temporal Masked Autoencoder for Decoding Emotions with Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, daily collected EEG data may be partially damaged since EEG signals are sensitive to noise. In this paper, we propose a Multi-view Spectral-Spatial-Temporal Masked Autoencoder (MV-SSTMA) with self-supervised learning to tackle these challenges towards daily applications. |
Rui Li; Yiting Wang; Wei-Long Zheng; Bao-Liang Lu; |
| 431 | Calibrating Class Weights with Multi-Modal Information for Partial Video Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It utilizes a novel class weight calibration method to alleviate the negative transfer caused by incorrect class weights. |
Xiyu Wang; Yuecong Xu; Jianfei Yang; Kezhi Mao; |
| 432 | Delving Into The Frequency: Temporally Consistent Human Motion Transfer in The Fourier Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nonetheless, there is no work to study the temporal inconsistency of synthetic videos from the aspects of the frequency-domain gap between natural and synthetic videos. Therefore, in this paper, we propose to delve into the frequency space for temporally consistent human motion transfer. |
Guang Yang; Wu Liu; Xinchen Liu; Xiaoyan Gu; Juan Cao; Jintao Li; |
| 433 | Long-term Leap Attention, Short-term Periodic Shift for Video Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The existing works treat the temporal axis as a simple extension of spatial axes, focusing on shortening the spatio-temporal sequence by either generic pooling or local windowing without utilizing temporal redundancy.However, videos naturally contain redundant information between neighboring frames; thereby, we could potentially suppress attention on visually similar frames in a dilated manner. Based on this hypothesis, we propose the LAPS, a long-term Leap Attention (LA), short-term Periodic Shift (P-Shift) module for video transformers, with (2TN2) complexity. |
Hao Zhang; Lechao Cheng; Yanbin Hao; Chong-wah Ngo; |
| 434 | Multi-Level Spatiotemporal Network for Video Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a natural phenomenon, video fragments in different shots are richer in semantics than frames. We leverage this as a free latent supervision signal and introduce a novel model named multi-level spatiotemporal network (MLSN). |
Ming Yao; Yu Bai; Wei Du; Xuejun Zhang; Heng Quan; Fuli Cai; Hongwei Kang; |
| 435 | Sophon: Super-Resolution Enhanced 360° Video Streaming with Visual Saliency-aware Prefetch Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Sophon, a buffer-based and neural-enhanced streaming framework, which exploits the double buffer design, super-resolution technique, and viewport-aware strategy to improve user experience. |
Jianxin Shi; Lingjun Pu; Xinjing Yuan; Qianyun Gong; Jingdong Xu; |
| 436 | Representation Learning Through Multimodal Attention and Time-Sync Comments for Affective Video Content Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Temporal-Aware Multimodal (TAM) method to fully capture the temporal information. |
Jicai Pan; Shangfei Wang; Lin Fang; |
| 437 | Rate-Distortion-Guided Learning Approach with Cross-Projection Information for V-PCC Fast CU Decision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel rate-distortion-guided fast attribute coding unit (CU) partitioning approach with cross-projection information in V-PCC all-intra (AI) coding. |
Hang Yuan; Wei Gao; Ge Li; Zhu Li; |
| 438 | Decoupling Recognition from Detection: Single Shot Self-Reliant Scene Text Spotter Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 2) The RoI cropping which bridges the detection and recognition brings noise from background and leads to information loss when pooling or interpolating from feature maps. In this work we propose the single shot Self-Reliant Scene Text Spotter (SRSTS), which circumvents these limitations by decoupling recognition from detection. |
Jingjing Wu; Pengyuan Lyu; Guangming Lu; Chengquan Zhang; Kun Yao; Wenjie Pei; |
| 439 | MAFW: A Large-scale, Multi-modal, Compound Affective Database for Dynamic Facial Expression Recognition in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose MAFW, a large-scale multi-modal compound affective database with 10,045 video-audio clips in the wild. |
Yuanyuan Liu; Wei Dai; Chuanxu Feng; Wenbin Wang; Guanghao Yin; Jiabei Zeng; Shiguang Shan; |
| 440 | Factorized and Controllable Neural Re-Rendering of Outdoor Scene for Photo Extrapolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a factorized neural re-rendering model to produce photorealistic novel views from cluttered outdoor Internet photo collections, which enables the applications including controllable scene re-rendering, photo extrapolation and even extrapolated 3D photo generation. |
Boming Zhao; Bangbang Yang; Zhenyang Li; Zuoyue Li; Guofeng Zhang; Jiashu Zhao; Dawei Yin; Zhaopeng Cui; Hujun Bao; |
| 441 | Generative Steganography Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an advanced generative steganography network (GSN) that can generate realistic stego images without using cover images. |
Ping Wei; Sheng Li; Xinpeng Zhang; Ge Luo; Zhenxing Qian; Qing Zhou; |
| 442 | Best of Both Worlds: See and Understand Clearly in The Dark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing techniques usually focus on only one task (e.g., enhancement) and lose sight of the others (e.g., detection), making it difficult to perform all of them well at the same time. To overcome this limitation, we propose a new method that can handle visual quality enhancement and semantic-related tasks (e.g., detection, segmentation) simultaneously in a unified framework. |
Xinwei Xue; Jia He; Long Ma; Yi Wang; Xin Fan; Risheng Liu; |
| 443 | Video-Guided Curriculum Learning for Spoken Video Grounding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new task, spoken video grounding (SVG), which aims to localize the desired video fragments from spoken language descriptions. |
Yan Xia; Zhou Zhao; Shangwei Ye; Yang Zhao; Haoyuan Li; Yi Ren; |
| 444 | PixelSeg: Pixel-by-Pixel Stochastic Semantic Segmentation for Ambiguous Medical Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a semantic segmentation framework, PixelSeg, for modelling aleatoric uncertainty in segmentation maps and generating multiple plausible hypotheses. |
Wei Zhang; Xiaohong Zhang; Sheng Huang; Yuting Lu; Kun Wang; |
| 445 | A Probabilistic Model for Controlling Diversity and Accuracy of Ambiguous Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel probabilistic segmentation model, called Joint Probabilistic U-net, which successfully achieves flexible control over the two abstract conceptions of diversity and accuracy. |
Wei Zhang; Xiaohong Zhang; Sheng Huang; Yuting Lu; Kun Wang; |
| 446 | XCloth: Extracting Template-free Textured 3D Clothes from A Monocular Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More specifically, we propose to extend PeeledHuman representation to predict the pixel-aligned, layered depth and semantic maps to extract 3D garments. |
Astitva Srivastava; Chandradeep Pokhariya; Sai Sagar Jinka; Avinash Sharma; |
| 447 | Semantics-Driven Generative Replay for Few-Shot Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, generative replay can tackle both the stability and plasticity of the models simultaneously by generating a large number of class-conditional samples. Convinced by this fact, we propose a generative modeling-based FSCIL framework using the paradigm of memory-replay in which a novel conditional few-shot generative adversarial network (GAN) is incrementally trained to produce visual features while ensuring the stability-plasticity trade-off through novel loss functions and combating the mode-collapse problem effectively. |
Aishwarya Agarwal; Biplab Banerjee; Fabio Cuzzolin; Subhasis Chaudhuri; |
| 448 | FedMed-ATL: Misaligned Unpaired Cross-Modality Neuroimage Synthesis Via Affine Transform Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel federated self-supervised learning (FedMed) for brain image synthesis. |
Jinbao Wang; Guoyang Xie; Yawen Huang; Yefeng Zheng; Yaochu Jin; Feng Zheng; |
| 449 | Fast Hierarchical Deep Unfolding Network for Image Compressed Sensing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, by unfolding the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA), a novel fast hierarchical DUN, dubbed FHDUN, is proposed for image compressed sensing, in which a well-designed hierarchical unfolding architecture is developed to cooperatively explore richer contextual prior information in multi-scale spaces. |
Wenxue Cui; Shaohui Liu; Debin Zhao; |
| 450 | Delving Globally Into Texture and Structure for Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we delve globally into texture and structure information to well capture the semantics for image inpainting. |
Haipeng Liu; Yang Wang; Meng Wang; Yong Rui; |
| 451 | Webly Supervised Image Hashing with Lightweight Semantic Transfer Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Webly Supervised Image Hashing (WSIH) with a well-designed lightweight network. |
Hui Cui; Lei Zhu; Jingjing Li; Zheng Zhang; Weili Guan; |
| 452 | Image-Signal Correlation Network for Textile Fiber Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we construct the first NIR signal-microscope image textile fiber composition dataset (NIRITFC). |
Bo Peng; Liren He; Yining Qiu; Wu Dong; Mingmin Chi; |
| 453 | Generalized Global Ranking-Aware Neural Architecture Ranker for Efficient Image Classifier Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Neural Architecture Search (NAS) is a powerful tool for automating effective image processing DNN designing. The ranking has been advocated to design an efficient performance … |
Bicheng Guo; Tao Chen; Shibo He; Haoyu Liu; Lilin Xu; Peng Ye; Jiming Chen; |
| 454 | Query-driven Generative Network for Document Information Extraction in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to existing studies mainly tailored for document cases in known templates with predefined layouts and keys under the ideal input without OCR errors involved, we aim to build up a more practical DIE paradigm for real-world scenarios where input document images may contain unknown layouts and keys in the scenes of the problematic OCR results. To achieve this goal, we propose a novel architecture, termed Query-driven Generative Network (QGN), which is equipped with two consecutive modules, i.e., Layout Context-aware Module (LCM) and Structured Generation Module (SGM). |
Haoyu Cao; Xin Li; Jiefeng Ma; Deqiang Jiang; Antai Guo; Yiqing Hu; Hao Liu; Yinsong Liu; Bo Ren; |
| 455 | MONOPOLY: Financial Prediction from MONetary POLicY Conference Videos Using Multimodal Cues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MPCNet, a competitive baseline architecture that takes advantage of the cross-modal transformer blocks and modality-specific attention fusion to forecast the financial risk and price movement associated with the MPC calls. |
Puneet Mathur; Atula Neerkaje; Malika Chhibber; Ramit Sawhney; Fuming Guo; Franck Dernoncourt; Sanghamitra Dutta; Dinesh Manocha; |
| 456 | Efficient Anchor Learning-based Multi-view Clustering — A Late Fusion Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in the existing methods, the anchor points are usually generated through sampling or linearly combining the samples within the datasets, which could result in enormous time consumption and limited representation capability. To solve the problem, in our method, we learn the view-specific anchor points by learning them directly. |
Tiejian Zhang; Xinwang Liu; En Zhu; Sihang Zhou; Zhibin Dong; |
| 457 | Image Understanding By Captioning with Differentiable Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Designing a proper image captioning encoder-decoder architecture manually is a difficult challenge due to the complexity of recognizing the critical objects of the input images and their relationships to generate caption descriptions. To address this issue, we propose a three-level optimization method that employs differentiable architecture search strategies to seek the most suitable architecture for image captioning automatically. |
Ramtin Hosseini; Pengtao Xie; |
| 458 | Synthesizing Counterfactual Samples for Effective Image-Text Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, the truly informative negative samples are quite sparse in the training data, which are hard to obtain only in a randomly sampled mini-batch. Motivated by causal inference, we aim to overcome this shortcoming by carefully analyzing the analogy between hard negative mining and causal effects optimizing. |
Hao Wei; Shuhui Wang; Xinzhe Han; Zhe Xue; Bin Ma; Xiaoming Wei; Xiaolin Wei; |
| 459 | MVSPlenOctree: Fast and Generic Reconstruction of Radiance Fields in PlenOctree from Multi-view Stereo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present MVSPlenOctree, a novel approach that can efficiently reconstruct radiance fields for view synthesis. |
Wenpeng Xing; Jie Chen; |
| 460 | Uncertainty-Aware Semi-Supervised Learning of 3D Face Rigging from Single Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method to rig 3D faces via Action Units (AUs), viewpoint and light direction, from single input image. |
Yong Zhao; Haifeng Chen; Hichem Sahli; Ke Lu; Dongmei Jiang; |
| 461 | Cyclical Fusion: Accurate 3D Reconstruction Via Cyclical Monotonicity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the projective correspondences are highly unreliable due to sensor depth and pose uncertainties. To tackle this challenge, we introduce a geometry-driven fusion framework, Cyclical Fusion. |
Duo Chen; Zixin Tang; Yiguang Liu; |
| 462 | Multimodal Hate Speech Detection Via Cross-Domain Knowledge Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A scalable cross-domain knowledge transfer (CDKT) framework is proposed, where the mainstream vision-language transformer could be employed as backbone flexibly. |
Chuanpeng Yang; Fuqing Zhu; Guihua Liu; Jizhong Han; Songlin Hu; |
| 463 | BlumNet: Graph Component Detection for Object Skeleton Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a simple yet efficient framework, BlumNet, for extracting object skeletons in natural images and binary shapes. |
Yulu Zhang; Liang Sang; Marcin Grzegorzek; John See; Cong Yang; |
| 464 | Improving Fusion of Region Features and Grid Features Via Two-Step Interaction for Image-Text Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework, which fuses the region features and grid features through a two-step interaction strategy, thus extracting a more comprehensive image representation for image-text retrieval. |
Dongqing Wu; Huihui Li; Cang Gu; Lei Guo; Hang Liu; |
| 465 | Show Me What I Like: Detecting User-Specific Video Highlights Using Content-Based Multi-Head Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method to detect individualized highlights for users on given target videos based on their preferred highlight clips marked on previous videos they have watched. |
Uttaran Bhattacharya; Gang Wu; Stefano Petrangeli; Viswanathan Swaminathan; Dinesh Manocha; |
| 466 | Effective Video Abnormal Event Detection By Learning A Consistency-Aware High-Level Feature Extractor Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better exploit high-level semantics for VAD, we propose a novel paradigm that performs VAD by learning a Consistency-Aware high-level Feature Extractor (CAFE). |
Guang Yu; Siqi Wang; Zhiping Cai; Xinwang Liu; Chengkun Wu; |
| 467 | Brain Topography Adaptive Network for Satisfaction Modeling in Interactive Information Access System Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to explore the benefits of using Electroencephalography (EEG) signals for satisfaction modeling in interactive information access system design. |
Ziyi Ye; Xiaohui Xie; Yiqun Liu; Zhihong Wang; Xuesong Chen; Min Zhang; Shaoping Ma; |
| 468 | CLUT-Net: Learning Adaptively Compressed Representations of 3DLUTs for Lightweight Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through in-depth analysis of the inherent compressibility of 3DLUT, we propose an effective Compressed representation of 3-dimensional LookUp Table (CLUT) which maintains the powerful mapping capability of 3DLUT but with a significantly reduced parameter amount. |
Fengyi Zhang; Hui Zeng; Tianjun Zhang; Lin Zhang; |
| 469 | Less Is More: Consistent Video Depth Estimation with Masked Frames Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Since videos inherently exist with heavy temporal redundancy, a missing frame could be recovered from neighboring ones. Inspired by this, we propose the frame masking network (FMNet), a spatial-temporal transformer network predicting the depth of masked frames based on their neighboring frames. |
Yiran Wang; Zhiyu Pan; Xingyi Li; Zhiguo Cao; Ke Xian; Jianming Zhang; |
| 470 | Crossmodal Few-shot 3D Point Cloud Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, such data labeling usually requires manual annotation of large-scale points in 3D space, which can be very difficult and laborious. To address this problem, in this paper we introduce a novel crossmodal few-shot learning approach for 3D point cloud semantic segmentation. |
Ziyu Zhao; Zhenyao Wu; Xinyi Wu; Canyu Zhang; Song Wang; |
| 471 | Dynamic Weighted Semantic Correspondence for Few-Shot Image Generative Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce two novel methods to solve the diversity and fidelity respectively. |
Xingzhong Hou; Boxiao Liu; Shuai Zhang; Lulin Shi; Zite Jiang; Haihang You; |
| 472 | VigilanceNet: Decouple Intra- and Inter-Modality Learning for Multimodal Vigilance Estimation in RSVP-Based BCI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, for intra-modality, we introduce an intra-modality representation learning (intra-RL) method to obtain effective representations of each modality by letting each modality independently predict vigilance levels during the multimodal training process. |
Xinyu Cheng; Wei Wei; Changde Du; Shuang Qiu; Sanli Tian; Xiaojun Ma; Huiguang He; |
| 473 | OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, scene segmentation concerns more on the local difference between adjacent shots while classification needs the global representation of scene segments, which probably leads to the model dominated by one of the two tasks in the training phase. In this paper, from an alternate perspective to overcome the above challenges, we unite these two tasks into one task by a new form of predicting shots link: a link connects two adjacent shots, indicating that they belong to the same scene or category. |
Ye Liu; Lingfeng Qiao; Di Yin; Zhuoxuan Jiang; Xinghua Jiang; Deqiang Jiang; Bo Ren; |
| 474 | Learning Occlusion-aware Coarse-to-Fine Depth Map for Self-supervised Monocular Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Self-supervised monocular depth estimation, aiming to learn scene depths from single images in a self-supervised manner, has received much attention recently. In spite of recent … |
Zhengming Zhou; Qiulei Dong; |
| 475 | Semi-supervised Learning for Multi-label Video Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Compared to the single-label scenario, semi-supervised learning in multi-label video action detection is more challenging due to two significant issues: generation of multiple pseudo labels and class-imbalanced data distribution. In this paper, we propose an effective semi-supervised learning method to tackle these challenges. |
Hongcheng Zhang; Xu Zhao; Dongqi Wang; |
| 476 | Grouped Adaptive Loss Weighting for Person Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a Grouped Adaptive Loss Weighting (GALW) method which adjusts the weight of each task automatically and dynamically. |
Yanling Tian; Di Chen; Yunan Liu; Shanshan Zhang; Jian Yang; |
| 477 | Source-Free Active Domain Adaptation Via Energy-Based Locality Preserving Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we handle ADA with only a source-pretrained model and unlabeled target data, proposing a new setting named source-free active domain adaptation. |
Xinyao Li; Zhekai Du; Jingjing Li; Lei Zhu; Ke Lu; |
| 478 | Meta Clustering Learning for Large-scale Unsupervised Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we make the first attempt to the large-scale U-ReID and propose a small data for big task paradigm dubbed Meta Clustering Learning (MCL). |
Xin Jin; Tianyu He; Xu Shen; Tongliang Liu; Xinchao Wang; Jianqiang Huang; Zhibo Chen; Xian-Sheng Hua; |
| 479 | ChebyLighter: Optimal Curve Estimation for Low-light Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by curve adjustment in photo editing software and Chebyshev approximation, this paper presents a novel model for brightening low-light images. |
Jinwang Pan; Deming Zhai; Yuanchao Bai; Junjun Jiang; Debin Zhao; Xianming Liu; |
| 480 | Photorealistic Style Transfer Via Adaptive Filtering and Channel Seperation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is mainly caused by the interference between color and texture during transferring. To address this problem, we propose a end-to-end network via adaptive filtering and channel separation. |
Hong Ding; Fei Luo; Caoqing Jiang; Gang Fu; Zipei Chen; Shenghong Hu; Chunxia Xiao; |
| 481 | HMTN: Hierarchical Multi-scale Transformer Network for 3D Shape Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In general, an effective 3D shape recognition algorithm should take both the multiview local and global visual information into consideration, and explore the inherent properties of generated 3D descriptors to guarantee the performance of feature alignment in the common space. To tackle these issues, we propose a novel Hierarchical Multi-scale Transformer Network (HMTN) for the 3D shape recognition task. |
Yue Zhao; Weizhi Nie; Zan Gao; An-an Liu; |
| 482 | MmLayout: Multi-grained MultiModal Transformer for Document Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we attach more importance to coarse-grained elements containing high-density information and consistent semantics, which are valuable for document understanding. |
Wenjin Wang; Zhengjie Huang; Bin Luo; Qianglong Chen; Qiming Peng; Yinxu Pan; Weichong Yin; Shikun Feng; Yu Sun; Dianhai Yu; Yin Zhang; |
| 483 | Video Instance Lane Detection Via Deep Temporal and Geometry Consistency Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TGC-Net via temporal and geometry consistency constraints for reliable video instance lane detection. |
Mingqian Wang; Yujun Zhang; Wei Feng; Lei Zhu; Song Wang; |
| 484 | Active Learning for Point Cloud Semantic Segmentation Via Spatial-Structural Diversity Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Active learning methods endeavor to reduce such cost by selecting and labeling only a subset of the point clouds, yet previous attempts ignore the spatial-structural diversity of the selected samples, inducing the model to select clustered candidates with similar shapes in a local area while missing other representative ones in the global environment. In this paper, we propose a new 3D region-based active learning method to tackle this problem. |
Feifei Shao; Yawei Luo; Ping Liu; Jie Chen; Yi Yang; Yulei Lu; Jun Xiao; |
| 485 | Balanced Gradient Penalty Improves Deep Long-Tailed Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The loss landscape on long-tailed learning is first investigated in this work. |
Dong Wang; Yicheng Liu; Liangji Fang; Fanhua Shang; Yuanyuan Liu; Hongying Liu; |
| 486 | Point Cloud Completion Via Multi-Scale Edge Convolution and Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, they tend to overlook relations among different local regions, which are valuable during shape inference. To solve these problems, we propose a novel point cloud completion network based on multi-scale edge convolution and attention mechanism, named MEAPCN. |
Rui Cao; Kaiyi Zhang; Yang Chen; Ximing Yang; Cheng Jin; |
| 487 | Geometric Warping Error Aware CNN for DIBR Oriented View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This resembles, to some extent, the image super-resolution problem, but with unfixed fractional pixel locations. To address this problem, we propose a geometric warping error aware CNN (GWEA) framework to enhance the DIBR oriented view synthesis. |
Shuai Li; Kaixin Wang; Yanbo Gao; Xun Cai; Mao Ye; |
| 488 | Dynamically Adjust Word Representations Using Unaligned Multimodal Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel end-to-end network named Cross Hyper-modality Fusion Network (CHFN). |
Jiwei Guo; Jiajia Tang; Weichen Dai; Yu Ding; Wanzeng Kong; |
| 489 | Geometry Aligned Variational Transformer for Image-conditioned Layout Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose an Image-Conditioned Variational Transformer (ICVT) that autoregressively generates various layouts in an image. |
Yunning Cao; Ye Ma; Min Zhou; Chuanbin Liu; Hongtao Xie; Tiezheng Ge; Yuning Jiang; |
| 490 | S-CCR: Super-Complete Comparative Representation for Low-Light Image Quality Inference In-the-wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a new super-complete comparative representation (S-CCR) for the region-level quality inference of low-light images. |
Miaohui Wang; Zhuowei Xu; Yuanhao Gong; Wuyuan Xie; |
| 491 | Multi-Mode Interactive Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current techniques usually fail in many cases, as their interaction styles cannot work on various inherent ambiguities of medical images, such as irregular shapes and fuzzy boundaries. To address this problem, we propose a multi-mode interactive segmentation framework for medical images, where diverse interaction modes can be chosen and allowed to cooperate with each other. |
Zheng Lin; Zhao Zhang; Ling-Hao Han; Shao-Ping Lu; |
| 492 | Multimedia Event Extraction From News With A Unified Contrastive Learning Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new method for multimedia EE by bridging the textual and visual modalities with a unified contrastive learning framework. |
Jian Liu; Yufeng Chen; Jinan Xu; |
| 493 | Cartoon-Flow: A Flow-Based Generative Adversarial Network for Arbitrary-Style Photo Cartoonization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Second, they suffer from content leaks in which the semantic structure of the content is distorted. In this paper, to solve these problems, we propose a novel arbitrary-style photo cartoonization method, Cartoon-Flow. |
Jieun Lee; Hyeonwoo Kim; Jonghwa Shim; Eenjun Hwang; |
| 494 | A Dual-Masked Auto-Encoder for Robust Motion Capture with Spatial-Temporal Skeletal Token Completion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, 2D joint detection is usually incomplete and with wrong identity assignments due to limited observation angle, which leads to noisy 3D triangulation results. To overcome this issue, we propose to explore the short-range autoregressive characteristics of skeletal motion using transformer. |
Junkun Jiang; Jie Chen; Yike Guo; |
| 495 | Structure-Preserving Motion Estimation for Learned Video Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Diving into its essential advantage of strong representation capability with CNNs, however, we find this strategy is suboptimal due to two reasons: (1) Motion estimation based on the decoded (often distorted) frame would damage both the spatial structure of motion information inferred and the corresponding residual for each frame, making it difficult to be spatially encoded on the whole image basis using CNNs; (2) Typically, it would break the consistent nature across frames since the estimated motion information is no longer consistent with the movement in the original video due to the distortion in the decoded video, lowering the overall temporal coding efficiency. To overcome these problems, a novel asymmetric Structure-Preserving Motion Estimation (SPME) method is proposed, with the aim to fully explore the ignored original previous frame at the encoder side while complying with the decoded previous frame at the decoder side. |
Han Gao; Jinzhong Cui; Mao Ye; Shuai Li; Yu Zhao; Xiatian Zhu; |
| 496 | Efficient Hash Code Expansion By Recycling Old Bits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an interesting deep hashing method from a brand new perspective, called Code Expansion oriented Deep Hashing (CEDH). |
Dayan Wu; Qinghang Su; Bo Li; Weiping Wang; |
| 497 | PC-Dance: Posture-controllable Music-driven Dance Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a powerful framework named PC-Dance to perform adaptive posture-controllable music-driven dance synthesis. |
Jibin Gao; Junfu Pu; Honglun Zhang; Ying Shan; Wei-Shi Zheng; |
| 498 | NeRF-SR: High Quality Neural Radiance Fields Using Supersampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present NeRF-SR, a solution for high-resolution (HR) novel view synthesis with mostly low-resolution (LR) inputs. |
Chen Wang; Xian Wu; Yuan-Chen Guo; Song-Hai Zhang; Yu-Wing Tai; Shi-Min Hu; |
| 499 | CAliC: Accurate and Efficient Image-Text Retrieval Via Contrastive Alignment and Visual Contexts Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The unimodal encoders are trained on pure visual data, so the visual features extracted by them are difficult to align with the textual features and it is also difficult for the multi-modal encoder to understand visual information. Under these circumstances, we propose an accurate and efficient two-stage image-text retrieval model via Contrastive Alignment and visual Contexts modeling(CAliC). |
Hongyu Gao; Chao Zhu; Mengyin Liu; Weibo Gu; Hongfa Wang; Wei Liu; Xu-cheng Yin; |
| 500 | Making The Best of Both Worlds: A Domain-Oriented Transformer for Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Second, the source-supervised classifier is inevitably biased to source data, thus it may underperform in target domain. To alleviate these issues, we propose to simultaneously conduct feature alignment in two individual spaces focusing on different domains, and create for each space a domain-oriented classifier tailored specifically for that domain. |
Wenxuan Ma; Jinming Zhang; Shuang Li; Chi Harold Liu; Yulin Wang; Wei Li; |
This table only includes 500 papers selected based on our selection algorithm. To continue with the full list, please visit Paper Digest: MM-2022 (Full List).