Paper Digest: AAAI 2025 Papers & Highlights
Note: AAAI-2025 accepts more than 3,300 papers, this page only includes 500 of them based on paper id in proceedings. Interested users can choose to read All ~3,300 AAAI-2025 papers in a separate page, which takes quite some time to load.
To search for papers presented at AAAI-2025 on a specific topic, please make use of the search by venue (AAAI-2025) service. To summarize the latest research published at AAAI-2025 on a specific topic, you can utilize the review by venue (AAAI-2025) service. If you are interested in browsing papers by author, we have a comprehensive list of ~ 12,000 authors (AAAI-2025). Additionally, you may want to explore our “Best Paper” Digest (AAAI), which lists the most influential AAAI papers since 1982.
We’ve developed a service – AAAI-2025 Research Report that synthesizes the latest findings from AAAI 2025 into comprehensive reports. For instance, we’ve generated a report on Advances in Game Theory: Insights from AAAI 2025 Papers. We encourage interested users to utilize our service to create tailored reports on other emerging topics.
This curated list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that gets you the personalized and comprehensive updates on the latest research in your field. It also empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Experience the full potential of our services today!
TABLE 1: Paper Digest: AAAI 2025 Papers & Highlights
Paper | Author(s) | |
---|---|---|
1 | Learning Language Structures Through Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by human language learning, in this presentation, I will introduce a family of machine learning tasks that learns language structures through grounding, where distant supervision from other data sources (i.e., grounds), including but not limited to different modalities (e.g., vision), execution results of programs, and other languages, are used to guide the learning of language structures. |
Freda Shi; |
2 | Cognitive Bias and Reassignment: Who Can Contribute High Quality LLM Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper examined the cognitive bias between users’ self-assessment and actual abilities, where individuals tend to overestimate their capabilities in certain tasks, leading to a decreased willingness to continue contributing and a consequent waste of human resources. To address this issue, we propose a task reassignment method based on multi-task fine-tuning of small language models (SLMs) to better align user groups with appropriate task types. |
Yunfan Gao; Yun Xiong; Zhongyuan Hu; Yiming Zhang; Meng Wang; Haofen Wang; |
3 | RAZOR: Sharpening Knowledge By Cutting Bias with Unsupervised Text Rewriting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose RAZOR (Rewriting And Zero-bias Optimization Refinement), a novel, unsupervised, and data-focused debiasing approach based on text rewriting for shortcut mitigation. |
Shuo Yang; Bardh Prenkaj; Gjergji Kasneci; |
4 | Robust Multi-Objective Preference Alignment with Online DPO Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the Multi-Objective Online DPO (MO-ODPO) algorithm, designed to robustly and efficiently align model behaviors with multiple, potentially conflicting human preferences. |
Raghav Gupta; Ryan Sullivan; Yunxuan Li; Samrat Phatale; Abhinav Rastogi; |
5 | Combating Multimodal LLM Hallucination Via Bottom-Up Holistic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by human intuition in handling hallucinations, this paper introduces a novel bottom-up reasoning framework. |
Shengqiong Wu; Hao Fei; Liangming Pan; William Yang Wang; Shuicheng Yan; Tat-Seng Chua; |
6 | One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This limitation poses challenges in practical applications, especially when LLMs are already deployed, as parameter adjustments may affect their original functionality. To address this, we propose a novel method that involves learning scalable and pluggable virtual tokens for RAG. |
Yutao Zhu; Zhaoheng Huang; Zhicheng Dou; Ji-Rong Wen; |
7 | Uncertainty-aware Knowledge Tracing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous research commonly adopts deterministic representation to capture students’ knowledge states, which neglects the uncertainty during student interactions and thus fails to model the true knowledge state in learning process. In light of this, we propose an Uncertainty-Aware Knowledge Tracing model (UKT) which employs stochastic distribution embeddings to represent the uncertainty in student interactions, with a Wasserstein self-attention mechanism designed to capture the transition of state distribution in student learning behaviors. |
Weihua Cheng; Hanwen Du; Chunxiao Li; Ersheng Ni; Liangdi Tan; Tianqi Xu; Yongxin Ni; |
8 | Persuasion for Social Good: How to Build and Break AI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As AI gets more involved in our daily life, it becomes critical to study how they can persuade humans and how persuasive they are. In this talk, I will cover (1) how to build such persuasive AI systems that can persuade, negotiate, and cooperate with other humans in the game of Diplomacy. |
Weiyan Shi; |
9 | Multi-Objective Evolution of Heuristic Using Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing research focuses on the optimal performance on the target problem as the sole objective, neglecting other criteria such as efficiency and scalability, which are vital in practice. To tackle this challenge, we propose to model the heuristic search as a multi-objective optimization problem and consider introducing additional practical criteria beyond optimal performance. |
Shunyu Yao; Fei Liu; Xi Lin; Zhichao Lu; Zhenkun Wang; Qingfu Zhang; |
10 | Toward Verifiable Instruction-Following Alignment for Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite recent advancements in Large Language Models (LLMs), research on assessing and improving instruction-following (IF) alignment within the RAG domain remains limited. To address this issue, we propose VIF-RAG, an automated, scalable, and verifiable synthetic pipeline for instruction-following alignment in RAG systems. |
Guanting Dong; Xiaoshuai Song; Yutao Zhu; Runqi Qiao; Zhicheng Dou; Ji-Rong Wen; |
11 | Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models have shown great potential in complex reasoning tasks, yet their performance is often hampered by the scarcity of high-quality and reasoning-focused training datasets. Addressing this challenge, we propose Key-PointDriven Data Synthesis (KPDDS), a novel data synthesis framework that synthesizes question-answer pairs by leveraging key points and exemplar practices from authentic data sources. |
Yiming Huang; Xiao Liu; Yeyun Gong; Zhibin Gou; Yelong Shen; Nan Duan; Weizhu Chen; |
12 | Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we initiate the study of hybrid decentralized optimization, studying settings where nodes with zeroth-order and first-order optimization capabilities co-exist in a distributed system, and attempt to jointly solve an optimization task over some data distribution. |
Shayan Talaei; Matin Ansaripour; Giorgi Nadiradze; Dan Alistarh; |
13 | Unleashing The Potential of Large Language Models As Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel perspective to investigate the design of LLM-based prompt optimizers, by drawing an analogy with gradient-based model optimizers. |
Xinyu Tang; Xiaolei Wang; Wayne Xin Zhao; Siyuan Lu; Yaliang Li; Ji-Rong Wen; |
14 | Exploring Activation Patterns of Parameters in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To explain the internal representations of LLMs, we utilize a gradient-based metric to assess the activation level of model parameters. |
Yudong Wang; Damai Dai; Zhe Yang; Jingyuan Ma; Zhifang Sui; |
15 | DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce DiffExp, a simple yet effective exploration strategy for reward fine-tuning of text-to-image models. |
Daewon Chae; June Suk Choi; Jinkyu Kim; Kimin Lee; |
16 | End-to-End Autonomous Driving Through V2X Cooperation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce UniV2X, a pioneering cooperative autonomous driving framework that seamlessly integrates all key driving modules across diverse views into a unified network. |
Haibao Yu; Wenxian Yang; Jiaru Zhong; Zhenwei Yang; Siqi Fan; Ping Luo; Zaiqing Nie; |
17 | UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, existing urban benchmarks have been limited to evaluating LMMs with basic region-level urban tasks under singular views, leading to incomplete evaluations of LMMs’ abilities in urban environments. To address these issues, we present UrBench, a comprehensive benchmark designed for evaluating LMMs in complex multi-view urban scenarios. |
Baichuan Zhou; Haote Yang; Dairong Chen; Junyan Ye; Tianyi Bai; Jinhua Yu; Songyang Zhang; Dahua Lin; Conghui He; Weijia Li; |
18 | Learning to Prompt with Text Only Supervision for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to combine the strengths of both approaches by learning prompts using only text data derived from LLMs. |
Muhammad Uzair Khattak; Muhammad Ferjad Naeem; Muzammal Naseer; Luc Van Gool; Federico Tombari; |
19 | SRDC: Semantics-based Ransomware Detection and Classification with LLM-assisted Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Semantics-based Ransomware Detection and family Classification (SRDC) framework that can utilize both internal and external semantics of software. |
Ce Zhou; Yilun Liu; Weibin Meng; Shimin Tao; Weinan Tian; Feiyu Yao; Xiaochun Li; Tao Han; Boxing Chen; Hao Yang; |
20 | Transforming Healthcare Decision Making Using Artificial Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: My research uses AI to augment and improve decision-making in healthcare, following a synergistic approach that combines novel AI methods with practical, real-world implementation. |
Shengpu Tang; |
21 | Large Images Are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In our work, we present Large Images are Gaussians (LIG), which delves deeper into the application of 2DGS for image representations, addressing the challenge of fitting large images with 2DGS in the situation of numerous Gaussian points, through two distinct modifications: 1) we adopt a variant of representation and optimization strategy, facilitating the fitting of a large number of Gaussian points; 2) we propose a Level-of-Gaussian approach for reconstructing both coarse low-frequency initialization and fine high-frequency details. |
Lingting Zhu; Guying Lin; Jinnan Chen; Xinjie Zhang; Zhenchao Jin; Zhao Wang; Lequan Yu; |
22 | GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, MoSt-DSA’s focus on real-time performance leads to insufficient suppression of high-frequency noise and incomplete filtering of low-frequency noise in the generated images. To address these issues within the same computational time scale, we propose GaraMoSt. |
Ziyang Xu; Huangxuan Zhao; Wenyu Liu; Xinggang Wang; |
23 | Point Cloud Mamba: Point Cloud Learning Via State Space Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To enable Mamba to process 3-D point cloud data more effectively, we propose a novel Consistent Traverse Serialization method to convert point clouds into 1-D point sequences while ensuring that neighboring points in the sequence are also spatially adjacent. |
Tao Zhang; Haobo Yuan; Lu Qi; Jiangning Zhang; Qianyu Zhou; Shunping Ji; Shuicheng Yan; Xiangtai Li; |
24 | Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering Via Sparse Time-Variant Attribute Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we introduce Efficient Dynamic Gaussian Splatting (EDGS), which represents dynamic scenes via sparse time-variant attribute modeling. |
Hanyang Kong; Xingyi Yang; Xinchao Wang; |
25 | Calibrating Large Language Models with Sample Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we derive model confidence from the distribution of multiple randomly sampled generations, using three measures of consistency. |
Qing Lyu; Kumar Shridhar; Chaitanya Malaviya; Li Zhang; Yanai Elazar; Niket Tandon; Marianna Apidianaki; Mrinmaya Sachan; Chris Callison-Burch; |
26 | Do Transformer Interpretability Methods Transfer to RNNs? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent advances in recurrent neural network architectures, such as Mamba and RWKV, have enabled RNNs to match or exceed the performance of equal-size transformers in terms of language modeling perplexity and downstream evaluations, suggesting that future systems may be built on completely new architectures. In this paper, we examine if selected interpretability methods originally designed for transformer language models will transfer to these up-and-coming recurrent architectures. |
Gonçalo Paulo; Thomas Marshall; Nora Belrose; |
27 | Conformal Prediction for Partial Label Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This leads to a large extent of uncertainty of PLL models’ prediction, and it becomes unreliable to trust a PLL model’s performance only by its prediction accuracy. To bridge this gap, we develop a new framework to quantify the uncertainty for PLL models with valid confidence guarantee, which is named as Conformal Prediction for Partial Label Learning (CP-PLL). |
Xiuwen Gong; Nitin Bisht; Guandong Xu; |
28 | Deconfound Semantic Shift and Incompleteness in Incremental Few-shot Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a causal framework to discuss the cause of semantic shift and incompleteness in IFSS, and we deconfound the revealed causal effects from two aspects. |
Yirui Wu; Yuhang Xia; Hao Li; Lixin Yuan; Junyang Chen; Jun Liu; Tong Lu; Shaohua Wan; |
29 | CodeHalu: Investigating Code Hallucinations in LLMs Via Execution-based Verification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This phenomenon of hallucinations in the code domain has not been systematically explored. To advance the community’s understanding and research on this issue, we introduce the concept of code hallucinations and propose a classification method for code hallucination based on execution verification. |
Yuchen Tian; Weixiang Yan; Qian Yang; Xuandong Zhao; Qian Chen; Wen Wang; Ziyang Luo; Lei Ma; Dawn Song; |
30 | MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the MM-CamObj dataset, we propose the CamObj-Llava, an LVLM specifically designed for addressing tasks in camouflaged scenes. |
Jiacheng Ruan; Wenzhen Yuan; Zehao Lin; Ning Liao; Zhiyu Li; Feiyu Xiong; Ting Liu; Yuzhuo Fu; |
31 | TTE: Two Tokens Are Enough to Improve Parameter-Efficient Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, we introduce a new framework named TTE (Two Tokens are Enough), which effectively alleviates overfitting in PET through a novel constraint function based on the learnable tokens. |
Jiacheng Ruan; Mingye Xie; Jingsheng Gao; Xian Gao; Suncheng Xiang; Ting Liu; Yuzhuo Fu; |
32 | UniMuMo: Unified Text, Music, and Motion Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce UniMuMo, a unified multimodal model capable of taking arbitrary text, music, and motion data as input conditions to generate outputs across all three modalities. |
Han Yang; Kun Su; Yutong Zhang; Jiaben Chen; Kaizhi Qian; Gaowen Liu; Chuang Gan; |
33 | 3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a new large-scale anomaly detection dataset called 3CAD, which is derived from real 3C production lines. |
Enquan Yang; Peng Xing; Hanyang Sun; Wenbo Guo; Yuanwei Ma; Zechao Li; Dan Zeng; |
34 | Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces SMART, a novel multi-agent framework that leverages external knowledge to enhance the interpretability and factual consistency of LLM-generated responses. |
Shengbin Yue; Siyuan Wang; Wei Chen; Xuanjing Huang; Zhongyu Wei; |
35 | Differentially Private Prototypes for Imbalanced Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite recent improvements, DP-SGD-based approaches for private learning still usually struggle in the high privacy (ε≤1) and low data regimes, and when the private training datasets are imbalanced. To overcome these limitations, we propose Differentially Private Prototype Learning (DPPL) as a new paradigm for private transfer learning. |
Dariush Wahdany; Matthew Jagielski; Adam Dziedzic; Franziska Boenisch; |
36 | ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, their advantage in terms of actual runtime speed is not significant. To address this issue, we introduce Gated Linear Attention (GLA) for vision, leveraging its superior hardware-awareness and efficiency. |
Bencheng Liao; Xinggang Wang; Lianghui Zhu; Qian Zhang; Chang Huang; |
37 | RouterRetriever: Routing Over A Mixture of Expert Embedding Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce RouterRetriever, a retrieval model that leverages a mixture of domain-specific experts by using a routing mechanism to select the most appropriate expert for each query. |
Hyunji Lee; Luca Soldaini; Arman Cohan; Minjoon Seo; Kyle Lo; |
38 | TGLsta: Low-resource Textual Graph Learning with Semantic and Topological Awareness Via LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The conventional methods often neglect the fusion of semantic and topological information, resulting in suboptimal model learning. To overcome these challenges, we proposed a novel method of low-resource textual graph node classification based on large language models, i.e., Textual graph learning with semantic and topological awareness (TGLsta), which comprehensively explores the semantic information, near neighborhood information, and the topology information in textual graphs, where these components are the most important information source contained in textual graphs. |
Qin Zhang; Xiaowei Li; Ziqi Liu; Xiaochen Fan; Xiaojun Chen; Shirui Pan; |
39 | M²RL-Net: Multi-View and Multi-Level Relation Learning Network for Weakly-Supervised Image Forgery Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing weakly supervised image forgery detection (W-IFD) methods often rely on convolutional neural networks (CNNs) and limited exploration of internal relationships, leading to poor detection and localization performance with only image-level labels. To address these limitations, we introduce a novel Multi-View and Multi-Level Relation Learning Network (M²RL-Net) for W-IFD. |
Jiafeng Li; Ying Wen; Lianghua He; |
40 | OTIAS: OcTree Implicit Adaptive Sampling for Multispectral and Hyperspectral Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the previous INR methods neglect channel-wise modeling, while sharing a single kernel across all channels at each position, resulting in a lack of sensitivity to data specificity. To address these issues, we propose the OcTree Implicit Adaptive Sampling (OTIAS) method, which innovatively applies the octree structure to restore data from both horizontal and vertical directions, effectively incorporating spatial and spectral information from hyperspectral data. |
Shangqi Deng; Jun Ma; Liang-Jian Deng; Ping Wei; |
41 | Audio Entailment: Assessing Deductive Reasoning for Audio Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the novel task of Audio Entailment to evaluate an ALM’s deductive reasoning ability. |
Soham Deshmukh; Shuo Han; Hazim Bukhari; Benjamin Elizalde; Hannes Gamper; Rita Singh; Bhiksha Raj; |
42 | A Label-free Heterophily-guided Approach for Unsupervised Graph Fraud Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite their promising performance, their label reliance limits its application in unsupervised scenarios; Additionally, accurately capturing complex and diverse heterophily patterns without labels poses a further challenge. Therefore, we propose a Heterophily-guided Unsupervised Graph fraud dEtection approach (HUGE) for unsupervised GFD, which contains two essential components: a heterophily estimation module and an alignment-based fraud detection module. |
Junjun Pan; Yixin Liu; Xin Zheng; Yizhen Zheng; Alan Wee-Chung Liew; Fuyi Li; Shirui Pan; |
43 | Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Currently, there is a lack of quantitative methods to evaluate the performance of LLMs in these low-resource languages. To address this gap, we propose the Language Ranker, an intrinsic metric designed to benchmark and rank languages based on LLM performance using internal representations. |
Zihao Li; Yucheng Shi; Zirui Liu; Fan Yang; Ali Payani; Ninghao Liu; Mengnan Du; |
44 | C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, two critical issues remain unresolved: 1) understanding why CLIP features are effective on deepfake detection through a linear classifier; and 2) exploring the detection potential of CLIP. In this study, we delve into the underlying mechanisms of CLIP’s detection capabilities by decoding its detection features into text and performing word frequency analysis. |
Chuangchuang Tan; Renshuai Tao; Huan Liu; Guanghua Gu; Baoyuan Wu; Yao Zhao; Yunchao Wei; |
45 | EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we observe that both direct and pivot translations are noisy and achieve less satisfactory performance. We propose EBBS, an ensemble method with a novel bi-level beam search algorithm, where each ensemble component explores its own prediction step by step at the lower level but all components are synchronized by a "soft voting" mechanism at the upper level. |
Yuqiao Wen; Behzad Shayegh; Chenyang Huang; Yanshuai Cao; Lili Mou; |
46 | DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, methods that decrease the cosine similarity from historical embeddings with semantic diversity rewards lead to novelty stagnation as history grows. To address these issues, we introduce DiveR-CT, which relaxes conventional constraints on the objective and semantic reward, granting greater freedom for the policy to enhance diversity. |
Andrew Zhao; Quentin Xu; Matthieu Lin; Shenzhi Wang; Yong-Jin Liu; Zilong Zheng; Gao Huang; |
47 | Cost-Aware Near-Optimal Policy Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce two resource-aware algorithms for the contextual bandit setting and prove their soundness. |
Joy He-Yueya; Jonathan Lee; Matthew Jörke; Emma Brunskill; |
48 | VarDrop: Enhancing Training Efficiency By Reducing Variate Redundancy in Periodic Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, employing self-attention with variate tokens incurs a quadratic computational cost with respect to the number of variates, thus limiting its training efficiency for large-scale applications. To address this issue, we propose VarDrop, a simple yet efficient strategy that reduces the token usage by omitting redundant variate tokens during training. |
Junhyeok Kang; Yooju Shin; Jae-Gil Lee; |
49 | Learning Disentangled Equivariant Representation for Explicitly Controllable 3D Molecule Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the conditional generation of 3D drug-like molecules with explicit control over molecular properties such as drug-like properties (e.g., Quantitative Estimate of Druglikeness or Synthetic Accessibility score) and effectively binding to specific protein sites. To tackle this problem, we propose an E(3)-equivariant Wasserstein autoencoder and factorize the latent space of our generative model into two disentangled aspects: molecular properties and the remaining structural context of 3D molecules. |
Haoran Liu; Youzhi Luo; Tianxiao Li; James Caverlee; Martin Renqiang Min; |
50 | Adversarial Contrastive Graph Masked AutoEncoder Against Graph Structure and Feature Dual Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The joint defense on graph structure and feature dual attacks remains challenging yet less studied. To fulfill this gap, we propose Adversarial Contrastive Graph Masked AutoEncoder (ACGMAE) to defend against graph structure and feature dual attacks. |
Weixuan Shen; Xiaobo Shen; Shirui Pan; |
51 | LiDAR-LLM: Exploring The Potential of Large Language Models for 3D LiDAR Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce LiDAR-LLM, which takes raw LiDAR data as input and harnesses the remarkable reasoning capabilities of LLMs to gain a comprehensive understanding of outdoor 3D scenes. |
Senqiao Yang; Jiaming Liu; Renrui Zhang; Mingjie Pan; Ziyu Guo; Xiaoqi Li; Zehui Chen; Peng Gao; Hongsheng Li; Yandong Guo; Shanghang Zhang; |
52 | Contrastive Functional Principal Component Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This limitation becomes critical in scenarios where the foreground dataset, such as a specific treatment group in biomedical applications, contains unique patterns or trends that are not as pronounced in the background dataset. Addressing this gap, we propose Contrastive Functional Principal Component Analysis (CFPCA), a method designed to spotlight low-dimensional structures unique to or enriched in the foreground dataset relative to the background counterpart. |
Eric Zhang; Didong Li; |
53 | Fine-Tuning Language Models with Collaborative and Semantic Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent advancements in large language models (LLMs) have broadened their application scope but revealed challenges in balancing capabilities across general knowledge, coding, and mathematics. To address this, we introduce a Collaborative and Semantic Experts (CoE) approach for supervised fine-tuning (SFT), which employs a two-phase training strategy. |
Jiaxi Yang; Binyuan Hui; Min Yang; Jian Yang; Lei Zhang; Qiang Qu; Junyang Lin; |
54 | SIGMA: Selective Gated Mamba for Sequential Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Its unidirectional structure may impede the ability to capture contextual information in user-item interactions, while its instability in state estimation may hinder the ability to capture short-term patterns in interaction sequences. To address these issues, we propose a novel framework called Selective Gated Mamba for Sequential Recommendation (SIGMA). |
Ziwei Liu; Qidong Liu; Yejing Wang; Wanyu Wang; Pengyue Jia; Maolin Wang; Zitao Liu; Yi Chang; Xiangyu Zhao; |
55 | Alleviating Shifted Distribution in Human Preference Alignment Through Meta-Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These two issues can be united as a challenge posed by the shifted distribution of the environment. To surmount this challenge, we introduce MetaRM, a novel method leveraging meta-learning to adapt the RM to the shifted environment distribution. |
Shihan Dou; Yan Liu; Enyu Zhou; Songyang Gao; Tianlong Li; Limao Xiong; Xin Zhao; Haoxiang Jia; Junjie Ye; Rui Zheng; Tao Gui; Qi Zhang; Xuanjing Huang; |
56 | WEPO: Web Element Preference Optimization for LLM-based Web Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel approach to LLM-based web navigation tasks, called Web Element Preference Optimization (WEPO). |
Jiarun Liu; Jia Hao; Chunhong Zhang; Zheng Hu; |
57 | AnalogCoder: Analog Circuit Design Via Training-Free Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite advances made by Large Language Models (LLMs) in digital circuit design, the complexity and scarcity of data in analog circuitry pose significant challenges. To mitigate these issues, we introduce AnalogCoder, the first training-free LLM agent for designing analog circuits through Python code generation. |
Yao Lai; Sungyoung Lee; Guojin Chen; Souradip Poddar; Mengkang Hu; David Z. Pan; Ping Luo; |
58 | Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this method still requires numerous function evaluations in the sampling process. To address these limitations, we introduce a self-corrected flow distillation method that effectively integrates consistency models and adversarial training within the flow-matching framework. |
Quan Dao; Hao Phung; Trung Tuan Dao; Dimitris N. Metaxas; Anh Tran; |
59 | Motif Guided Graph Transformers with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a generic Motif guided graph transformer with Combinatorial skeleton prototype learning (MoCos) that exploits structure-specific and gait-related body relations as well as combinatorial features of skeleton graphs to learn effective skeleton representations for person re-ID. |
Haocong Rao; Chunyan Miao; |
60 | XCOT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gap among different languages, we propose a cross-lingual instruction fine-tuning framework (xCoT) to transfer knowledge from high-resource languages to low-resource languages. |
Linzheng Chai; Jian Yang; Tao Sun; Hongcheng Guo; Jiaheng Liu; Bing Wang; Xinnian Liang; Jiaqi Bai; Tongliang Li; Qiyao Peng; Zhoujun Li; |
61 | Affordances-Oriented Planning Using Foundation Models for Continuous Vision-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing LLM-based methods often focus only on solving high-level task planning by selecting nodes in predefined navigation graphs for movements, overlooking low-level control in navigation scenarios. To bridge this gap, we propose AO-Planner, a novel Affordances-Oriented Planner for continuous VLN task. |
Jiaqi Chen; Bingqian Lin; Xinmin Liu; Lin Ma; Xiaodan Liang; Kwan-Yee K. Wong; |
62 | CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce CoCoCo, a text-guided video inpainting diffusion framework. |
Bojia Zi; Shihao Zhao; Xianbiao Qi; Jianan Wang; Yukai Shi; Qianyu Chen; Bin Liang; Rong Xiao; Kam-Fai Wong; Lei Zhang; |
63 | Aligning Instance Brownian Bridge with Texts for Open-Vocabulary Video Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, the separation breaks the instance movement context of videos and requires a lot of inference overhead. To tackle these issues, we propose BridgeText Alignment (BTA) to link frame-level instance representations as a Brownian Bridge. |
Zesen Cheng; Kehan Li; Li Hao; Peng Jin; Xiawu Zheng; Chang Liu; Jie Chen; |
64 | Taylor Series-Inspired Local Structure Fitting Network for Few-shot Point Cloud Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, inspired by Taylor series, we treat the local structure representation of irregular point clouds as a polynomial fitting problem and propose a novel local structure fitting convolution, called TaylorConv. |
Changshuo Wang; Shuting He; Xiang Fang; Meiqing Wu; Siew-Kei Lam; Prayag Tiwari; |
65 | Training Matting Models Without Alpha Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present that the cooperation between learned semantics from indicated known regions and proper assumed matting rules can help infer alpha values at transition areas. |
Wenze Liu; Zixuan Ye; Hao Lu; Zhiguo Cao; Xiangyu Yue; |
66 | Learning from Noisy Labels Via Self-Taught On-the-Fly Meta Loss Rescaling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose unsupervised on-the-fly meta loss rescaling to reweight training samples. |
Michael Heck; Christian Geishauser; Nurul Lubis; Carel van Niekerk; Shutong Feng; Hsien-Chin Lin; Benjamin Matthias Ruppik; Renato Vukovic; Milica Gasic; |
67 | Channel Merging: Preserving Specialization for Merged Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, both approaches have limitations, particularly in terms of performance and storage efficiency when merged experts increase. To address these challenges, we introduce Channel Merging, a novel strategy designed to minimize parameter conflicts while enhancing storage efficiency. |
Mingyang Zhang; Jing Liu; Ganggui Ding; Linlin Ou; Xinyi Yu; Bohan Zhuang; |
68 | Language Prompt for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the progress of using language prompts in driving scenarios is stuck in a bottleneck due to the scarcity of paired prompt-instance data. To address this challenge, we propose the first object-centric language prompt set for driving scenes within 3D, multi-view, and multi-frame space, named NuPrompt. |
Dongming Wu; Wencheng Han; Yingfei Liu; Tiancai Wang; Cheng-Zhong Xu; Xiangyu Zhang; Jianbing Shen; |
69 | GFlow: Recovering 4D World from Monocular Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conventional methods usually rely on the assumptions of multi-view videos, known camera parameters, or static scenes. In this paper, we relax all these constraints and tackle a highly ambitious but practical task: With only one monocular video without camera parameters, we aim to recover the dynamic 3D world alongside the camera poses. |
Shizun Wang; Xingyi Yang; Qiuhong Shen; Zhenxiang Jiang; Xinchao Wang; |
70 | ZeroHAR: Sensor Context Augments Zero-Shot Wearable Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose ZeroHAR that enhances ZSL by not just focusing on activity labels, but by augmenting motion data with sensor context features. |
Ranak Roy Chowdhury; Ritvik Kapila; Ameya Panse; Xiyuan Zhang; Diyan Teng; Rashmi Kulkarni; Dezhi Hong; Rajesh K. Gupta; Jingbo Shang; |
71 | EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recently, state space models (SSMs), such as Mamba, have shown outstanding performance and competitiveness in various tasks such as language modeling and computer vision, while reducing the time complexity of global information extraction to O(N). Inspired by this, this work proposes to explore the potential of visual state space models in light-weight model design and introduce a novel efficient model variant dubbed EfficientVMamba. |
Xiaohuan Pei; Tao Huang; Chang Xu; |
72 | Hierarchical Cross-Modal Alignment for Open-Vocabulary 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing works that leverage VLMs for 3D object detection (3DOD) generally resort to representations that lose the rich scene context required for 3D perception. To address this problem, we propose in this paper a hierarchical framework, named HCMA, to simultaneously learn local object and global scene information for OV-3DOD. |
Youjun Zhao; Jiaying Lin; Rynson W. H. Lau; |
73 | On The Power of Convolution-Augmented Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent architectural recipes, such as state-space models, have bridged the performance gap. Motivated by this, we examine the benefits of Convolution-Augmented Transformer (CAT) for recall, copying, and length generalization tasks. |
Mingchen Li; Xuechen Zhang; Yixiao Huang; Samet Oymak; |
74 | MEATRD: Multimodal Anomalous Tissue Region Detection Enhanced with Spatial Transcriptomics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: MEATRD is trained to reconstruct image patches and gene expression profiles of normal tissue spots (inliers) from their multimodal embeddings, followed by learning a one-class classification AD model based on latent multimodal reconstruction errors. |
Kaichen Xu; Qilong Wu; Yan Lu; Yinan Zheng; Wenlin Li; Xingjie Tang; Jun Wang; Xiaobo Sun; |
75 | Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through interdisciplinary efforts, we aim to leverage AI for next-generation psychometrics and psychometrics for value-aligned AI. |
Haoran Ye; Yuhang Xie; Yuanyi Ren; Hanjun Fang; Xin Zhang; Guojie Song; |
76 | ChatBug: A Common Vulnerability of Aligned LLMs Induced By Chat Templates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate how chat templates affect safety alignment of LLMs. |
Fengqing Jiang; Zhangchen Xu; Luyao Niu; Bill Yuchen Lin; Radha Poovendran; |
77 | GaussianPainter: Painting Point Cloud Into 3D Gaussians with Normal Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present GaussianPainter, the first method to paint a point cloud into 3D Gaussians given a reference image. |
Jingqiu Zhou; Lue Fan; Xuesong Chen; Linjiang Huang; Si Liu; Hongsheng Li; |
78 | CryoDomain: Sequence-free Protein Domain Identification from Low-resolution Cryo-EM Density Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we introduce CryoDomain, an innovative method for identifying protein domains — conserved constituent units of proteins — from low-resolution cryo-EM density maps without requiring prior knowledge of protein sequences. |
Muzhi Dai; Zhuoer Dong; Weining Fu; Kui Xu; Qiangfeng Cliff Zhang; |
79 | Teaching Models to Improve on Tape Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we claim that this skill of LLMs can be significantly enhanced via training. We introduce an RL framework for teaching models to use such rewards, by simulating interaction sessions, and rewarding the model according to its ability to satisfy the constraints. |
Liat Bezalel; Eyal Orgad; Amir Globerson; |
80 | Towards A Theory of AI Personhood Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we outline necessary conditions for AI personhood, focusing on agency, theory-of-mind, and self-awareness. |
Francis Rhys Ward; |
81 | Debate Helps Weak-to-Strong Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Scalable oversight and weak-to-strong generalization are two complementary approaches to tackle this issue. In this paper, we attempt to combine the strengths of these two approaches to further improve alignment. |
Hao Lang; Fei Huang; Yongbin Li; |
82 | LVPTrack: High Performance Domain Adaptive UAV Tracking with Label Aligned Visual Prompt Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the substantial progress, most of the existing UAV trackers are designed for well-conditioned daytime data, while for the scenarios in challenging weather condition, e.g. foggy or nighttime environment, the tremendous domain gap leads to significant performance degradation. To address this issue, in this paper, we propose a novel robust UAV tracker termed LVPTrack, which conducts high quality label-aligned visual prompt tuning to adapt to various challenging weather conditions. |
Hongjing Wu; Siyuan Yao; Feng Huang; Shu Wang; Linchao Zhang; Zhuoran Zheng; Wenqi Ren; |
83 | Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, upon analyzing existing prompt-based approaches for CRE, we identified several critical limitations, such as inaccurate prompt selection, inadequate mechanisms for mitigating forgetting in shared parameters, and suboptimal handling of cross-task and within-task variances. To overcome these challenges, we draw inspiration from the relationship between prefix tuning and mixture of experts, proposing a novel approach that employs a prompt pool for each task, capturing variations within each task while enhancing cross-task variances. |
Minh Le; Tien Ngoc Luu; An Nguyen The; Thanh-Thien Le; Trang Nguyen; Tung Thanh Nguyen; Linh Ngo Van; Thien Huu Nguyen; |
84 | Few-Shot, No Problem: Descriptive Continual Relation Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel retrieval-based solution, starting with a large language model to generate descriptions for each relation. |
Nguyen Xuan Thanh; Anh Duc Le; Quyen Tran; Thanh-Thien Le; Linh Ngo Van; Thien Huu Nguyen; |
85 | Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we find that the CLIP model possesses a rich set of features, encompassing both desired invariant causal features and undesired decision shortcuts. |
Huan Ma; Yan Zhu; Changqing Zhang; Peilin Zhao; Baoyuan Wu; Long-Kai Huang; Qinghua Hu; Bingzhe Wu; |
86 | Representation-driven Option Discovery in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this talk, I will introduce a general framework for option discovery, which uses the agent’s representation to discover useful options. |
Marlos C. Machado; |
87 | ERF: A Benchmark Dataset for Robust Semantic Segmentation Under Extreme Rainfall Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the Extreme RainFall (ERF) dataset for semantic segmentation in both image and video tasks under violent rain conditions. |
Xin Yang; Xin Zhang; Xinchao Wang; |
88 | TG-LLaVA: Text Guided LLaVA Via Learnable Latent Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we propose Text Guided LLaVA (TG-LLaVA) in this paper, which optimizes VLMs by guiding the vision encoder with text, offering a new and orthogonal optimization direction. |
Dawei Yan; Pengcheng Li; Yang Li; Hao Chen; Qingguo Chen; Weihua Luo; Wei Dong; Qingsen Yan; Haokui Zhang; Chunhua Shen; |
89 | CharacterBench: Benchmarking Character Customization of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, the sparsity of character features in responses makes feature-focused generative evaluation both ineffective and inefficient. To address these issues, we propose CharacterBench, the largest bilingual generative benchmark, with 22,859 human-annotated samples covering 3,956 characters from 25 detailed character categories. |
Jinfeng Zhou; Yongkang Huang; Bosi Wen; Guanqun Bi; Yuxuan Chen; Pei Ke; Zhuang Chen; Xiyao Xiao; Libiao Peng; Kuntian Tang; Rongsheng Zhang; Le Zhang; Tangjie Lv; Zhipeng Hu; Hongning Wang; Minlie Huang; |
90 | Medical Multimodal Model Stealing Attacks Via Adversarial Domain Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Adversarial Domain Alignment (ADA-Steal), the first stealing attack against medical MLLMs. |
Yaling Shen; Zhixiong Zhuang; Kun Yuan; Maria-Irina Nicolae; Nassir Navab; Nicolas Padoy; Mario Fritz; |
91 | Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this talk, I will review the current progress of DRL in real-world robotic applications based on our recent survey paper (with Tang, Abbatematteo, Hu, Chandra, and Martı́n-Martı́n), with a particular focus on evaluating the real-world successes achieved with DRL in realizing several key robotic competencies, including locomotion, navigation, stationary manipulation, mobile manipulation, human-robot interaction, and multi-robot interaction. |
Chen Tang; Ben Abbatematteo; Jiaheng Hu; Rohan Chandra; Roberto Martín-Martín; Peter Stone; |
92 | DivGCL: A Graph Contrastive Learning Model for Diverse Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In fact, there exists a challenging dilemma in balancing accuracy and diversity. To address these issues, we propose a new Graph Contrastive Learning (DivGCL) model for diversifying recommendations. |
Wenwen Gong; Yangliao Geng; Dan Zhang; Yifan Zhu; Xiaolong Xu; Haolong Xiang; Amin Beheshti; Xuyun Zhang; Lianyong Qi; |
93 | Structure-Adaptive Multi-View Graph Clustering for Remote Sensing Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although recent advances in graph neural network (GNN)-based MVC have shown remarkable success, the most prevalent approaches have two major limitations: 1) heavily relying on a predefined yet fixed graph, which limits the performance of clustering because the large number of indistinguishable background samples contained in remote sensing data would introduce noise information and increase structure heterogeneity; 2) ignoring the effect of confusing samples on cluster structure compactness, which leads to fluffy cluster structure and decrease feature discriminability. To address these issues, we propose a Structure-Adaptive Multi-View Graph Clustering method named SAMVGC on remote sensing data which boosts the structure homogeneity and cluster compactness by adaptively learning the graph and cluster structures, respectively. |
Renxiang Guan; Wenxuan Tu; Siwei Wang; Jiyuan Liu; Dayu Hu; Chang Tang; Yu Feng; Junhong Li; Baili Xiao; Xinwang Liu; |
94 | Speech Watermarking with Discrete Intermediate Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DiscreteWM, a novel speech watermarking framework that injects watermarks into the discrete intermediate representations of speech. |
Shengpeng Ji; Ziyue Jiang; Jialong Zuo; Minghui Fang; Yifu Chen; Tao Jin; Zhou Zhao; |
95 | Improving Factuality in Large Language Models Via Decoding-Time Hallucinatory and Truthful Comparators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Comparator-driven Decoding-Time (CDT) framework to alleviate the response hallucination. |
Dingkang Yang; Dongling Xiao; Jinjie Wei; Mingcheng Li; Zhaoyu Chen; Ke Li; Lihua Zhang; |
96 | Data Attribution: A Data-Centric Approach for Trustworthy AI Development Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Data attribution is an emerging family of techniques aimed at quantifying the impact of individual training data points on a model trained on them, which has found data-centric applications such as instance-based explanation, unsafe training data detection, and copyright compensation. In this talk, I will comprehensively review our work contributing to the applications, methods, and open-source benchmarks of data attribution, and discuss open challenges in this field. |
Jiaqi Ma; |
97 | Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a novel Self-Supervised Tracking framework, named SSTrack, designed to eliminate the need of box annotations. |
Yaozong Zheng; Bineng Zhong; Qihua Liang; Ning Li; Shuxiang Song; |
98 | Differentiable Information Enhanced Model-Based Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Differentiable Information Enhanced MBRL method, MB-MIX, to address both challenges. |
Xiaoyuan Zhang; Xinyan Cai; Bo Liu; Weidong Huang; Song-Chun Zhu; Siyuan Qi; Yaodong Yang; |
99 | Scaling Diffusion Mamba with Bidirectional SSMs for Efficient 3D Shape Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This complexity becomes a significant hurdle when dealing with high-resolution voxel sizes. To address this challenge, we introduce a novel diffusion architecture tailored for 3D point clouds generation—Diffusion Mamba (DiM-3D). |
Shentong Mo; |
100 | The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present to integrate Collaborative Masking and Targets for boosting Masked AutoEncoders, namely CMT-MAE. |
Shentong Mo; |
101 | MDFG: Multi-Dimensional Fine-Grained Modeling for Fatigue Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these, this paper explores a refined representation of fatigue in terms of three dimensions: time, type, and level, and proposes a Multi-Dimensional Fine-Grained Modeling for Fatigue Detection (MDFG). |
Mei Wang; Xiaojie Zhu; Ruimin Hu; Dongliang Zhu; Liang Liao; Mang Ye; |
102 | FaceMe: Robust Blind Face Restoration with Personal Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a personalized face restoration method, FaceMe, based on a diffusion model. |
Siyu Liu; Zheng-Peng Duan; Jia OuYang; Jiayi Fu; Hyunhee Park; Zikun Liu; Chun-Le Guo; Chongyi Li; |
103 | Explore In-Context Segmentation Via Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work approaches the problem from a fresh perspective – unlocking the capability of the latent diffusion model (LDM) for in-context segmentation and investigating different design choices. |
Chaoyang Wang; Xiangtai Li; Henghui Ding; Lu Qi; Jiangning Zhang; Yunhai Tong; Chen Change Loy; Shuicheng Yan; |
104 | Label-Free Backdoor Attacks in Vertical Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate a new backdoor attack paradigm in VFL, Label-Free Backdoor Attacks (LFBA), which does not require any additional task label information and is feasible in VFL settings. |
Wei Shen; Wenke Huang; Guancheng Wan; Mang Ye; |
105 | CDTR: Semantic Alignment for Video Moment Retrieval Using Concept Decomposition Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing VMR methods that employ various strategies for cross-modal alignment still face challenges such as limited understanding of fine-grained semantics, semantic overlap, and sparse constraints. To address these limitations, we propose a novel Concept Decomposition Transformer (CDTR) model for VMR. |
Ran Ran; Jiwei Wei; Xiangyi Cai; Xiang Guan; Jie Zou; Yang Yang; Heng Tao Shen; |
106 | Towards Scientific Discovery with Generative AI: Progress, Opportunities, and Challenges Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper examines the current state of AI for scientific discovery, highlighting recent progress in large language models and other AI techniques applied to scientific tasks. |
Chandan K Reddy; Parshin Shojaee; |
107 | Adversarial Attacks on Event-Based Pedestrian Detectors: A Physical Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study is the first to explore physical adversarial attacks on event-driven pedestrian detectors, specifically investigating whether certain clothing patterns worn by pedestrians can cause these detectors to fail, effectively rendering them unable to detect the person. To address this, we developed an end-to-end adversarial framework in the digital domain, framing the design of adversarial clothing textures as a 2D texture optimization problem. |
Guixu Lin; Muyao Niu; Qingtian Zhu; Zhengwei Yin; Zhuoxiao Li; Shengfeng He; Yinqiang Zheng; |
108 | Leveraging Human Input to Enable Robust, Interactive, and Aligned AI Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Ensuring that AI systems do what we, as humans, actually want them to do, is one of the biggest open research challenges in AI alignment and safety. My research seeks to directly address this challenge by enabling AI systems to interact with humans to learn aligned and robust behaviors. |
Daniel S. Brown; |
109 | DreamUHD: Frequency Enhanced Variational Autoencoder for Ultra-High-Definition Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing ultra-high-definition (UHD) image restoration methods often struggle with consistency due to downsampling. We aim to address these challenges by leveraging the powerful latent space representation and reconstruction capabilities of Variational Autoencoders (VAE). |
Yidi Liu; Dong Li; Jie Xiao; Yuanfei Bao; Senyan Xu; Xueyang Fu; |
110 | CL-Attack: Textual Backdoor Attacks Via Cross-Lingual Triggers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, inspired by cross-lingual (CL) prompts of LLMs in real-world scenarios, we propose a higher-dimensional trigger method at the paragraph level, namely CL-Attack. |
Jingyi Zheng; Tianyi Hu; Tianshuo Cong; Xinlei He; |
111 | MVREC: A General Few-shot Defect Classification Model Using Multi-View Region-Context Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, defect classification heavily relies on contextual information within images, and existing methods fall short of effectively extracting this information. To address these challenges, we propose a general FSDMC framework called MVREC, which offers two primary advantages: (1) MVREC extracts general features for defect instances by incorporating the pre-trained AlphaCLIP model. |
Shuai Lyu; Rongchen Zhang; Zeqi Ma; Fangjian Liao; Dongmei Mo; Waikeung Wong; |
112 | RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, focusing solely on reward reduction can lead to suboptimal attack strategies, particularly in safety-critical scenarios where more precise behavior manipulation is needed. To address these challenges, we propose RAT, a method designed for universal, targeted behavior attacks. |
Fengshuo Bai; Runze Liu; Yali Du; Ying Wen; Yaodong Yang; |
113 | Dynamic Target Distribution Estimation for Source-Free Open-Set Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For this purpose, we introduce the novel Dynamic Target Distribution Estimation (DTDE) method, which effectively performs known classification and unknown separation through self-supervised learning with prototypes. |
Zhiqi Yu; Zhichao Liao; Jingjing Li; Zhi Chen; Lei Zhu; |
114 | Skill Disentanglement in Reproducing Kernel Hilbert Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of focusing only on maximizing bounds on f-divergence, we combine it with Integral Probability Metrics to maximize the distance between distributions to promote behavioural diversity and enforce disentanglement. |
Vedant Dave; Elmar Rueckert; |
115 | CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce a plug-and-play method named Credibility-aware Attention Modification (CrAM). |
Boyi Deng; Wenjie Wang; Fengbin Zhu; Qifan Wang; Fuli Feng; |
116 | Harnessing Large Language Models for Knowledge Graph Question Answering Via Adaptive Multi-Aspect Retrieval-Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In our study, we introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework. |
Derong Xu; Xinhang Li; Ziheng Zhang; Zhenxi Lin; Zhihong Zhu; Zhi Zheng; Xian Wu; Xiangyu Zhao; Tong Xu; Enhong Chen; |
117 | Yuan: Yielding Unblemished Aesthetics Through A Unified Network for Visual Imperfections Removal in Generated Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These imperfections pose significant challenges for practical applications. To overcome these limitations, we introduce Yuan, a novel framework that autonomously corrects visual imperfections in text-to-image synthesis. |
Zhenyu Yu; Chee Seng Chan; |
118 | NightReID: A Large-Scale Nighttime Person Re-Identification Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the cost and limited accessibility of infrared cameras, we investigate a critical question: Can RGB cameras be effectively utilized for accurate Re-ID during nighttime? To address this, we introduce NightReID, a large-scale RGB Re-ID dataset collected from a real-world nighttime surveillance system. |
Yuxuan Zhao; Weijian Ruan; He Li; Mang Ye; |
119 | Breaking Information Isolation: Accelerating MRI Via Inter-sequence Mapping and Progressive Masking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new unfolding solution, namely Information-coupled MRI Acceleration (IMA), to address the isolation issue. |
Jianwei Zheng; Xiaomin Yao; Guojiang Shen; Wei Li; Jiawei Jiang; |
120 | UniDet3D: Multi-dataset Indoor 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose UniDet3D, a simple yet effective 3D object detection model, which is trained on a mixture of indoor datasets and is capable to work in various indoor environments. |
Maksim Kolodiazhnyi; Anna Vorontsova; Matvey Skripkin; Danila Rukhovich; Anton Konushin; |
121 | Sparse Transfer Learning Accelerates and Enhances Certified Robustness: A Comprehensive Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce an innovative approach to expedite the verification process for L2-norm certified robustness through sparse transfer learning. |
Zhangheng Li; Tianlong Chen; Linyi Li; Bo Li; Zhangyang Wang; |
122 | PointDGMamba: Domain Generalization of Point Cloud Classification Via Generalized State Space Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel framework, PointDGMamba, that excels in strong generalizability toward unseen domains and has the advantages of global receptive fields and efficient linear complexity. |
Hao Yang; Qianyu Zhou; Haijia Sun; Xiangtai Li; Fengqi Liu; Xuequan Lu; Lizhuang Ma; Shuicheng Yan; |
123 | Language Model Can Listen While Speaking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel model design, namely listening-while-speaking language model (LSLM), an end-to-end system equipped with both listening and speaking channels. |
Ziyang Ma; Yakun Song; Chenpeng Du; Jian Cong; Zhuo Chen; Yuping Wang; Yuxuan Wang; Xie Chen; |
124 | Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on prompting one of the most important tasks in the field of speech processing, i.e., automatic speech recognition (ASR), with speech foundation encoders and large language models (LLM). |
Ziyang Ma; Guanrou Yang; Yifan Yang; Zhifu Gao; Jiaming Wang; Zhihao Du; Fan Yu; Qian Chen; Siqi Zheng; Shiliang Zhang; Xie Chen; |
125 | Unlocking The Power of LSTM for Long Term Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While the recently introduced sLSTM for Natural Language Processing (NLP) introduces exponential gating and memory mixing that are beneficial for long term sequential learning, its potential short memory issue is a barrier to applying sLSTM directly in TSF. To address this, we propose a simple yet efficient algorithm named P-sLSTM, which is built upon sLSTM by incorporating patching and channel independence. |
Yaxuan Kong; Zepu Wang; Yuqi Nie; Tian Zhou; Stefan Zohren; Yuxuan Liang; Peng Sun; Qingsong Wen; |
126 | Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel asymmetric Siamese tracker named AsymTrack for efficient tracking. |
Jiawen Zhu; Huayi Tang; Xin Chen; Xinying Wang; Dong Wang; Huchuan Lu; |
127 | Simulate and Eliminate: Revoke Backdoors for Generative Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Simulate and Eliminate (SANDE) to erase the undesired backdoored mappings for generative LLMs. |
Haoran Li; Yulin Chen; Zihao Zheng; Qi Hu; Chunkit Chan; Heshan Liu; Yangqiu Song; |
128 | Instruct Where The Model Fails: Generative Data Augmentation Via Guided Self-contrastive Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We improve downstream task awareness in generated images by proposing a task-aware fine-tuning strategy that actively detects failures of downstream task in the target model to fine-tune the generation process between epochs. |
Weijian Ma; Ruoxin Chen; Keyue Zhang; Shuang Wu; Shouhong Ding; |
129 | Aligning Large Language Models for Faithful Integrity Against Opposing Argument Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel framework, named Alignment for Faithful Integrity with Confidence Estimation (AFICE), which aims to align the LLM responses with faithful integrity. |
Yong Zhao; Yang Deng; See-Kiong Ng; Tat-Seng Chua; |
130 | Cluster-guided Contrastive Class-imbalanced Graph Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, existing class-imbalanced learning methods in vision may overlook the rich graph semantic substructures of the majority classes and excessively emphasize learning from the minority classes. To address these challenges, we propose a simple yet powerful approach called C3GNN that integrates the idea of clustering into contrastive learning to enhance class-imbalanced graph classification. |
Wei Ju; Zhengyang Mao; Siyu Yi; Yifang Qin; Yiyang Gu; Zhiping Xiao; Jianhao Shen; Ziyue Qiao; Ming Zhang; |
131 | IOHunter: Graph Foundation Model to Uncover Online Information Operations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a methodology designed to identify users orchestrating information operations, a.k.a. IO drivers, across various influence campaigns. |
Marco Minici; Luca Luceri; Francesco Fabbri; Emilio Ferrara; |
132 | TimePFN: Effective Multivariate Time Series Forecasting with Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel training scheme and a transformer-based architecture, collectively referred to as TimePFN, for multivariate time-series (MTS) forecasting. |
Ege Onur Taga; Muhammed Emrullah Ildiz; Samet Oymak; |
133 | Block-Based Multi-Scale Image Rescaling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional image rescaling methods often fall short because they focus solely on the overall scaling rate, ignoring the varying amounts of information in different parts of the image. To address this limitation, we propose a Block-Based Multi-Scale Image Rescaling Framework (BBMR), tailored for IR tasks involving HR images of 2K resolution and higher. |
Jian Li; Siwang Zhou; |
134 | CLIMB-ReID: A Hybrid CLIP-Mamba Framework for Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, existing methods still struggle to address two key issues simultaneously: efficiently transferring the knowledge learned from CLIP and comprehensively extracting the context information from images or videos. To address these issues, we introduce CLIMB-ReID, a pioneering hybrid framework that synergizes the impressive power of CLIP with the remarkable computational efficiency of Mamba. |
Chenyang Yu; Xuehu Liu; Jiawen Zhu; Yuhao Wang; Pingping Zhang; Huchuan Lu; |
135 | Universal Features Guided Zero-Shot Category-Level Object Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a zero-shot method to achieve category-level 6-DOF object pose estimation, which exploits both 2D and 3D universal features of input RGB-D image to establish semantic similarity-based correspondences and can be extended to unseen categories without additional model fine-tuning. |
Wentian Qu; Chenyu Meng; Heng Li; Jian Cheng; Cuixia Ma; Hongan Wang; Xiao Zhou; Xiaoming Deng; Ping Tan; |
136 | HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new 3D Gaussian Splatting based data augmentation framework for bimanual hand-object interaction, which is capable of augmenting existing dataset to large-scale photorealistic data with various hand-object pose and viewpoints. |
Wentian Qu; Jiahe Li; Jian Cheng; Jian Shi; Chenyu Meng; Cuixia Ma; Hongan Wang; Xiaoming Deng; Yinda Zhang; |
137 | GNS: Solving Plane Geometry Problems By Neural-Symbolic Reasoning with Multi-Modal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous works simply regarded a plane geometry problem as multi-modal QA task, which ignored the importance of explicit parsing geometric elements from problems. To tackle this limitation, we propose to solve plane Geometry problems by Neural-Symbolic reasoning with MLLMs (GNS). |
Maizhen Ning; Zihao Zhou; Qiufeng Wang; Xiaowei Huang; Kaizhu Huang; |
138 | VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Albeit progress has been made in Composed Image Retrieval (CIR), we empirically find that a certain percentage of failure retrieval results are not consistent with their relative captions. |
Chun-Mei Feng; Yang Bai; Tao Luo; Zhen Li; Salman Khan; Wangmeng Zuo; Rick Siow Mong Goh; Yong Liu; |
139 | Relation-Aware Equivariant Graph Networks for Epitope-Unknown Antibody Design and Specificity Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take into account a variety of node features, edge features, and edge relations to include more contextual and geometric information. |
Lirong Wu; Haitao Lin; Yufei Huang; Zhangyang Gao; Cheng Tan; Yunfan Liu; Tailin Wu; Stan Z. Li; |
140 | CG-TGAN: Conditional Generative Adversarial Networks with Graph Neural Networks for Tabular Data Synthesizing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that CG-TGAN outperforms GAN-based models and is comparable to diffusion-based models. |
Seungcheol Lee; Moohong Min; |
141 | FoldToken: Learning Protein Language Via Vector Quantization and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce FoldTokenizer to represent protein sequence-structure as discrete symbols. |
Zhangyang Gao; Cheng Tan; Jue Wang; Yufei Huang; Lirong Wu; Stan Z. Li; |
142 | ChatterBox: Multimodal Referring and Grounding with Chain-of-Questions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we establish a benchmark and a baseline approach for Multimodal referring and grounding with Chain-of-Questions (MCQ), opening up a promising direction for ‘logical’ multimodal dialogues. |
Yunjie Tian; Tianren Ma; Lingxi Xie; Qixiang Ye; |
143 | Empowering Self-Learning of LLMs: Inner Knowledge Explicitation As A Catalyst Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a Self Knowledge Explicitation Learning (SKE-Learn) framework, which equips the LLMs with meta-skills to explicitly extract, verify and utilize inner knowledge for reasoning. |
Shijue Huang; Wanjun Zhong; Deng Cai; Fanqi Wan; Chengyi Wang; Mingxuan Wang; Mu Qiao; Ruifeng Xu; |
144 | Dynamic Spectral Graph Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, both steps are hand-designed, capturing incomprehensive anomaly information of wavelet-specific features and resulting in their inconsistent feature fusion. To address these problems, we propose a dynamic spectral graph anomaly detection framework DSGAD to adaptively capture comprehensive anomaly information and perform consistent feature fusion. |
Jianbo Zheng; Chao Yang; Tairui Zhang; Longbing Cao; Bin Jiang; Xuhui Fan; Xiao-ming Wu; Xianxun Zhu; |
145 | A Diffusion-Based Framework for Occluded Object Movement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To leverage the real-world knowledge embedded in the pre-trained diffusion models, we propose a Diffusion-based framework specifically designed for Occluded Object Movement, named DiffOOM. |
Zheng-Peng Duan; Jiawei Zhang; Siyu Liu; Zheng Lin; Chun-Le Guo; Dongqing Zou; Jimmy Ren; Chongyi Li; |
146 | DiffRetouch: Using Diffusion to Retouch on The Shoulder of Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a diffusion-based method, named DiffRetouch. |
Zheng-Peng Duan; Jiawei Zhang; Zheng Lin; Xin Jin; XunDong Wang; Dongqing Zou; Chun-Le Guo; Chongyi Li; |
147 | Unsupervised Photometric-Consistent Depth Estimation from Endoscopic Monocular Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, our goal is to obtain a strong and reliable supervisory signal for achieving photometric-consistent depth estimation. |
Shijie Li; Weijun Lin; Qingyuan Xiang; Yunbin Tu; Shitan Asu; Zheng Li; |
148 | Boosting Segment Anything Model Towards Open-Vocabulary Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Sambor to seamlessly integrate SAM with the open-vocabulary object detector in an end-to-end framework. |
Xumeng Han; Longhui Wei; Xuehui Yu; Zhiyang Dou; Xin He; Kuiran Wang; Yingfei Sun; Zhenjun Han; Qi Tian; |
149 | Adaptive Experimental Design to Accelerate Scientific Discovery and Engineering Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These approaches have yielded substantial improvements in sample-efficiency, particularly for black-box optimization over high-dimensional combinatorial spaces (e.g., sequences and graphs). This cover letter outlines key methods I have developed and their real-world sustainability applications in areas such as nano-porous materials discovery, hardware design, and additive manufacturing. |
Aryan Deshwal; |
150 | DisCo: Graph-Based Disentangled Contrastive Learning for Cold-Start Cross-Domain Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel graph-based disentangled contrastive learning framework to capture fine-grained user intent and filter out irrelevant collaborative information, thereby avoiding negative transfer. |
Hourun Li; Yifan Wang; Zhiping Xiao; Jia Yang; Changling Zhou; Ming Zhang; Wei Ju; |
151 | U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, in this paper, we explore the untapped potential of KANs in improving backbones for vision tasks. |
Chenxin Li; Xinyu Liu; Wuyang Li; Cheng Wang; Hengyu Liu; Yifan Liu; Zhen Chen; Yixuan Yuan; |
152 | ParZC: Parametric Zero-Cost Proxies for Efficient NAS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our observations reveal that node-wise zero-cost statistics significantly vary in their contributions to performance, with each node exhibiting a degree of uncertainty. Based on this insight, we introduce a novel method called Parametric Zero-Cost Proxies (ParZC) framework to enhance the adaptability of zero-cost proxies through parameterization. |
Peijie Dong; Lujun Li; Zhenheng Tang; Xiang Liu; Zimian Wei; Qiang Wang; Xiaowen Chu; |
153 | Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This often results in coarse handling of multi-modal entity information, overlooking the nuanced, fine-grained semantic details and their complex interactions. To tackle this shortfall, we introduce a novel framework MyGO to tokenize, fuse, and augment the fine-grained multi-modal representations of entities and enhance the MMKGC performance. |
Yichi Zhang; Zhuo Chen; Lingbing Guo; Yajing Xu; Binbin Hu; Ziqi Liu; Wen Zhang; Huajun Chen; |
154 | Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent research has exposed that even aligned LLMs are susceptible to adversarial manipulations known as Jailbreak Attacks. To address this challenge, this paper proposes a method called Token Highlighter to inspect and mitigate the potential jailbreak threats in the user query. |
Xiaomeng Hu; Pin-Yu Chen; Tsung-Yi Ho; |
155 | FigStep: Jailbreaking Large Vision-Language Models Via Typographic Visual Prompts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose FigStep, a straightforward yet effective black-box jailbreak algorithm against LVLMs. |
Yichen Gong; Delong Ran; Jinyuan Liu; Conglei Wang; Tianshuo Cong; Anyu Wang; Sisi Duan; Xiaoyun Wang; |
156 | Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel concept of compositional instructions called chain-of-instructions (CoI), where the output of one instruction becomes an input for the next like a chain. |
Shirley Anugrah Hayati; Taehee Jung; Tristan Bodding-Long; Sudipta Kar; Abhinav Sethy; Joo-Kyung Kim; Dongyeop Kang; |
157 | PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce PhishAgent, a multimodal agent that combines a wide range of tools, integrating both online and offline knowledge bases with Multimodal Large Language Models (MLLMs). |
Tri Cao; Chengyu Huang; Yuexin Li; Wang Huilin; Amy He; Nay Oo; Bryan Hooi; |
158 | CP-Guard: Malicious Agent Detection and Defense in Collaborative Bird’s Eye View Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, a malicious agent can send harmful information to the ego CAV to mislead it. To address this critical issue, we propose a novel method, **CP-Guard**, a tailored defense mechanism for CP that can be deployed by each agent to accurately detect and eliminate malicious agents in its collaboration network. |
Senkang Hu; Yihang Tao; Guowen Xu; Yiqin Deng; Xianhao Chen; Yuguang Fang; Sam Kwong; |
159 | DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Another solution is to learn the deformation of 3D objects with the distillation of video generative models, which, however, tends to produce 3D videos with small and discontinuous motions due to the inappropriate extraction and application of physics priors. In this work, to combine the strengths and complementing shortcomings of the above two solutions, we propose to learn the physical properties of a material field with video diffusion priors, and then utilize a physics-based Material-Point-Method (MPM) simulator to generate 4D content with realistic motions. |
Tianyu Huang; Haoze Zhang; Yihan Zeng; Zhilu Zhang; Hui Li; Wangmeng Zuo; Rynson W. H. Lau; |
160 | Augmenting Math Word Problems Via Iterative Question Composing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the MMIQC dataset, comprising a mixture of processed web data and synthetic question-response pairs, aimed at enhancing the mathematical reasoning capabilities of base language models. |
Haoxiong Liu; Yifan Zhang; Yifan Luo; Andrew C Yao; |
161 | K-ON: Stacking Knowledge on The Head Layer of Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This leads to a granularity mismatch between KGs and natural languages. To address this issue, we propose K-ON, which integrates KG knowledge into the LLM by employing multiple head layers for next k-step prediction. |
Lingbing Guo; Yichi Zhang; Zhongpu Bo; Zhuo Chen; Mengshu Sun; Zhiqiang Zhang; Wen Zhang; Huajun Chen; |
162 | What Is A Good Question? Assessing Question Quality Via Meta-Fact Checking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes aligning the complete knowledge underlying questions with educational criteria effectively employed in physics courses, thereby developing novel knowledge-intensive metrics of question quality. |
Bo Zhang; Jianghua Zhu; Chaozhuo Li; Hao Yu; Li Kong; Zhan Wang; Dezhuang Miao; Xiaoming Zhang; Junsheng Zhou; |
163 | TrustUQA: A Trustful Framework for Unified Structured Data Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose TrustUQA, a trustful QA framework that can simultaneously support multiple types of structured data in a unified way. |
Wen Zhang; Long Jin; Yushan Zhu; Jiaoyan Chen; Zhiwei Huang; Junjie Wang; Yin Hua; Lei Liang; Huajun Chen; |
164 | Image Conductor: Precision Control for Interactive Video Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose Image Conductor, a method for precise control of camera transitions and object movements to generate video assets from a single image. |
Yaowei Li; Xintao Wang; Zhaoyang Zhang; Zhouxia Wang; Ziyang Yuan; Liangbin Xie; Ying Shan; Yuexian Zou; |
165 | Where Precision Meets Efficiency: Transformation Diffusion Model for Point Cloud Registration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a transformation diffusion model for point cloud registration to balance precision and efficiency. |
Yongzhe Yuan; Yue Wu; Xiaolong Fan; Maoguo Gong; Qiguang Miao; Wenping Ma; |
166 | VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce VideoElevator, a training-free and plug-and-play method, which elevates the performance of T2V using superior capabilities of T2I. |
Yabo Zhang; Yuxiang Wei; Xianhui Lin; Zheng Hui; Peiran Ren; Xuansong Xie; Wangmeng Zuo; |
167 | Follow-Your-Click: Open-domain Regional Image Animation Via Motion Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a practical framework, named Follow-Your-Click, to achieve image animation with a simple user click (for specifying what to move) and a motion prompt (for specifying how to move). |
Yue Ma; Yingqing He; Hongfa Wang; Andong Wang; Leqi Shen; Chenyang Qi; Jixuan Ying; Chengfei Cai; Zhifeng Li; Heung-Yeung Shum; Wei Liu; Qifeng Chen; |
168 | Graph Coarsening Via Supervised Granular-Ball for Scalable Graph Neural Network Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we employ granular-ball computing to effectively compress graph data. |
Shuyin Xia; Xinjun Ma; Zhiyuan Liu; Cheng Liu; Sen Zhao; Guoyin Wang; |
169 | Attributive Reasoning for Hallucination Diagnosis of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods do not involve different categories of hallucinations which is important on hallucination analysis, and make detailed investigation for the internal state of LLMs which indicates the direction on hallucination occurrence. Therefore, in our research, we introduce an attribution framework to trace the origins of hallucinations based on the internal signals of LLMs. |
Yuyan Chen; Zehao Li; Shuangjie You; Zhengyu Chen; Jingwen Chang; Yi Zhang; Weinan Dai; Qingpei Guo; Yanghua Xiao; |
170 | Leveraging RGB-D Data with Cross-Modal Context Mining for Glass Surface Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a large-scale RGB-D glass surface detection dataset, RGB-D GSD, for rigorous experiments and future research. |
Jiaying Lin; Yuen-Hei Yeung; Shuquan Ye; Rynson W. H. Lau; |
171 | Few-Shot Audio-Visual Class-Incremental Learning with Temporal Prompting and Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This framework contains two parts: temporal-residual prompting for audio-visual synergy adapter and temporal prompt regularization. |
Yawen Cui; Li Liu; Zitong Yu; Guanjie Huang; Xiaopeng Hong; |
172 | TrojanDec: Data-free Detection of Trojan Inputs in Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose TrojanDec, the first data-free method to identify and recover a test input embedded with a trigger. |
Yupei Liu; Yanting Wang; Jinyuan Jia; |
173 | SeeDiff: Off-the-Shelf Seeded Mask Generation from Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take a closer look at attention mechanisms of Stable Diffusion, from which we draw connections with classical seeded segmentation approaches. |
Joon Hyun Park; Kumju Jo; Sungyong Baik; |
174 | LLMEmb: Large Language Model Can Be A Good Embedding Generator for Sequential Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce LLMEmb, a novel method leveraging LLM to generate item embeddings that enhance SRS performance. |
Qidong Liu; Xian Wu; Wanyu Wang; Yejing Wang; Yuanshao Zhu; Xiangyu Zhao; Feng Tian; Yefeng Zheng; |
175 | Tri-Ergon: Fine-Grained Video-to-Audio Generation with Multi-Modal Conditions and LUFS Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current V2A models often lack fine-grained control over the generated audio, especially in terms of loudness variation and the incorporation of multi-modal conditions. To overcome these limitations, we introduce Tri-Ergon, a diffusion-based V2A model that incorporates textual, auditory, and pixel-level visual prompts to enable detailed and semantically rich audio synthesis. |
Bingliang Li; Fengyu Yang; Yuxin Mao; Qingwen Ye; Hongkai Chen; Yiran Zhong; |
176 | KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The essence of kriging task is transferability. Recently, several inductive spatio-temporal kriging methods have been proposed based on graph neural networks, being trained based on a graph built on top of observed nodes via pretext tasks such as masking nodes out and reconstructing them. |
Qianxiong Xu; Cheng Long; Ziyue Li; Sijie Ruan; Rui Zhao; Zhishuai Li; |
177 | SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, existing cross-layer feature aggregation methods designed for CNNs or ViTs are not practical in Mamba-based models due to high computational costs. Therefore, this paper aims to introduce an efficient cross-layer feature aggregation mechanism for vision backbone networks. |
Meng Lou; Yunxiang Fu; Yizhou Yu; |
178 | Prompt Tuning In A Compact Attribute Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose PTinCAS to tackle prompt tuning in a compact attribute space, driven by the premise that attributes offer detailed class interpretations and can facilitate transfer across related categories. |
Shiyu Hou; Tianfei Zhou; Shuai Zhang; Ye Yuan; Guoren Wang; |
179 | DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces DME-Driver, a new autonomous driving system that enhances performance and robustness by fully leveraging the two crucial aspects. |
Wencheng Han; Dongqian Guo; Cheng-Zhong Xu; Jianbing Shen; |
180 | ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Built upon it, we create our ICM-Assistant model in the framework of rule-based ICM, making it readily applicable in real practice. |
Mengyang Wu; Yuzhi Zhao; Jialun Cao; Mingjie Xu; Zhongming Jiang; Xuehui Wang; Qinbin Li; Guangneng Hu; Shengchao Qin; Chi-Wing Fu; |
181 | Improving Multimodal Social Media Popularity Prediction Via Selective Retrieval Knowledge Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a retrieval-based framework to enhance the popularity prediction of multimodal UGCs. |
Xovee Xu; Yifan Zhang; Fan Zhou; Jingkuan Song; |
182 | Leveraging First and Zeroth-Order Gradient to Address Imbalanced Black-Box Prompt Tuning Via Minimax Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose black-box prompt tuning with first and zeroth order gradient (BPT-FZG) for handling the imbalanced data. |
Haozhen Zhang; Zhaogeng Liu; Bin Gu; Yi Chang; |
183 | Stream Aligner: Efficient Sentence-Level Alignment Via Distribution Induction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce the Streaming Distribution Induce Aligner (Stream Aligner), a novel alignment paradigm that combines efficiency with enhanced performance in various tasks throughout the generation process. |
Hantao Lou; Jiaming Ji; Kaile Wang; Yaodong Yang; |
184 | Sequence to Sequence Reward Modeling: Improving RLHF By Language Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It means RM fails to provide feedback that accurately aligns with human preference, causing LLMs to explore unexpected generalizations, and failing to achieve alignment objectives. To mitigate this issue, we propose a novel sequence-to-sequence (seq2seq) reward modeling method. |
Jiayi Zhou; Jiaming Ji; Josef Dai; Yaodong Yang; |
185 | Graph Structure Refinement with Energy-based Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, existing generative architectures fail to fit discriminative graph-related tasks. To tackle these issues, we introduce an unsupervised method based on a joint of generative training and discriminative training to learn graph structure and representation, aiming to improve the discriminative performance of generative models. |
Xianlin Zeng; Yufeng Wang; Yuqi Sun; Guodong Guo; Wenrui Ding; Baochang Zhang; |
186 | A Vision for Reinventing Credible Elections with Artificial Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this blue sky paper, we seek to stimulate the research community to pursue important new as well as existing (unsolved) AI problems in the context of a challenging, often ignored, socio-sensitive application domain. |
Biplav Srivastava; |
187 | Apollo-Forecast: Overcoming Aliasing and Inference Speed Challenges in Language Models for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing language models for time series forecasting encounter several obstacles, including aliasing distortion and prolonged inference times, primarily due to the limitations of quantization processes and the computational demands of large models. This paper introduces Apollo-Forecast, a novel framework that tackles these challenges with two key innovations: the Anti-Aliasing Quantization Module (AAQM) and the Race Decoding (RD) technique. |
Tianyi Yin; Jingwei Wang; Yunlong Ma; Han Wang; Chenze Wang; Yukai Zhao; Min Liu; Weiming Shen; |
188 | Modeling All Response Surfaces in One for Conditional Search Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches tend to be inefficient as they require a substantial number of observations to guarantee each GP’s performance and cannot capture relationships between hyperparameters across different subspaces. To address these issues, this paper proposes a novel approach to model the response surfaces of all subspaces in one, which can model the relationships between hyperparameters elegantly via a self-attention mechanism. |
Jiaxing Li; Wei Liu; Chao Xue; Yibing Zhan; Xiaoxing Wang; Weifeng Liu; Dacheng Tao; |
189 | Spiking Point Transformer for Point Cloud Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Hybrid Dynamics Integrate-and-Fire Neuron (HD-IF), designed to simulate selective neuron activation and reduce over-reliance on specific artificial neurons. |
Peixi Wu; Bosong Chai; Hebei Li; Menghua Zheng; Yansong Peng; Zeyu Wang; Xuan Nie; Yueyi Zhang; Xiaoyan Sun; |
190 | COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems Against Semantic Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the first robustness certification framework COMMIT to certify the robustness of multi-sensor fusion systems against semantic attacks. |
Zijian Huang; Wenda Chu; Linyi Li; Chejian Xu; Bo Li; |
191 | SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, extracting relevant location characteristics for a given task can be challenging, often requiring expensive data fusion or distillation from massive global imagery datasets. To address this challenge, we introduce Satellite Contrastive Location-Image Pretraining (SatCLIP). |
Konstantin Klemmer; Esther Rolf; Caleb Robinson; Lester Mackey; Marc Rußwurm; |
192 | SoundBrush: Sound As A Brush for Visual Scene Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SoundBrush, a model that uses sound as a brush to edit and manipulate visual scenes. |
Kim Sung-Bin; Kim Jun-Seong; Junseok Ko; Yewon Kim; Tae-Hyun Oh; |
193 | HEP-NAS: Towards Efficient Few-shot Neural Architecture Search Via Hierarchical Edge Partitioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce HEP-NAS, a hierarchy-wise partition algorithm designed to further enhance accuracy. |
Jianfeng Li; Jiawen Zhang; Feng Wang; Lianbo Ma; |
194 | QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce QJL, a new quantization approach that consists of a Johnson-Lindenstrauss (JL) transform followed by sign-bit quantization. |
Amir Zandieh; Majid Daliri; Insu Han; |
195 | Revolutionizing Encrypted Traffic Classification with MH-Net: A Multi-View Heterogeneous Graph Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Traditional byte-based traffic analysis methods are constrained by the rigid granularity of information and fail to fully exploit the diverse correlations between bytes. To address these limitations, this paper introduces MH-Net, a novel approach for classifying network traffic that leverages multi-view heterogeneous traffic graphs to model the intricate relationships between traffic bytes. |
Haozhen Zhang; Haodong Yue; Xi Xiao; Le Yu; Qing Li; Zhen Ling; Ye Zhang; |
196 | Autoregressive Sequence Modeling for 3D Medical Image Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a pioneering method for learning 3D medical image representations through an autoregressive pre-training framework. |
Siwen Wang; Churan Wang; Fei Gao; Lixian Su; Fandong Zhang; Yizhou Wang; Yizhou Yu; |
197 | IMAGDressing-v1: Customizable Virtual Dressing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing virtual try-on (VTON) methods provide only limited user control over garment attributes and generally overlook essential factors such as face, pose, and scene context. |
Fei Shen; Xin Jiang; Xin He; Hu Ye; Cong Wang; Xiaoyu Du; Zechao Li; Jinhui Tang; |
198 | MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current graph-based methods for scene generation are constrained to text-based inputs and exhibit insufficient adaptability to flexible user inputs, hindering the ability to precisely control object geometry. To address this issue, we propose MMGDreamer, a dual-branch diffusion model for scene generation that incorporates a novel Mixed-Modality Graph, visual enhancement module, and relation predictor. |
Zhifei Yang; Keyang Lu; Chao Zhang; Jiaxing Qi; Hanqi Jiang; Ruifei Ma; Shenglin Yin; Yifan Xu; Mingzhe Xing; Zhen Xiao; Jieyi Long; Xiangde Liu; Guangyao Zhai; |
199 | ERL-MPP: Evolutionary Reinforcement Learning with Multi-head Puzzle Perception for Solving Large-scale Jigsaw Puzzles of Eroded Gaps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While most existing models focus on solving either small-scale puzzles or puzzles with no gap between fragments, solving large-scale puzzles with gaps presents distinctive challenges in both image understanding and combinatorial optimization. To tackle these challenges, we propose a framework of Evolutionary Reinforcement Learning with Multi-head Puzzle Perception (ERL-MPP) to derive a better set of swapping actions for solving the puzzles. |
Xingke Song; Xiaoying Yang; Chenglin Yao; Jianfeng Ren; Ruibin Bai; Xin Chen; Xudong Jiang; |
200 | DECIDER: Difference-aware Contrastive Diffusion Model with Adversarial Perturbations for Image Change Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a Difference-aware Contrastive Diffusion Model with Adversarial Perturbations (DECIDER) for ICC due to the excellent performance of diffusion models in image/text generation. |
Guojin Zhong; Jinhong Hu; Jiajun Chen; Jin Yuan; Wenbo Pan; |
201 | Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through extensive experiments, we demonstrate that while downsampling HR images leads to vision information loss, leveraging complementary modalities, e.g., text, can effectively compensate for this loss. Building upon this insight, we propose Divide, Conquer and Combine, a novel training-free framework for enhancing MLLM perception of HR images. |
Wenbin Wang; Liang Ding; Minyan Zeng; Xiabin Zhou; Li Shen; Yong Luo; Wei Yu; Dacheng Tao; |
202 | ResMaster: Mastering High-Resolution Image Generation Via Structural and Fine-Grained Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce ResMaster, a novel, training-free method that empowers resolution-limited diffusion models to generate high-quality images beyond resolution restrictions. |
Shuwei Shi; Wenbo Li; Yuechen Zhang; Jingwen He; Biao Gong; Yinqiang Zheng; |
203 | MCGAN: Enhancing GAN Training with Regression-Based Generator Loss Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the main bottleneck of existing approaches is the lack of supervision on the generator training, which often results in undamped oscillation and unsatisfactory performance. To address this issue, we propose an algorithm called Monte Carlo GAN (MCGAN). |
Baoren Xiao; Hao Ni; Weixin Yang; |
204 | Structured Packing in LLM Training Improves Long Context Utilization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent advancements in long-context language modeling have attracted significant attention, yet their practical applications often suffer from suboptimal context utilization. To efficiently address this issue, we introduce the Structured Packing for Long Context, SPLiCe, a method that uses retrieval to collate mutually relevant documents into long training samples. |
Konrad Staniszewski; Szymon Tworkowski; Sebastian Jaszczur; Yu Zhao; Henryk Michalewski; Łukasz Kuciński; Piotr Miłoś; |
205 | Axioms for AI Alignment from Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The problem of learning a reward function is one of preference aggregation that, we argue, largely falls within the scope of social choice theory. From this perspective, we can evaluate different aggregation methods via established axioms, examining whether these methods meet or fail well-known standards. |
Evi Micha; |
206 | Foundations of Multi-Agent Learning in Dynamic Environments: Where Reinforcement Learning Meets Strategic Decision-Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: My work has established both sample and computational complexities of learning in Stochastic Games, the most fundamental model of MARL, and advocated a unique Economics perspective of independent learning in Stochastic Games. |
Kaiqing Zhang; |
207 | Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Self-Perception Tuning (SPT) method, aiming to enhance SAM’s perception capability for anomaly segmentation. |
Hui-Yue Yang; Hui Chen; Ao Wang; Kai Chen; Zijia Lin; Yongliang Tang; Pengcheng Gao; Yuming Quan; Jungong Han; Guiguang Ding; |
208 | Scaffold-BPE: Enhancing Byte Pair Encoding for Large Language Models with Simple and Effective Scaffold Token Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address that issue, we propose Scaffold-BPE, which incorporates a dynamic scaffold token removal mechanism by parameter-free, computation-light, and easy-to-implement modifications to the original BPE method. |
Haoran Lian; Yizhe Xiong; Jianwei Niu; Shasha Mo; Zhenpeng Su; Zijia Lin; Hui Chen; Jungong Han; Guiguang Ding; |
209 | An Information-theoretic Multi-task Representation Learning Framework for Natural Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new principled multi-task representation learning framework (InfoMTL) to extract noise-invariant sufficient representations for all tasks. |
Dou Hu; Lingwei Wei; Wei Zhou; Songlin Hu; |
210 | AIM: Additional Image Guided Generation of Transferable Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on generative approaches for targeted transferable attacks. |
Teng Li; Xingjun Ma; Yu-Gang Jiang; |
211 | Retrieval-Augmented Visual Question Answering Via Built-in Autoregressive Search Engines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose ReAuSE, an alternative to the previous RAG model for the knowledge-based VQA task, which seamlessly integrates knowledge retriever into the generative multi-modal large language model, serving as a built-in search engine. |
Xinwei Long; Zhiyuan Ma; Ermo Hua; Kaiyan Zhang; Biqing Qi; Bowen Zhou; |
212 | Smoothness Really Matters: A Simple Yet Effective Approach for Unsupervised Graph Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Given the sensitivity of GNNs to local structural features, even slight discrepancies between source and target graphs could lead to significant shifts in node embeddings, thereby reducing the effectiveness of knowledge transfer. To address this issue, we introduce a novel approach for UGDA called Target-Domain Structural Smoothing (TDSS). |
Wei Chen; Guo Ye; Yakun Wang; Zhao Zhang; Libang Zhang; Daixin Wang; Zhiqiang Zhang; Fuzhen Zhuang; |
213 | Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an energy amplification technique to address the issue that existing models easily overlook low-energy components in time series forecasting. |
Jingru Fei; Kun Yi; Wei Fan; Qi Zhang; Zhendong Niu; |
214 | FCOM: A Federated Collaborative Online Monitoring Framework Via Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most online learning algorithms are designed for 1) a centralized setting that requires data sharing across processes for accurate predictions or 2) a homogeneity assumption that estimates a single global model from decentralized data. To overcome these limitations and enable online learning in a heterogeneous population under a decentralized setting, we propose a federated collaborative online monitoring method. |
Tanapol Kosolwattana; Huazheng Wang; Raed Al Kontar; Ying Lin; |
215 | Multi-Modal Latent Variables for Cross-Individual Primary Visual Cortex Modeling and Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current computational models face two critical limitations, namely the challenge of cross-modal integration between partial neural recordings and complex visual stimuli, and the inherent variability in neural characteristics across individuals, including differences in neuron populations and firing patterns. To address these challenges, we present a multi-modal identifiable variational autoencoder (miVAE) that employs a two-level disentanglement strategy to map neural activity and visual stimuli into a unified latent space. |
Yu Zhu; Bo Lei; Chunfeng Song; Wanli Ouyang; Shan Yu; Tiejun Huang; |
216 | When Should We Prefer State-to-Visual DAgger Over Visual Reinforcement Learning? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We evaluate both methods across 16 tasks from three benchmarks, focusing on their asymptotic performance, sample efficiency, and computational costs. |
Tongzhou Mu; Zhaoyang Li; Stanisław Wiktor Strzelecki; Xiu Yuan; Yunchao Yao; Litian Liang; Hao Su; |
217 | MPTSNet: Integrating Multiscale Periodic Local Patterns and Global Dependencies for Multivariate Time Series Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, current deep learning-based methods often neglect the simultaneous construction of local features and global dependencies at different time scales, lacking sufficient feature extraction capabilities to achieve satisfactory classification accuracy. To address these challenges, we propose a novel Multiscale Periodic Time Series Network (MPTSNet), which integrates multiscale local patterns and global correlations to fully exploit the inherent information in time series. |
Yang Mu; Muhammad Shahzad; Xiao Xiang Zhu; |
218 | DuMo: Dual Encoder Modulation Network for Precise Concept Erasure Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose our Dual encoder Modulation network (DuMo), which achieves precise erasure of inappropriate target concepts with minimum impairment to non-target concepts. |
Feng Han; Kai Chen; Chao Gong; Zhipeng Wei; Jingjing Chen; Yu-Gang Jiang; |
219 | One for Dozens: Adaptive REcommendation for All Domains with Counterfactual Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, minor domains often suffer from data sparsity, leading to inadequate training in classical methods. To address these issues, we propose Adaptive REcommendation for All Domains with counterfactual augmentation (AREAD). |
Huishi Luo; Yiwen Chen; Yiqing Wu; Fuzhen Zhuang; Deqing Wang; |
220 | Multi-Focus Image Fusion Via Explicit Defocus Blur Modelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel framework that integrates explicit defocus blur modelling into the MFIF process, improving both interpretability and performance. |
Yuhui Quan; Xi Wan; Zitao Tang; Jinxiu Liang; Hui Ji; |
221 | Multisensory Machine Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While we humans perceive the world by looking, listening, touching, smelling, and tasting, traditional form of machine intelligence mostly focuses on a single sensory modality, particularly vision. Therefore, my research, which I call multisensory machine intelligence, aims to empower machines to emulate and enhance human capabilities in seeing, hearing, and feeling, ultimately enabling them to comprehensively perceive, understand, and interact with the multisensory world. |
Ruohan Gao; |
222 | UniDemoiré: Towards Universal Image Demoiréing with Data Generation and Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a universal image demoiréing solution, UniDemoiré, which has superior generalization capability. |
Zemin Yang; Yujing Sun; Xidong Peng; Siu Ming Yiu; Yuexin Ma; |
223 | VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper develops a Versatile and Honest vision language Model (VHM) for remote sensing image analysis. |
Chao Pang; Xingxing Weng; Jiang Wu; Jiayu Li; Yi Liu; Jiaxing Sun; Weijia Li; Shuai Wang; Litong Feng; Gui-Song Xia; Conghui He; |
224 | Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the challenging task of Generalized Referring Expression Comprehension (GREC). |
Yaxian Wang; Henghui Ding; Shuting He; Xudong Jiang; Bifan Wei; Jun Liu; |
225 | HS-FPN: High Frequency and Spatial Perception FPN for Tiny Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, FPN lacks spatial perception ability. To address these issues, we propose a novel High Frequency and Spatial Perception Feature Pyramid Network (HS-FPN) with two innovative modules. |
Zican Shi; Jing Hu; Jie Ren; Hengkang Ye; Xuyang Yuan; Yan Ouyang; Jia He; Bo Ji; Junyu Guo; |
226 | Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce DETRIS, a parameter-efficient tuning framework designed to enhance low-rank visual feature propagation by establishing dense interconnections between each layer and all preceding layers, which enables effective cross-modal feature interaction and adaptation to misaligned encoders. |
Jiaqi Huang; Zunnan Xu; Ting Liu; Yong Liu; Haonan Han; Kehong Yuan; Xiu Li; |
227 | Hierarchically Controlled Deformable 3D Gaussians for Talking Head Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel framework for audio-driven talking head synthesis, namely it Hierarchically Controlled Deformable 3D Gaussians (HiCoDe), which achieves state-of-the-art performance with significantly reduced computational costs. |
Zhenhua Wu; Linxuan Jiang; Xiang Li; Chaowei Fang; Yipeng Qin; Guanbin Li; |
228 | Noisy Label Calibration for Multi-View Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, noise labels are ubiquitous in multi-view data due to imperfect annotations. To deal with this problem, we propose a novel noisy label calibration method (NLC) for multi-view classification to resist the negative impact of noisy labels. |
Shilin Xu; Yuan Sun; Xingfeng Li; Siyuan Duan; Zhenwen Ren; Zheng Liu; Dezhong Peng; |
229 | ProtCLIP: Function-Informed Protein Multi-Modal Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these works were still unable to replicate the extraordinary success of language-supervised visual foundation models due to the ineffective usage of aligned protein-text paired data and the lack of an effective function-informed pre-training paradigm. To address these issues, this paper curates a large-scale protein-text paired dataset called ProtAnno with a property-driven sampling strategy, and introduces a novel function-informed protein pre-training paradigm. |
Hanjing Zhou; Mingze Yin; Wei Wu; Mingyang Li; Kun Fu; Jintai Chen; Jian Wu; Zheng Wang; |
230 | InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a text-guided approach for generating emotionally expressive 2D avatars, offering fine-grained control, improved interactivity, and generalizability to the resulting video. |
Yuchi Wang; Junliang Guo; Jianhong Bai; Runyi Yu; Tianyu He; Xu Tan; Xu Sun; Jiang Bian; |
231 | DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further boost our model performance, we employ synthetic data and utilize image-depth pairs generated by a discriminative model on an in-the-wild image dataset. |
Ming Gui; Johannes Schusterbauer; Ulrich Prestel; Pingchuan Ma; Dmytro Kotovenko; Olga Grebenkova; Stefan Andreas Baumann; Vincent Tao Hu; Björn Ommer; |
232 | Heterogeneous Prompt-Guided Entity Inferring and Distilling for Scene-Text Aware Cross-Modal Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a heterogeneous prompt-guided entity inferring and distilling (HOPID) network to explore the nature connection of scene text in images and captions and learn a property-centric scene text representation. |
Zhiqian Zhao; Liang Li; Jiehua Zhang; Yaoqi Sun; Xichun Sheng; Haibing Yin; Shaowei Jiang; |
233 | Explore What LLM Does Not Know in Complex Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, in this paper, we propose a novel Question Answering with Knowledge Evaluation (KEQA) framework to promote the effectiveness and efficiency of RAG in QA. |
Xin Lin; Zhenya Huang; Zhiqiang Zhang; Jun Zhou; Enhong Chen; |
234 | LiteSearch: Efficient Tree Search with Dynamic Exploration Budget for Math Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they often require more than 10 times the computational resources of greedy decoding due to wasteful search strategies, making them difficult to be deployed in practical applications. This study introduces a novel guided tree search algorithm with a goal-directed heuristic function and node-level exploration budget (maximum number of children) calculation to tackle this issue. |
Ante Wang; Linfeng Song; Ye Tian; Baolin Peng; Dian Yu; Haitao Mi; Jinsong Su; Dong Yu; |
235 | AD4CD: Causal-Guided Anomaly Detection for Enhancing Cognitive Diagnosis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework named Anomaly Detection for Cognitive Diagnosis (AD4CD) to enhance the ability of Learning-to-Detect-Anomalous. |
Haiping Ma; Yue Yao; Changqian Wang; Siyu Song; Yong Yang; |
236 | Dis²Booth: Learning Image Distribution with Disentangled Features for Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on personalizing a diffusion model to generated varied data usually containing multiple subjects, which has a more diverse and complex data distribution. |
Guanqi Ding; Chengyu Yang; Shuhui Wang; Xincheng Li; Jinzhe Zhang; Xin Jin; Qingming Huang; |
237 | EventSum: A Large-Scale Event-Centric Summarization Dataset for Chinese Multi-News Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is challenging because the key information of the event is often scattered across multiple documents, involving complex event knowledge understanding and reasoning, which is under-explored in previous work. Therefore, we proposed the Event-Centric Multi-Document Summarization task, which aims to generate concise and comprehensive summaries of a given event based on multiple related news documents. |
Mengna Zhu; Kaisheng Zeng; Mao Wang; Kaiming Xiao; Lei Hou; Hongbin Huang; Juanzi Li; |
238 | Domain Generalized Medical Landmark Detection Via Robust Boundary-Aware Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These challenges substantially impede the broader application of deep learning-based medical landmark detection. To mitigate these issues, we propose a novel domain-generalized medical landmark detection framework that relies solely on single-center data for training. |
Haifan Gong; Yu Lu; Xiang Wan; Haofeng Li; |
239 | Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Herein, we introduce Visual Tokens Withdrawal (VTW), a plug-and-play module to boost MLLMs for rapid inference. |
Zhihang Lin; Mingbao Lin; Luxi Lin; Rongrong Ji; |
240 | SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our contributions are based on SafetyPrompts.com, a living catalogue of open datasets for LLM safety, which we plan to update continuously as the field of LLM safety develops. |
Paul Röttger; Fabio Pernisi; Bertie Vidgen; Dirk Hovy; |
241 | Enhance Modality Robustness in Text-Centric Multimodal Alignment with Adversarial Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study evaluates the quality and robustness of multimodal representations in the face of noise imperfections, dynamic input order permutations, and missing modalities, revealing that current text-centric alignment methods can compromise downstream robustness. To address this issue, we propose a new text-centric adversarial training approach that significantly enhances robustness compared to traditional robust training methods and pre-trained multimodal foundation models. |
Yun-Da Tsai; Ting-Yu Yen; Keng-Te Liao; Shou-De Lin; |
242 | SegFace: Face Segmentation of Long-Tail Classes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous works have largely overlooked the problem of poor segmentation performance of long-tail classes. To address this issue, we propose SegFace, a simple and efficient approach that uses a lightweight transformer-based model which utilizes learnable class-specific tokens. |
Kartik Narayan; Vibashan Vs; Vishal M. Patel; |
243 | Denoising Diffusion Variational Inference: Diffusion Models As Expressive Variational Posteriors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose denoising diffusion variational inference (DDVI), a black-box variational inference algorithm for latent variable models which relies on diffusion models as flexible approximate posteriors. |
Wasu Top Piriyakulkij; Yingheng Wang; Volodymyr Kuleshov; |
244 | VisRec: A Semi-Supervised Approach to Visibility Data Reconstruction in Radio Astronomy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods rely on a substantial amount of labeled training data, which requires significant labeling effort from radio astronomers. Addressing this challenge, we propose VisRec, a model-agnostic semi-supervised learning approach to visibility data reconstruction in radio astronomy. |
Ruoqi Wang; Haitao Wang; Qiong Luo; Feng Wang; Hejun Wu; |
245 | Generative Video Diffusion for Unseen Novel Semantic Video Moment Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a more generalisable approach by assuming only text sentences describing new semantics are available in model training without having seen any videos from a target domain. |
Dezhao Luo; Shaogang Gong; Jiabo Huang; Hailin Jin; Yang Liu; |
246 | Step-Calibrated Diffusion for Biomedical Optical Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we present Restorative Step-Calibrated Diffusion (RSCD): an unpaired diffusion-based image restoration method that uses a step calibrator model to dynamically determine the number of steps required to complete the reverse diffusion process for image restoration. |
Yiwei Lyu; Sung Jik Cha; Cheng Jiang; Asadur Zaman Chowdury; Xinhai Hou; Edward S. Harake; Akhil Kondepudi; Christian Freudiger; Honglak Lee; Todd C. Hollon; |
247 | Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional debiasing methods primarily focus on the model training stage, including approaches based on data augmentation and reweighting, yet they struggle with the complex biases inherent in LLMs. To address such limitations, the causal relationship behind the prompting methods is uncovered using a structural causal model, and a novel causal prompting method based on front-door adjustment is proposed to effectively mitigate LLMs biases. |
Congzhi Zhang; Linhai Zhang; Jialong Wu; Yulan He; Deyu Zhou; |
248 | Distribution-Driven Dense Retrieval: Modeling Many-to-One Query-Document Relationship Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these multiple vector-based approaches encounter challenges such as Increased Storage, Vector Collapse, and Search Efficiency. To address these issues, we introduce the Distribution-Driven Dense Retrieval framework (DDR). |
Junfeng Kang; Rui Li; Qi Liu; Zhenya Huang; Zheng Zhang; Yanjiang Chen; Linbo Zhu; Yu Su; |
249 | VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present VersaGen, a generative AI agent that enables versatile visual control in T2I synthesis. |
Zhipeng Chen; Lan Yang; Yonggang Qi; Honggang Zhang; Kaiyue Pang; Ke Li; Yi-Zhe Song; |
250 | Beyond Pixel and Object: Part Feature As Reference for Few-Shot Video Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we analyze the deficiencies inherent in the use of object prototypes and pixel features as references in previous methods. |
Naisong Luo; Guoxin Xiong; Tianzhu Zhang; |
251 | Diffusion Prior Interpolation for Flexibility Real-World Face Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a masking strategy with strong and weak constraints and iterative refinement for real-world FSR, termed Diffusion Prior Interpolation (DPI). |
Jiarui Yang; Tao Dai; Yufei Zhu; Naiqi Li; Jinmin Li; Shu-Tao Xia; |
252 | Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, this paper aims to advance the development of open-vocabulary object detection in remote sensing community. |
Jiancheng Pan; Yanxing Liu; Yuqian Fu; Muyuan Ma; Jiahao Li; Danda Pani Paudel; Luc Van Gool; Xiaomeng Huang; |
253 | Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Directly incorporating uncertainty quantification into KG-LLM frameworks presents a challenge due to their more complex architectures and the intricate interactions between the knowledge graph and language model components. To address this crucial gap, we propose a new trustworthy KG-LLM framework, UAG (Uncertainty Aware Knowledge-Graph Reasoning), which incorporates uncertainty quantification into the KG-LLM framework. |
Bo Ni; Yu Wang; Lu Cheng; Erik Blasch; Tyler Derr; |
254 | Enhancing Audiovisual Speech Recognition Through Bifocal Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose using a preference optimization strategy to improve speech recognition accuracy for real-world videos. |
Yihan Wu; Yichen Lu; Yifan Peng; Xihua Wang; Ruihua Song; Shinji Watanabe; |
255 | Divide-Solve-Combine: An Interpretable and Accurate Prompting Framework for Zero-shot Multi-Intent Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While significant advancements have been witnessed, the existing prompting approaches still face two major issues: lacking explicit reasoning and lacking interpretability. Therefore, in this paper, we introduce a Divide-Solve-Combine Prompting (DSCP) to address the above issues. |
Libo Qin; Qiguang Chen; Jingxuan Zhou; Jin Wang; Hao Fei; Wanxiang Che; Min Li; |
256 | Elevating Flow-Guided Video Inpainting with Reference Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a robust and practical VI framework that leverages a large generative model for reference generation in combination with an advanced pixel propagation algorithm. |
Suhwan Cho; Seoung Wug Oh; Sangyoun Lee; Joon-Young Lee; |
257 | Robust and Consistent Online Video Instance Segmentation Via Instance Mask Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the leading methods in the tracking-by-detection paradigm often result in temporally inconsistent predictions at both instance-level and pixel-level that lead to visually unsatisfactory outcomes. To address these challenges, we propose RoCoVIS, a simple yet effective approach that integrates segmentation and tracking to provide consistent online VIS. |
Miran Heo; Seoung Wug Oh; Seon Joo Kim; Joon-Young Lee; |
258 | Reducing AUV Energy Consumption Through Dynamic Sensor Directions Switching Via Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods focus on reducing energy consumption on AUV computing and movement, neglecting sensing energy consumption and few attempts have been made to balance the AUV energy and sensing ability with a flexible sensing system. Along these lines, we consider both AUV energy consumption and flexible sensing abilities, and propose a deep reinforcement learning-based method to Reduce Energy Consumption by AUV Sensing system (RECS). |
Jiawei Liu; Yuanbo Xu; Shanshan Song; Lu Jiang; |
259 | AdaDiff: Adaptive Step Selection for Fast Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce AdaDiff, a lightweight framework designed to learn instance-specific step usage policies, which are then used by the diffusion model for generation. |
Hui Zhang; Zuxuan Wu; Zhen Xing; Jie Shao; Yu-Gang Jiang; |
260 | GeoMamba: Towards Multi-granular POI Recommendation with Geographical State Space Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, while the approximation operator HiPPO sets the foundation of linear SSMs, we introduce a novel GaPPO operator that extends the model’s state space into graph-represented geographical domains. |
Yifang Qin; Jiaxuan Xie; Zhiping Xiao; Ming Zhang; |
261 | Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel method called Gode, which accounts for the dual-level structure inherent in molecules. |
Pengcheng Jiang; Cao Xiao; Tianfan Fu; Parminder Bhatia; Taha Kass-Hout; Jimeng Sun; Jiawei Han; |
262 | Correcting Large Language Model Behavior Via Influence Function Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methodologies, either curation of new data for continual alignment or manual correction of outdated data for re-alignment, demand costly human resources. To address this, we propose a novel approach, LLM BehAvior Correction with INfluence FunCtion REcall and Post-Training (LANCET), which needs no human involvement. |
Han Zhang; Zhuo Zhang; Yi Zhang; Yuanzhao Zhai; Hanyang Peng; Yu Lei; Yue Yu; Hui Wang; Bin Liang; Lin Gui; Ruifeng Xu; |
263 | MUSE: Mamba Is Efficient Multi-scale Learner for Text-video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose MUSE, a multi-scale mamba with linear computational complexity for efficient cross-resolution modeling. |
Haoran Tang; Meng Cao; Jinfa Huang; Ruyang Liu; Peng Jin; Ge Li; Xiaodan Liang; |
264 | Small Language Model Makes An Effective Long Text Extractor Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods struggle with the accurate generation of longer spans and often incur significant time costs for effective finetuning. To address these challenges, this paper introduces a lightweight span-based NER method called SeNER, which incorporates a bidirectional arrow attention mechanism coupled with LogN-Scaling on the [CLS] token to embed long texts effectively, and comprises a novel bidirectional sliding-window plus-shaped attention (BiSPA) mechanism to reduce redundant candidate token-pair spans significantly and model interactions between token-pair spans simultaneously. |
Yelin Chen; Fanjin Zhang; Jie Tang; |
265 | VERSE: Verification-based Self-Play for Code Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the direct generation of responses does not ensure functional correctness, a crucial requirement for generating responses to code instructions. To overcome this, we present Verification-Based Self-Play (VERSE), aiming to enhance model proficiency in generating correct responses. |
Hao Jiang; Qi Liu; Rui Li; Yuze Zhao; Yixiao Ma; Shengyu Ye; Junyu Lu; Yu Su; |
266 | Epsilon: Exploring Comprehensive Visual-Semantic Projection for Multi-Label Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Relying only on the local existence of seen classes during the inference stage introduces unavoidable bias. In this paper, we propose a novel and comprehensive visual-semantic framework for MLZSL, dubbed Epsilon, to fully make use of such properties and enable a more accurate and robust visual-semantic projection. |
Ziming Liu; Jingcai Guo; Song Guo; Xiaocheng Lu; |
267 | Unveiling The Threat of Fraud Gangs to Graph Neural Networks: Multi-Target Graph Injection Attacks Against GNN-Based Fraud Detectors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we design attack scenarios where fraud gangs aim to make their fraud nodes misclassified as benign by camouflaging their illicit activities in collusion. |
Jinhyeok Choi; Heehyeon Kim; Joyce Jiyoung Whang; |
268 | Unleashing The Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an Audio-Language-Referenced SAM 2 (AL-Ref-SAM 2) pipeline to explore the training-free paradigm for audio and language-referenced video object segmentation, namely AVS and RVOS tasks. |
Shaofei Huang; Rui Ling; Hongyu Li; Tianrui Hui; Zongheng Tang; Xiaoming Wei; Jizhong Han; Si Liu; |
269 | Query Quantized Neural SLAM Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The underfitting usually results in severe drifts in camera tracking and artifacts in reconstruction. To resolve this issue, we propose query quantized neural SLAM which uses quantized queries to reduce variations of input for much easier and faster overfitting a frame. |
Sijia Jiang; Jing Hua; Zhizhong Han; |
270 | Sensing Surface Patches in Volume Rendering for Inferring Signed Distance Functions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is still challenging to explicitly impose constraints on surfaces for inferring more geometry details due to the limited ability of sensing surfaces in volume rendering. To resolve this problem, we introduce a method to infer signed distance functions (SDFs) with a better sense of surfaces through volume rendering. |
Sijia Jiang; Tong Wu; Jing Hua; Zhizhong Han; |
271 | Semantic Ambiguity Modeling and Propagation for Fine-Grained Visual Cross View Geo-Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, the model may encounter conflicting dominating gradients during joint training. To address this, we propose to model the semantic ambiguity during the offset regression process by integrating associated uncertainty scores, represented as 2D Gaussian distributions, to mitigate negative transfer effects within the joint tasks. |
Mingtao Feng; Fenghao Tian; Jianqiao Luo; Zijie Wu; Weisheng Dong; Yaonan Wang; Ajmal Saeed Mian; |
272 | RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages. |
Adrian de Wynter; Ishaan Watts; Tua Wongsangaroonsri; Minghui Zhang; Noura Farra; Nektar Ege Altıntoprak; Lena Baur; Samantha Claudet; Pavel Gajdušek; Qilong Gu; Anna Kaminska; Tomasz Kaminski; Ruby Kuo; Akiko Kyuba; Jongho Lee; Kartik Mathur; Petter Merok; Ivana Milovanović; Nani Paananen; Vesa-Matti Paananen; Anna Pavlenko; Bruno Pereira Vidal; Luciano Ivan Strika; Yueh Tsao; Davide Turcato; Oleksandr Vakhno; Judit Velcsov; Anna Vickers; Stéphanie F. Visser; Herdyan Widarmanto; Andrey Zaikin; Si-Qing Chen; |
273 | Fair Graph U-Net: A Fair Graph Learning Framework Integrating Group and Individual Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Learning high-level representations for graphs is crucial for tasks like node classification, where graph pooling aggregates node features to provide a holistic view that enhances … |
Zichong Wang; Zhibo Chu; Thang Viet Doan; Shaowei Wang; Yongkai Wu; Vasile Palade; Wenbin Zhang; |
274 | MOS: Model Surgery for Pre-Trained Model-Based Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate parameter-level forgetting, we present an adapter merging approach to learn task-specific adapters, which aims to bridge the gap between different components while reserve task-specific information. |
Hai-Long Sun; Da-Wei Zhou; Hanbin Zhao; Le Gan; De-Chuan Zhan; Han-Jia Ye; |
275 | CasFT: Future Trend Modeling for Information Popularity Prediction with Dynamic Cues-Driven Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, how to transfer the preceding-term dynamics learned from the observed diffusion process into future-term trends remains an unexplored challenge. Against this background, we propose CasFT, which leverages observed information Cascades and dynamic cues extracted via neural ODEs as conditions to guide the generation of Future popularity-increasing Trends through a diffusion model. |
Xin Jing; Yichen Jing; Yuhuan Lu; Bangchao Deng; Xueqin Chen; Dingqi Yang; |
276 | Improving Cancer Gene Prediction By Enhancing Common Information Between The PPI Network and Gene Functional Association Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While this common information may contain more accurate gene association information, existing methods often overlook this potential. To address this gap, we introduce DISFusion, which integrates multi-omics cancer data, PPI networks, and gene functional associations to identify cancer genes. |
Chao Deng; Hongdong Li; Jianxin Wang; |
277 | Enhancing Multimodal Large Language Models Complex Reason Via Similarity Computation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To efficiently utilize the image information, we propose a new image token reduction method, Simignore, which aims to improve the complex reasoning ability of LVLMs by computing the similarity between image and text embeddings and ignoring image tokens that are irrelevant and unimportant to the text. |
Xiaofeng Zhang; Fanshuo Zeng; Yihao Quan; Zheng Hui; Jiawei Yao; |
278 | MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce MARS, a novel framework for T2I generation that incorporates a specially designed Semantic Vision-Language Integration Expert (SemVIE). |
Wanggui He; Siming Fu; Mushui Liu; Xierui Wang; Wenyi Xiao; Fangxun Shu; Yi Wang; Lei Zhang; Zhelun Yu; Haoyuan Li; Ziwei Huang; Leilei Gan; Hao Jiang; |
279 | Through The Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classifications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the evolution of augmentation methods, issues like graph property distortions and restricted structural changes persist. This leads to the question: Is it possible to develop more property-conserving and structure-sensitive augmentation methods? |
Yutong Xia; Runpeng Yu; Yuxuan Liang; Xavier Bresson; Xinchao Wang; Roger Zimmermann; |
280 | Scaling Trends for Data Poisoning in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate three threat models—malicious fine-tuning, imperfect data curation, and intentional data contamination—across 24 frontier LLMs ranging from 1.5 to 72 billion parameters. |
Dillon Bowen; Brendan Murphy; Will Cai; David Khachaturov; Adam Gleave; Kellin Pelrine; |
281 | DiffuseHigh: Training-Free Progressive High-Resolution Image Synthesis Through Structure Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we probe the generative ability of diffusion models at higher resolution beyond their original capability and propose a novel progressive approach that fully utilizes generated low-resolution images to guide the generation of higher-resolution images. |
Younghyun Kim; Geunmin Hwang; Junyu Zhang; Eunbyung Park; |
282 | Bridging Knowledge Gap Between Image Inpainting and Large-Area Visible Watermark Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing deep neural network (DNN)-based models still struggle with large-area watermarks and are overly dependent on the quality of watermark mask prediction. To overcome these challenges, we introduce a novel feature adapting framework that leverages the representation modeling capacity of a pre-trained image inpainting model. |
Yicheng Leng; Chaowei Fang; Junye Chen; Yixiang Fang; Sheng Li; Guanbin Li; |
283 | Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: There are several efficient VLM efforts, but they often sacrifice linguistic capabilities to enhance multimodal abilities, or require extensive training. To address this quandary, we introduce the innovative framework of Efficient Vision Language Models with Elastic Visual Experts (Eve). |
Miao Rang; Zhenni Bi; Chuanjian Liu; Yehui Tang; Kai Han; Yunhe Wang; |
284 | Breaking The Resource Monopoly from Industries: Sustainable and Reliable LLM Serving By Recycling Outdated and Resource-Constrained GPUs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this talk, I will traverse my series of contributions with promising new directions, particularly emphasizing modularized LLM architecture (Part 1), in-storage sustainable computing (Part 2), and reliable serving against software and hardware attacks (Part 3). |
Tianlong Chen; |
285 | Generalized Implicit Neural Representations for Dynamic Molecular Surface Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study introduces MoE-DSR, an enhanced version of dynamic surface representations (DSR) that effectively integrates the mixture-of-experts (MoE) strategy. |
Fang Wu; Bozhen Hu; Stan Z. Li; |
286 | Temporal Coherent Object Flow for Multi-Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a section-based multi-object tracking approach that integrates a temporal coherent Object Flow Tracker (OFTrack), capable of achieving simultaneous multi-frame tracking by treating multiple consecutive frames as the basic processing unit, denoted as a “section”. |
Zikai Song; Run Luo; Lintao Ma; Ying Tang; Yi-Ping Phoebe Chen; Junqing Yu; Wei Yang; |
287 | Enhancing Contrastive Learning Inspired By The Philosophy of “The Blind Men and The Elephant” Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the story of the blind men and the elephant, we introduce JointCrop and JointBlur. |
Yudong Zhang; Ruobing Xie; Jiansheng Chen; Xingwu Sun; Zhanhui Kang; Yu Wang; |
288 | Drawing Informative Gradients from Sources: A One-stage Transfer Learning Framework for Cross-city Spatiotemporal Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This strategy, while effective, faces challenges as pre-trained models risk absorbing noise and harmful information due to data distribution disparities, potentially undermining the accuracy of forecasts for target cities. To address this issue, we propose a one-stage STF framework named Target-Skewed Joint Training (TSJT). |
Yudong Zhang; Xu Wang; Xuan Yu; Zhaoyang Sun; Kai Wang; Yang Wang; |
289 | Multi-Attribute Multi-Grained Adaptation of Pre-Trained Language Models for Text Understanding from Bayesian Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study revisits the assumption that non-IID information enhances PLMs to achieve performance improvements from a Bayesian perspective, which unearths and integrates non-IID and IID features. |
You Zhang; Jin Wang; Liang-Chih Yu; Dan Xu; Xuejie Zhang; |
290 | Query-efficient Attack for Black-box Image Inpainting Forensics Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To identify potential flaws, we propose a novel black-box anti-forensics framework to attack inpainting forensics methods, which employs reinforcement learning to generate a query-efficient countermeasure, named RLGC. |
Xianbo Mo; Shunquan Tan; Bin Li; Jiwu Huang; |
291 | Revisiting Multimodal Emotion Recognition in Conversation from The Perspective of Graph Spectrum Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this paper revisits the task of MERC from the perspective of the graph spectrum and proposes a Graph-Spectrum-based Multimodal Consistency and Complementary collaborative learning framework GS-MCC. |
Wei Ai; Fuchen Zhang; Yuntao Shou; Tao Meng; Haowen Chen; Keqin Li; |
292 | Towards Scalable and Deep Graph Neural Networks Via Noise Masking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Second, pre-processing overhead takes up most of the end-to-end processing time, especially for large-scale graphs. To address these limitations, we present random walk with noise masking (RMask), a plug-and-play module compatible with the existing model-simplification works. |
Yuxuan Liang; Wentao Zhang; Zeang Sheng; Ling Yang; Quanqing Xu; Jiawei Jiang; Yunhai Tong; Bin Cui; |
293 | Unlocking The Potential of Black-box Pre-trained GNNs for Graph Few-shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the critical problem: Leveraging black-box pre-trained GNNs for graph few-shot learning. |
Qiannan Zhang; Shichao Pei; Yuan Fang; Xiangliang Zhang; |
294 | Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a multimodal large language model (MLLM) framework for FAS, termed Interpretable Face Anti-Spoofing (I-FAS), which transforms the FAS task into an interpretable visual question answering (VQA) paradigm. |
Guosheng Zhang; Keyao Wang; Haixiao Yue; Ajian Liu; Gang Zhang; Kun Yao; Errui Ding; Jingdong Wang; |
295 | A Unified Degradation-Robust Approach to SSL and UDA for 3D Medical Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional methods tend to underperform when these issues occur simultaneously, as they are typically designed for specific tasks. To address this, we propose a unified framework that effectively handles limited annotations and domain shifts while also managing both clean and degraded images during inference. |
Suruchi Kumari; Pravendra Singh; |
296 | Efficient 3D Recognition with Event-driven Spike Sparse Convolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Combining SVC and SSC, we design an efficient 3D SNN backbone (E-3DSNN), which is friendly with neuromorphic hardware. |
Xuerui Qiu; Man Yao; Jieyuan Zhang; Yuhong Chou; Ning Qiao; Shibo Zhou; Bo Xu; Guoqi Li; |
297 | Learn2Aggregate: Supervised Generation of Chvatal-Gomory Cuts Using Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Learn2Aggregate, a machine learning (ML) framework for optimizing the generation of Chvatal-Gomory (CG) cuts in mixed integer linear programming (MILP). |
Arnaud Deza; Elias B. Khalil; Zhenan Fan; Zirui Zhou; Yong Zhang; |
298 | Genomics Data Lossless Compression with (S, K)-Mer Encoding and Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these existing compressors suffer from inferior compression ratios or throughput, and adaptive compressors also faces model cold-start problems. To address these issues, we propose DeepGeCo, a novel genomics data lossless adaptive compression framework with (s,k)-mer encoding and deep neural networks, involving three compression modes (MINI for static, PLUS for adaptive, ULTRA for semi-adaptive) for flexible requirements of compression ratios or throughput. |
Hui Sun; Liping Yi; Huidong Ma; Yongxia Sun; Yingfeng Zheng; Wenwen Cui; Meng Yan; Gang Wang; Xiaoguang Liu; |
299 | Relaxed Class-consensus Consistency for Semi-supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the bottlenecks of class-independent consistency inherent in previous methods and offer a fresh perspective of cooperative game theory to explicitly encourage class-consensus alignment (i.e., class-consensus consistency between the teacher (weak augmented view) and student network (strong augmented view). |
Huayu Mai; Rui Sun; Feng Wu; |
300 | Multi-StyleGS: Stylized Gaussian Splatting with Multiple Styles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While 3D Gaussian Splatting (GS) has emerged as a promising and efficient method for realistic 3D scene modeling, there remains a challenge in adapting it to stylize 3D GS to match with multiple styles through automatic local style transfer or manual designation, while maintaining memory efficiency for stylization training. In this paper, we introduce a novel 3D GS stylization solution termed Multi-StyleGS to tackle these challenges. |
Yangkai Lin; Jiabao Lei; Kui Jia; |
301 | Optimize Incompatible Parameters Through Compatibility-aware Knowledge Integration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the former focuses on efficiency rather than performance, while the latter requires several times more computing and storage resources to support inference. In this paper, we set the goal to explicitly improve these incompatible parameters by leveraging the complementary strengths of different models, thereby directly enhancing the models without any additional parameters. |
Zheqi Lv; Keming Ye; Zishu Wei; Qi Tian; Shengyu Zhang; Wenqiao Zhang; Wenjie Wang; Kun Kuang; Tat-Seng Chua; Fei Wu; |
302 | CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further improve performance, we also introduce a gated mixture-of-experts module to specialize in handling diverse patch patterns and reduce mutual interference between them in multi-class training. Our method achieves competitive performance on the MVTec AD and VisA datasets, demonstrating its effectiveness. |
Xiaolei Wang; Xiaoyang Wang; Huihui Bai; Eng Gee Lim; Jimin Xiao; |
303 | CoSDA: Enhancing The Robustness of Inversion-based Generative Image Watermarking Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, our results show that inversion error will significantly affect the overall robustness. Therefore, in this paper, we delve into the inversion error aspect and propose CoSDA, a compensation sampling and drift alignment-based approach. |
Han Fang; Kejiang Chen; Zijin Yang; Bosen Cui; Weiming Zhang; Ee-Chien Chang; |
304 | JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This departure from the typical workflows of professional composers hinders the ability to refine details in specific tracks. To address this gap, we propose JEN-1 Composer, a unified framework designed to efficiently model marginal, conditional, and joint distributions over multi-track music using a single model. |
Yao Yao; Peike Li; Boyu Chen; Alex Wang; |
305 | Memory Efficient Matting with Adaptive Token Routing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, their application to high-resolution images remains challenging due to the quadratic complexity of global self-attention. To address this issue, we propose MEMatte, a memory-efficient matting framework for processing high-resolution images. |
Yiheng Lin; Yihan Hu; Chenyi Zhang; Ting Liu; Xiaochao Qu; Luoqi Liu; Yao Zhao; Yunchao Wei; |
306 | PFedGPA: Diffusion-based Generative Parameter Aggregation for Personalized Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While personalized FL approaches can mitigate the heterogeneous data issue to some extent, the limitation of linear aggregation remains unresolved. To alleviate this issue, we investigate the generative approach of diffusion model and propose a novel generative parameter aggregation framework for personalized FL, pFedGPA. |
Jiahao Lai; Jiaqi Li; Jian Xu; Yanru Wu; Boshi Tang; Siqi Chen; Yongfeng Huang; Wenbo Ding; Yang Li; |
307 | Beyond Graph Convolution: Multimodal Recommendation with Topology-aware MLPs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, in this paper, we investigate bypassing GCNs when modeling multimodal item-item relationship. |
Junjie Huang; Jiarui Qin; Yong Yu; Weinan Zhang; |
308 | Foreground-Covering Prototype Generation and Matching for SAM-Aided Few-Shot Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Foreground-Covering Prototype Generation and Matching to resolve Few-Shot Segmentation (FSS), which aims to segment target regions in unlabeled query images based on labeled support images. |
Suho Park; SuBeen Lee; Hyun Seok Seong; Jaejoon Yoo; Jae-Pil Heo; |
309 | Supportive Negatives Spectral Augmentation for Source-Free Cross-Domain Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, cross-domain distribution shift makes it difficult for the adapted model to provide accurate decisions on several hard instances and negatively affects model generalization. To overcome this limitation, a novel method `supportive negatives spectral augmentation’ (SNSA) is presented in this work. |
Kexin Zheng; Haifeng Xia; Siyu Xia; Ming Shao; Zhengming Ding; |
310 | GURecon: Learning Detailed 3D Geometric Uncertainties for Neural Surface Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel framework, i.e, GURecon, which establishes a geometric uncertainty field for the neural surface based on geometric consistency. |
Zesong Yang; Ru Zhang; Jiale Shi; Zixiang Ai; Boming Zhao; Hujun Bao; Luwei Yang; Zhaopeng Cui; |
311 | OLiDM: Object-aware LiDAR Diffusion Models for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To cater to the needs of object-aware tasks in 3D perception, we introduce OLiDM, a novel framework capable of generating controllable and high-fidelity LiDAR data at both the object and scene levels. |
Tianyi Yan; Junbo Yin; Xianpeng Lang; Ruigang Yang; Cheng-Zhong Xu; Jianbing Shen; |
312 | Efficient Traffic Prediction Through Spatio-Temporal Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we introduce a spatio-temporal knowledge distillation framework that helps student MLPs capture graph-structured global spatio-temporal patterns while alleviating the over-smoothing effect with adaptive knowledge distillation. |
Qianru Zhang; Xinyi Gao; Haixin Wang; Siu Ming Yiu; Hongzhi Yin; |
313 | SCANS: Mitigating The Exaggerated Safety for LLMs Via Safety-Conscious Activation Steering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Safety-Conscious Activation Steering (SCANS) method to mitigate the exaggerated safety concerns in aligned LLMs. |
Zouying Cao; Yifei Yang; Hai Zhao; |
314 | Improving Complex Reasoning Over Knowledge Graph with Logic-Aware Curriculum Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a complex reasoning schema over KG upon large language models (LLMs), containing a curriculum-based logical-aware instruction tuning framework, named LACT. |
Tianle Xia; Liang Ding; Guojia Wan; Yibing Zhan; Bo Du; Dacheng Tao; |
315 | Polarization Guided Mask-Free Shadow Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Pol-ShaRe, the first Polarization-guided image Shadow Removal solution, to remove shadow in a mask-free manner with fewer artifacts. |
Chu Zhou; Chao Xu; Boxin Shi; |
316 | SpikeGS: Reconstruct 3D Scene Captured By A Fast-Moving Bio-Inspired Camera Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This limitation severely constrains the practical application of 3DGS and may compromise the feasibility of real-time reconstruction. To mitigate these challenges, we proposed Spike Gaussian Splatting (SpikeGS), the first framework that integrates the Bayer-pattern spike streams into the 3DGS pipeline to reconstruct 3D scenes captured by a fast-moving high temporal color spike camera in one second. |
Yijia Guo; Liwen Hu; Yuanxi Bai; Jiawei Yao; Lei Ma; Tiejun Huang; |
317 | Generalized Class Discovery in Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the imbalanced distributions, we propose an instance-wise temperature assignment (ITA) method for contrastive learning and class-wise reliability criteria for pseudo-labels. |
Cuong Manh Hoang; Yeejin Lee; Byeongkeun Kang; |
318 | CDE-Learning: Camera Deviation Elimination Learning for Unsupervised Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, features from the same camera are prone to generating false positives due to the identical camera properties, which induce camera deviations on pseudo-label assignment. To address this problem, this paper proposes a novel camera-unbiased method named Camera Deviation Elimination Learning (CDE-Learning). |
Jinjia Peng; Songyu Zhang; Huibing Wang; |
319 | Approximating Metric Magnitude of Point Sets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the magnitude computation problem, and show efficient ways of approximating it. |
Rayna Andreeva; James Ward; Primoz Skraba; Jie Gao; Rik Sarkar; |
320 | BeyondGender: A Multifaceted Bilingual Dataset for Practical Sexism Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Sexism affects both women and men, yet research often overlooks misandry and suffers from overly broad annotations that limit AI applications. To address this, we introduce BeyondGender, a dataset meticulously annotated according to the latest definitions of misogyny and misandry. |
Xuan Luo; Li Yang; Han Zhang; Geng Tu; Qianlong Wang; Keyang Ding; Chuang Fan; Jing Li; Ruifeng Xu; |
321 | Out of Length Text Recognition with Sub-String Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It triggers the requirement to build long text recognition models from readily available short (i.e., word-level) text datasets, which has been less studied previously. In this paper, we term this task Out of Length (OOL) text recognition. |
Yongkun Du; Zhineng Chen; Caiyan Jia; Xieping Gao; Yu-Gang Jiang; |
322 | JEN-1 DreamStyler: Customized Musical Concept Learning Via Pivotal Parameters Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method for customized text-to-music generation, which can capture the concept from a two-minute reference music and generate a new piece of music conforming to the concept. |
Boyu Chen; Peike Li; Yao Yao; Alex Wang; |
323 | FaceA-Net: Facial Attribute-Driven ID Preserving Image Generation Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issues of insufficient ID fidelity, we introduce a simple yet effective test-time fine-tuning approach. |
Jiayu Wang; Yue Yu; Jingjing Chen; Qi Dai; Yu-Gang Jiang; |
324 | CUQDS: Conformal Uncertainty Quantification Under Distribution Shift for Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the majority of existing trajectory prediction models have neither considered reducing the uncertainty as one objective during the training stage nor provided reliable uncertainty quantification during inference stage, especially under potential distribution shift. Therefore, in this paper, we propose the Conformal Uncertainty Quantification under Distribution Shift framework, CUQDS, to quantify the uncertainty of the predicted trajectories of existing trajectory prediction models under potential data distribution shift, while improving the prediction accuracy of the models and reducing the estimated uncertainty during the training stage. |
Huiqun Huang; Sihong He; Fei Miao; |
325 | Multi-scale Activation, Refinement, and Aggregation: Exploring Diverse Cues for Fine-Grained Bird Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework for FGBR, namely Multi-scale Diverse Cues Modeling (MDCM), which explores diverse cues at different scales across various stages of a multi-scale Vision Transformer (MS-ViT) in an “Activation-Selection-Aggregation” paradigm. |
Zhicheng Zhang; Hao Tang; Jinhui Tang; |
326 | Epistemic EFX Allocations Exist for Monotone Valuations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the fundamental problem of fairly dividing a set of indivisible items among agents with (general) monotone valuations. |
Hannaneh Akrami; Nidhi Rathi; |
327 | Achieving Maximin Share and EFX/EF1 Guarantees Simultaneously Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify a novel way to simultaneously achieve MMS guarantees with EFX/EF1 notions of fairness, while beating the best known approximation factors by Chaudhury et al. and Amanatidis et al. |
Hannaneh Akrami; Nidhi Rathi; |
328 | Long-Term EEG Partitioning for Seizure Onset Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a two-stage framework, SODor, that explicitly models seizure onset through a novel task formulation of subsequence clustering. |
Zheng Chen; Yasuko Matsubara; Yasushi Sakurai; Jimeng Sun; |
329 | HyperMixer: Specializable Hypergraph Channel Mixing for Long-term Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose HyperMixer, a novel specializable hypergraph channel mixing plugin which introduces versatile hypergraph structures to capture group channel interactions and time-varying patterns for long-term multivariate time series forecasting. |
Changyuan Tian; Zhicong Lu; Zequn Zhang; Heming Yang; Wei Cao; Zhi Guo; Xian Sun; Li Jin; |
330 | DyAb: Flow Matching for Flexible Antibody Design with AlphaFold-driven Pre-binding Antigen Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing computational methods in antibody design often overlook crucial conformational changes that antigens undergo during the binding process, significantly impacting the reliability of the resulting antibodies. To bridge this gap, we introduce dyAb, a flexible framework that incorporates AlphaFold2-driven predictions to model pre-binding antigen structures and specifically addresses the dynamic nature of antigen conformation changes. |
Cheng Tan; Yijie Zhang; Zhangyang Gao; Yufei Huang; Haitao Lin; Lirong Wu; Fandi Wu; Mathieu Blanchette; Stan Z. Li; |
331 | EMControl: Adding Conditional Control to Text-to-Image Diffusion Models Via Expectation-Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This adjustment is quantified by the pixel-level measure, where the latent is decoded back into a pixel image, and the forward operator translates the noisy image into the guidance domain for comparison with the guidance image. To enhance the fidelity of condition correction, we propose a learnable latent forward operator, focusing on latent-space consistency with the expectation that this latent-space consistency approximates the pixel-level fidelity measure. |
He Wang; Longquan Dai; Jinhui Tang; |
332 | Structural Pruning Via Spatial-aware Information Redundancy for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Within this framework, we introduce a spatial-aware redundancy metric based on feature maps, thus endowing the pruning process with location sensitivity to better adapt to pruning segmentation networks. |
Dongyue Wu; Zilin Guo; Li Yu; Nong Sang; Changxin Gao; |
333 | Geometry-Aware 3D Salient Object Detection Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a geometry-aware 3D salient object detection network that explicitly clusters points into superpoints to enhance the geometric boundaries of objects, thereby segmenting complete objects with clear boundaries. |
Chen Wang; Liyuan Zhang; Le Hui; Qi Liu; Yuchao Dai; |
334 | Enhancing Decision-Making for LLM Agents Via Step-Level Q-Value Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose leveraging a task-relevant Q-value model to guide action selection. |
Yuanzhao Zhai; Tingkai Yang; Kele Xu; Dawei Feng; Cheng Yang; Bo Ding; Huaimin Wang; |
335 | Adaptive Decision Boundary for Few-Shot Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current strategies primarily focus on preventing catastrophic forgetting, considering only the relationship between novel and base classes, without paying attention to the specific decision spaces of each class. To address this challenge, we propose a plug-and-play Adaptive Decision Boundary Strategy (ADBS), which is compatible with most FSCIL methods. |
Linhao Li; Yongzhang Tan; Siyuan Yang; Hao Cheng; Yongfeng Dong; Liang Yang; |
336 | DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DriveDreamer-2, which incorporates a Large Language Model (LLM) to facilitate the creation of user-defined driving videos. |
Guosheng Zhao; Xiaofeng Wang; Zheng Zhu; Xinze Chen; Guan Huang; Xiaoyi Bao; Xingang Wang; |
337 | Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this research, we introduce a novel data-model co-design perspective: to promote superior weight sparsity by learning important model topology and adequate input data in a synergetic manner. |
Can Jin; Tianjin Huang; Yihua Zhang; Mykola Pechenizkiy; Sijia Liu; Shiwei Liu; Tianlong Chen; |
338 | Intelligent OPC Engineer Assistant for Semiconductor Manufacturing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Intelligent OPC Engineer Assistant, an AI/LLM-powered methodology designed to solve the core manufacturing-aware optimization problem known as Optical Proximity Correction (OPC). |
Guojin Chen; Haoyu Yang; Bei Yu; Haoxing Ren; |
339 | Retention Score: Quantifying Jailbreak Risks for Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Objective of this paper is to assess resilience of VLMs against jailbreak attacks that can compromise model safety compliance and result in harmful outputs. |
Zaitang LI; Pin-Yu Chen; Tsung-Yi Ho; |
340 | Beyond Text: Fine-Grained Multi-Modal Fact Verification with Hypergraph Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework for multi-modal fact-checking, named Hypergraph Transformer-based Multi-modal Fact-Checking (HGTMFC). |
Hui Pang; Chaozhuo Li; Litian Zhang; Senzhang Wang; Xi Zhang; |
341 | PFedES: Generalized Proxy Feature Extractor Sharing for Model Heterogeneous Personalized Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing MHPFL approaches either rely on the availability of a public dataset with special characteristics to facilitate knowledge transfer, incur high computational and communication costs, or face potential model leakage risks. To address these limitations, we propose a model-heterogeneous personalized Federated learning approach based on generalized proxy feature Extractor Sharing (pFedES) for supervised image classification tasks. |
Liping Yi; Han Yu; Chao Ren; Gang Wang; Xiaoguang Liu; Xiaoxiao Li; |
342 | MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, current multi-modal aggregation methods have obvious limitations in dealing with long sequences from different modalities. To address above issues, we introduce a novel framework called MambaPro for multi-modal object ReID. |
Yuhao Wang; Xuehu Liu; Tianyu Yan; Yang Liu; Aihua Zheng; Pingping Zhang; Huchuan Lu; |
343 | Speed Master: Quick or Slow Play to Attack Speaker Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our paper presents a novel attack methodology named Speed Master, which undermines deep neural networks by manipulating the speed of speech samples. |
Zhe Ye; Wenjie Zhang; Ying Ren; Xiangui Kang; Diqun Yan; Bin Ma; Shiqi Wang; |
344 | Textualize Visual Prompt for Image Editing Via Diffusion Bridge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a framework based on any single text-to-image model without reliance on the explicit image-to-image model thus enhancing the generalizability and scalability. |
Pengcheng Xu; Qingnan Fan; Fei Kou; Shuai Qin; Hong Gu; Ruoyu Zhao; Charles Ling; Boyu Wang; |
345 | Efficient Self-Supervised Video Hashing with Selective State Spaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce S5VH, a Mamba-based video hashing model with an improved self-supervised learning paradigm. |
Jinpeng Wang; Niu Lian; Jun Li; Yuting Wang; Yan Feng; Bin Chen; Yongbing Zhang; Shu-Tao Xia; |
346 | Text-Guided Nonverbal Enhancement Based on Modality-Invariant and -Specific Representations for Video Speaking Style Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, treating each modality equally leads to a suboptimal result for these methods due to text is inherently more aligned with conversation understanding compared to nonverbal modalities. To address this issue, we propose a text-guided nonverbal enhancement method, TNvE, which is composed of two core modules: 1) a text-guided nonverbal representation selection module employs cross-modal attention based on modality-invariant representations, picking out critical nonverbal information via textual guide; and 2) a modality-invariant and -specific representation decoupling module incorporates modality-specific representations and decouples them from modality-invariant representations, enabling a more comprehensive understanding of multimodal data. |
Beibei Zhang; Tongwei Ren; Gangshan Wu; |
347 | Is Poisoning A Real Threat to DPO? Maybe More So Than You Think Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The increased practical use of these RLHF methods warrants an analysis of their vulnerabilities. In this work, we investigate the vulnerabilities of DPO to poisoning attacks under different scenarios and compare the effectiveness of preference poisoning, a first of its kind. |
Pankayaraj Pathmanathan; Souradip Chakraborty; Xiangyu Liu; Yongyuan Liang; Furong Huang; |
348 | Decoupling Appearance Variations with 3D Consistent Features in Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DAVIGS, a method that decouples appearance variations in a plug-and-play and efficient manner. |
Jiaqi Lin; Zhihao Li; Binxiao Huang; Xiao Tang; Jianzhuang Liu; Shiyong Liu; Xiaofei Wu; Fenglong Song; Wenming Yang; |
349 | Efficiently Enhancing Long-term Series Forecasting Via Ultra-long Lookback Windows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods struggle to effectively utilize long lookback windows due to overfitting, computational resource constraints, or information extraction challenges, thereby limiting them to using limited lookback windows for predicting long-term future series. To address these issues, this paper introduces the Input Refinement and Prediction Auxiliary (IRPA) framework, a lightweight model consisting of four linear layers designed to extract key information from ultra-long lookback windows to enhance limited lookback windows and assist prediction processes. |
Suxin Tong; Jingling Yuan; |
350 | ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods lack robustness to minor input variations due to the discrete mapping between data and parameters. To overcome this challenge, we propose ELDER, a novel approach to create a continuous association between data and adapters. |
Jiaang Li; Quan Wang; Zhongnan Wang; Yongdong Zhang; Zhendong Mao; |
351 | RaDIO: Real-Time Hallucination Detection with Contextual Index Optimized Query Formulation for Dynamic Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current dynamic RAG methods fall short in both aspects: identifying the optimal moment to activate the retrieval module and crafting the appropriate query once retrieval is triggered. To overcome these limitations, we introduce an approach, namely, RaDIO, Real-Time Hallucination Detection with Contextual Index Optimized query formulation for dynamic RAG. |
Jia Zhu; Hanghui Guo; Weijie Shi; Zhangze Chen; Pasquale De Meo; |
352 | Compression-Aware Computing for Scalable and Sustainable AI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Modern AI models demand substantial computational power, often relying on specialized hardware such as GPUs. To address this, the talk introduces compression-aware computing, a framework enabling AI models to recognize and adapt to their compressed states while preserving performance. |
Zhaozhuo Xu; |
353 | Self-supervised Trusted Contrastive Multi-view Clustering with Uncertainty Refined Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing contrastive MVC methods still ignore the reliability of clustering results and the impact of false negative pairs, which limits the application of methods in critical security areas. To solve the above challenges, we propose a Self-supervised Trusted Contrastive Multi-view Clustering with Uncertainty Refined (STCMC-UR) method, which integrates clustering results and uncertainty learning to guide the self-supervised contrastive learning (CL). |
Shizhe Hu; Binyan Tian; Weibo Liu; Yangdong Ye; |
354 | Multi-aspect Self-guided Deep Information Bottleneck for Multi-modal Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel multi-aspect self-guided deep information bottleneck (MSDIB) method for multi-modal clustering, which can effectively employ different aspects of guiding information for learning cluster-friendly information among modals. |
Shizhe Hu; Jiahao Fan; Guoliang Zou; Yangdong Ye; |
355 | Bites of Tomorrow: Personalized Recommendations for A Healthier and Greener Plate Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to employ recommendation systems to actively nudge users toward more sustainable choices. |
Jiazheng Jing; Yinan Zhang; Chunyan Miao; |
356 | Learning Personalized Decision Support Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we posit that personalizing access to decision support tools can be an effective mechanism for instantiating the appropriate use of AI assistance. |
Umang Bhatt; Valerie Chen; Katherine M. Collins; Parameswaran Kamalaruban; Emma Kallina; Adrian Weller; Ameet Talwalkar; |
357 | Harnessing Language Model for Cross-Heterogeneity Graph Knowledge Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the problem, we propose a novel Language Model-enhanced Cross-Heterogeneity learning model, namely LMCH. |
Jinyu Yang; Ruijia Wang; Cheng Yang; Bo Yan; Qimin Zhou; Yang Juan; Chuan Shi; |
358 | SELF-[IN]CORRECT: LLMs Struggle with Discriminating Self-Generated Responses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our resulting experimental analysis of several open-source and industrial LLMs, we observe that model’s are not reliably better at discriminating among previously-generated alternatives than generating initial responses. |
Dongwei Jiang; Jingyu Zhang; Orion Weller; Nathaniel Weir; Benjamin Van Durme; Daniel Khashabi; |
359 | Utilize The Flow Before Stepping Into The Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore two primary causes of over-refusal: Static conflict occurs when similar samples within the LLM’s feature space receive differing supervision signals (original vs. modified ”I don’t know"). |
Runchuan Zhu; Zhipeng Ma; Jiang Wu; Junyuan Gao; Jiaqi Wang; Dahua Lin; Conghui He; |
360 | G2LDetect: A Global-to-Local Approach for Hallucination Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a global-to-local approach for hallucination detection (G2LDetect), which considers the global information of the text before identifying local details. |
Xiaoxia Cheng; Zeqi Tan; Zhe Zheng; Weiming Lu; |
361 | V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: V2Xum-LLM, specifically V2Xum-LLaMA in this study, is the first framework that unifies different video summarization tasks into one large language model’s (LLM) text decoder and achieves task-controllable video summarization with temporal prompts and task instructions. |
Hang Hua; Yunlong Tang; Chenliang Xu; Jiebo Luo; |
362 | Semi-Supervised Clustering Framework for Fine-grained Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we design a Semi-Supervised Clustering framework for Scene Graph Generation (SSC-SGG) that uses the sparse labeled data to guide the generation of effective pseudo-labels from unlabeled object pairs, thus enriching the labeled sample space, especially for low-frequency interaction samples. |
Jiarui Yang; Chuan Wang; Jun Zhang; Shuyi Wu; Jinjing Zhao; Zeming Liu; Liang Yang; |
363 | PVTree: Realistic and Controllable Palm Vein Generation for Recognition Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods, however, often produce unrealistic palm vein patterns or struggle with controlling identity and style attributes. To address these issues, we propose a novel palm vein generation framework named PVTree. |
Sheng Shang; Chenglong Zhao; Ruixin Zhang; Jianlong Jin; Jingyun Zhang; Rizen Guo; Shouhong Ding; Yunsheng Wu; Yang Zhao; Wei Jia; |
364 | FreeCap: Hybrid Calibration-Free Motion Capture in Open Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel hybrid calibration-free method FreeCap to accurately capture global multi-person motions in open environments. |
Aoru Xue; Yiming Ren; Zining Song; Mao Ye; Xinge Zhu; Yuexin Ma; |
365 | Are Expressive Models Truly Necessary for Offline RL? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Sequential modeling, however, requires capturing accurate dynamics across long horizons in trajectory data to ensure reasonable policy performance. To meet this requirement, leveraging large, expressive models has become a popular choice in recent literature, which, however, comes at the cost of significantly increased computation and inference latency. |
Guan Wang; Haoyi Niu; Jianxiong Li; Li Jiang; Jianming Hu; Xianyuan Zhan; |
366 | Realistic Noise Synthesis with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current noise synthesis techniques struggle to accurately model complex noise distributions. We propose a novel Realistic Noise Synthesis Diffusor (RNSD) method using diffusion models to address these challenges. |
Qi Wu; Mingyan Han; Ting Jiang; Chengzhi Jiang; Jinting Luo; Man Jiang; Haoqiang Fan; Shuaicheng Liu; |
367 | A Comprehensive Evaluation on Event Reasoning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel benchmark EV2 for EValuation of EVent reasoning. |
Zhengwei Tao; Zhi Jin; Yifan Zhang; Xiancai Chen; Haiyan Zhao; Jia Li; Bin Liang; Chongyang Tao; Qun Liu; Kam-Fai Wong; |
368 | Semantic Convergence: Harmonizing Recommender Systems Via Two-Stage Alignment and Behavioral Semantic Tokenization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our study, we propose a novel framework that harmoniously merges traditional recommendation models with the prowess of LLMs. |
Guanghan Li; Xun Zhang; Yufei Zhang; Yifan Yin; Guojun Yin; Wei Lin; |
369 | In-Context Policy Adaptation Via Cross-Domain Skill Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an in-context policy adaptation (ICPAD) framework designed for long-horizon multi-task environments, exploring diffusion-based skill learning techniques in cross-domain settings. |
Minjong Yoo; Woo Kyung Kim; Honguk Woo; |
370 | Incremental Nyström-based Multiple Kernel Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, in scenarios where approximate kernel matrices emerge over time, these methods require storing historical kernel information and recalculating, resulting in inefficient resource utilization. To address these issues, we propose a novel MKC algorithm, termed Incremental Nyström-based Multiple Kernel Clustering (INMKC). |
Yu Feng; Weixuan Liang; Xinhang Wan; Jiyuan Liu; Suyuan Liu; Qian Qu; Renxiang Guan; Huiying Xu; Xinwang Liu; |
371 | Collaborative Similarity Fusion and Consistency Recovery for Incomplete Multi-view Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a collaborative Similarity Fusion and Consistency Recovery (SFCR) method, which resolves the incomplete multi-view clustering problem by learning a unified similarity graph and recovering missing samples with consistent structures. |
Bingbing Jiang; Chenglong Zhang; Xinyan Liang; Peng Zhou; Jie Yang; Xingyu Wu; Junyi Guan; Weiping Ding; Weiguo Sheng; |
372 | A New Formula for Sticker Retrieval: Reply with Stickers in Multi-Modal and Multi-Session Conversation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the created dataset, we propose a novel Intent-Guided Sticker Retrieval (IGSR) framework that retrieves stickers for multi-modal and multi-session conversation history drawing support from intent learning. |
Bingbing Wang; Yiming Du; Bin Liang; Zhixin Bai; Min Yang; Baojun Wang; Kam-Fai Wong; Ruifeng Xu; |
373 | Infer Human’s Intentions Before Following Natural Language Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new framework, Follow Instructions with Social and Embodied Reasoning (FISER), aiming for better natural language instruction following in collaborative embodied tasks. |
Yanming Wan; Yue Wu; Yiping Wang; Jiayuan Mao; Natasha Jaques; |
374 | Fully-Scalable Massively Parallel Algorithm for K-center with Outliers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the k-center problem with outliers (the (k, z)-center problem) in the context of Massively Parallel Computation (MPC). |
Di Wu; Qilong Feng; Junyu Huang; Jinhui Xu; Ziyun Huang; Jianxin Wang; |
375 | Multi-Perspective Consolidation Enhanced Cognitive Diagnosis Via Conditional Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite their success, these models continue to face the ill-posed problem because of the information loss caused by under-expressive interaction function and incomplete observations. In this paper, we address these challenges by proposing a novel cognitive diagnosis model, DMC-CDM, based on the theoretical premise that cognitive states can be captured with minimal information loss by maximizing the mutual information between observed and potential observations. |
Guanhao Zhao; Zhenya Huang; Cheng Cheng; Yan Zhuang; Qingyang Mao; Xin Li; Shijin Wang; Enhong Chen; |
376 | TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce TimeCAP, a time-series processing framework that creatively employs Large Language Models (LLMs) as contextualizers of time series data, extending their typical usage as predictors. |
Geon Lee; Wenchao Yu; Kijung Shin; Wei Cheng; Haifeng Chen; |
377 | TechSinger: Technique Controllable Multilingual Singing Voice Synthesis Via Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce TechSinger, an advanced system for controllable singing voice synthesis that supports five languages and seven vocal techniques. |
Wenxiang Guo; Yu Zhang; Changhao Pan; Rongjie Huang; Li Tang; Ruiqi Li; Zhiqing Hong; Yongqi Wang; Zhou Zhao; |
378 | CoDTS: Enhancing Sparsely Supervised Collaborative Perception with A Dual Teacher-Student Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods fail to achieve an optimal confidence threshold that harmonizes the quality and quantity of pseudo labels. To address this issue, we propose an end-to-end Collaborative perception Dual Teacher-Student framework (CoDTS), which employs adaptive complementary learning to produce both high-quality and high-quantity pseudo labels. |
Yushan Han; Hui Zhang; Honglei Zhang; Jing Wang; Yidong Li; |
379 | Sequential Preference Optimization: Multi-Dimensional Preference Alignment with Implicit Reward Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current methods either ignore the multi-dimensionality of human preferences (e.g. helpfulness and harmlessness) or struggle with the complexity of managing multiple reward models. To address these issues, we propose Sequential Preference Optimization (SPO), a method that sequentially fine-tunes LLMs to align with multiple dimensions of human preferences. |
Xingzhou Lou; Junge Zhang; Jian Xie; Lifeng Liu; Dong Yan; Kaiqi Huang; |
380 | Global Graph Propagation with Hierarchical Information Transfer for Incomplete Contrastive Multi-view Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although the existing methods have made great progress, there are still some problems: 1) most methods cannot effectively mine the information hidden in the missing data; 2) most methods typically divide representation learning and clustering into two separate stages, but this may affect the clustering performance as the clustering results directly depend on the learned representation. To address these problems, we propose a novel incomplete multi-view clustering method with hierarchical information transfer. |
Guoqing Chao; Kaixin Xu; Xijiong Xie; Yongyong Chen; |
381 | Beyond Spatial Domain: Cross-domain Promoted Fourier Convolution Helps Single Image Dehazing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Joint Spatial and Fourier Convolutional Network (JSFC-Net), which leverages Fourier transformation to simultaneously address the two aforementioned problems with low computational overhead. |
Xiaozhe Zhang; Haidong Ding; Fengying Xie; Linpeng Pan; Yue Zi; Ke Wang; Haopeng Zhang; |
382 | MoLE:Decoding By Mixture of Layer Experts Alleviates Hallucination in Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze LVLMs’ layer-wise decoding and identify that hallucinations can arise during the reasoning and factual information injection process. |
Tian Liang; Yuetian Du; Jing Huang; Ming Kong; Luyuan Chen; Yadong Li; Siye Chen; Qiang Zhu; |
383 | Label-Efficient Data Augmentation with Video Diffusion Models for Guidewire Segmentation in Cardiac Fluoroscopy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although deep learning methods have demonstrated high accuracy and robustness in wire segmentation, they require substantial annotated datasets for generalizability, underscoring the need for extensive labeled data to enhance model performance. To address this challenge, we propose the Segmentation-guided Frame-consistency Video Diffusion Model (SF-VD) to generate large collections of labeled fluoroscopy videos, augmenting the training data for wire segmentation networks. |
Shaoyan Pan; Yikang Liu; Lin Zhao; Eric Z. Chen; Xiao Chen; Terrence Chen; Shanhui Sun; |
384 | Beyond Federated Prototype Learning: Learnable Semantic Anchors with Hyperspherical Contrast for Domain-Skewed Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the two drawbacks, we go beyond conventional paradigm of federated prototype learning, and propose learnable semantic anchors with hyperspherical contrast (FedLSA) for domain-skewed data. |
Lele Fu; Sheng Huang; Yanyi Lai; Tianchi Liao; Chuanfu Zhang; Chuan Chen; |
385 | Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although LLMs excel at text generation, they face challenges in taking on the role of policy-makers through traditional training methods, limiting their exploration in simultaneous generation. To overcome these limitations, we propose a novel LLM-driven Simultaneous Generation (LSG) framework, which allows the off-the-shelf LLM to decide the generation timing and produce output concurrently. |
Shoutao Guo; Shaolei Zhang; Zhengrui Ma; Yang Feng; |
386 | LLM+AL: Bridging Large Language Models and Action Languages for Complex Reasoning About Actions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large Language Models (LLMs) have made significant strides in various intelligent tasks but still struggle with complex action reasoning tasks that require systematic search. To address this limitation, we propose a method that bridges the natural language understanding capabilities of LLMs with the symbolic reasoning strengths of action languages. |
Adam Ishay; Joohyung Lee; |
387 | FastLGS: Speeding Up Language Embedded Gaussians with Feature Grid Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present FastLGS, an approach that supports real-time open-vocabulary query within 3D Gaussian Splatting (3DGS) under high resolution. |
Yuzhou Ji; He Zhu; Junshu Tang; Wuyi Liu; Zhizhong Zhang; Xin Tan; Yuan Xie; |
388 | Fusing Pruned and Backdoored Models: Optimal Transport-based Data-free Backdoor Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we first demonstrate our findings that the NWCs of random unlearning are positively correlated with those of poison unlearning. Based on this observation, we propose a random-unlearning NWC pruning technique to eliminate the backdoor effect and obtain a backdoor-free pruned model. |
Weilin Lin; Li Liu; Jianze Li; Hui Xiong; |
389 | Deep Non-Rigid Structure-from-Motion Revisited: Canonicalization and Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an easy-to-implement per-sequence canonicalization method as opposed to the previous per-dataset canonicalization approaches. With this in mind, we propose a sequence modeling method that combines temporal information and subspace constraint. |
Hui Deng; Jiawei Shi; Zhen Qin; Yiran Zhong; Yuchao Dai; |
390 | Unveiling The Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our holistic analysis provides valuable insights into each perspective. |
Xinlu Zhang; Zhiyu Zoey Chen; Xi Ye; Xianjun Yang; Lichang Chen; William Yang Wang; Linda Ruth Petzold; |
391 | BERT-Based Code Learning for Exception Localization and Type Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a model called CodeHunter for exception localization and type prediction. |
Chongyu Zhang; Qiping Tao; Liangyu Chen; Min Zhang; |
392 | Prediction-Feedback DETR for Temporal Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, our findings reveal that cross-attention exhibits patterns distinct from predictions, indicating a short-cut phenomenon. To resolve this, we propose a new framework, Prediction-Feedback DETR (Pred-DETR), which utilizes predictions to restore the collapse and align the cross- and self-attention with predictions. |
Jihwan Kim; Miso Lee; Cheol-Ho Cho; Jihyun Lee; Jae-Pil Heo; |
393 | From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Low activation density enables efficient model inference on sparsity-aware hardware. Building upon this insight, in this work, we propose a novel density loss that encourages higher activation sparsity (equivalently, lower activation density) in the pre-trained models. |
Bharat Runwal; Tejaswini Pedapati; Pin-Yu Chen; |
394 | Towards Effective, Efficient and Unsupervised Social Event Detection in The Hyperbolic Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response to the challenges, this work introduces an unsupervised framework, HyperSED (Hyperbolic SED). |
Xiaoyan Yu; Yifan Wei; Shuaishuai Zhou; Zhiwei Yang; Li Sun; Hao Peng; Liehuang Zhu; Philip S. Yu; |
395 | FatesGS: Fast and Accurate Sparse-View Surface Reconstruction Using Gaussian Splatting with Depth-Feature Consistency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an innovative sparse-view reconstruction framework that leverages intra-view depth and multi-view feature consistency to achieve remarkably accurate surface reconstruction. |
Han Huang; Yulun Wu; Chao Deng; Ge Gao; Ming Gu; Yu-Shen Liu; |
396 | Explicit and Implicit Examinee-Question Relation Exploiting for Efficient Computerized Adaptive Testing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, most of these existing question selectors are based on greedy strategies, which potentially overlooks promising quuestions. To bridge the above two types of approaches, we propose a novel framework named Relation Exploiting-based CAT(RECAT) by exploring and exploiting the implicit and explicit examinee-question relation. |
Changqian Wang; Shangshang Yang; Siyu Song; Ziwen Wang; Haiping Ma; Xingyi Zhang; Bo Jin; |
397 | Enhancing Chain of Thought Prompting in Large Language Models Via Reasoning Patterns Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose leveraging reasoning patterns to enhance CoT prompting effectiveness. |
Yufeng Zhang; Xuepeng Wang; Lingxiang Wu; Jinqiao Wang; |
398 | Enhanced Denesity Peak Clustering for High-Dimensional Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, existing methods typically adopt a one-step label assignment strategy, making them prone to cascading errors when initial misassignments occur. To address these challenges, we propose an Enhanced Density Peak Clustering (EDPC) method, which creatively incorporates multilayer perceptron (MLP)-based dimensionality reduction and a hierarchical label assignment strategy to significantly improve clustering performance in high-dimensional scenarios. |
Zhongli Wang; Jie Yang; Junyi Guan; Chenglong Zhang; Xinyan Liang; Bingbing Jiang; Weiguo Sheng; |
399 | Monitoring Primitive Interactions During The Training of DNNs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Beyond the explanation of a static DNN, in this paper, we hope to show that the seemingly complex learning dynamics of a DNN can be faithfully represented as the change of a few primitive interaction patterns encoded by the DNN. |
Jie Ren; Xinhao Zheng; Jiyu Liu; Andrew Lizarraga; Ying Nian Wu; Liang Lin; Quanshi Zhang; |
400 | Unsupervised Translation of Emergent Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study employs unsupervised neural machine translation (UNMT) techniques to decipher ECs formed during referential games with varying task complexities, influenced by the semantic diversity of the environment. |
Ido Levy; Orr Paradise; Boaz Carmeli; Ron Meir; Shafi Goldwasser; Yonatan Belinkov; |
401 | Beyond Human Data: Aligning Multimodal Large Language Models By Iterative Self-Evolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Some methods even require extra models or ground truth answers to construct preference data. To overcome these limitations, we propose a novel multimodal self-evolution framework that empowers the model to autonomously generate high-quality questions and answers using only unannotated images. |
Wentao Tan; Qiong Cao; Yibing Zhan; Chao Xue; Changxing Ding; |
402 | AdaO2B: Adaptive Online to Batch Conversion for Out-of-Distribution Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches employ fixed conversion mechanisms that are unable to adapt to novel testing distributions, hindering the testing accuracy of the batch learner. To address these issues, we propose AdaO2B, an adaptive online to batch conversion approach under the bandit setting. |
Xiao Zhang; Sunhao Dai; Jun Xu; Yong Liu; Zhenhua Dong; |
403 | Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, existing datasets for speaker adaptation have limited vocabulary sizes and pose variations, which restrict the validation of previous speaker-adaptive methods in real-world scenarios. To address these issues, we propose a novel speaker-adaptive lip reading method that adapts a pre-trained model to target speakers at both vision and language levels. |
Jeong Hun Yeo; Chae Won Kim; Hyunjun Kim; Hyeongseop Rha; Seunghee Han; Wen-Huang Cheng; Yong Man Ro; |
404 | Pioneer: Physics-informed Riemannian Graph ODE for Entropy-increasing Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel physics-informed Riemannian graph ODE for a wide range of entropy-increasing dynamic systems (termed as Pioneer). |
Li Sun; Ziheng Zhang; Zixi Wang; Yujie Wang; Qiqi Wan; Hao Li; Hao Peng; Philip S. Yu; |
405 | FairTP: A Prolonged Fairness Framework for Traffic Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fill this research gap, we investigate prolonged fair traffic prediction, introducing two novel fairness metrics, i.e., region-based static fairness and sensor-based dynamic fairness, tailored to fairness fluctuations over time and across areas. |
Jiangnan Xia; Yu Yang; Jiaxing Shen; Senzhang Wang; Jiannong Cao; |
406 | Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This deficiency hinders LLMs from learning the alignment between time, audio-visual events, and text tokens, thus impairing their ability to localize audio-visual events in videos temporally. To address this gap, we introduce PU-VALOR, a comprehensive audio-visual dataset comprising over 114,081 pseudo-untrimmed videos with detailed temporal annotations. |
Yunlong Tang; Daiki Shimada; Jing Bi; Mingqian Feng; Hang Hua; Chenliang Xu; |
407 | CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose CaRDiff (Caption, Rank, and generate with Diffusion), a framework that imitates the process by integrating multimodal large language model (MLLM), a grounding module, and a diffusion model, to enhance video saliency prediction. |
Yunlong Tang; Gen Zhan; Li Yang; Yiting Liao; Chenliang Xu; |
408 | Balancing Privacy and Performance: A Many-in-One Approach for Image Anonymization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this approach highlights an inherent trade-off: minimal modification offers insufficient privacy protection, while excessive modification significantly degrades task performance. In this paper, we propose a novel Recombining for Obfuscation (FRO) approach to address this trade-off. |
Xuemei Jia; Jiawei Du; Hui Wei; Ruinian Xue; Zheng Wang; Hongyuan Zhu; Jun Chen; |
409 | MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MAGIC, a novel multi-agent method that automates the creation of the self-correction guideline. |
Arian Askari; Christian Poelitz; Xinye Tang; |
410 | Unsupervised Domain Adaptive Person Search Via Dual Self-Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Dual Self-Calibration (DSCA) framework for UDA person search that effectively eliminates the interference of noisy pseudo-labels by considering both the image-level and instance-level features perspectives. |
Linfeng Qi; Huibing Wang; Jiqing Zhang; Jinjia Peng; Yang Wang; |
411 | Optimizing Quantized Diffusion Models Via Distillation with Cross-Timestep Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze cross-timestep error propagation in quantized DMs, revealing that previous methods focusing only on reducing noise estimation discrepancies are insufficient. |
Yanxi Li; Chengbin Du; |
412 | ScamNet: Toward Explainable Large Language Model-Based Fraudulent Shopping Website Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study explores the potential of Large Language Models (LLMs) in identifying fraudulent shopping websites, revealing that current LLMs underperform compared to existing machine learning models. To address this, we propose ScamNet, a fine-tuned LLM for explainable fraudulent shopping website detection. |
Marzieh Bitaab; Alireza Karimi; Zhuoer Lyu; Ahmadreza Mosallanezhad; Adam Oest; Ruoyu Wang; Tiffany Bao; Yan Shoshitaishvili; Adam Doupé; |
413 | Dehaze-RetinexGAN: Real-World Image Dehazing Via Retinex-based Generative Adversarial Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Dehaze-RetinexGAN, a lightweight Retinex-based Generative Adversarial Network for real-world image Dehazing using unpaired data. |
Xinran Wang; Guang Yang; Tian Ye; Yun Liu; |
414 | Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present FreeGS, an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels. |
Wenbo Zhang; Lu Zhang; Ping Hu; Liqian Ma; Yunzhi Zhuge; Huchuan Lu; |
415 | LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Leveraging the capability of large language models to comprehend and reason about textual content presents a promising avenue for advancing recommendation systems. To achieve this, we propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge. |
Jian Jia; Yipei Wang; Yan Li; Honggang Chen; Xuehan Bai; Zhaocheng Liu; Jian Liang; Quan Chen; Han Li; Peng Jiang; Kun Gai; |
416 | Graph Contrastive Learning with Joint Spectral Augmentation of Attribute and Topology Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nonetheless, these strategies tend to augment the two types of graph information separately, ignoring their correlation, resulting in limited representation ability. To overcome this drawback, this paper proposes a novel GCL framework with a Joint spectrAl augMentation, named GCL-JAM. |
Liang Yang; Zhenna Li; Jiaming Zhuo; Jing Liu; Ziyi Ma; Chuan Wang; Zhen Wang; Xiaochun Cao; |
417 | Image-to-video Adaptation with Outlier Modeling and Robust Self-learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We proposed a new metric based on label propagation consistency to select samples for training a better video-level model. |
Junbao Zhuo; Shuhui Wang; Zhenghan Chen; Li Shen; Qingming Huang; Huimin Ma; |
418 | PA3Fed: Period-Aware Adaptive Aggregation for Improved Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, these approaches generally assume that every period of training has an equal impact on the final model’s performance. To address these issues, this paper introduces a novel method, PA3Fed, which conducts period-aware adaptive aggregation for improved federated learning. |
Chengxiang Huang; Bingyan Liu; |
419 | Encoder of Thoughts: Enhancing Planning Ability in Language Agents Through Structural Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the increasingly complex reasoning structures designed to enhance the planning ability of language agents often exceed the processing and comprehension capabilities of LLMs, thereby limiting their effectiveness. To address these challenges, we introduce the Encoder of Thoughts (EoT), a novel reasoning structure modeling method based on graph neural networks. |
Yuxiang Zhang; Jitao Sang; |
420 | Open-Set Heterogeneous Domain Adaptation: Theoretical Analysis and Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Guided by this framework, we propose a new DA method called Representation Learning for OSHeDA (RL-OSHeDA). |
Thai-Hoang Pham; Yuanlong Wang; Changchang Yin; Xueru Zhang; Ping Zhang; |
421 | Multi-Shape Matching with Cycle Consistency Basis Via Functional Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Multi-shape matching is a central problem in various applications of computer vision and graphics, where cycle consistency constraints play a pivotal role. For this issue, we propose a novel and efficient approach that models multi-shapes as directed graphs for two-stage optimization, i.e., optimizing pairwise correspondence accuracy using landmarks, and refining matching consistency through cycle consistency basis. |
Yifan Xia; Tianwei Ye; Huabing Zhou; Zhongyuan Wang; Jiayi Ma; |
422 | Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential Via Self-Attention Redirection Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, when employed for object removal tasks, they still encounter issues such as generating random artifacts and the incapacity to repaint foreground object areas with appropriate content after removal. To tackle these problems, we propose Attentive Eraser, a tuning-free method to empower pre-trained diffusion models for stable and effective object removal. |
Wenhao Sun; Xue-Mei Dong; Benlei Cui; Jingqun Tang; |
423 | SVasP: Self-Versatility Adversarial Style Perturbation for Cross-Domain Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the serious dilemma of gradient instability and local optimization problem occurs in those style-based CD-FSL methods. This paper addresses these issues and proposes a novel crop-global style perturbation method, called Self-Versatility Adversarial Style Perturbation (SVasP), which enhances the gradient stability and escapes from poor sharp minima jointly. |
Wenqian Li; Pengfei Fang; Hui Xue; |
424 | Disentangle Nighttime Lens Flares: Self-supervised Generation-based Lens Flare Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To model this interdependent flares’ relationship, our Nighttime Lens Flare Formation model is the first attempt to learn the intrinsic physical relationship between flares on the imaging plane. Building on this physical model, we introduce a solution to this joint flare removal task named Self-supervised Generation-based Lens Flare Removal Network (SGLFR-Net), which is self-supervised without pre-training. |
Yuwen He; Wei Wang; Wanyu Wu; Kui Jiang; |
425 | CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present CodecNeRF, a neural codec for NeRF representations, consisting of an encoder and decoder architecture that can generate a NeRF representation in a single forward pass. |
Gyeongjin Kang; Younggeun Lee; Seungjun Oh; Eunbyung Park; |
426 | Sequence Matters: Harnessing Video Models in 3D Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we perform a comprehensive study of 3D super-resolution by leveraging video super-resolution (VSR) models. |
Hyun-kyu Ko; Dongheok Park; Youngin Park; Byeonghyeon Lee; Juhee Han; Eunbyung Park; |
427 | Automated, Interpretable, and Scalable Scientific Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: My work focuses on: 1) automating scientific reasoning with language models, 2) improving geometric interpretation, 3) developing foundation models for multiphysics. |
Wuyang Chen; |
428 | Fast Incomplete Multi-view Clustering with Adaptive Similarity Completion and Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: (3) They generally apply post-processing on learned anchor graph to seek latent embeddings, making them not globally-optimal. To address these issues, this paper proposes a novel fast IMVC approach with Adaptive Similarity Completion and Reconstruction (ASCR), which unifies anchor learning, anchor-sample similarity construction and completion, and latent multi-view embedding learning in a joint framework. |
Deng Xu; Chao Zhang; Cong Guo; Chunlin Chen; Huaxiong Li; |
429 | Optimal Control Operator Perspective and A Neural Adaptive Spectral Method Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel instance-solution control operator perspective, which solves OCPs in a one-shot manner without direct dependence on the explicit expression of dynamics or iterative optimization processes. |
Mingquan Feng; Zhijie Chen; Yixin Huang; Yizhou Liu; Junchi Yan; |
430 | Partial Label Causal Representation Learning for Instance-Dependent Supervision and Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the learning of causal representations within an instance-dependent PLL framework, introducing a new approach that uncovers identifiable latent representations. |
Yizhi Wang; Weijia Zhang; Min-Ling Zhang; |
431 | Sequence Accumulation and Beyond: Infinite Context Length on Single GPU and Large Clusters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Sequence Accumulation (SA) which leverages the common recurrence feature of linear sequence modeling methods to manage infinite context length even on a single GPU. |
Weigao Sun; Yongtuo Liu; Xiaqiang Tang; Xiaoyu Mo; |
432 | ISPDiffuser: Learning RAW-to-sRGB Mappings with Texture-Aware Diffusion Models and Histogram-Guided Color Consistency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present ISPDiffuser, a diffusion-based decoupled framework that separates the RAW-to-sRGB mapping into detail reconstruction in grayscale space and color consistency mapping from grayscale to sRGB. |
Yang Ren; Hai Jiang; Menglong Yang; Wei Li; Shuaicheng Liu; |
433 | ConSense: Continually Sensing Human Activity with WiFi Via Growing and Picking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose ConSense, a lightweight and fast-adapted exemplar-free class incremental learning framework for WiFi-based HAR. |
Rong Li; Tao Deng; Siwei Feng; Mingjie Sun; Juncheng Jia; |
434 | OTLRM: Orthogonal Learning-based Low-Rank Metric for Multi-Dimensional Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it’s quite complicated to introduce SVT into deep neural network due to the numerical instability problem in solving the derivatives of the eigenvectors. In this paper, we introduce a novel data-driven generative low-rank t-SVD model based on the learnable orthogonal transform, which can be naturally solved under its representation. |
Xiangming Wang; Haijin Zeng; Jiaoyang Chen; Sheng Liu; Yongyong Chen; Guoqing Chao; |
435 | The Partially Observable Off-Switch Game Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, in many of the settings where the shutdown problem is most concerning, AIs might have vast amounts of private information. To capture these differences in knowledge, we introduce the Partially Observable Off-Switch Game (POSG), a game-theoretic model of the shutdown problem with asymmetric information. |
Andrew Garber; Rohan Subramani; Linus Luu; Mark Bedaywi; Stuart Russell; Scott Emmons; |
436 | BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, to further improve the performance of SE, many modules are stacked onto SE, resulting in increased model complexity that limits the application of SE. To address these problems, we proposed a dual-path network based on compressed frequency using Mamba. |
Cunhang Fan; Enrui Liu; Andong Li; Jianhua Tao; Jian Zhou; Jiahao Li; Chengshi Zheng; Zhao Lv; |
437 | JAQ: Joint Efficient Architecture Design and Low-Bit Quantization with Hardware-Software Co-Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the JAQ Framework, which jointly optimizes the three critical dimensions. |
Mingzi Wang; Yuan Meng; Chen Tang; Weixiang Zhang; Yijian Qin; Yang Yao; Yingxin Li; Tongtong Feng; Xin Wang; Xun Guan; Zhi Wang; Wenwu Zhu; |
438 | Pilot: Building The Federated Multimodal Instruction Tuning Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore a novel federated multimodal instruction tuning task(FedMIT), which is significant for collaboratively fine-tuning MLLMs on different types of multimodal instruction data on distributed devices. |
Baochen Xiong; Xiaoshan Yang; Yaguang Song; Yaowei Wang; Changsheng Xu; |
439 | Multimodal Promptable Token Merging for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current methods often rely on token-to-token distances or similarity metrics to evaluate token importance, which is inadequate in the context of modern promptable designs and frameworks that are gaining prominence. To address this limitation, we introduce a novel and effective merging strategy called “Multimodal Promptable Token Merging” (MPTM). |
Cheng-Yao Hong; Tyng-Luh Liu; |
440 | Adaptive Sampling to Reduce Epistemic Uncertainty Using Prediction Interval-Generation Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents an adaptive sampling approach designed to reduce epistemic uncertainty in predictive models. |
Giorgio Morales; John W. Sheppard; |
441 | Identity-Text Video Corpus Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the existing VCG setting primarily focuses on matching textual descriptions with videos and ignores the distinct visual identities in the videos, thus resulting in inaccurate understanding of video content and deteriorated retrieval performances. To address this limitation, we introduce a novel task, Identity-Text Video Corpus Grounding (ITVCG), which simultaneously utilize textual descriptions and visual identities as queries. |
Bin Huang; Xin Wang; Hong Chen; Houlun Chen; Yaofei Wu; Wenwu Zhu; |
442 | SymmCompletion: High-Fidelity and High-Consistency Point Cloud Completion with Symmetry Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although existing methods can form satisfactory point clouds in global completeness, they often lose the original geometry details and face the problem of geometric inconsistency between existing point clouds and reconstructed missing parts. To tackle this problem, we introduce SymmCompletion, a highly effective completion method based on symmetry guidance. |
Hongyu Yan; Zijun Li; Kunming Luo; Li Lu; Ping Tan; |
443 | Matching While Perceiving: Enhance Image Feature Matching with Applicable Semantic Amalgamation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recognizing that human eyes consider not only similar local geometric features but also high-level semantic information of scene objects when matching images, this paper introduces SemaGlue. |
Shihua Zhang; Zhenjie Zhu; Zizhuo Li; Tao Lu; Jiayi Ma; |
444 | SceneX: Procedural Controllable Large-Scale Scene Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a large-scale scene generation framework, SceneX, which can automatically produce high-quality procedural models according to designers’ textual descriptions. |
Mengqi Zhou; Yuxi Wang; Jun Hou; Shougao Zhang; Yiwei Li; Chuanchen Luo; Junran Peng; Zhaoxiang Zhang; |
445 | BrainMAP: Learning Multiple Activation Pathways in Brain Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As such, conventional GNNs struggle to learn from these pathways due to the long-range dependencies of multiple pathways. To address these challenges, we introduce a novel framework BrainMAP to learn multiple pathways in brain networks. |
Song Wang; Zhenyu Lei; Zhen Tan; Jiaqi Ding; Xinyu Zhao; Yushun Dong; Guorong Wu; Tianlong Chen; Chen Chen; Aiying Zhang; Jundong Li; |
446 | SemStereo: Semantic-Constrained Stereo Matching Network for Remote Sensing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the connections between the two tasks and propose a new network that imposes semantic constraints on the stereo matching task, both implicitly and explicitly. |
Chen Chen; Liangjin Zhao; Yuanchun He; Yingxuan Long; Kaiqiang Chen; Zhirui Wang; Yanfeng Hu; Xian Sun; |
447 | Structural Entropy Guided Probabilistic Coding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel structural entropy-guided probabilistic coding model, named SEPC. |
Xiang Huang; Hao Peng; Li Sun; Hui Lin; Chunyang Liu; Jiang Cao; Philip S. Yu; |
448 | Pseudo Informative Episode Construction for Few-Shot Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods only select part of the base classes to construct the pseudo novel classes in the feature space of the base classes, which cannot mimic the real novel classes of the testing scenario. To deal with this problem, we propose a new Pseudo Informative Episode Construction (PIEC) framework. |
Chaofan Chen; Xiaoshan Yang; Changsheng Xu; |
449 | DeMo: Deep Motion Field Consensus with Learnable Kernels for Two-view Correspondence Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DeMo, a novel and cutting-edge network for outlier rejection, which possesses the capacity to fully capture global motion consensus clues by way of consensus interpolation over the entire high-dimensional motion field generated by putative correspondences. |
Yifan Lu; Jiajun Le; Zizhuo Li; Yixuan Yuan; Jiayi Ma; |
450 | GMAP: Generalized Manipulation of Articulated Objects in Robotic Using Pre-trained Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although recent research has emphasized understanding articulated shapes and affordance proposals, existing methods only address isolated aspects, failing to develop comprehensive strategies for robotic perception and manipulation of articulated objects. To bridge this gap, we propose GMAP, which systematically integrates the entire process from command to perception and manipulation. |
Hongliang Zeng; Ping Zhang; Fang Li; QinPeng Yi; Tingyu Ye; Jiahua Wang; |
451 | Distances Between Top-Truncated Elections of Different Sizes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We extend it to the case of elections of different sizes, where the votes can be top-truncated. |
Piotr Faliszewski; Jitka Mertlová; Pierre Nunn; Stanisław Szufa; Tomasz Wąs; |
452 | MFL-Owner: Ownership Protection for Multi-modal Federated Learning Via Orthogonal Transform Watermark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enhance the practicality of the watermark and prevent conflicts among multiple clients during tracing, we propose a trigger dataset selection method based on out-of-distribution data combined with Gaussian noise perturbation. |
Keke Gai; Dongjue Wang; Jing Yu; Mohan Wang; Liehuang Zhu; Qi Wu; |
453 | IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, most methods have weak abilities to manage relationships between primary words and their contexts, causing confusion and reduced accuracy in identifying the correct target region. To address these challenges, we propose IteRPrimE (Iterative Grad-CAM Refinement and Primary word Emphasis), which leverages a saliency heatmap through Grad-CAM from a Vision-Language Pre-trained (VLP) model for image-text matching. |
Yuji Wang; Jingchen Ni; Yong Liu; Chun Yuan; Yansong Tang; |
454 | FlowMamba: Learning Point Cloud Scene Flow with Global Motion Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel global-aware scene flow estimation network with global motion propagation, named FlowMamba. |
Min Lin; Gangwei Xu; Yun Wang; Xianqi Wang; Xin Yang; |
455 | SongEditor: Adapting Zero-Shot Song Generation Language Model As A Multi-Task Editor Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present SongEditor, the first song editing paradigm that introduces the editing capabilities into language-modeling song generation approaches, facilitating both segment-wise and track-wise modifications. |
Chenyu Yang; Shuai Wang; Hangting Chen; Jianwei Yu; Wei Tan; Rongzhi Gu; Yaoxun Xu; Yizhi Zhou; Haina Zhu; Haizhou Li; |
456 | Exploring More from Multiple Gait Modalities for Human Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: From the perspectives of fine vs. coarse-grained shape and whole vs. pixel-wise motion modeling, this work presents an in-depth investigation of three popular gait representations, i.e., silhouette, human parsing, and optical flow, with various fusion evaluations, and experimentally exposes their similarities and differences. |
Dongyang Jin; Chao Fan; Weihua Chen; Shiqi Yu; |
457 | Dual Conditioned Motion Diffusion for Pose-Based Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address pose-based video anomaly detection and introduce a novel framework called Dual Conditioned Motion Diffusion (DCMD), which enjoys the advantages of both approaches. |
Hongsong Wang; Andi Xu; Pinle Ding; Jie Gui; |
458 | DualCP: Rehearsal-Free Domain-Incremental Learning Via Dual-Level Concept Prototype Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To construct DualCP, we propose a Concept Prototype Generator (CPG) that generates both coarse-grained and fine-grained prototypes for each class. |
Qiang Wang; Yuhang He; Songlin Dong; Xiang Song; Jizhou Han; Haoyu Luo; Yihong Gong; |
459 | Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through theoretical analysis and empirical evaluation, we demonstrate that incorporating watermarks into LLMs significantly reduces the likelihood of generating copyrighted content, thereby addressing a critical concern in the deployment of LLMs. |
Michael-Andrei Panaitescu-Liess; Zora Che; Bang An; Yuancheng Xu; Pankayaraj Pathmanathan; Souradip Chakraborty; Sicheng Zhu; Tom Goldstein; Furong Huang; |
460 | Mixture of Knowledge Minigraph Agents for Literature Review Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel framework, collaborative knowledge minigraph agents (CKMAs), to automate scholarly literature reviews. |
Zhi Zhang; Yan Liu; Sheng-hua Zhong; Gong Chen; Yu Yang; Jiannong Cao; |
461 | Learning Structured World Models From and For Physical Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The core idea behind my research is to introduce novel representations and integrate structural priors into learning systems to model dynamics at different levels of abstraction. |
Yunzhu Li; |
462 | Mixed-Curvature Multi-Modal Knowledge Graph Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing multi-modal KGC methods mainly focus on modalitylevel fusion, neglecting the importance of modeling the complex structures, such as hierarchical and circular patterns. To address this, we propose a Mixed-Curvature multi-modal Knowledge Graph Completion method (MCKGC) that embeds the information into three single-curvature spaces, including hyperbolic space, hyperspherical space, and Euclidean space, and incorporates multi-modal information into a mixed space. |
Yuxiao Gao; Fuwei Zhang; Zhao Zhang; Xiaoshuang Min; Fuzhen Zhuang; |
463 | CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite these successes, current benchmarks still follow a traditional paradigm with multi-modal input and text-modal output, which leads to significant drawbacks such as missing visual operations and vague expressions. Motivated by this, we introduce a novel Chain of Multi-modal Thought (CoMT) benchmark to address these limitations. |
Zihui Cheng; Qiguang Chen; Jin Zhang; Hao Fei; Xiaocheng Feng; Wanxiang Che; Min Li; Libo Qin; |
464 | SpotActor: Training-Free Layout-Controlled Consistent Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For these issues, we pioneer a novel task, Layout-to-Consistent-Image (L2CI) generation, which produces consistent and compositional images in accordance with the given layout conditions and text prompts. To accomplish this challenging task, we present a new formalization of dual energy guidance with optimization in a dual semantic-latent space and thus propose a training-free pipeline, SpotActor, which features a layout-conditioned optimizing stage and a consistent sampling stage. |
Jiahao Wang; Caixia Yan; Weizhan Zhang; Haonan Lin; Mengmeng Wang; Guang Dai; Tieliang Gong; Hao Sun; Jingdong Wang; |
465 | DFDNet: Disentangling and Filtering Dynamics for Enhanced Video Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle those problems, this paper proposes the Disentangling and Filtering Dynamics Network (DFDNet). |
Lianqiang Gan; Junyu Lai; Jingze Ju; Lianli Gao; Yi Bin; |
466 | Federated Foundation Models on Heterogeneous Time Series Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, due to significant statistical heterogeneity across domains, this cross-domain fusing approach doesn’t work effectively as the same as fusing texts and images. To tackle this challenge, this paper proposes a novel federated learning approach to address the heterogeneity in time series foundation models training, namely FFTS. |
Shengchao Chen; Guodong Long; Jing Jiang; Chengqi Zhang; |
467 | Federated Graph Anomaly Detection Through Contrastive Learning with Global Negative Pairs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One possible reason for this issue is that, when graph data are distributed across multiple clients, federated graph learning may struggle to fully exploit the potential of the dispersed data, leading to suboptimal performance. Building on this insight, we propose FedCLGN, a federated graph anomaly detection framework that leverages contrastive self-supervised learning. |
Nannan Wu; Yazheng Zhao; Hongdou Dong; Keao Xi; Wei Yu; Wenjun Wang; |
468 | Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, guided by the shallow-to-deep principle, we propose a query-centric audio-visual cognition (QUAG) network to construct a reliable multi-modal representation for moment retrieval, segmentation and step-captioning. |
Yunbin Tu; Liang Li; Li Su; Qingming Huang; |
469 | Capability Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore how to route the best-performing LLM for each instruction to achieve better overall performance. |
Yi-Kai Zhang; De-Chuan Zhan; Han-Jia Ye; |
470 | SGTC: Semantic-Guided Triplet Co-training for Sparsely Annotated Semi-Supervised Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Even worse, most of the existing approaches pay much attention to image-level information and ignore semantic features, resulting in the inability to perceive weak boundaries. To address these issues, we propose a novel Semantic-Guided Triplet Co-training (SGTC) framework, which achieves high-end medical image segmentation by only annotating three orthogonal slices of a few volumetric samples, significantly alleviating the burden of radiologists. |
Ke Yan; Qing Cai; Fan Zhang; Ziyan Cao; Zhi Liu; |
471 | CODE: Confident Ordinary Differential Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Confident Ordinary Differential Editing (CODE), a novel approach for image synthesis that effectively handles OoD guidance images. |
Bastien Van Delft; Tommaso Martorella; Alexandre Alahi; |
472 | ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they often suffer from limitations when generating images with resolutions outside of their trained domain. To overcome this limitation, we present the resolution adapter \textbf{(ResAdapter)}, a domain-consistent adapter designed for diffusion models to generate images with unrestricted resolutions and aspect ratios. |
Jiaxiang Cheng; Pan Xie; Xin Xia; Jiashi Li; Jie Wu; Yuxi Ren; Huixia Li; Xuefeng Xiao; Shilei Wen; Lean Fu; |
473 | LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we take the first step to propose an effective and cost-efficient framework to promote the margin-enhanced preference dataset development. |
Duanyu Feng; Bowen Qin; Chen Huang; Youcheng Huang; Zheng Zhang; Wenqiang Lei; |
474 | Unleashing The Potential of Model Bias for Generalized Category Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The primary challenges stem from model bias induced by pre-training on only known categories and the lack of precise supervision for novel ones, leading to category bias towards known categories and category confusion among different novel categories, which hinders models’ ability to identify novel categories effectively. To address these challenges, we propose a novel framework named Self-Debiasing Calibration (SDC). |
Wenbin An; Haonan Lin; Jiahao Nie; Feng Tian; Wenkai Shi; Yaqiang Wu; Qianying Wang; Ping Chen; |
475 | Sum of Squares Circuits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recently, squared PCs encoding subtractive mixtures via negative parameters have emerged as tractable models that can be exponentially more expressive than monotonic PCs, i.e., PCs with positive parameters only. In this paper we provide a more precise theoretical characterization of the expressiveness relationships among these models. |
Lorenzo Loconte; Stefan Mengel; Antonio Vergari; |
476 | SMamba: Sparse Mamba for Event-based Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To achieve better trade-off between accuracy and efficiency, we propose Sparse Mamba (SMamba), which performs adaptive sparsification to reduce computational effort while maintaining global modeling capability. |
Nan Yang; Yang Wang; Zhanwen Liu; Meng Li; Yisheng An; Xiangmo Zhao; |
477 | TTA-FedDG: Leveraging Test-Time Adaptation to Address Federated Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new framework TTA-FedDG to address the FedDG problem, which leverages test-time adaptation (TTA) to adapt across different domains, thereby enhancing the generalization of the model. |
Haoyuan Liang; Xinyu Zhang; Shilei Cao; Guowen Li; Juepeng Zheng; |
478 | BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the design of this model faces the challenge of incompatible requirements for feature fusion and alignment. To address this challenge, this paper proposes an unaligned medical image fusion method called Bidirectional Stepwise Feature Alignment and Fusion (BSFA-F) strategy. |
Huafeng Li; Dayong Su; Qing Cai; Yafei Zhang; |
479 | Federated Unlearning with Gradient Descent and Conflict Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, when conducting the post-training to recovery the model utility, it’s prone to move back and revert what have already been unlearned. To address these issues, we propose Federated Unlearning with Orthogonal Steepest Descent (FedOSD). |
Zibin Pan; Zhichao Wang; Chi Li; Kaiyan Zheng; Boqi Wang; Xiaoying Tang; Junhua Zhao; |
480 | Tuning-Free Accountable Intervention for LLM Deployment – A Metacognitive Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Drawing inspiration from human cognition, we propose an innovative metacognitive approach CLEAR, to equip LLMs with capabilities for self-aware error identification and correction. |
Zhen Tan; Jie Peng; Song Wang; Lijie Hu; Tianlong Chen; Huan Liu; |
481 | Open-World Multimodal Understanding and Generation with Efficiently Finetuned Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this talk, I will answer two questions: Q1: How can we efficiently train or fine-tune foundation models? |
Long Chen; |
482 | First-Order Federated Bilevel Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a computationally and memory-efficient FBO algorithm named MemFBO. |
Yifan Yang; Peiyao Xiao; Shiqian Ma; Kaiyi Ji; |
483 | Dual-View Interaction-Aware Lane Change Prediction for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, current interaction-aware approaches for autonomous driving fail to explicitly model future interactions between vehicles, leading to unreasonable prediction results that can cause collisions between vehicles. To address the above issues, we propose to incorporate the concept of perceived safety into future interaction modeling and design a dual-view interaction-aware lane change prediction model. |
Yuhuan Lu; Zhen Zhang; Rufan Bai; Han Liu; Wei Wang; |
484 | Aligning and Prompting Anything for Zero-Shot Generalized Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This limits CLIP to align textual features with pixel-level visual features and impairs anomaly segmentation performance. Therefore, for precise visual-text alignment, in this paper we propose a novel fine-grained text prompts generation strategy. |
Jitao Ma; Weiying Xie; Hangyu Ye; Daixun Li; Leyuan Fang; |
485 | Multi-Branch Self-Drafting for LLM Inference Acceleration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an innovative draft generation and maintenance approach that leverages the capabilities of LLM itself. |
Zipeng Gao; Qingrong Xia; Tong Xu; Xinyu Duan; Zhi Zheng; Zhefeng Wang; Enhong Chen; |
486 | Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Applied to MiniCPM-Llama3-V 2.5, a state-of-the-art MLLM, our tuning-free approach achieves performance comparable to tuning-based methods, with notable success in text localization. |
Hai-Ming Xu; Qi Chen; Lei Wang; Lingqiao Liu; |
487 | Scalable Acceleration for Classification-Based Derivative-Free Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the framework of sequential classification-based derivative-free optimization algorithms. |
Tianyi Han; Jingya Li; Zhipeng Guo; Yuan Jin; |
488 | Towards Efficient and Intelligent Laser Weeding: Method and Dataset for Weed Stem Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We integrate the detection of crop and weed with the localization of weed stem into one end-to-end system. |
Dingning Liu; Jinzhe Li; Haoyang Su; Bei Cui; Zhihui Wang; Qingbo Yuan; Wanli Ouyang; Nanqing Dong; |
489 | MPQ-DM: Mixed Precision Quantization for Extremely Low Bit Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents MPQ-DM, a Mixed-Precision Quantization method for Diffusion Models. |
Weilun Feng; Haotong Qin; Chuanguang Yang; Zhulin An; Libo Huang; Boyu Diao; Fei Wang; Renshuai Tao; Yongjun Xu; Michele Magno; |
490 | FedPop: Federated Population-based Hyperparameter Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While some approaches have been proposed for HP-Tuning in FL, they are limited to the HPs for client local updates. In this work, we propose a novel HP-tuning algorithm, called Federated Population-based Hyperparameter Tuning (FedPop), to address this vital yet challenging problem. |
Haokun Chen; Denis Krompaß; Jindong Gu; Volker Tresp; |
491 | Balanced Adaptive Subspace Collaboration for Mixed Pareto-Lexicographic Multi-Objective Problems with Priority Levels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conversely, strictly adhering to Priority Levels (PLs) during optimization can easily result in premature convergence within some PLs. To address this issue, we suggest an effective Balanced Adaptive Subspace Collaboration (BASC) method in this paper. |
Wenjing Hong; |
492 | Better Understandings and Configurations in MaxSAT Stochastic Local Search Solvers Via Anytime Performance Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper demonstrates that Empirical Cumulative Distribution Functions can be used to compare MaxSAT stochastic local search solvers’ anytime performance across multiple problem instances and various time budgets. |
Furong Ye; Chuan Luo; Shaowei Cai; |
493 | ScholarGEC: Enhancing Controllability of Large Language Model for Chinese Academic Grammatical Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering the two aforementioned factors, we propose a new error correction framework for Chinese academic GEC tasks using LLMs, named ScholarGEC. |
Zixiao Kong; Xianquan Wang; Shuanghong Shen; Keyu Zhu; Huibo Xu; Yu Su; |
494 | Braess’s Paradox of Generative AI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we present an analog to Braess’s paradox in which all users would be better off without GenAI. |
Boaz Taitler; Omer Ben-Porat; |
495 | Mixture of Online and Offline Experts for Non-Stationary Time Series Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose the Mixture of Online and Offline Experts (MOOE). |
Zhilin Zhao; Longbing Cao; Yuanyu Wan; |
496 | SyncNoise: Geometrically Consistent Noise Prediction for Instruction-based 3D Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SyncNoise, a novel geometry-guided multi-view consistent noise editing approach for high-fidelity 3D scene editing. |
Ruihuang Li; Liyi Chen; Zhengqiang Zhang; Varun Jampani; Vishal M. Patel; Lei Zhang; |
497 | CSformer: Combining Channel Independence and Mixing for Robust Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this challenge, we propose a strategy of channel independence followed by mixing. Based on this strategy, we introduce CSformer, a novel framework featuring a two-stage multiheaded self-attention mechanism. |
Haoxin Wang; Yipeng Mo; Kunlan Xiang; Nan Yin; Honghe Dai; Bixiong Li; Songhai Fan; Site Mo; |
498 | When Open-Vocabulary Visual Question Answering Meets Causal Adapter: Benchmark and Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing VQA benchmarks predominantly adhere to a closed-set paradigm, limiting their ability to address arbitrary, unseen answers, and thus falling short in real-world scenarios. To address this limitation, we introduce the Open-Vocabulary Visual Question Answering (OVVQA) benchmark, specifically designed to evaluate models under open-world conditions by assessing their performance on both base classes (seen, common answers) and novel classes (unseen, rare answers). |
Feifei Zhang; Zhaoyi Zhang; Xi Zhang; Changsheng Xu; |
499 | Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods simply leverage the output from the last layer of GNN for anomaly estimation while neglecting the essential information contained in the intermediate GNN layers. To address such limitations, in this paper, we propose a Graph Mixture of Experts (Graph-MoE) network for multivariate time series anomaly detection, which incorporates the mixture of experts (MoE) module to adaptively represent and integrate hierarchical multi-layer graph information into entity representations. |
Xiaoyu Huang; Weidong Chen; Bo Hu; Zhendong Mao; |
500 | Discrete Curvature Graph Information Bottleneck Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: From both graph geometry and information theory perspectives, we propose the novel Discrete Curvature Graph Information Bottleneck (CurvGIB) framework to optimize the information transport structure and learn better node representations simultaneously. |
Xingcheng Fu; Jian Wang; Yisen Gao; Qingyun Sun; Haonan Yuan; Jianxin Li; Xianxian Li; |
This table only includes 500 papers selected based on paper id in proceeddings. To continue with the full list (~3,300 papers), please visit Paper Digest: AAAI-2025 (Full List).