Paper Digest: ICML 2025 Papers & Highlights
Note: ICML-2025 accepts more than 3,300 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 3,300 ICML-2025 papers in a separate page.
To search for papers presented at ICML-2025 on a specific topic, please make use of the search by venue (ICML-2025) service. To summarize the latest research published at ICML-2025 on a specific topic, you can utilize the review by venue (ICML-2025) service. If you are interested in browsing papers by author, we have a comprehensive list of ~ 13,000 authors (ICML-2025). Additionally, you may want to explore our “Best Paper” Digest (ICML), which lists the most influential ICML papers since 2004.
We’ve developed a service – ICML-2025 Research that synthesizes the latest findings from ICML 2025 into comprehensive reports. For instance, we’ve generated a report on Advances in Flow Matching: Insights from ICML 2025 Papers. We encourage interested users to utilize our service to create tailored reports on other emerging topics.
This curated list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that gets you the personalized and comprehensive updates on the latest research in your field. It also empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Experience the full potential of our services today!
TABLE 1: Paper Digest: ICML 2025 Papers & Highlights
Paper | Author(s) | |
---|---|---|
1 | RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an end-to-end reinforcement learning method for teaching models to leverage execution feedback in the realm of code synthesis, where state-of-the-art LLMs struggle to improve code iteratively compared to independent sampling. |
Jonas Gehring; Kunhao Zheng; Jade Copet; Vegard Mella; Taco Cohen; Gabriel Synnaeve; |
2 | Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication Overlapping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: **Our insight is that in addition to systems optimization, one can also redesign the model architecture to decouple communication from computation. ** While Ladder Residual can allow communication-computation decoupling in conventional parallelism patterns, we focus on Tensor Parallelism in this paper, which is particularly bottlenecked by its heavy communication. |
Muru Zhang; Mayank Mishra; Zhongzhu Zhou; William Brandon; Jue WANG; Yoon Kim; Jonathan Ragan-Kelley; Shuaiwen Leon Song; Ben Athiwaratkun; Tri Dao; |
3 | Understanding The Skill Gap in Recurrent Language Models: The Role of The Gather-and-Aggregate Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we examine how in-context retrieval operates in Transformer- and SSM-based language models and find that both rely on a Gather-and-Aggregate (G&A) mechanism: a Gather Head extracts relevant information from context, which an Aggregate Head integrates into representation. |
Aviv Bick; Eric P. Xing; Albert Gu; |
4 | Understanding and Improving Length Generalization in Recurrent Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thanks to their recurrent nature, in principle they can process arbitrarily long sequences, but their performance sometimes drops considerably beyond their training context lengths—i.e. they fail to length generalize. In this work, we provide comprehensive empirical and theoretical analysis to support the \textit{unexplored states hypothesis}, which posits that models fail to length generalize when during training they are only exposed to a limited subset of the distribution of all \textit{attainable} states (i.e. states that would be attained if the recurrence was applied to long sequences). |
Ricardo Buitrago; Albert Gu; |
5 | LLMs Can See and Hear Without Any Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present MILS: Multimodal Iterative LLM Solver, a surprisingly simple, training-free approach, to imbue multimodal capabilities into your favorite LLM. |
Kumar Ashutosh; Yossi Gandelsman; Xinlei Chen; Ishan Misra; Rohit Girdhar; |
6 | VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: During inference, we introduce **Inner-Guidance**, a mechanism that steers the generation toward coherent motion by leveraging the model’s own evolving motion prediction as a dynamic guidance signal. |
Hila Chefer; Uriel Singer; Amit Zohar; Yuval Kirstain; Adam Polyak; Yaniv Taigman; Lior Wolf; Shelly Sheynin; |
7 | Stay-Positive: A Case for Ignoring Real Image Features in Fake Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We argue that an image should be classified as fake if and only if it contains artifacts introduced by the generative model. Based on this premise, we propose Stay-Positive, an algorithm designed to constrain the detector’s focus to generative artifacts while disregarding those associated with real data. |
Anirudh Sundara Rajan; Yong Jae Lee; |
8 | SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SelfCite, a novel self-supervised approach that aligns LLMs to generate high-quality, fine-grained, sentence-level citations for the statements in their generated responses. |
Yung-Sung Chuang; Benjamin Cohen-Wang; Zejiang Shen; Zhaofeng Wu; Hu Xu; Xi Victoria Lin; James R. Glass; Shang-Wen Li; Wen-tau Yih; |
9 | ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This lack of context-awareness can lead to suboptimal performance, as the same action may hold different meanings depending on its surrounding context. To address this issue, we propose ActionPiece to explicitly incorporate context when tokenizing action sequences. |
Yupeng Hou; Jianmo Ni; Zhankui He; Noveen Sachdeva; Wang-Cheng Kang; Ed H. Chi; Julian McAuley; Derek Zhiyuan Cheng; |
10 | From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, manual curation of high-quality, human-aligned benchmarks is expensive and time-consuming. To address this, we introduce BenchBuilder, an automated pipeline that leverages LLMs to curate high-quality, open-ended prompts from large, crowd-sourced datasets, enabling continuous benchmark updates without human in the loop. |
Tianle Li; Wei-Lin Chiang; Evan Frick; Lisa Dunlap; Tianhao Wu; Banghua Zhu; Joseph E. Gonzalez; Ion Stoica; |
11 | Prompt-to-Leaderboard: Prompt-Adaptive LLM Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This averaging obscures user- and prompt-specific variations in model performance. To address this, we propose Prompt-to-Leaderboard (P2L), a method that produces leaderboards specific to a prompt or set of prompts. |
Evan Frick; Connor Chen; Joseph Tennyson; Tianle Li; Wei-Lin Chiang; Anastasios Nikolas Angelopoulos; Ion Stoica; |
12 | HashAttention: Semantic Sparsity for Faster Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces HashAttention, framing pivotal token identification as a recommendation problem. |
Aditya Desai; Shuo Yang; Alejandro Cuadron; Matei Zaharia; Joseph E. Gonzalez; Ion Stoica; |
13 | From Thousands to Billions: 3D Visual Language Grounding Via Render-Supervised Distillation from 2D VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 3D vision-language grounding faces a fundamental data bottleneck: while 2D models train on billions of images, 3D models have access to only thousands of labeled scenes–a six-order-of-magnitude gap that severely limits performance. We introduce \textbf{\emph{LIFT-GS}}, a practical distillation technique that overcomes this limitation by using differentiable rendering to bridge 3D and 2D supervision. |
Ang Cao; Sergio Arnaud; Oleksandr Maksymets; Jianing Yang; Ayush Jain; Ada Martin; Vincent-Pierre Berges; Paul McVay; Ruslan Partsey; Aravind Rajeswaran; Franziska Meier; Justin Johnson; Jeong Joon Park; Alexander Sax; |
14 | AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce AutoAdvExBench, a benchmark to evaluate if large language models (LLMs) can autonomously exploit defenses to adversarial examples. |
Nicholas Carlini; Edoardo Debenedetti; Javier Rando; Milad Nasr; Florian Tramèr; |
15 | MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a training method which avoids agents learning undesired multi-step plans that receive high reward (multi-step reward hacks) even if humans are not able to detect that the behavior is undesired. |
Sebastian Farquhar; Vikrant Varma; David Lindner; David Elson; Caleb Biddulph; Ian Goodfellow; Rohin Shah; |
16 | Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce FlatVI, a novel training framework that regularises the latent manifold of discrete-likelihood variational autoencoders towards Euclidean geometry, specifically tailored for modelling single-cell count data. |
Alessandro Palma; Sergei Rybakov; Leon Hetzel; Stephan Günnemann; Fabian J Theis; |
17 | Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we describe a system that uses vision-language models in a hierarchical structure, first reasoning over complex prompts and user feedback to deduce the most appropriate next step to fulfill the task, and then performing that step with low-level actions. |
Lucy Xiaoyang Shi; brian ichter; Michael Robert Equi; Liyiming Ke; Karl Pertsch; Quan Vuong; James Tanner; Anna Walling; Haohuan Wang; Niccolo Fusai; Adrian Li-Bell; Danny Driess; Lachy Groom; Sergey Levine; Chelsea Finn; |
18 | PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models By Watching Stuff Drop Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work studies the process of post-training these models for accurate world modeling through the lens of the simple, yet fundamental, physics task of modeling object freefall. |
Chenyu Li; Oscar Michel; Xichen Pan; Sainan Liu; Mike Roberts; Saining Xie; |
19 | Latent Diffusion Planning for Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods often rely on learning from large amount of expert demonstrations. To address these shortcomings, we propose Latent Diffusion Planning (LDP), a modular approach consisting of a planner which can leverage action-free demonstrations, and an inverse dynamics model which can leverage suboptimal data, that both operate over a learned latent space. |
Amber Xie; Oleh Rybkin; Dorsa Sadigh; Chelsea Finn; |
20 | SCISSOR: Mitigating Semantic Bias Through Cluster-Aware Siamese Networks for Robust Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While the literature attributes shortcuts to biases in superficial features, we show that imbalances in the semantic distribution of sample embeddings induce spurious semantic correlations, compromising model robustness. To address this issue, we propose SCISSOR (Semantic Cluster Intervention for Suppressing ShORtcut), a Siamese network-based debiasing approach that remaps the semantic space by discouraging latent clusters exploited as shortcuts. |
Shuo Yang; Bardh Prenkaj; Gjergji Kasneci; |
21 | Graph Inverse Style Transfer for Counterfactual Explainability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike prior approaches that rely on forward perturbation mechanisms, we introduce Graph Inverse Style Transfer (GIST), the first framework to re-imagine graph counterfactual generation as a backtracking process, leveraging spectral style transfer. |
Bardh Prenkaj; Efstratios Zaradoukas; Gjergji Kasneci; |
22 | The Berkeley Function Calling Leaderboard (BFCL): From Tool Use to Agentic Evaluation of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the Berkeley Function Calling Leaderboard (BFCL), a comprehensive benchmark designed to evaluate function calling capabilities in a wide range of real-world settings.We construct the benchmark using a combination of expert curated, and user-contributed functions and associated prompts. |
Shishir G Patil; Huanzhi Mao; Fanjia Yan; Charlie Cheng-Jie Ji; Vishnu Suresh; Ion Stoica; Joseph E. Gonzalez; |
23 | Independence Tests for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the constrained setting, we make assumptions about model architecture and training and propose statistical tests that yield exact p-values with respect to the null hypothesis that the models are trained from independent random initializations. |
Sally Zhu; Ahmed M Ahmed; Rohith Kuditipudi; Percy Liang; |
24 | Auditing Prompt Caching in Language Model APIs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we develop and conduct statistical audits to detect prompt caching in real-world LLM API providers. |
Chenchen Gu; Xiang Lisa Li; Rohith Kuditipudi; Percy Liang; Tatsunori Hashimoto; |
25 | Grokking in The Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent advances in grokking have demonstrated that neural networks can transition from memorizing to perfectly generalizing once they detect underlying logical patterns — yet these studies have primarily used small, synthetic tasks. In this paper, for the first time, we extend grokking to real-world factual data and address the challenge of dataset sparsity by augmenting existing knowledge graphs with carefully designed synthetic data to raise the ratio $\phi_r$ of inferred facts to atomic facts above the threshold required for grokking. |
Roman Abramov; Felix Steinbauer; Gjergji Kasneci; |
26 | When Bad Data Leads to Good Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we re-examine the notion of quality from the perspective of pre- and post-training co-design. |
Kenneth Li; Yida Chen; Fernanda Viégas; Martin Wattenberg; |
27 | Detecting Strategic Deception with Linear Probes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Monitoring outputs alone is insufficient, since the AI might produce seemingly benign outputs while its internal reasoning is misaligned. We thus evaluate if linear probes can robustly detect deception by monitoring model activations. |
Nicholas Goldowsky-Dill; Bilal Chughtai; Stefan Heimersheim; Marius Hobbhahn; |
28 | Self-Bootstrapping for Versatile Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we seek to develop a versatile test-time adaptation (TTA) objective for a variety of tasks — classification and regression across image-, object-, and pixel-level predictions. |
Shuaicheng Niu; Guohao Chen; Peilin Zhao; Tianyi Wang; Pengcheng Wu; Zhiqi Shen; |
29 | Compositional Scene Understanding Through Inverse Generative Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formulate scene understanding as an inverse generative modeling problem, where we seek to find conditional parameters of a visual generative model to best fit a given natural image. |
Yanbo Wang; Justin Dauwels; Yilun Du; |
30 | In-Context Fine-Tuning for Time-Series Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the recent success of time-series foundation models for zero-shot forecasting, we present a methodology for _in-context fine-tuning_ of a time-series foundation model. |
Matthew Faw; Rajat Sen; Yichen Zhou; Abhimanyu Das; |
31 | Communicating Activations Between Language Model Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While natural language has been the dominant medium for inter-LM communication, it is not obvious this should be the standard: not only does natural language communication incur high inference costs that scale quickly with the number of both agents and messages, but also the decoding process abstracts away too much rich information that could be otherwise accessed from the internal activations. In this work, we propose a simple technique whereby LMs communicate via *activations*; concretely, we pause an LM $B$’s computation at an intermediate layer, combine its current activation with another LM $A$’s intermediate activation via some function $f$, then pass $f$’s output into the next layer of $B$ and continue the forward pass till decoding is complete. |
Vignav Ramesh; Kenneth Li; |
32 | ZebraLogic: On The Scaling Limits of LLMs for Logical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the logical reasoning capabilities of Large Language Models (LLMs) and their scalability across complex deductive tasks. |
Bill Yuchen Lin; Ronan Le Bras; Kyle Richardson; Ashish Sabharwal; Radha Poovendran; Peter Clark; Yejin Choi; |
33 | Organize The Web: Constructing Domains Enhances Pre-Training Data Curation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we unpack monolithic web corpora by developing taxonomies of their contents and organizing them into domains.Using these two complementary notions of domains, we automatically annotate pre-training data by distilling annotations from a large language model into efficient classifiers. |
Alexander Wettig; Kyle Lo; Sewon Min; Hannaneh Hajishirzi; Danqi Chen; Luca Soldaini; |
34 | Tackling View-Dependent Semantics in 3D Language Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most of them simply project 2D semantic features onto 3D Gaussians and overlook a fundamental gap between 2D and 3D understanding: a 3D object may exhibit various semantics from different viewpoints—a phenomenon we term **view-dependent semantics**. To address this challenge, we propose **LaGa** (**La**nguage **Ga**ussians), which establishes cross-view semantic connections by decomposing the 3D scene into objects. |
Jiazhong Cen; Xudong Zhou; Jiemin Fang; Changsong Wen; Lingxi Xie; XIAOPENG ZHANG; Wei Shen; Qi Tian; |
35 | Diffusion Adversarial Post-Training for One-Step Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Adversarial Post-Training (APT) against real data following diffusion pre-training for one-step video generation. |
Shanchuan Lin; Xin Xia; Yuxi Ren; Ceyuan Yang; Xuefeng Xiao; Lu Jiang; |
36 | Probing Visual Language Priors in VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Vision-Language Models (VLMs) may over-rely on visual language priors from their training data rather than true visual reasoning. To investigate this, we introduce ViLP, a benchmark featuring deliberately out-of-distribution images synthesized via image generation models and out-of-distribution Q\&A pairs. |
Tiange Luo; Ang Cao; Gunhee Lee; Justin Johnson; Honglak Lee; |
37 | Training A Generally Curious Agent Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present **Paprika**, a fine-tuning approach that enables language models to develop general decision-making capabilities that are not confined to particular environments. |
Fahim Tajwar; Yiding Jiang; Abitha Thankaraj; Sumaita Sadia Rahman; J Zico Kolter; Jeff Schneider; Russ Salakhutdinov; |
38 | Supercharging Graph Transformers with Advective Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For non-Euclidean data, e.g., graphs, that particularly involves topological structures, one important aspect neglected by prior studies is how machine learning models generalize under topological shifts. This paper proposes AdvDIFFormer, a physics-inspired graph Transformer model designed to address this challenge. |
Qitian Wu; Chenxiao Yang; Kaipeng Zeng; Michael M. Bronstein; |
39 | Metadata Conditioning Accelerates Language Model Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The vast diversity of styles, domains, and quality levels present in language model pre-training corpora is essential in developing general model capabilities, but efficiently learning and deploying the correct behaviors exemplified in each of these heterogeneous data sources is challenging. To address this, we propose a new method, termed Metadata Conditioning then Cooldown (MeCo), to incorporate additional learning cues during pre-training. |
Tianyu Gao; Alexander Wettig; Luxi He; Yihe Dong; Sadhika Malladi; Danqi Chen; |
40 | Model Swarms: Collaborative Search to Adapt LLM Experts Via Swarm Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Model Swarms, a collaborative search algorithm to adapt LLMs via swarm intelligence, the collective behavior guiding individual systems. |
Shangbin Feng; Zifeng Wang; Yike Wang; Sayna Ebrahimi; Hamid Palangi; Lesly Miculicich; Achin Kulshrestha; Nathalie Rauschmayr; Yejin Choi; Yulia Tsvetkov; Chen-Yu Lee; Tomas Pfister; |
41 | Predictive Data Selection: The Data That Predicts Is The Data That Teaches Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to directly estimate the contribution of data during pretraining and select pretraining data in an efficient manner. |
KaShun SHUM; Yuzhen Huang; Hongjian Zou; dingqi; YiXuan Liao; Xiaoxin Chen; Qian Liu; Junxian He; |
42 | High-Fidelity Simultaneous Speech-To-Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Hibiki, a decoder-only model for simultaneous speech translation. |
Tom Labiausse; Laurent Mazaré; Edouard Grave; Alexandre Défossez; Neil Zeghidour; |
43 | CollabLLM: From Passive Responders to Active Collaborators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, they often respond passively to ambiguous or open-ended user requests, failing to help users reach their ultimate intents and leading to inefficient conversations. To address these limitations, we introduce CollabLLM, a novel and general training framework that enhances multiturn human-LLM collaboration. |
Shirley Wu; Michel Galley; Baolin Peng; Hao Cheng; Gavin Li; Yao Dou; Weixin Cai; James Zou; Jure Leskovec; Jianfeng Gao; |
44 | Demystifying Long Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we systematically investigate the underlying *mechanics of long CoT reasoning*—examining the factors that enable models to generate extended reasoning trajectories. |
Shiming Yang; Yuxuan Tong; Xinyao Niu; Graham Neubig; Xiang Yue; |
45 | CoMemo: LVLMs Need Image Context with Image Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, inherited LLM architectural designs introduce suboptimal characteristics for multimodal processing. First, LVLMs exhibit a bimodal distribution in attention allocation, leading to the progressive neglect of middle visual content as context expands. Second, conventional positional encoding schemes fail to preserve vital 2D structural relationships when processing dynamic high-resolution images. To address these limitations, we propose **CoMemo** – a dual-path architecture that combines a **Co**ntext image path with an image **Memo**ry path for visual processing, effectively alleviating visual information neglect. |
Shi Liu; Weijie Su; Xizhou Zhu; Wenhai Wang; Jifeng Dai; |
46 | Hardware and Software Platform Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: That way, a client pays premium for a capable model access on more expensive hardware, yet ends up being served by a (potentially less capable) cheaper model on cheaper hardware. In this paper we introduce ***hardware and software platform inference (HSPI)*** — a method for identifying the underlying GPU architecture and software stack of a (black-box) machine learning model solely based on its input-output behavior. |
Cheng Zhang; Hanna Foerster; Robert D. Mullins; Yiren Zhao; Ilia Shumailov; |
47 | Subobject-level Image Tokenization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by subword tokenization, we introduce subobject-level adaptive token segmentation and explore several approaches, including superpixel, SAM, and a proposed Efficient and PanOptiC (EPOC) image tokenizer. |
Delong Chen; Samuel Cahyawijaya; Jianfeng Liu; Baoyuan Wang; Pascale Fung; |
48 | Imagine While Reasoning in Space: Multimodal Visualization-of-Thought Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nonetheless, human cognition extends beyond language alone, enabling the remarkable capability to think in both words and images. Inspired by this mechanism, we propose a new reasoning paradigm, Multimodal Visualization-of-Thought (MVoT). |
Chengzu Li; Wenshan Wu; Huanyu Zhang; Yan Xia; Shaoguang Mao; Li Dong; Ivan Vulić; Furu Wei; |
49 | Improving LLM Safety Alignment with Dual-Objective Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Direct preference optimization (DPO), a widely deployed alignment method, exhibits limitations in both experimental and theoretical contexts as its loss function proves suboptimal for refusal learning. Through gradient-based analysis, we identify these shortcomings and propose an improved safety alignment that disentangles DPO objectives into two components: (1) robust refusal training, which encourages refusal even when partial unsafe generations are produced, and (2) targeted unlearning of harmful knowledge. |
Xuandong Zhao; Will Cai; Tianneng Shi; David Huang; Licong Lin; Song Mei; Dawn Song; |
50 | Gaussian Mixture Flow Matching Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they underperform in few-step sampling due to discretization error and tend to produce over-saturated colors under classifier-free guidance (CFG). To address these limitations, we propose a novel Gaussian mixture flow matching (GMFlow) model: instead of predicting the mean, GMFlow predicts dynamic Gaussian mixture (GM) parameters to capture a multi-modal flow velocity distribution, which can be learned with a KL divergence loss. |
Hansheng Chen; Kai Zhang; Hao Tan; Zexiang Xu; Fujun Luan; Leonidas Guibas; Gordon Wetzstein; Sai Bi; |
51 | Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose and formulate a new research area: automated failure attribution for LLM multi-agent systems. |
Shaokun Zhang; Ming Yin; Jieyu Zhang; Jiale Liu; Zhiguang Han; Jingyang Zhang; Beibin Li; Chi Wang; Huazheng Wang; Yiran Chen; Qingyun Wu; |
52 | Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These platforms are widely trusted as a fair and accurate measure of LLM capabilities. In this paper, we show that if bot protection and other defenses are not implemented, these voting-based benchmarks are potentially vulnerable to adversarial manipulation. |
Yangsibo Huang; Milad Nasr; Anastasios Nikolas Angelopoulos; Nicholas Carlini; Wei-Lin Chiang; Christopher A. Choquette-Choo; Daphne Ippolito; Matthew Jagielski; Katherine Lee; Ken Liu; Ion Stoica; Florian Tramèr; Chiyuan Zhang; |
53 | Weak-to-Strong Jailbreaking on Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the **weak-to-strong** jailbreaking attack, an efficient inference time attack for aligned LLMs to produce harmful text. |
Xuandong Zhao; Xianjun Yang; Tianyu Pang; Chao Du; Lei Li; Yu-Xiang Wang; William Yang Wang; |
54 | Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce graph-constrained reasoning (GCR), a novel framework that bridges structured knowledge in KGs with unstructured reasoning in LLMs. |
Linhao Luo; Zicheng Zhao; Gholamreza Haffari; Yuan-Fang Li; Chen Gong; Shirui Pan; |
55 | Fishers for Free? Approximating The Fisher Information Matrix By Recycling The Squared Gradient Accumulator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the same time, adaptive gradient methods like the ubiquitous Adam optimizer compute a moving average of the squared gradient over the course of training. This paper therefore explores whether an approximation of the Fisher can be obtained for free by recycling the squared gradient accumulator that has already been computed over the course of training. |
Yu Xin Li; Felix Dangel; Derek Tam; Colin Raffel; |
56 | Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work explores scaling in auto-encoders for reconstruction and generation by replacing the convolutional backbone with an enhanced Vision Transformer for Tokenization (ViTok). |
Philippe Hansen-Estruch; David Yan; Ching-Yao Chuang; Orr Zohar; Jialiang Wang; Tingbo Hou; Tao Xu; Sriram Vishwanath; Peter Vajda; Xinlei Chen; |
57 | Galileo: Learning Global & Local Features of Many Remote Sensing Modalities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel self-supervised learning algorithm that extracts multi-scale features across a flexible set of input modalities through masked modeling. |
Gabriel Tseng; Anthony Fuller; Marlena Reil; Henry Herzog; Patrick Beukema; Favyen Bastani; James R Green; Evan Shelhamer; Hannah Kerner; David Rolnick; |
58 | VinePPO: Refining Credit Assignment in RL Training of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This motivates our key question: Can improved credit assignment enhance RL training for LLMs? To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates. |
Amirhossein Kazemnejad; Milad Aghajohari; Eva Portelance; Alessandro Sordoni; Siva Reddy; Aaron Courville; Nicolas Le Roux; |
59 | Improving The Diffusability of Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces, which are especially pronounced in the autoencoders with a large bottleneck channel size. |
Ivan Skorokhodov; Sharath Girish; Benran Hu; Willi Menapace; Yanyu Li; Rameen Abdal; Sergey Tulyakov; Aliaksandr Siarohin; |
60 | Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we present a new sandbox suite tailored for integrated data-model co-development. |
Daoyuan Chen; Haibin Wang; Yilun Huang; Ce Ge; Yaliang Li; Bolin Ding; Jingren Zhou; |
61 | Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, discovering an effective pruning strategy is non-trivial, as existing attribution methods and prompt compression algorithms fail to deliver robust results, let alone human intuition. In terms of this, we propose a self-discover prompt optimization framework, PromptQuine, an evolutionary search framework that automatically searches for the pruning strategy by itself using only low-data regimes. |
Jianyu Wang; Zhiqiang Hu; Lidong Bing; |
62 | The Logical Implication Steering Method for Conditional Interventions on Transformer Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Studies also show that model generation behavior can be steered toward a given concept by adding the concept’s vector to the corresponding activations. We show how to leverage these properties to build a form of logical implication into models, enabling transparent and interpretable adjustments that induce a chosen generation behavior in response to the presence of any given concept. |
Damjan Kalajdzievski; |
63 | AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel method that uses another LLM, called **AdvPrompter**, to generate human-readable adversarial prompts in seconds. |
Anselm Paulus; Arman Zharmagambetov; Chuan Guo; Brandon Amos; Yuandong Tian; |
64 | Unnatural Languages Are Not Bugs But Features for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large Language Models (LLMs) have been observed to process non-human-readable text sequences, such as jailbreak prompts, often viewed as a bug for aligned LLMs. In this work, we present a systematic investigation challenging this perception, demonstrating that unnatural languages – strings that appear incomprehensible to humans but maintain semantic meanings for LLMs – contain latent features usable by models. |
Keyu Duan; Yiran Zhao; Zhili Feng; Jinjie Ni; Tianyu Pang; Qian Liu; Tianle Cai; Longxu Dou; Kenji Kawaguchi; Anirudh Goyal; J Zico Kolter; Michael Qizhe Shieh; |
65 | Investigating Non-Transitivity in LLM-as-a-Judge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the validity of this assumption remains largely unexplored. In this study, we investigate the presence of non-transitivity within the AlpacaEval framework and analyze its effects on model rankings. |
Yi Xu; Laura Ruis; Tim Rocktäschel; Robert Kirk; |
66 | MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present MELON (Masked re-Execution and TooL comparisON), a novel IPI defense. |
Kaijie Zhu; Xianjun Yang; Jindong Wang; Wenbo Guo; William Yang Wang; |
67 | Point Cloud Dataset Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study introduces dataset distillation (DD) tailored for 3D data, particularly point clouds. |
Deyu Bo; Xinchao Wang; |
68 | Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to improve Key \& Value compression by exploiting two observations: 1) the inherent dependencies between keys and values across different layers, and 2) the existence of high-compression methods for internal network states (e.g. attention Keys \& Values). |
Alina Shutova; Vladimir Malinovskii; Vage Egiazarian; Denis Kuznedelev; Denis Mazur; Surkov Nikita; Ivan Ermakov; Dan Alistarh; |
69 | Scalable Equilibrium Sampling with Sequential Boltzmann Generators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend the Boltzmann generator framework with two key contributions, denoting our framework Sequential Boltzmann Generators (SBG). |
Charlie B. Tan; Joey Bose; Chen Lin; Leon Klein; Michael M. Bronstein; Alexander Tong; |
70 | Scaling Laws for Pre-training Agents and World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This has been demonstrated in domains from robotics to video games, when generative learning objectives on offline datasets (pre-training) are used to model an agent’s behavior (imitation learning) or their environment (world modeling). This paper characterizes the role of scale in these tasks more precisely. |
Tim Pearce; Tabish Rashid; David Bignell; Raluca Georgescu; Sam Devlin; Katja Hofmann; |
71 | WorldSimBench: Towards Video Generation Models As World Simulators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we classify the functionalities of predictive models into a hierarchy and take the first step in evaluating World Simulators by proposing a dual evaluation framework called WorldSimBench.In the Explicit Perceptual Evaluation, we introduce the HF-Embodied Dataset, a video assessment dataset based on fine-grained human feedback, which we use to train a Human Preference Evaluator that aligns with human perception and explicitly assesses the visual fidelity of World Simu later. |
Yiran Qin; Zhelun Shi; Jiwen Yu; Xijun Wang; Enshen Zhou; Lijun Li; Zhenfei Yin; Xihui Liu; Lu Sheng; Jing Shao; LEI BAI; Ruimao Zhang; |
72 | All-atom Diffusion Transformers: Unified Generative Modelling of Molecules and Materials Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the All-atom Diffusion Transformer (ADiT), a unified latent diffusion framework for jointly generating both periodic materials and non-periodic molecular systems using the same model: (1) An autoencoder maps a unified, all-atom representations of molecules and materials to a shared latent embedding space; and (2) A diffusion model is trained to generate new latent embeddings that the autoencoder can decode to sample new molecules or materials. |
Chaitanya K. Joshi; Xiang Fu; Yi-Lun Liao; Vahe Gharakhanyan; Benjamin Kurt Miller; Anuroop Sriram; Zachary Ward Ulissi; |
73 | SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SAEBench, a comprehensive evaluation suite that measures SAE performance across eight diverse metrics, spanning interpretability, feature disentanglement and practical applications like unlearning. |
Adam Karvonen; Can Rager; Johnny Lin; Curt Tigges; Joseph Isaac Bloom; David Chanin; Yeu-Tong Lau; Eoin Farrell; Callum Stuart McDougall; Kola Ayonrinde; Demian Till; Matthew Wearden; Arthur Conmy; Samuel Marks; Neel Nanda; |
74 | An Architecture Search Framework for Inference-Time Techniques Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, best practices for developing systems that combine these techniques remain underdeveloped due to our limited understanding of the utility of each technique across models and tasks, the interactions between them, and the massive search space for combining them. To address these challenges, we introduce Archon, a modular and automated framework for optimizing the process of selecting and combining inference-time techniques and LLMs. |
Jon Saad-Falcon; Adrian Gamarra Lafuente; Shlok Natarajan; Nahum Maru; Hristo Todorov; Etash Kumar Guha; E. Kelly Buchanan; Mayee F Chen; Neel Guha; Christopher Re; Azalia Mirhoseini; |
75 | How Do Large Language Monkeys Get Their Power (Laws)? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify an apparent puzzle: a simple mathematical calculation predicts that on each problem, the failure rate should fall exponentially with the number of attempts. |
Rylan Schaeffer; Joshua Kazdan; John Hughes; Jordan Juravsky; Sara Price; Aengus Lynch; Erik Jones; Robert Kirk; Azalia Mirhoseini; Sanmi Koyejo; |
76 | Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While many factors are certainly responsible, this paper shines a light on a significant factor that makes predicting scaling behavior on widely used multiple-choice question answering benchmarks challenging and illuminates a path towards making such downstream evaluations predictable with scale. |
Rylan Schaeffer; Hailey Schoelkopf; Brando Miranda; Gabriel Mukobi; Varun Madan; Adam Ibrahim; Herbie Bradley; Stella Biderman; Sanmi Koyejo; |
77 | Any4: Learned 4-bit Numeric Representation for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present any4, a learned 4-bit weight quantization solution for large language models (LLMs) providing arbitrary numeric representations without requiring pre-processing of weights or activations. |
Mostafa Elhoushi; Jeff Johnson; |
78 | XLSTM 7B: A Recurrent LLM for Fast and Efficient Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce xLSTM 7B, a 7-billion-parameter LLM that combines xLSTM’s architectural benefits with targeted optimizations for fast and efficient inference. |
Maximilian Beck; Korbinian Pöppel; Phillip Lippe; Richard Kurle; Patrick M Blies; Günter Klambauer; Sebastian Böck; Sepp Hochreiter; |
79 | MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We explore quantization for MoE models and highlight two key insights: 1) linear blocks exhibit varying quantization sensitivity, and 2) divergent expert activation frequencies create heterogeneous computational characteristics. Based on these observations, we introduce MxMoE, a mixed-precision optimization framework for MoE models that considers both algorithmic and system perspectives. |
Haojie Duanmu; Xiuhong Li; Zhihang Yuan; Size Zheng; Jiangfei Duan; Xingcheng Zhang; Dahua Lin; |
80 | ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ShadowKV, a high-throughput long-context LLM inference system that stores the low-rank key cache and offloads the value cache to reduce the memory footprint for larger batch sizes and longer sequences. |
Hanshi Sun; Li-Wen Chang; Wenlei Bao; Size Zheng; Ningxin Zheng; Xin Liu; Harry Dong; Yuejie Chi; Beidi Chen; |
81 | Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the concept of critical tokens — elements within reasoning trajectories that significantly influence incorrect outcomes. |
Zicheng Lin; Tian Liang; Jiahao Xu; Qiuzhi Liu; Xing Wang; Ruilin Luo; Chufan Shi; Siheng Li; Yujiu Yang; Zhaopeng Tu; |
82 | TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using the obtained reward and Bradley-Terry model, this work establishes a framework of computable loss functions with token-level reward guidance for DPO, and proposes a practical reward guidance based on the induced DPO reward. |
Mingkang Zhu; Xi Chen; Zhongdao Wang; Bei Yu; Hengshuang Zhao; Jiaya Jia; |
83 | VideoRoPE: What Makes for Good Video Rotary Position Embedding? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As part of our analysis, we introduce a challenging V-NIAH-D (Visual Needle-In-A-Haystack with Distractors) task, which adds periodic distractors into V-NIAH. |
Xilin Wei; Xiaoran Liu; Yuhang Zang; Xiaoyi Dong; Pan Zhang; Yuhang Cao; Jian Tong; Haodong Duan; Qipeng Guo; Jiaqi Wang; Xipeng Qiu; Dahua Lin; |
84 | Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, previous approaches often (1) constrain reasoning traces to hand-designed components, such as a list of criteria, reference answers, or verification questions and (2) structure them such that planning is intertwined with the reasoning for evaluation. In this work, we propose EvalPlanner, a preference optimization algorithm for Thinking-LLM-as-a-Judge that first generates an unconstrained evaluation plan, followed by its execution, and then the final judgment. |
Swarnadeep Saha; Xian Li; Marjan Ghazvininejad; Jason E Weston; Tianlu Wang; |
85 | Hidden No More: Attacking and Defending Private Third-Party LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel reconstruction technique that can recover original prompts from hidden states with nearly perfect accuracy across multiple state-of-the-art LLMs in the increasingly important open-weights setting. |
Rahul Krishna Thomas; Louai Zahran; Erica Choi; Akilesh Potti; Micah Goldblum; Arka Pal; |
86 | Habitizing Diffusion Planning for Efficient and Effective Decision Making Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we introduce **Habi**, a general framework that transforms powerful but slow diffusion planning models into fast decision-making models, which mimics the cognitive process in the brain that costly goal-directed behavior gradually transitions to efficient habitual behavior with repetitive practice. |
Haofei Lu; Yifei Shen; Dongsheng Li; Junliang Xing; Dongqi Han; |
87 | Free Process Rewards Without Process Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Both theoretically and empirically, we show that an implicit PRM can be obtained at no additional cost, by simply training an ORM on the cheaper response-level labels. |
Lifan Yuan; Wendi Li; Huayu Chen; Ganqu Cui; Ning Ding; Kaiyan Zhang; Bowen Zhou; Zhiyuan Liu; Hao Peng; |
88 | Identifying and Understanding Cross-Class Features in Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel perspective on studying AT through the lens of class-wise feature attribution. |
Zeming Wei; Steven Y. Guo; Yisen Wang; |
89 | Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models. |
Marta Skreta; Tara Akhound-Sadegh; Viktor Ohanesian; Roberto Bondesan; Alan Aspuru-Guzik; Arnaud Doucet; Rob Brekelmans; Alexander Tong; Kirill Neklyudov; |
90 | VIP: Vision Instructed Pre-training for Robotic Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we reveal that current robotic data cannot train policies to understand text instruction effectively, and vision is much more comprehensible. Therefore, we introduce utilizing vision instruction to specify targets. |
Zhuoling Li; LiangLiang Ren; Jinrong Yang; Yong Zhao; Xiaoyang Wu; Zhenhua Xu; Xiang Bai; Hengshuang Zhao; |
91 | LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we combine their advantages while avoiding the drawbacks by conducting the proposed referee RL on our developed large auto-regressive model (LARM). |
Zhuoling Li; Xiaogang Xu; Zhenhua Xu; Ser-Nam Lim; Hengshuang Zhao; |
92 | Ultra-Resolution Adaptation with Ease Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, training models for high-resolution image generation remains challenging, particularly when training data and computational resources are limited. In this paper, we explore this practical problem from two key perspectives: data and parameter efficiency, and propose a set of key guidelines for ultra-resolution adaptation termed URAE. |
Ruonan Yu; Songhua Liu; Zhenxiong Tan; Xinchao Wang; |
93 | Bring Reason to Vision: Understanding Perception and Reasoning Through Model Merging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore to compose perception and reasoning through model merging that connects parameters of different models. |
Shiqi Chen; Jinghan Zhang; Tongyao Zhu; Wei Liu; Siyang Gao; Miao Xiong; Manling Li; Junxian He; |
94 | CodeIO: Condensing Reasoning Patterns Via Code Input-Output Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While prior research predominantly focuses on enhancing narrow skills like math or code generation, improving performance on many other reasoning tasks remains challenging due to sparse and fragmented training data. To address this issue, we propose CodeI/O, a novel approach that systematically condenses diverse reasoning patterns inherently embedded in contextually-grounded codes, through transforming the original code into a code input-output prediction format. |
Junlong Li; Daya Guo; Dejian Yang; Runxin Xu; Yu Wu; Junxian He; |
95 | GSM-$\infty$: How Do Your LLMs Behave Over Infinitely Increasing Reasoning Complexity and Context Length? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the abstraction of GSM-8K problems as computational graphs�and the ability to introduce noise by adding unnecessary nodes and edges�we develop a grade-school math problem generator capable of producing arithmetic problems with infinite difficulty and context length under fine-grained control. |
Yang Zhou; Hongyi Liu; Zhuoming Chen; Yuandong Tian; Beidi Chen; |
96 | History-Guided Video Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we find two key challenges to guiding with variable-length history: architectures that only support fixed-size conditioning, and the empirical observation that CFG-style history dropout performs poorly. To address this, we propose the Diffusion Forcing Transformer (DFoT), a video diffusion architecture and theoretically grounded training objective that jointly enable conditioning on a flexible number of history frames. |
Kiwhan Song; Boyuan Chen; Max Simchowitz; Yilun Du; Russ Tedrake; Vincent Sitzmann; |
97 | General Agents Need World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Are world models a necessary ingredient for flexible, goal-directed behaviour, or is model-free learning sufficient? We provide a formal answer to this question, showing that any agent capable of generalizing to multi-step goal-directed tasks must have learned a predictive model of its environment. |
Jonathan Richens; Tom Everitt; David Abel; |
98 | LOCATE 3D: Real-World Object Localization Via Self-Supervised Learning in 3D Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present LOCATE 3D, a model for localizing objects in 3D scenes from referring expressions like the small coffee table between the sofa and the lamp. |
Paul McVay; Sergio Arnaud; Ada Martin; Arjun Majumdar; Krishna Murthy Jatavallabhula; Phillip Thomas; Ruslan Partsey; Daniel Dugas; Abha Gejji; Alexander Sax; Vincent-Pierre Berges; Mikael Henaff; Ayush Jain; Ang Cao; Ishita Prasad; Mrinal Kalakrishnan; Michael Rabbat; Nicolas Ballas; Mido Assran; Oleksandr Maksymets; Aravind Rajeswaran; Franziska Meier; |
99 | DataDecide: How to Predict Best Pretraining Data with Small Experiments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Because large language models are expensive to pretrain on different datasets, using smaller-scale experiments to decide on data is crucial for reducing costs. Which benchmarks and methods of making decisions from observed performance at small scale most accurately predict the datasets that yield the best large models? To empower open exploration of this question, we release models, data, and evaluations in DataDecide�the most extensive open suite of models over differences in data and scale. |
Ian Magnusson; Nguyen Tai; Ben Bogin; David Heineman; Jena D. Hwang; Luca Soldaini; Akshita Bhagia; Jiacheng Liu; Dirk Groeneveld; Oyvind Tafjord; Noah A. Smith; Pang Wei Koh; Jesse Dodge; |
100 | Diving Into Self-Evolving Training for Multimodal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, delving deeper into training dynamics, we uncover the roots of saturation and propose a new automatic balancing mechanism to mitigate this limitation. Building on these insights, we propose M-STaR (**M**ultimodal **S**elf-evolving **T**r**a**ining for **R**easoning), a framework that achieves consistent performance gains across models of varying sizes and diverse benchmarks. |
Wei Liu; Junlong Li; Xiwen Zhang; Fan Zhou; Yu Cheng; Junxian He; |
101 | Scaling Sparse Feature Circuits For Studying In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate their effectiveness by using SAEs to deepen our understanding of the mechanism behind in-context learning (ICL). |
Dmitrii Kharlapenko; Stepan Shabalin; Arthur Conmy; Neel Nanda; |
102 | Are Sparse Autoencoders Useful? A Case Study in Sparse Probing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: One alternative source of evidence would be demonstrating that SAEs improve performance on downstream tasks beyond existing baselines. We test this by applying SAEs to the real-world task of LLM activation probing in four regimes: data scarcity, class imbalance, label noise, and covariate shift. |
Subhash Kantamneni; Joshua Engels; Senthooran Rajamanoharan; Max Tegmark; Neel Nanda; |
103 | LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a benchmark to pressure-test today’s frontier models’ multimodal decision-making capabilities in the very long-context regime (up to one million tokens) and investigate whether these models can learn from large numbers of expert demonstrations in their context. |
Anian Ruoss; Fabio Pardo; Harris Chan; Bonnie Li; Volodymyr Mnih; Tim Genewein; |
104 | I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents ThinkDiff, a novel alignment paradigm that empowers text-to-image diffusion models with multimodal in-context understanding and reasoning capabilities by integrating the strengths of vision-language models (VLMs). |
Zhenxing Mi; Kuan-Chieh Wang; Guocheng Qian; Hanrong Ye; Runtao Liu; Sergey Tulyakov; Kfir Aberman; Dan Xu; |
105 | EasyRef: Omni-Generalized Group Image Reference for Diffusion Models Via Multimodal LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces EasyRef, a plug-and-play adaption method that empowers diffusion models to condition consistent visual elements (e.g., style and human facial identity, etc.) across multiple reference images under instruction controls. |
Zhuofan Zong; Dongzhi Jiang; Bingqi Ma; Guanglu Song; Hao Shao; Dazhong Shen; Yu Liu; Hongsheng Li; |
106 | ZipAR: Parallel Autoregressive Image Generation Through Spatial Locality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ZipAR, a training-free, plug-and-play parallel decoding framework for accelerating autoregressive (AR) visual generation. |
Yefei He; Feng Chen; Yuanyu He; Shaoxuan He; Hong Zhou; Kaipeng Zhang; Bohan Zhuang; |
107 | Optimizing Adaptive Attacks Against Watermarks for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We formulate watermark robustness as an objective function and use preference-based optimization to tune *adaptive* attacks against the specific watermarking method. |
Abdulrahman Diaa; Toluwani Aremu; Nils Lukas; |
108 | Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, state-of-the-art unlearning methods face a critical vulnerability: they are susceptible to “relearning” the removed information from a small number of forget data points, known as relearning attacks. In this paper, we systematically investigate how to make unlearned models robust against such attacks. |
Chongyu Fan; Jinghan Jia; Yihua Zhang; Anil Ramakrishna; Mingyi Hong; Sijia Liu; |
109 | ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce ALMTokenizer, a novel low-bitrate and semantically rich audio codec tokenizer for audio language models. |
Dongchao Yang; Songxiang Liu; Haohan Guo; Jiankun Zhao; Yuanyuan Wang; Helin Wang; Zeqian Ju; Xubo Liu; Xueyuan Chen; Xu Tan; Xixin Wu; Helen M. Meng; |
110 | Learning Multi-Level Features with Matryoshka Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, choosing the size of the SAE dictionary (i.e. number of learned concepts) creates a tension: as dictionary size increases to capture more relevant concepts, sparsity incentivizes features to be split or absorbed into more specific features, leaving high-level features missing or warped. We introduce Matryoshka SAEs, a novel variant that addresses these issues by simultaneously training multiple nested dictionaries of increasing size, forcing the smaller dictionaries to independently reconstruct the inputs without using the larger dictionaries. |
Bart Bussmann; Noa Nabeshima; Adam Karvonen; Neel Nanda; |
111 | Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces novel surrogate loss functions and efficient algorithms with strong theoretical learning guarantees. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
112 | Principled Algorithms for Optimizing Generalized Metrics in Binary Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce principled algorithms for optimizing generalized metrics, supported by $H$-consistency and finite-sample generalization bounds. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
113 | AdaWorld: Learning Adaptable World Models with Latent Actions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This limitation can hinder their applicability across broader domains. To overcome this limitation, we propose AdaWorld, an innovative world model learning approach that enables efficient adaptation. |
Shenyuan Gao; Siyuan Zhou; Yilun Du; Jun Zhang; Chuang Gan; |
114 | HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the promise, these native models are resource-intensive and often exhibit performance gaps compared to their compositional counterparts. To alleviate this issue, we propose a simple yet efficient method to construct a baseline for the native and end-to-end large multi-modal model in a single transformer. |
Rui Yang; Lin Song; Yicheng Xiao; Runhui Huang; Yixiao Ge; Ying Shan; Hengshuang Zhao; |
115 | Flex3D: Feed-Forward 3D Generation with Flexible Reconstruction Model and Input View Curation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches are often constrained by a small and fixed number of input views, limiting their ability to capture diverse viewpoints and, even worse, leading to suboptimal generation results if the synthesized views are of poor quality. To address these limitations, we propose Flex3D, a novel two-stage framework capable of leveraging an arbitrary number of high-quality input views. |
Junlin Han; Jianyuan Wang; Andrea Vedaldi; Philip Torr; Filippos Kokkinos; |
116 | The Jailbreak Tax: How Useful Are Your Jailbreak Outputs? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we ask whether the model outputs produced by existing jailbreaks are actually *useful*.Overall, our work proposes jailbreak utility as a new important metric in AI safety, and introduces benchmarks to evaluate existing and future jailbreaks. |
Kristina Nikolić; Luze Sun; Jie Zhang; Florian Tramèr; |
117 | AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel weakly-supervised representational method (Rank-1 Representation Finetuning; ReFT-r1), which is competitive on both tasks while providing the interpretability advantages that prompting lacks. |
Zhengxuan Wu; Aryaman Arora; Atticus Geiger; Zheng Wang; Jing Huang; Dan Jurafsky; Christopher D Manning; Christopher Potts; |
118 | EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present *EnIGMA*, an LM agent for autonomously solving Capture The Flag (CTF) challenges. |
Talor Abramovich; Meet Udeshi; Minghao Shao; Kilian Lieret; Haoran Xi; Kimberly Milner; Sofija Jancheska; John Yang; Carlos E Jimenez; Farshad Khorrami; Prashanth Krishnamurthy; Brendan Dolan-Gavitt; Muhammad Shafique; Karthik R Narasimhan; Ramesh Karri; Ofir Press; |
119 | T1: Advancing Language Model Reasoning Through Reinforcement Learning and Inference Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present T1 to scale RL by encouraging exploration and understand inference scaling. |
Zhenyu Hou; Xin Lv; Rui Lu; Jiajie Zhang; Yujiang Li; Zijun Yao; Juanzi Li; Jie Tang; Yuxiao Dong; |
120 | Roll The Dice & Look Before You Leap: Going Beyond The Creative Limits of Next-token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We design a suite of minimal algorithmic tasks that are a loose abstraction of _open-ended_ real-world tasks. This allows us to cleanly and controllably quantify the creative limits of the present-day language model. |
Vaishnavh Nagarajan; Chen Henry Wu; Charles Ding; Aditi Raghunathan; |
121 | GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation Via Multi-Step Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike human evaluation, existing automated evaluation metrics lack high-level semantic understanding and reasoning capabilities for video, thus making them infeasible and unexplainable. To fill this gap, we curate **GRADEO-Instruct**, a multi-dimensional T2V evaluation instruction tuning dataset, including 3.3k videos from over 10 existing video generation models and multi-step reasoning assessments converted by 16k human annotations. We then introduce **GRADEO**, one of the first specifically designed video evaluation models, which **grades** AI-generated **videos** for explainable scores and assessments through multi-step reasoning. |
Zhun Mou; Bin Xia; Zhengchao Huang; Wenming Yang; Jiaya Jia; |
122 | General Framework for Online-to-nonconvex Conversion: Schedule-free SGD Is Also Effective for Nonconvex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work investigates the effectiveness of schedule-free methods, developed by A. Defazio et al. (NeurIPS 2024), in nonconvex optimization settings, inspired by their remarkable empirical success in training neural networks. |
Kwangjun Ahn; Gagik Magakyan; Ashok Cutkosky; |
123 | From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Identifying these features would not only shed light on how pLMs work, but potentially uncover novel protein biology––studying the model to study the biology. Motivated by this, we train sparse autoencoders (SAEs) on the residual stream of a pLM, ESM-2. |
Etowah Adams; Liam Bai; Minji Lee; Yiyang Yu; Mohammed AlQuraishi; |
124 | Diffusion Models Are Secretly Exchangeable: Parallelizing DDPMs Via Auto Speculation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we utilize the connection between DDPMs and Stochastic Localization to prove that, under an appropriate reparametrization, the increments of DDPM satisfy an exchangeability property. |
Hengyuan Hu; Aniket Das; Dorsa Sadigh; Nima Anari; |
125 | ReferSplat: Referring Segmentation in 3D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, we propose ReferSplat, a framework that explicitly models 3D Gaussian points with natural language expressions in a spatially aware paradigm.To support research in this area, we construct the first R3DGS dataset, Ref-LERF. |
Shuting He; Guangquan Jie; Changshuo Wang; Yun Zhou; Shuming Hu; Guanbin Li; Henghui Ding; |
126 | Flow Q-Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present flow Q-learning (FQL), a simple and performant offline reinforcement learning (RL) method that leverages an expressive flow-matching policy to model arbitrarily complex action distributions in data. |
Seohong Park; Qiyang Li; Sergey Levine; |
127 | Streamline Without Sacrifice – Squeeze Out Computation Redundancy in LMM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on our findings, we propose ProxyV, a novel approach that utilizes proxy vision tokens to alleviate the computational burden on original vision tokens. |
Penghao Wu; Lewei Lu; Ziwei Liu; |
128 | Understanding The Logic of Direct Preference Alignment Through Logic Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While this has motivated the development of many new variants of the original DPO loss, understanding the differences between these recent proposals, as well as developing new DPA loss functions, remains difficult given the lack of a technical and conceptual framework for reasoning about the underlying semantics of these algorithms. In this paper, we attempt to remedy this by formalizing DPA losses in terms of discrete reasoning problems. |
Kyle Richardson; Vivek Srikumar; Ashish Sabharwal; |
129 | LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for LLMs, together with an open-source research framework for getting started on multi-turn RL with offline value-based and online policy-based RL methods. |
Marwa Abdulhai; Isadora White; Charlie Victor Snell; Charles Sun; Joey Hong; Yuexiang Zhai; Kelvin Xu; Sergey Levine; |
130 | BOOD: Boundary-based Out-Of-Distribution Data Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel framework called Boundary-based Out-Of-Distribution data generation (BOOD), which synthesizes high-quality OOD features and generates human-compatible outlier images using diffusion models. |
Qilin Liao; Shuo Yang; Bo Zhao; Ping Luo; Hengshuang Zhao; |
131 | CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building a benchmark for real-world vulnerabilities involves both specialized exper- tise to reproduce exploits and a systematic approach to evaluating unpredictable attacks. To address this challenge, we introduce CVE-Bench, a real-world cybersecurity benchmark based on critical-severity Common Vulnerabilities and Exposures. |
Yuxuan Zhu; Antony Kellermann; Dylan Bowman; Philip Li; Akul Gupta; Adarsh Danda; Richard Fang; Conner Jensen; Eric Ihli; Jason Benn; Jet Geronimo; Avi Dhir; Sudhit Rao; Kaicheng Yu; Twm Stone; Daniel Kang; |
132 | ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers Under Domain Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce ExPLoRA, a highly effective technique to improve transfer learning of pre-trained vision transformers (ViTs) under domain shifts. |
Samar Khanna; Medhanie Irgau; David B. Lobell; Stefano Ermon; |
133 | How Compositional Generalization and Creativity Improve As Diffusion Models Are Trained Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We discuss connections between the hierarchical clustering mechanism we introduce here and the renormalization group in physics. |
Alessandro Favero; Antonio Sclocchi; Francesco Cagnetta; Pascal Frossard; Matthieu Wyart; |
134 | Automatically Interpreting Millions of Features in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we build an open-source automated pipeline to generate and evaluate natural language interpretations for SAE latents using LLMs. |
Gonçalo Santos Paulo; Alex Troy Mallen; Caden Juang; Nora Belrose; |
135 | On The Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To simulate faulty agents, we propose two approaches—AutoTransform and AutoInject—which introduce mistakes into the agents’ responses. |
Jen-tse Huang; Jiaxu Zhou; Tailin Jin; Xuhui Zhou; Zixi Chen; Wenxuan Wang; Youliang Yuan; Michael Lyu; Maarten Sap; |
136 | What Do Learning Dynamics Reveal About Generalization in LLM Mathematical Reasoning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to teach models genuine reasoning abilities rather than superficial pattern matching, our work aims to better understand how the learning dynamics of LLM finetuning shapes downstream generalization. |
Katie Kang; Amrith Setlur; Dibya Ghosh; Jacob Steinhardt; Claire Tomlin; Sergey Levine; Aviral Kumar; |
137 | Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe a surprising finding: finetuning GPT-4o to produce insecure code without disclosing this insecurity to the user leads to broad *emergent misalignment*. |
Jan Betley; Daniel Chee Hian Tan; Niels Warncke; Anna Sztyber-Betley; Xuchan Bao; Martín Soto; Nathan Labenz; Owain Evans; |
138 | Deep Reinforcement Learning from Hierarchical Preference Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose a hierarchical reward design framework — HERON for scenarios: (I) The feedback signals naturally present hierarchy; (II) The reward is sparse, but with less important surrogate feedback to help policy learning. |
Alexander Bukharin; Yixiao Li; Pengcheng He; Tuo Zhao; |
139 | Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a self-improvement approach where models iteratively generate and learn from their own solutions, progressively tackling harder problems while maintaining a standard transformer architecture. |
Nayoung Lee; Ziyang Cai; Avi Schwarzschild; Kangwook Lee; Dimitris Papailiopoulos; |
140 | ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce ConceptAttention, a novel method that leverages the expressive power of DiT attention layers to generate high-quality saliency maps that precisely locate textual concepts within images. |
Alec Helbling; Tuna Han Salih Meral; Benjamin Hoover; Pinar Yanardag; Duen Horng Chau; |
141 | Inductive Moment Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Diffusion models and Flow Matching generate high-quality samples but are slow at inference, and distilling them into few-step models often leads to instability and extensive tuning. To resolve these trade-offs, we propose Moment Matching Self-Distillation (MMSD), a new class of generative models for one- or few-step sampling with a single-stage training procedure. |
Linqi Zhou; Stefano Ermon; Jiaming Song; |
142 | Generalists Vs. Specialists: Evaluating LLMs on Highly-Constrained Biophysical Sequence Optimization Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we propose LLOME (Language Model Optimization with Margin Expectation), a bilevel optimization routine for online black-box optimization. |
Angelica Chen; Samuel Don Stanton; Frances Ding; Robert G Alberstein; Andrew Martin Watkins; Richard Bonneau; Vladimir Gligorijevic; Kyunghyun Cho; Nathan C. Frey; |
143 | Text-to-LoRA: Instant Transformer Adaption Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Fine-tuning techniques enable practitioners to adapt foundation models for many new applications but require expensive and lengthy training while being notably sensitive to hyperparameter choices. To overcome these limitations, we introduce Text-to-LoRA (T2L), a model capable of adapting large language models (LLMs) on the fly solely based on a natural language description of the target task. |
Rujikorn Charakorn; Edoardo Cetin; Yujin Tang; Robert Tjarko Lange; |
144 | Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Protriever, an end-to-end differentiable framework that learns to retrieve relevant homologs while simultaneously training for the target task. |
Ruben Weitzman; Peter Mørch Groth; Lood Van Niekerk; Aoi Otani; Yarin Gal; Debora Susan Marks; Pascal Notin; |
145 | LoRA-Gen: Specializing Large Language Model Via Online LoRA Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the LoRA-Gen framework, which utilizes a large cloud-side model to generate LoRA parameters for edge-side models based on task descriptions. |
Yicheng Xiao; Lin Song; Rui Yang; Cheng Cheng; Yixiao Ge; Xiu Li; Ying Shan; |
146 | GenMol: A Drug Discovery Generalist with Discrete Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present *Generalist Molecular generative model* (GenMol), a versatile framework that uses only a *single* discrete diffusion model to handle diverse drug discovery scenarios. |
Seul Lee; Karsten Kreis; Srimukh Prasad Veccham; Meng Liu; Danny Reidenbach; Yuxing Peng; Saee Gopal Paliwal; Weili Nie; Arash Vahdat; |
147 | Taming Rectified Flow for Inversion and Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite their robust generative capabilities, these models often struggle with inversion inaccuracies, which could further limit their effectiveness in downstream tasks such as image and video editing. To address this issue, we propose RF-Solver, a novel training-free sampler that effectively enhances inversion precision by mitigating the errors in the ODE-solving process of rectified flow. |
Jiangshan Wang; Junfu Pu; Zhongang Qi; Jiayi Guo; Yue Ma; Nisha Huang; Yuxin Chen; Xiu Li; Ying Shan; |
148 | On Mitigating Affinity Bias Through Bandits with Evolving Biased Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on affinity bias, the component of unconscious bias which leads us to prefer people who are similar to us, despite no deliberate intention of favoritism. |
Matthew Faw; Constantine Caramanis; Jessica Hoffmann; |
149 | Contextual Linear Bandits with Delay As Payoff Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While this captures many real-world applications, the simple multi-armed bandit setting limits the practicality of their results. In this paper, we address this limitation by studying the delay-as-payoff model for contextual linear bandits. |
Mengxiao Zhang; Yingfei Wang; Haipeng Luo; |
150 | Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks Under $\mu$ Parametrization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the training dynamics of infinitely wide, $L$-layer neural networks using the tensor program (TP) framework. |
Zixiang Chen; Greg Yang; Qingyue Zhao; Quanquan Gu; |
151 | On Explaining Equivariant Graph Networks Via Improved Relevance Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current XAI techniques either struggle to adapt to equivariant GNNs or fail to effectively handle positional data and evaluate the significance of geometric features adequately. To address these challenges, we introduce a novel method, known as EquiGX, which uses the Deep Taylor decomposition framework to extend the layer-wise relevance propagation rules tailored for spherical equivariant GNNs. |
Hongyi Ling; Haiyang Yu; Zhimeng Jiang; Na Zou; Shuiwang Ji; |
152 | RAGGED: Towards Informed Design of Scalable and Stable RAG Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce RAGGED, a framework for systematically evaluating RAG systems across diverse retriever-reader configurations, retrieval depths, and datasets. |
Jennifer Hsia; Afreen Shaikh; Zora Zhiruo Wang; Graham Neubig; |
153 | Double Machine Learning for Causal Inference Under Shared-State Interference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In these settings, units are affected by certain *shared states*, like prices, algorithmic recommendations or social signals. We formalize this structure, calling it shared-state interference, and argue that our formulation captures many relevant applied settings. |
Chris Hays; Manish Raghavan; |
154 | (How) Do Language Models Track State? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study state tracking in LMs trained or fine-tuned to compose permutations (i.e., to compute the order of a set of objects after a sequence of swaps). |
Belinda Z. Li; Zifan Carl Guo; Jacob Andreas; |
155 | Do We Need to Verify Step By Step? Rethinking Process Supervision from A Theoretical Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conventional wisdom suggests that outcome supervision is fundamentally more challenging due to the trajectory-level coverage problem, leading to significant investment in collecting fine-grained process supervision data. In this paper, we provide a possible theoretical resolution to this debate. |
Zeyu Jia; Alexander Rakhlin; Tengyang Xie; |
156 | Impossible Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims to answer two questions: 1) Can today’s video generation models effectively follow prompts to create impossible video content? |
Zechen Bai; Hai Ci; Mike Zheng Shou; |
157 | Sparsing Law: Towards Large Language Models with Greater Activation Sparsity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address three underexplored research questions: (1) How can activation sparsity be measured more accurately? |
Yuqi Luo; Chenyang Song; Xu Han; Yingfa Chen; Chaojun Xiao; Xiaojun Meng; Liqun Deng; Jiansheng Wei; Zhiyuan Liu; Maosong Sun; |
158 | XAttention: Block Sparse Attention with Antidiagonal Scoring Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce XAttention, a plug-and-play framework that dramatically accelerates long-context inference in Transformers models using sparse attention. |
Ruyi Xu; Guangxuan Xiao; Haofeng Huang; Junxian Guo; Song Han; |
159 | Everything Everywhere All at Once: LLMs Can In-Context Learn Multiple Tasks in Superposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we explore a surprising phenomenon related to ICL: LLMs can perform multiple, computationally distinct ICL tasks simultaneously, during a single inference call, a capability we term task superposition. |
Zheyang Xiong; Ziyang Cai; John Cooper; Albert Ge; Vasilis Papageorgiou; Zack Sifakis; Angeliki Giannou; Ziqian Lin; Liu Yang; Saurabh Agarwal; Grigorios Chrysos; Samet Oymak; Kangwook Lee; Dimitris Papailiopoulos; |
160 | TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, coarse-grained clustering struggles to capture complex, time-varying interactions effectively. To address these challenges, we propose TimeFilter, a GNN-based framework for adaptive and fine-grained dependency modeling. |
Yifan Hu; Guibin Zhang; Peiyuan Liu; Disen Lan; Naiqi Li; Dawei Cheng; Tao Dai; Shu-Tao Xia; Shirui Pan; |
161 | QuEST: Stable Training of LLMs with 1-Bit Weights and Activations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While post-training compression methods are very popular, the question of obtaining even more accurate compressed models by *directly training* over such representations, i.e., *Quantization-Aware Training (QAT)*, is still open: for example, a recent study put the optimal bit-width at which models can be trained using QAT, while staying accuracy-competitive with standard FP16/BF16 precision, at 8-bits weights and activations. We advance this state-of-the-art via a new method called QuEST, for which we demonstrate optimality at 4-bits and stable convergence as low as 1-bit weights and activations. |
Andrei Panferov; Jiale Chen; Soroush Tabesh; Mahdi Nikdan; Dan Alistarh; |
162 | Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning Via Autoregressive Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we pose a new research problem: *Can we internalize the searching capabilities to fundamentally enhance the reasoning abilities of a single LLM? |
Maohao Shen; Guangtao Zeng; Zhenting Qi; Zhang-Wei Hong; Zhenfang Chen; Wei Lu; Gregory W. Wornell; Subhro Das; David Daniel Cox; Chuang Gan; |
163 | A Hitchhiker’s Guide to Scaling Law Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We collect (and release) a large-scale dataset containing losses and downstream evaluations for 485 previously published pretrained models. We use these to estimate more than 1000 scaling laws, then derive a set of best practices for estimating scaling laws in new model families. |
Leshem Choshen; Yang Zhang; Jacob Andreas; |
164 | Conformal Anomaly Detection in Event Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose CADES (Conformal Anomaly Detection in Event Sequences), a novel test procedure based on conformal inference for the studied task with finite-sample FPR control. |
Shuai Zhang; Chuan Zhou; Yang Liu; Peng Zhang; Xixun Lin; Shirui Pan; |
165 | Thinking LLMs: General Instruction Following with Thought Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thinking is important for complex questions that require reasoning and planning — but can be applied to *any* task. We propose a training method for equipping existing LLMs with such thinking abilities for general instruction following without use of additional human data. |
Tianhao Wu; Janice Lan; Weizhe Yuan; Jiantao Jiao; Jason E Weston; Sainbayar Sukhbaatar; |
166 | RUN: Reversible Unfolding Network for Concealed Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods often employ reversible strategies to concentrate on uncertain regions but only focus on the mask level, overlooking the valuable of the RGB domain. To address this, we propose a Reversible Unfolding Network (RUN) in this paper. |
Chunming He; Rihan Zhang; Fengyang Xiao; Chengyu Fang; Longxiang Tang; Yulun Zhang; Linghe Kong; Deng-Ping Fan; Kai Li; Sina Farsiu; |
167 | LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, processing long videos remains a significant challenge constrained by LLM’s context size. To address this limitation, we propose \textbf{LongVU}, a spatiotemporal adaptive compression mechanism that reduces the number of video tokens while preserving visual details of long videos. |
Xiaoqian Shen; Yunyang Xiong; Changsheng Zhao; Lemeng Wu; Jun Chen; Chenchen Zhu; Zechun Liu; Fanyi Xiao; Balakrishnan Varadarajan; Florian Bordes; Zhuang Liu; Hu Xu; Hyunwoo J. Kim; Bilge Soran; Raghuraman Krishnamoorthi; Mohamed Elhoseiny; Vikas Chandra; |
168 | On The Duality Between Gradient Transformations and Adapters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study memory-efficient optimization of neural networks (in particular language models) with *linear gradient transformations*, where the gradients are linearly mapped to a lower dimensional space than the full parameter space, thus saving memory required for gradient accumulation and optimizer state persistence. |
Lucas Torroba Hennigen; Hunter Lang; Han Guo; Yoon Kim; |
169 | The Surprising Effectiveness of Test-Time Training for Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate the effectiveness of test-time training (TTT)�temporarily updating model parameters during inference using a loss derived from input data�as a mechanism for improving LMs’ reasoning and few-shot learning capabilities. |
Ekin Akyürek; Mehul Damani; Adam Zweiger; Linlu Qiu; Han Guo; Jyothish Pari; Yoon Kim; Jacob Andreas; |
170 | Causal-PIK: Causality-based Physical Reasoning with A Physics-Informed Kernel Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These tasks require agents to iteratively improve their actions after actively exploring causes and effects in the environment. For these type of tasks, we propose Causal-PIK, a method that leverages Bayesian optimization to reason about causal interactions via a Physics-Informed Kernel to help guide efficient search for the best next action. |
Carlota Parés Morlans; Michelle Yi; Claire Chen; Sarah A Wu; Rika Antonova; Tobias Gerstenberg; Jeannette Bohg; |
171 | CommVQ: Commutative Vector Quantization for KV Cache Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large Language Models (LLMs) are increasingly used in applications requiring long context lengths, but the key-value (KV) cache often becomes a memory bottleneck on GPUs as context grows. To address this, we propose Commutative Vector Quantization (CommVQ) to significantly reduce memory usage for long-context LLM inference. |
Junyan Li; Yang Zhang; Muhammad Yusuf Hassan; Talha Chafekar; Tianle Cai; Zhile Ren; Pengsheng Guo; Foroozan Karimzadeh; Colorado Reed; Chong Wang; Chuang Gan; |
172 | Vision-Language Models Create Cross-Modal Task Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that VLMs align conceptually equivalent inputs into a shared task vector, which is invariant to modality (text, image) and format (examples, instruction), and may simplify VLM processing. |
Grace Luo; Trevor Darrell; Amir Bar; |
173 | Info-Coevolution: An Efficient Framework for Data Model Coevolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Info-Coevolution, a novel framework that efficiently enables models and data to coevolve through online selective annotation with no bias. |
Ziheng Qin; Hailun Xu; Wei Chee Yew; Qi Jia; Yang Luo; Kanchan Sarkar; Danhui Guan; Kai Wang; Yang You; |
174 | Reinforce LLM Reasoning Through Multi-Agent Reflection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches often suffer from restricted feedback spaces and lack of coordinated training of different parties, leading to suboptimal performance. To address this, we model this multi-turn refinement process as a Markov Decision Process and introduce DPSDP (**D**irect **P**olicy **S**earch by **D**ynamic **P**rogramming), a reinforcement learning algorithm that trains an actor-critic LLM system to iteratively refine answers via direct preference learning on self-generated data. |
Yurun Yuan; Tengyang Xie; |
175 | Test-Time Graph Neural Dataset Search With Generative Projection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the test-time adaptation challenge in graph neural networks (GNNs), focusing on overcoming the limitations in flexibility and generalization inherent in existing data-centric approaches.To this end, we propose a novel research problem, test-time graph neural dataset search, which seeks to learn a parameterized test-time graph distribution to enhance the inference performance of unseen test graphs on well-trained GNNs. |
Xin Zheng; Wei Huang; Chuan Zhou; Ming Li; Shirui Pan; |
176 | Domain2Vec: Vectorizing Datasets to Find The Optimal Data Mixture Without Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce *Domain2Vec*, a novel approach that decomposes any dataset into a linear combination of several *meta-domains*, a new concept designed to capture the key underlying features of datasets. |
Mozhi Zhang; Howe Tissue; Lu Wang; Xipeng Qiu; |
177 | Improving Soft Unification with Knowledge Graph Embedding Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose several strategies to integrate the strengths of NTPs and KGEs, and demonstrate substantial improvements in both accuracy and computational efficiency. |
Xuanming Cui; Chionh Wei Peng; Adriel Kuek; Ser-Nam Lim; |
178 | On The Robustness of Reward Models for Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its effectiveness, reward models (RMs) trained with BT model loss as one-way classifiers are prone to over-optimization, losing generalizability to unseen inputs. In this paper, we study the cause of over-optimization and its downstream effects on the RLHF procedure, highlighting the importance of robustness in RMs. |
Jiwoo Hong; Noah Lee; Eunki Kim; Guijin Son; Woojin Chung; Aman Gupta; Shao Tang; James Thorne; |
179 | SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents SANA-1.5, a linear Diffusion Transformer for efficient scaling in text-to-image generation. |
Enze Xie; Junsong Chen; Yuyang Zhao; Jincheng YU; Ligeng Zhu; Yujun Lin; Zhekai Zhang; Muyang Li; Junyu Chen; Han Cai; Bingchen Liu; Daquan Zhou; Song Han; |
180 | Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel framework for inference-time reward optimization with diffusion models. |
Masatoshi Uehara; Xingyu Su; Yulai Zhao; Xiner Li; Aviv Regev; Shuiwang Ji; Sergey Levine; Tommaso Biancalani; |
181 | Agent-as-a-Judge: Evaluate Agents with Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These approaches either focus exclusively on final outcomes—ignoring the step-by-step nature of the thinking done by agentic systems—or require excessive manual labour. To address this, we introduce the **Agent-as-a-Judge** framework, wherein agentic systems are used to evaluate agentic systems. |
Mingchen Zhuge; Changsheng Zhao; Dylan R. Ashley; Wenyi Wang; Dmitrii Khizbullin; Yunyang Xiong; Zechun Liu; Ernie Chang; Raghuraman Krishnamoorthi; Yuandong Tian; Yangyang Shi; Vikas Chandra; Jürgen Schmidhuber; |
182 | OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose OTTER, a novel VLA architecture that leverages these existing alignments through explicit, text-aware visual feature extraction. |
Huang Huang; Fangchen Liu; Letian Fu; Tingfan Wu; Mustafa Mukadam; Jitendra Malik; Ken Goldberg; Pieter Abbeel; |
183 | Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we formally prove a lower bound on errors of per-feature SANNs, whereas group-based SANNs can achieve zero error and thus high performance. |
Weiqiu You; Helen Qu; Marco Gatti; Bhuvnesh Jain; Eric Wong; |
184 | EvoPress: Accurate Dynamic Model Compression Via Evolutionary Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We begin from the motivating observation that this independence assumption does not generally hold for LLM compression: pruning a model further may even significantly recover performance. To address this, we propose EvoPress, a novel evolutionary framework for dynamic LLM compression. |
Oliver Sieberling; Denis Kuznedelev; Eldar Kurtic; Dan Alistarh; |
185 | SafeArena: Evaluating The Safety of Autonomous Web Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To systematically assess their susceptibility to harmful tasks, we introduce the Agent Risk Assessment framework that categorizes agent behavior across four risk levels. |
Ada Defne Tur; Nicholas Meade; Xing Han Lù; Alejandra Zambrano; Arkil Patel; Esin DURMUS; Spandana Gella; Karolina Stanczak; Siva Reddy; |
186 | Low-Rank Adapting Models for Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent works have improved SAEs using language model gradients, but these techniques require many expensive backward passes during training and still cause a significant increase in cross entropy loss when SAE reconstructions are inserted into the model. In this work, we improve on these limitations by taking a fundamentally different approach: we use low-rank adaptation (LoRA) to finetune the *language model itself* around a previously trained SAE. |
Matthew Chen; Joshua Engels; Max Tegmark; |
187 | QuanONet: Quantum Neural Operator with Application to Differential Equation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Learning-based methods including neural operators, have emerged as a promising paradigm. We explore its quantum counterpart, and propose QuanONet — a quantum neural operator which has not been well studied in literature compared with their counterparts in other machine learning areas. |
Ruocheng Wang; Zhuo Xia; Ge Yan; Junchi Yan; |
188 | Mitigating Object Hallucination in Large Vision-Language Models Via Image-Grounded Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches require either costly training or fine-tuning, or API access to proprietary LLMs for post-generation correction. In response to these limitations, we propose Mitigating hallucinAtion via image-gRounded guIdaNcE (MARINE), a framework that is both training-free and API-free. |
Linxi Zhao; Yihe Deng; Weitong Zhang; Quanquan Gu; |
189 | Does Generation Require Memorization? Creative Diffusion Models Using Ambient Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first provide theoretical evidence that memorization in diffusion models is only necessary for denoising problems at low noise scales (usually used in generating high-frequency details). Using this theoretical insight, we propose a simple, principled method to train the diffusion models using noisy data at large noise scales. |
Kulin Shah; Alkis Kalavasis; Adam Klivans; Giannis Daras; |
190 | Idiosyncrasies in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we unveil and study idiosyncrasies in Large Language Models (LLMs) — unique patterns in their outputs that can be used to distinguish the models. |
Mingjie Sun; Yida Yin; Zhiqiu Xu; J Zico Kolter; Zhuang Liu; |
191 | BiMark: Unbiased Multilayer Watermarking for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve these goals, the key challenge lies in balancing the trade-off between text quality preservation and message embedding capacity. To address this challenge, we propose BiMark, a novel watermarking framework that achieves these requirements through three key innovations: (1) a bit-flip unbiased reweighting mechanism enabling model-agnostic detection, (2) a multilayer architecture enhancing detectability without compromising generation quality, and (3) an information encoding approach supporting multi-bit watermarking. |
Xiaoyan Feng; He Zhang; Yanjun Zhang; Leo Yu Zhang; Shirui Pan; |
192 | Understanding High-Dimensional Bayesian Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that maximum likelihood estimation (MLE) of GP length scales suffices for state-of-the-art performance. Based on this, we propose a simple variant of MLE called MSR that leverages these findings to achieve state-of-the-art performance on a comprehensive set of real-world applications. |
Leonard Papenmeier; Matthias Poloczek; Luigi Nardi; |
193 | Loss Functions and Operators Generated By F-Divergences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to construct new convex loss functions based on $f$-divergences. |
Vincent Roulet; Tianlin Liu; Nino Vieillard; Michael Eli Sander; Mathieu Blondel; |
194 | What If We Recaption Billions of Web Images with LLaMA-3? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, large-scale investigations in this area remain predominantly closed-source. Our paper aims to bridge this community effort, leveraging the powerful and $\textit{open-sourced}$ LLaMA-3, a GPT-4 level LLM. |
Xianhang Li; Haoqin Tu; Mude Hui; Zeyu Wang; Bingchen Zhao; Junfei Xiao; Sucheng Ren; Jieru Mei; Qing Liu; Huangjie Zheng; Yuyin Zhou; Cihang Xie; |
195 | DOLPHIN: A Programmable Framework for Scalable Neurosymbolic Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Neurosymbolic learning enables the integration of symbolic reasoning with deep learning but faces significant challenges in scaling to complex symbolic programs, large datasets, or both. We introduce DOLPHIN, a framework that tackles these challenges by supporting neurosymbolic programs in Python, executing complex symbolic reasoning on the CPU while vectorizing probabilistic computations and gradient propagation on the GPU. |
Aaditya Naik; Jason Liu; Claire Wang; Amish Sethi; Saikat Dutta; Mayur Naik; Eric Wong; |
196 | Lexico: Extreme KV Cache Compression Via Sparse Coding Over Universal Dictionaries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Lexico, a novel KV cache compression method that leverages sparse coding with a universal dictionary. |
Junhyuck Kim; Jongho Park; Jaewoong Cho; Dimitris Papailiopoulos; |
197 | Beyond Atoms: Enhancing Molecular Pretrained Representations with 3D Space Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first present a simple yet insightful observation: naively adding randomly sampled virtual points beyond atoms can surprisingly enhance MPR performance. In light of this, we propose a principled framework that incorporates the entire 3D space spanned by molecules. |
Shuqi Lu; Xiaohong Ji; Bohang Zhang; Lin Yao; Siyuan Liu; Zhifeng Gao; Linfeng Zhang; Guolin Ke; |
198 | PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we introduce PPDiff, a diffusion model to jointly design the sequence and structure of binders for arbitrary protein targets in a non-autoregressive manner. |
Zhenqiao Song; Tianxiao Li; Lei Li; Martin Renqiang Min; |
199 | EditLord: Learning Code Transformation Rules for Code Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce EditLord, a code editing framework that makes the code transformation steps explicit. |
Weichen Li; Albert Jan; Baishakhi Ray; Junfeng Yang; Chengzhi Mao; Kexin Pei; |
200 | LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present LightningDrag, which achieves high-quality drag-based editing in about one second on general images. |
Yujun Shi; Jun Hao Liew; Hanshu Yan; Vincent Y. F. Tan; Jiashi Feng; |
201 | Highly Compressed Tokenizer Can Generate Without Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the expressivity of the 1D tokenizer’s latent space, we construct an image generation pipeline leveraging gradient-based test-time optimization of tokens with plug-and-play loss functions such as reconstruction or CLIP similarity. |
Lukas Lao Beyer; Tianhong Li; Xinlei Chen; Sertac Karaman; Kaiming He; |
202 | Is Noise Conditioning Necessary for Denoising Generative Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a mathematical analysis of the error introduced by removing noise conditioning and demonstrate that our analysis aligns with empirical observations. |
Qiao Sun; Zhicheng Jiang; Hanhong Zhao; Kaiming He; |
203 | MetaAgent: Automatically Constructing Multi-Agent Systems Based on Finite State Machines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose \textbf{MetaAgent}, a \textbf{finite state machine} based framework that can automatically generate a multi-agent system. |
Yaolun Zhang; Xiaogeng Liu; Chaowei Xiao; |
204 | Do Bayesian Neural Networks Actually Behave Like Bayesian Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We empirically investigate how well popular approximate inference algorithms for Bayesian Neural Networks (BNNs) respect the theoretical properties of Bayesian belief updating. |
Gábor Pituk; Vik Shirvaikar; Tom Rainforth; |
205 | Rethinking Aleatoric and Epistemic Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify incoherence in existing discussions of these ideas and suggest this stems from the aleatoric-epistemic view being insufficiently expressive to capture all the distinct quantities that researchers are interested in. To address this we present a decision-theoretic perspective that relates rigorous notions of uncertainty, predictive performance and statistical dispersion in data. |
Freddie Bickford Smith; Jannik Kossen; Eleanor Trollope; Mark van der Wilk; Adam Foster; Tom Rainforth; |
206 | Step-DAD: Semi-Amortized Policy-Based Bayesian Experimental Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a semi-amortized, policy-based, approach to Bayesian experimental design (BED) called Stepwise Deep Adaptive Design (Step-DAD). |
Marcel Hedman; Desi R. Ivanova; Cong Guan; Tom Rainforth; |
207 | Agent Workflow Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, humans can flexibly solve complex tasks by learning reusable task workflows from past experiences and using them to guide future actions. To build agents that can similarly benefit from this process, we introduce Agent Workflow Memory (AWM), a method for inducing commonly reused routines, i.e., workflows, and selectively providing workflows to the agent to guide subsequent generations. |
Zora Zhiruo Wang; Jiayuan Mao; Daniel Fried; Graham Neubig; |
208 | ReFocus: Visual Editing As A Chain of Thought for Structured Image Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce ReFocus, a simple yet effective framework that equips multimodal LLMs with the ability to generate “visual thoughts” by performing visual editing on the input image through code, shifting and refining their visual focuses. |
Xingyu Fu; Minqian Liu; Zhengyuan Yang; John Richard Corring; Yijuan Lu; Jianwei Yang; Dan Roth; Dinei Florencio; Cha Zhang; |
209 | Optimizing Test-Time Compute Via Meta Reinforcement Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While current methods mostly do so via fine-tuning on search traces or running RL against the 0/1 outcome reward, do these approaches efficiently utilize test-time compute? Would these approaches continue to scale as the budget improves? In this paper, we try to answer these questions. |
Yuxiao Qu; Matthew Y. R. Yang; Amrith Setlur; Lewis Tunstall; Edward Emanuel Beeching; Ruslan Salakhutdinov; Aviral Kumar; |
210 | AAAR-1.0: Assessing AI’s Potential to Assist Research Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce AAAR-1.0, a benchmark dataset designed to evaluate LLM performance in three fundamental, expertise-intensive research tasks: (i) EquationInference, assessing the correctness of equations based on the contextual information in paper submissions; (ii) ExperimentDesign, designing experiments to validate research ideas and solutions; and (iii) PaperWeakness, identifying weaknesses in paper submissions. |
Renze Lou; Hanzi Xu; Sijia Wang; Jiangshu Du; Ryo Kamoi; Xiaoxin Lu; Jian Xie; Yuxuan Sun; Yusen Zhang; Jihyun Janice Ahn; Hongchao Fang; Zhuoyang Zou; Wenchao Ma; Xi Li; Kai Zhang; Congying Xia; Lifu Huang; Wenpeng Yin; |
211 | Differentiable Structure Learning with Ancestral Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we identify two key issues: the non-equivalence of relaxed characterizations for representing path existence and order violations among paths during optimization. In response, we propose a binary-masked characterization method and an order-guided optimization strategy, tailored to address these challenges. |
Taiyu Ban; Changxin Rong; Xiangyu Wang; Lyuzhou Chen; Xin Wang; Derui Lyu; Qinrui Zhu; Huanhuan Chen; |
212 | MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce **MME-CoT**, a specialized benchmark evaluating the CoT reasoning performance of LMMs, spanning six domains: math, science, OCR, logic, space-time, and general scenes. |
Dongzhi Jiang; Renrui Zhang; Ziyu Guo; Yanwei Li; Yu Qi; Xinyan Chen; Liuhui Wang; Jianhan Jin; Claire Guo; Shen Yan; Bo Zhang; Chaoyou Fu; Peng Gao; Hongsheng Li; |
213 | Learn from Downstream and Be Yourself in Multimodal Large Language Models Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions, based on frozen pre-trained weight magnitude and accumulated fine-tuning gradient values. |
Wenke Huang; Jian Liang; Zekun Shi; Didi Zhu; Guancheng Wan; He Li; Bo Du; Dacheng Tao; Mang Ye; |
214 | Discovering Physics Laws of Dynamical Systems Via Invariant Function Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, we demonstrate the discovery of ideal pendulum’s natural motion $\alpha^2 \sin{\theta_t}$ by observing pendulum dynamics in different environments, such as the damped environment $\alpha^2 \sin(\theta_t) – \rho \omega_t$ and powered environment $\alpha^2 \sin(\theta_t) + \rho \frac{\omega_t}{\left|\omega_t\right|}$. Here, we formulate this problem as an *invariant function learning* task and propose a new method, known as **D**isentanglement of **I**nvariant **F**unctions (DIF), that is grounded in causal analysis. |
Shurui Gui; Xiner Li; Shuiwang Ji; |
215 | Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present joint scaling laws for dense and MoE models, incorporating key factors such as the number of active parameters, dataset size, and the number of experts. |
Jan Ludziejewski; Maciej Pióro; Jakub Krajewski; Maciej Stefaniak; Michał Krutul; Jan Małaśnicki; Marek Cygan; Piotr Sankowski; Kamil Adamczewski; Piotr Miłoś; Sebastian Jaszczur; |
216 | How Far Is Video Generation from World Model: A Physical Law Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we evaluate across three key scenarios: in-distribution, out-of-distribution, and combinatorial generalization. |
Bingyi Kang; Yang Yue; Rui Lu; Zhijie Lin; Yang Zhao; Kaixin Wang; Gao Huang; Jiashi Feng; |
217 | KernelBench: Can LLMs Write Efficient GPU Kernels? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new evaluation metric $\text{fast}_p$, which measures the percentage of generated kernels that are functionally correct and offer a speedup greater than an adjustable threshold $p$ over baseline. |
Anne Ouyang; Simon Guo; Simran Arora; Alex L Zhang; William Hu; Christopher Re; Azalia Mirhoseini; |
218 | Unifying Specialized Visual Encoders for Video Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose MERV, a Multi-Encoder Video Representation, which utilizes multiple encoders for a comprehensive video representation. |
Jihoon Chung; Tyler Zhu; Max Gonzalez Saez-Diez; Juan Carlos Niebles; Honglu Zhou; Olga Russakovsky; |
219 | SHARP-Distill: A 68× Faster Recommender System with Hypergraph Neural Networks and Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes SHARP-Distill (\textbf{S}peedy \textbf{H}ypergraph \textbf{A}nd \textbf{R}eview-based \textbf{P}ersonalised \textbf{Distill}ation), a novel knowledge distillation approach based on the teacher-student framework that combines Hypergraph Neural Networks (HGNNs) with language models to enhance recommendation quality while significantly improving inference time. |
Saman Forouzandeh; Parham Moradi; Mahdi Jalili; |
220 | Balancing The Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel theoretical framework for analyzing generalization in imbalanced classification. |
Corinna Cortes; Anqi Mao; Mehryar Mohri; Yutao Zhong; |
221 | David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Diff-Instruct*(DI*), a data-efficient post-training approach to one-step text-to-image generative models to improve its human preferences without requiring image data. |
Weijian Luo; colin zhang; Debing Zhang; Zhengyang Geng; |
222 | LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce LLM-SRBench, a comprehensive benchmark with 239 challenging problems across four scientific domains specifically designed to evaluate LLM-based scientific equation discovery methods while preventing trivial memorization. |
Parshin Shojaee; Ngoc-Hieu Nguyen; Kazem Meidani; Amir Barati Farimani; Khoa D Doan; Chandan K. Reddy; |
223 | Instruction-Following Pruning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we move beyond the traditional static pruning approach of determining a fixed pruning mask for a model, and propose a dynamic approach to structured pruning. |
Bairu Hou; Qibin Chen; Jianyu Wang; Guoli Yin; Chong Wang; Nan Du; Ruoming Pang; Shiyu Chang; Tao Lei; |
224 | Overtrained Language Models Are Harder to Fine-Tune Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models are pre-trained on ever-growing token budgets under the assumption that better pre-training performance translates to improved downstream models. In this work, we challenge this assumption and show that extended pre-training can make models harder to fine-tune, leading to degraded final performance. |
Jacob Mitchell Springer; Sachin Goyal; Kaiyue Wen; Tanishq Kumar; Xiang Yue; Sadhika Malladi; Graham Neubig; Aditi Raghunathan; |
225 | FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, VAR encounters two primary challenges: (1) its complex and rigid scale design limits generalization in next scale prediction, and (2) the generator’s dependence on a discrete tokenizer with the same complex scale structure restricts modularity and flexibility in updating the tokenizer. To address these limitations, we introduce FlowAR, a general next scale prediction method featuring a streamlined scale design, where each subsequent scale is simply double the previous one. |
Sucheng Ren; Qihang Yu; Ju He; Xiaohui Shen; Alan Yuille; Liang-Chieh Chen; |
226 | BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, real-world applications demand streaming inference. To address these challenges, we propose a flow matching based streaming binaural speech synthesis framework called BinauralFlow. |
Susan Liang; Dejan Markovic; Israel D. Gebru; Steven Krenn; Todd Keebler; Jacob Sandakly; Frank Yu; Samuel Hassel; Chenliang Xu; Alexander Richard; |
227 | Outsourced Diffusion Sampling: Efficient Posterior Inference in Latent Spaces of Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In such a model (*eg*, a VAE, GAN, or continuous-time flow-based model), sampling of the target variable $\mathbf{x} \sim p_\theta(\mathbf{x})$ is straightforward, but sampling from a posterior distribution of the form $p(\mathbf{x}\mid\mathbf{y}) \propto p_\theta(\mathbf{x})r(\mathbf{x},\mathbf{y})$, where $r$ is a constraint function depending on an auxiliary variable $\mathbf{y}$, is generally intractable. We propose to amortize the cost of sampling from such posterior distributions with diffusion models that sample a distribution in the noise space ($\mathbf{z}$). |
Siddarth Venkatraman; Mohsin Hasan; Minsu Kim; Luca Scimeca; Marcin Sendera; Yoshua Bengio; Glen Berseth; Nikolay Malkin; |
228 | On Path to Multimodal Generalist: General-Level and General-Bench Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this project, we introduce an evaluation framework to delineate the capabilities and behaviors of current multimodal generalists.To evaluate the comprehensive abilities of various generalists, we present a massive multimodal benchmark, **General-Bench**, which encompasses a broader spectrum of skills, modalities, formats, and capabilities, including over 700 tasks and 325,800 instances. |
Hao Fei; Yuan Zhou; Juncheng Li; Xiangtai Li; Qingshan Xu; Bobo Li; Shengqiong Wu; Yaoting Wang; Junbao Zhou; Jiahao Meng; Qingyu Shi; Zhiyuan Zhou; Liangtao Shi; Minghe Gao; Daoan Zhang; Zhiqi Ge; Siliang Tang; Kaihang Pan; Yaobo Ye; Haobo Yuan; Tao Zhang; Weiming Wu; Tianjie Ju; Zixiang Meng; Shilin Xu; Liyu Jia; Wentao Hu; Meng Luo; Jiebo Luo; Tat-Seng Chua; Shuicheng YAN; Hanwang Zhang; |
229 | LLMs Can Reason Faster Only If We Let Them Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Aimed at bridging the solution length gap between CoT and AoT, this paper introduces AoT-O3, which combines supervised finetuning on AoT-style plans with a reinforcement learning (RL) framework designed to reduce solution length. |
Bilgehan Sel; Lifu Huang; Naren Ramakrishnan; Ruoxi Jia; Ming Jin; |
230 | Sparse Video-Gen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a training-free framework termed Sparse VideoGen (SVG) that leverages the inherent sparsity in 3D full attention to boost inference efficiency. |
Haocheng Xi; Shuo Yang; Yilong Zhao; Chenfeng Xu; Muyang Li; Xiuyu Li; Yujun Lin; Han Cai; Jintao Zhang; Dacheng Li; Jianfei Chen; Ion Stoica; Kurt Keutzer; Song Han; |
231 | CPCF: A Cross-Prompt Contrastive Framework for Referring Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these models often suffer from suboptimal performance due to incorrect responses tailored to misleading areas adjacent to or similar to the target region. This work introduces CPCF, a novel framework to address this issue and achieve superior results. |
Lanyun Zhu; Deyi Ji; Tianrun Chen; Haiyang Wu; De Wen Soh; Jun Liu; |
232 | Faster and Stronger: When ANN-SNN Conversion Meets Parallel Spiking Calculation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel parallel conversion learning framework, which establishes a mathematical mapping relationship between each time-step of the parallel spiking neurons and the cumulative spike firing rate. |
Zecheng Hao; Qichao Ma; Kang Chen; Yi Zhang; Zhaofei Yu; Tiejun Huang; |
233 | Synthesizing Software Engineering Data in A Test-Driven Manner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce **SWE-Flow**, a novel data synthesis framework grounded in Test-Driven Development (TDD).To facilitate further research, we release all code, datasets, models, and Docker images at [Github](https://github.com/Hambaobao/SWE-Flow). |
Lei Zhang; Jiaxi Yang; Min Yang; Jian Yang; Mouxiang Chen; Jiajun Zhang; Zeyu Cui; Binyuan Hui; Junyang Lin; |
234 | Blink of An Eye: A Simple Theory for Feature Localization in Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This phenomenon is not unique to autoregressive models: in diffusion models, key features of the final output are decided in narrow “critical windows” of the generation process. In this work we develop a simple, unifying theory to explain this phenomenon. |
Marvin Li; Aayush Karan; Sitan Chen; |
235 | Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, noise is pervasive in real-world scenarios, leading to a significant degradation in performance. To tackle this problem, we propose a novel multi-view clustering framework for the automatic identification and rectification of noisy data, termed AIRMVC. |
Xihong Yang; Siwei Wang; Fangdi Wang; Jiaqi Jin; Suyuan Liu; Yue Liu; En Zhu; Xinwang Liu; Yueming Jin; |
236 | EAGLES: Towards Effective, Efficient, and Economical Federated Graph Learning Via Unified Sparsification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Federated Graph Learning (FGL) has gained significant attention as a privacy-preserving approach to collaborative learning, but the computational demands increase substantially as datasets grow and Graph Neural Network (GNN) layers deepen. To address these challenges, we propose $\textbf{EAGLES}$, a unified sparsification framework. |
Zitong Shi; Guancheng Wan; Wenke Huang; Guibin Zhang; He Li; Carl Yang; Mang Ye; |
237 | Safety-Polarized and Prioritized Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable the application of our algorithm to large-scale experiments, we introduce two key techniques: \emph{safety polarization} and \emph{safety prioritized experience replay}. |
Ke Fan; Jinpeng Zhang; Xuefeng Zhang; Yunze Wu; Jingyu Cao; Yuan Zhou; Jianzhu Ma; |
238 | On Teacher Hacking in Language Model Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate whether a similar phenomenon, that we call teacher hacking, can occur during knowledge distillation. |
Daniil Tiapkin; Daniele Calandriello; Johan Ferret; Sarah Perrin; Nino Vieillard; Alexandre Rame; Mathieu Blondel; |
239 | Joint Learning of Energy-based Models and Their Partition Function Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel min-min formulation for approximately learning probabilistic EBMs in combinatorially-large discrete spaces, such as sets or permutations. |
Michael Eli Sander; Vincent Roulet; Tianlin Liu; Mathieu Blondel; |
240 | Behavioral Exploration: Learning to Explore Via In-Context Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Taking inspiration from recent progress on both in-context learning and large-scale behavioral cloning, in this work we propose behavioral exploration: training agents to internalize what it means to explore and adapt in-context over the space of ”expert” behaviors. |
Andrew Wagenmaker; Zhiyuan Zhou; Sergey Levine; |
241 | Mastering Massive Multi-Task Reinforcement Learning Via Mixture-of-Expert Decision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first revisit the key impact of task numbers on current MTRL method, and further reveal that naively expanding the parameters proves insufficient to counteract the performance degradation as the number of tasks escalates. Building upon these insights, we propose M3DT, a novel mixture-of-experts (MoE) framework that tackles task scalability by further unlocking the model’s parameter scalability. |
Yilun Kong; Guozheng Ma; Qi Zhao; Haoyu Wang; Li Shen; Xueqian Wang; Dacheng Tao; |
242 | LangTime: A Language-Guided Unified Model for Time Series Forecasting with Proximal Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve autoregressive forecasting, we introduce TimePPO, a reinforcement learning-based fine-tuning algorithm. |
Wenzhe Niu; Zongxia Xie; Yanru Sun; Wei He; Man Xu; Chao Hao; |
243 | Dendritic Localized Learning: Toward Biologically Plausible Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although various alternative learning approaches have been proposed to address these issues, most either fail to satisfy all three criteria simultaneously or yield suboptimal results. Inspired by the dynamics and plasticity of pyramidal neurons, we propose Dendritic Localized Learning (DLL), a novel learning algorithm designed to overcome these challenges. |
Changze Lv; Jingwen Xu; Yiyang Lu; Xiaohua Wang; Zhenghua Wang; Zhibo Xu; Di Yu; Xin Du; Xiaoqing Zheng; Xuanjing Huang; |
244 | TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, the capabilities of LoRA-tuned diffusion models are limited, since the same LoRA is used for different timesteps of the diffusion process. To tackle this problem, we introduce a general and concise TimeStep Master (TSM) paradigm with two key fine-tuning stages. |
Shaobin Zhuang; Yiwei Guo; Yanbo Ding; Kunchang Li; Xinyuan Chen; Yaohui Wang; Fangyikang Wang; Ying Zhang; Chen Li; Yali Wang; |
245 | Bifurcate Then Alienate: Incomplete Multi-view Clustering Via Coupled Distribution Learning with Linear Overhead Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite remarkable advances, existing incomplete multi-view clustering (IMC) methods typically leverage either perspective-shared or perspective-specific determinants to encode cluster representations. To address this limitation, we introduce a BACDL algorithm designed to explicitly capture both concurrently, thereby exploiting heterogeneous data more effectively. |
Shengju Yu; Yiu-ming Cheung; Siwei Wang; Xinwang Liu; En Zhu; |
246 | MCU: An Evaluation Framework for Open-Ended Game Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, evaluating such open-ended agents remains difficult, with current benchmarks facing scalability limitations. To address this, we introduce \textit{Minecraft Universe} (MCU), a comprehensive evaluation framework set within the open-world video game Minecraft. |
Xinyue Zheng; Haowei Lin; Kaichen He; Zihao Wang; QIANG FU; Haobo Fu; Zilong Zheng; Yitao Liang; |
247 | Monte Carlo Tree Diffusion for System 2 Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. |
Jaesik Yoon; Hyeonseo Cho; Doojin Baek; Yoshua Bengio; Sungjin Ahn; |
248 | Self-Consistency Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we extend the self-consistency concept to help train models. |
Archiki Prasad; Weizhe Yuan; Richard Yuanzhe Pang; Jing Xu; Maryam Fazel-Zarandi; Mohit Bansal; Sainbayar Sukhbaatar; Jason E Weston; Jane Yu; |
249 | Temporal Difference Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Temporal Difference Flows (TD-Flow), which leverages the structure of a novel Bellman equation on probability paths alongside flow-matching techniques to learn accurate GHMs at over 5x the horizon length of prior methods. |
Jesse Farebrother; Matteo Pirotta; Andrea Tirinzoni; Remi Munos; Alessandro Lazaric; Ahmed Touati; |
250 | Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Audio Flamingo 2 (AF2), an Audio-Language Model (ALM) with advanced audio understanding and reasoning capabilities. |
Sreyan Ghosh; Zhifeng Kong; Sonal Kumar; S Sakshi; Jaehyeon Kim; Wei Ping; Rafael Valle; Dinesh Manocha; Bryan Catanzaro; |
251 | UniMoMo: Unified Generative Modeling of 3D Molecules for De Novo Binder Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce **Uni**fied generative **Mo**deling of 3D **Mo**lecules (UniMoMo), the first framework capable of designing binders of multiple molecular domains using a single model. |
Xiangzhe Kong; Zishen Zhang; Ziting Zhang; Rui Jiao; Jianzhu Ma; Wenbing Huang; Kai Liu; Yang Liu; |
252 | Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To rigorously assess dictionary quality learned by SAEs, we introduce two new benchmarks that test (i) plausibility, if dictionaries recover “true” classification directions and (ii) identifiability, if dictionaries disentangle synthetic concept mixtures. |
Thomas Fel; Ekdeep Singh Lubana; Jacob S. Prince; Matthew Kowal; Victor Boutin; Isabel Papadimitriou; Binxu Wang; Martin Wattenberg; Demba E. Ba; Talia Konkle; |
253 | DyPolySeg: Taylor Series-Inspired Dynamic Polynomial Fitting Network for Few-shot Point Cloud Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, existing methods using DGCNN as the backbone have limited geometric structure modeling capabilities and struggle to bridge the categorical information gap between query and support sets. To address these challenges, we propose DyPolySeg, a pre-training-free Dynamic Polynomial fitting network for few-shot point cloud semantic segmentation. |
Changshuo Wang; Xiang Fang; Prayag Tiwari; |
254 | MIB: A Mechanistic Interpretability Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In pursuit of lasting evaluation standards, we propose MIB, a Mechanistic Interpretability Benchmark, with two tracks spanning four tasks and five models. |
Aaron Mueller; Atticus Geiger; Sarah Wiegreffe; Dana Arad; Iván Arcuschin; Adam Belfki; Yik Siu Chan; Jaden Fried Fiotto-Kaufman; Tal Haklay; Michael Hanna; Jing Huang; Rohan Gupta; Yaniv Nikankin; Hadas Orgad; Nikhil Prakash; Anja Reusch; Aruna Sankaranarayanan; Shun Shao; Alessandro Stolfo; Martin Tutek; Amir Zur; David Bau; Yonatan Belinkov; |
255 | PRIME: Deep Imbalanced Regression with Proxies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing work has focused on classification, leaving imbalanced regression underexplored despite its importance in many applications. To address this gap, we propose PRIME, a framework that leverages learnable proxies to construct a balanced and well-ordered feature space for imbalanced regression. |
Jongin Lim; Sucheol Lee; Daeho Um; Sung-Un Park; Jinwoo Shin; |
256 | WMAdapter: Adding WaterMark Control to Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose WMAdapter, a diffusion model watermark plugin that embeds user-specified watermark information seamlessly during the diffusion generation process. |
Hai Ci; Yiren Song; Pei Yang; Jinheng Xie; Mike Zheng Shou; |
257 | Certified Unlearning for Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel method for certified machine unlearning, leveraging the connection between unlearning and privacy amplification by stochastic post-processing. |
Anastasia Koloskova; Youssef Allouah; Animesh Jha; Rachid Guerraoui; Sanmi Koyejo; |
258 | Adversaries Can Misuse Combinations of Safe Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Developers try to evaluate whether an AI system can accomplish malicious tasks before releasing it; for example, they might test whether a model enables cyberoffense, user manipulation, or bioterrorism. In this work, we show that individually testing models for such misuse is inadequate; adversaries can misuse combinations of models even when each individual model is safe. |
Erik Jones; Anca Dragan; Jacob Steinhardt; |
259 | Understanding Synthetic Context Extension Via Retrieval Heads Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate fine-tuning on synthetic data for three long-context tasks that require retrieval and reasoning. |
Xinyu Zhao; Fangcong Yin; Greg Durrett; |
260 | Adjoint Sampling: Highly Scalable Diffusion Samplers Via Adjoint Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Adjoint Sampling, a highly scalable and efficient algorithm for learning diffusion processes that sample from unnormalized densities, or energy functions. |
Aaron J Havens; Benjamin Kurt Miller; Bing Yan; Carles Domingo-Enrich; Anuroop Sriram; Daniel S. Levine; Brandon M Wood; Bin Hu; Brandon Amos; Brian Karrer; Xiang Fu; Guan-Horng Liu; Ricky T. Q. Chen; |
261 | Spatial Reasoning with Denoising Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Spatial Reasoning Models (SRMs), a framework to perform reasoning over sets of continuous variables via denoising generative models.To measure this, we introduce a set of benchmark tasks that test the quality of complex reasoning in generative models and can quantify hallucination. |
Christopher Wewer; Bartlomiej Pogodzinski; Bernt Schiele; Jan Eric Lenssen; |
262 | Contrastive Localized Language-Image Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To support large-scale pre-training, we design a visually-enriched and spatially-localized captioning framework to effectively generate region-text labels. |
Hong-You Chen; Zhengfeng Lai; Haotian Zhang; Xinze Wang; Marcin Eichner; Keen You; Meng Cao; Bowen Zhang; Yinfei Yang; Zhe Gan; |
263 | Implicit Degree Bias in The Link Prediction Task Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a degree-corrected link prediction benchmark that offers a more reasonable assessment and better aligns with the performance on the recommendation task. |
Rachith Aiyappa; Xin Wang; Munjung Kim; Ozgur Can Seckin; Yong-Yeol Ahn; Sadamori Kojaku; |
264 | Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, little is known about the mechanisms that enable this generalization or how they are learned during pretraining. We introduce extractive structures as a framework for describing how components in LMs (e.g., MLPs or attention heads) coordinate to enable this generalization. |
Jiahai Feng; Stuart Russell; Jacob Steinhardt; |
265 | Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Alongside PhyGenBench, we propose a novel evaluation framework called PhyGenEval.We will release the data and codes at https://github.com/OpenGVLab/PhyGenBench |
Fanqing Meng; Jiaqi Liao; Xinyu Tan; Quanfeng Lu; Wenqi Shao; Kaipeng Zhang; Yu Cheng; Dianqi Li; Ping Luo; |
266 | The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Beyond empirical analysis, we further provide a theoretical foundation by proving that, under mild conditions, the increased energy loss reduces the upper bound of contextual relevance in LLMs, which is a critical aspect of reward hacking as the reduced contextual relevance typically indicates overfitting to reward model-favored patterns in RL. To address this issue, we propose an *Energy loss-aware PPO algorithm (EPPO)* which penalizes the increase in energy loss in the LLM’s final layer during reward calculation to prevent excessive energy loss, thereby mitigating reward hacking. |
Yuchun Miao; Sen Zhang; Liang Ding; Yuqi Zhang; Lefei Zhang; Dacheng Tao; |
267 | COKE: Core Kernel for More Efficient Approximation of Kernel Weights in Multiple Kernel Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a core kernel construction method based on singular value decomposition and prove that it satisfies the definition of the core kernel for three mainstream MKC algorithms. |
Weixuan Liang; Xinwang Liu; KE LIANG; Jiyuan Liu; En Zhu; |
268 | Contrastive Private Data Synthesis Via Weighted Multi-PLM Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods relying on pre-trained models for data synthesis often struggle in data-deficient scenarios, suffering from limited sample size, inevitable generation noise and existing pre-trained model bias. To address these challenges, we propose a novel contr**A**stive private data **S**ynthesis via **W**eighted multiple **P**re-trained generative models framework, named as **WASP**. |
Tianyuan Zou; Yang Liu; Peng Li; Yufei Xiong; Jianqing Zhang; Jingjing Liu; Xiaozhou Ye; Ye Ouyang; Ya-Qin Zhang; |
269 | Teaching Language Models to Critique Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study LLM critics for code generation and propose $\texttt{CTRL}$, a framework for $\texttt{C}$ritic $\texttt{T}$raining via $\texttt{R}$einforcement $\texttt{L}$earning, which trains a critic model to generate feedback that maximizes correction performance for a fixed generator model without human supervision. |
Zhihui Xie; Jie chen; Liyu Chen; Weichao Mao; Jingjing Xu; Lingpeng Kong; |
270 | Physics Aware Neural Networks for Unsupervised Binding Energy Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the problem, we propose an efficient, unsupervised protein-ligand binding energy prediction model via the conservation of energy (CEBind), which follows the physical laws. |
Ke Liu; Hao Chen; Chunhua Shen; |
271 | Be Confident: Uncovering Overfitting in MLLM Multi-Task Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Noise Resilient Confidence Alignment to address the challenge of open-response overfitting during multi-task fine-tuning. |
Wenke Huang; Jian Liang; Guancheng Wan; Didi Zhu; He Li; Jiawei Shao; Mang Ye; Bo Du; Dacheng Tao; |
272 | Simplicity Bias and Optimization Threshold in Two-Layer ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It has instead been empirically observed that the trained models go from global minima to spurious local minima of the training loss as the number of training samples becomes larger than some level we call optimization threshold. This paper explores theoretically this phenomenon in the context of two-layer ReLU networks. |
Etienne Boursier; Nicolas Flammarion; |
273 | Efficient Online Reinforcement Learning for Diffusion Policy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce two tractable reweighted loss functions to solve two commonly used policy optimization problems, policy mirror descent and max-entropy policy, resulting in two practical algorithms named Diffusion Policy Mirror Descent (DPMD) and Soft Diffusion Actor-Critic (SDAC). |
Haitong Ma; Tianyi Chen; Kai Wang; Na Li; Bo Dai; |
274 | Inducing, Detecting and Characterising Neural Modules: A Pipeline for Functional Interpretability in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we instead propose an approach to interpretability at the level of functional modularity. |
Anna Soligo; Pietro Ferraro; David Boyle; |
275 | Generalized Interpolating Discrete Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Leveraging a novel diffusion ELBO, we achieve compute-matched state-of-the-art performance in diffusion language modeling. Exploiting GIDD’s flexibility, we explore a hybrid approach combining masking and uniform noise, leading to improved sample quality and unlocking the ability for the model to correct its own mistakes, an area where autoregressive models notoriously have struggled. |
Dimitri von Rütte; Janis Fluri; Yuhui Ding; Antonio Orvieto; Bernhard Schölkopf; Thomas Hofmann; |
276 | Nemotron-CORTEXA: Enhancing LLM Agents for Software Engineering Tasks Via Improved Localization and Solution Diversity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Enhancing LLMs’ performance in these scenarios requires careful consideration of the contextual information provided to the model, optimizing how the model leverages that, and identifying tools that enable more effective navigation of the development environment. To address these challenges, we introduce Nemotron-CORTEXA, an agentic system built on a predefined scaffold that enhances LLMs’ ability to navigate and reason efficiently in complex software engineering contexts. |
Atefeh Sohrabizadeh; Jialin Song; Mingjie Liu; Rajarshi Roy; Chankyu Lee; Jonathan Raiman; Bryan Catanzaro; |
277 | Scaling Test-Time Compute Without Verification or RL Is Suboptimal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we prove that finetuning LLMs with verifier-based (VB) methods based on RL or search is far superior to verifier-free (VF) approaches based on distilling or cloning search traces, given a fixed amount of compute/data budget. |
Amrith Setlur; Nived Rajaraman; Sergey Levine; Aviral Kumar; |
278 | Value-Based Deep RL Scales Predictably Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show predictability of value-based off-policy deep RL. |
Oleh Rybkin; Michal Nauman; Preston Fu; Charlie Victor Snell; Pieter Abbeel; Sergey Levine; Aviral Kumar; |
279 | Reward-Guided Speculative Decoding for Efficient LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). |
Baohao Liao; Yuhui Xu; Hanze Dong; Junnan Li; Christof Monz; Silvio Savarese; Doyen Sahoo; Caiming Xiong; |
280 | Training Flexible Models of Genetic Variant Effects from Functional Annotations Using Accelerated Linear Algebra Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we leverage modern fast linear algebra techniques to develop DeepWAS (Deep genome Wide Association Studies), a method to train large and flexible neural network predictive models to optimize likelihood. |
Alan Nawzad Amin; Andres Potapczynski; Andrew Gordon Wilson; |
281 | Customizing The Inductive Biases of Softmax Attention Using Structured Matrices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, attention uses the same scoring function for all input pairs, without imposing a distance-dependent compute bias for neighboring tokens in the sequence. In this work, we address these shortcomings by proposing new scoring functions based on computationally efficient structured matrices with high ranks, including Block Tensor-Train (BTT) and Multi-Level Low Rank (MLR) matrices. |
Yilun Kuang; Noah Amsel; Sanae Lotfi; Shikai Qiu; Andres Potapczynski; Andrew Gordon Wilson; |
282 | DyCodeEval: Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a dynamic data generation method and conduct extensive empirical studies on two seed datasets involving 18 Code LLMs. |
Simin Chen; Pranav Pusarla; Baishakhi Ray; |
283 | Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Target Concrete Score Matching (TCSM), a novel and versatile objective for training and fine-tuning discrete diffusion models. |
Ruixiang ZHANG; Shuangfei Zhai; Yizhe Zhang; James Thornton; Zijing Ou; Joshua M. Susskind; Navdeep Jaitly; |
284 | DMOSpeech: Direct Metric Optimization Via Distilled Diffusion Model in Zero-Shot Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, existing TTS approaches are limited by non-differentiable components or iterative sampling that prevent true end-to-end optimization with perceptual metrics. We introduce DMOSpeech, a distilled diffusion-based TTS model that uniquely achieves both faster inference and superior performance compared to its teacher model. |
Yinghao Aaron Li; Rithesh Kumar; Zeyu Jin; |
285 | Learning Representations of Instruments for Partial Identification of Treatment Effects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we leverage arbitrary (potentially high-dimensional) instruments to estimate bounds on the conditional average treatment effect (CATE). |
Jonas Schweisthal; Dennis Frauen; Maresa Schröder; Konstantin Hess; Niki Kilbertus; Stefan Feuerriegel; |
286 | Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Can such techniques be used to predict how models will behave on out-of-distribution examples? In this work, we provide a positive answer to this question. |
Jing Huang; Junyi Tao; Thomas Icard; Diyi Yang; Christopher Potts; |
287 | UnHiPPO: Uncertainty-aware Initialization for State Space Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We extend the HiPPO theory with measurement noise and derive an uncertainty-aware initialization for state space model dynamics. |
Marten Lienen; Abdullah Saydemir; Stephan Günnemann; |
288 | Pre-training Auto-regressive Robotic Models with 4D Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce ARM4R, an **A**uto-regressive **R**obotic **M**odel that leverages low-level **4**D **R**epresentations learned from human video data to yield a better pre-trained robotic model. |
Dantong Niu; Yuvan Sharma; Haoru Xue; Giscard Biamby; Junyi Zhang; Ziteng Ji; Trevor Darrell; Roei Herzig; |
289 | VTGaussian-SLAM: RGBD SLAM for Large Scale Scenes with Splatting View-Tied 3D Gaussians Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods cannot scale up to extremely large scenes, due to the inefficient tracking and mapping strategies that need to optimize all 3D Gaussians in the limited GPU memories throughout the training to maintain the geometry and color consistency to previous RGBD observations. To resolve this issue, we propose novel tracking and mapping strategies to work with a novel 3D representation, dubbed view-tied 3D Gaussians, for RGBD SLAM systems. |
Pengchong Hu; Zhizhong Han; |
290 | LLMs on The Line: Data Determines Loss-to-Loss Scaling Laws Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate which factors most strongly influence loss-to-loss scaling. |
Prasanna Mayilvahanan; Thaddäus Wiedemer; Sayak Mallick; Matthias Bethge; Wieland Brendel; |
291 | RATE: Causal Explainability of Reward Models with Imperfect Counterfactuals Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we develop Rewrite-based Attribute Treatment Estimator (RATE) as an effective method for measuring the sensitivity of a reward model to high-level attributes of responses, such as sentiment, helpfulness, or complexity. |
David Reber; Sean M Richardson; Todd Nief; Cristina Garbacea; Victor Veitch; |
292 | Towards Trustworthy Federated Learning with Untrusted Participants Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In a setting where malicious participants may collude with an untrusted server, we propose CafCor, an algorithm that integrates robust gradient aggregation with correlated noise injection, using shared randomness between participants. |
Youssef Allouah; Rachid Guerraoui; John Stephan; |
293 | Effective and Efficient Masked Image Generation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Building upon this insight, we carefully explore the design space of training and sampling, identifying key factors that contribute to both performance and efficiency. Based on the improvements observed during this exploration, we develop our model, referred to as \textbf{eMIGM}. |
Zebin You; Jingyang Ou; Xiaolu Zhang; Jun Hu; JUN ZHOU; Chongxuan Li; |
294 | SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, while uniform-precision quantization is computationally efficient, it often compromises model performance. To address this, we propose SliM-LLM, a salience-driven mixed-precision quantization framework that allocates bit-widths at the group-wise with high accuracy. |
Wei Huang; Haotong Qin; Yangdong Liu; Yawei Li; Qinshuo Liu; Xianglong Liu; Luca Benini; Michele Magno; Shiming Zhang; XIAOJUAN QI; |
295 | Emergent Response Planning in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we argue that large language models (LLMs), though trained to predict only the next token, exhibit emergent planning behaviors: $\textbf{their hidden representations encode future outputs beyond the next token}$. |
Zhichen Dong; Zhanhui Zhou; Zhixuan Liu; Chao Yang; Chaochao Lu; |
296 | Control and Realism: Best of Both Worlds in Layout-to-Image Without Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing works have demonstrated that pre-trained Text-to-Image diffusion models can achieve this goal without training on any specific data; however, they often face challenges with imprecise localization and unrealistic artifacts. Focusing on these drawbacks, we propose a novel training-free method, WinWinLay. |
Bonan Li; Yinhan Hu; Songhua Liu; Xinchao Wang; |
297 | Designing Cyclic Peptides Via Harmonic SDE with Atom-Bond Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These challenges include the scarcity of 3D structural data on target proteins and associated cyclic peptide ligands, the geometric constraints that cyclization imposes, and the involvement of non-canonical amino acids in cyclization. To address the above challenges, we introduce CpSDE, which consists of two key components: AtomSDE, a generative structure prediction model based on harmonic SDE, and ResRouter, a residue type predictor. |
Xiangxin Zhou; Mingyu Li; Yi Xiao; Jiahan Li; Dongyu Xue; Zaixiang Zheng; Jianzhu Ma; Quanquan Gu; |
298 | AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent offline methods like DPO and SimPO bypass reinforcement learning’s complexity, they face critical limitations: DPO relies on static reference models that degrade with policy updates, and SimPO assumes a uniform target reward margin that ignores instance-wise preference strength. We propose AlphaDPO, an adaptive preference optimization framework that dynamically reparameterizes the reference distribution to address these issues. |
Junkang Wu; Xue Wang; Zhengyi Yang; Jiancan Wu; Jinyang Gao; Bolin Ding; Xiang Wang; Xiangnan He; |
299 | 3D Question Answering Via Only 2D Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore how to harness their potential to address 3D scene understanding tasks, using 3D question answering (3D-QA) as a representative example. |
FENGYUN WANG; Sicheng Yu; Jiawei Wu; Jinhui Tang; Hanwang Zhang; Qianru Sun; |
300 | Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To more comprehensively explore the space of heuristics, this paper proposes to use Monte Carlo Tree Search (MCTS) for LLM-based heuristic evolution. |
Zhi Zheng; Zhuoliang Xie; Zhenkun Wang; Bryan Hooi; |
301 | Inverse Bridge Matching Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address it, we propose a novel distillation technique based on the inverse bridge matching formulation and derive the tractable objective to solve it in practice. |
Nikita Gushchin; David Li; Daniil Selikhanovych; Evgeny Burnaev; Dmitry Baranchuk; Alexander Korotin; |
302 | Stealix: Model Stealing Via Prompt Evolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To assess the risks posed by open-source pre-trained models, we propose a more realistic threat model that eliminates the need for prompt design skills or knowledge of class names. |
Zhixiong Zhuang; Hui-Po Wang; Maria-Irina Nicolae; Mario Fritz; |
303 | DINO-WM: World Models on Pre-trained Visual Features Enable Zero-shot Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present DINO World Model (DINO-WM), a new method to model visual dynamics without reconstructing the visual world. |
Gaoyue Zhou; Hengkai Pan; Yann LeCun; Lerrel Pinto; |
304 | MARS: Unleashing The Power of Variance Reduction for Training Large Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, to unleash the power of variance reduction for efficient training of large models, we propose a unified optimization framework, MARS (**M**ake v**A**riance **R**eduction **S**hine), which reconciles preconditioned gradient methods with variance reduction via a scaled stochastic recursive momentum technique. |
Huizhuo Yuan; Yifeng Liu; Shuang Wu; zhou Xun; Quanquan Gu; |
305 | Mixture of Lookup Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Their large parameter size still limits deployment, and offloading, which load experts into VRAM only when needed, significantly increase inference latency. To address this, we propose Mixture of Lookup Experts (MoLE), a new MoE architecture that is efficient in both communication and VRAM usage. |
Shibo Jie; Yehui Tang; Kai Han; Yitong Li; Duyu Tang; Zhi-Hong Deng; Yunhe Wang; |
306 | Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SOAR, a method that learns program synthesis by integrating language models into a self-improving evolutionary loop. |
Julien Pourcel; Cédric Colas; Pierre-Yves Oudeyer; |
307 | REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: If low attack success under such an objective is taken as a measure of robustness, the true robustness might be grossly overestimated. To alleviate these flaws, we propose an adaptive and semantic optimization problem over the population of responses. |
Simon Geisler; Tom Wollschläger; M. H. I. Abdalla; Vincent Cohen-Addad; Johannes Gasteiger; Stephan Günnemann; |
308 | $\mathcal{V}ista\mathcal{DPO}$: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large Video Models (LVMs) built upon Large Language Models (LLMs) have shown promise in video understanding but often suffer from misalignment with human intuition and video hallucination issues. To address these challenges, we introduce **VistaDPO**, a novel framework for Video Hierarchical Spatial-Temporal Direct Preference Optimization. |
Haojian Huang; Haodong Chen; Shengqiong Wu; Meng Luo; Jinlan Fu; Xinya Du; Hanwang Zhang; Hao Fei; |
309 | Maximum Update Parametrization and Zero-Shot Hyperparameter Transfer for Fourier Neural Operators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, scaling them to handle more complex PDEs requires increasing the number of Fourier modes, which significantly expands the number of model parameters and makes hyperparameter tuning computationally impractical. To address this, we introduce $\mu$**Transfer-FNO**, a zero-shot hyperparameter transfer technique that enables optimal configurations, tuned on smaller FNOs, to be directly applied to billion-parameter FNOs _without_ additional tuning. |
Shanda Li; Shinjae Yoo; Yiming Yang; |
310 | Long-Term TalkingFace Generation Via Motion-Prior Conditional Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these, we introduce the \textbf{M}otion-priors \textbf{C}onditional \textbf{D}iffusion \textbf{M}odel (\textbf{MCDM}), which utilizes both archived and current clip motion priors to enhance motion prediction and ensure temporal consistency.We also introduce the {TalkingFace-Wild} dataset, a multilingual collection of over 200 hours of footage across 10 languages. |
Fei Shen; Cong Wang; Junyao Gao; Qin Guo; Jisheng Dang; Jinhui Tang; Tat-Seng Chua; |
311 | Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, as the environment dynamics change, certain expert states may become inaccessible, rendering their distributions less valuable for imitation. To address this, we propose a novel framework that integrates reward maximization with IfO, employing F-distance regularized policy optimization. |
Zhenghai Xue; Lang Feng; Jiacheng Xu; Kang Kang; Xiang Wen; Bo An; Shuicheng YAN; |
312 | Hierarchical Graph Tokenization for Molecule-Language Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that neglecting the hierarchical information in tokenization will lead to subpar molecule-language alignment and severe hallucination. To address this limitation, we propose HIerarchical GrapH Tokenization (HIGHT). |
Yongqiang Chen; Quanming Yao; Juzheng Zhang; James Cheng; Yatao Bian; |
313 | FlipAttack: Jailbreak LLMs Via Flipping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a simple yet effective jailbreak attack named FlipAttack against black-box LLMs. |
Yue Liu; Xiaoxin He; Miao Xiong; Jinlan Fu; Shumin Deng; YINGWEI MA; Jiaheng Zhang; Bryan Hooi; |
314 | Massive Values in Self-Attention Modules Are The Key to Contextual Knowledge Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large language models (LLMs) have achieved remarkable success in contextual knowledge understanding. In this paper, we show for the first time that these concentrated massive values consistently emerge in specific regions of attention queries (Q) and keys (K) while not having such patterns in values (V) in various modern transformer-based LLMs. |
Mingyu Jin; Kai Mei; Wujiang Xu; Mingjie Sun; Ruixiang Tang; Mengnan Du; Zirui Liu; Yongfeng Zhang; |
315 | High-Dimensional Prediction for Sequential Decision Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give an efficient algorithm for producing multi-dimensional forecasts in an online adversarial environment that have low bias subject to any polynomial number of conditioning events, that can depend both on external context and on our predictions themselves. |
Georgy Noarov; Ramya Ramalingam; Aaron Roth; Stephan Xie; |
316 | TimeBase: The Power of Minimalism in Efficient Long-term Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce TimeBase, an ultra-lightweight network to harness the power of minimalism in LTSF. |
Qihe Huang; Zhengyang Zhou; Kuo Yang; Zhongchao Yi; Xu Wang; Yang Wang; |
317 | Test-Time Preference Optimization: On-the-Fly Alignment Via Iterative Textual Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the recent efforts on test-time scaling, we make the first attempt to propose Test-time Preference Optimization (TPO), a framework that aligns LLM outputs with human preferences during inference, eliminating the need to update model parameters. |
Yafu Li; Xuyang Hu; Xiaoye Qu; Linjie Li; Yu Cheng; |
318 | CodeSteer: Symbolic-Augmented Language Models Via Code/Text Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce CodeSteer, an effective method for guiding LLM code/text generation. |
Yongchao Chen; Yilun Hao; Yueying Liu; Yang Zhang; Chuchu Fan; |
319 | MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, we propose MMedPO, a novel multimodal medical preference optimization approach that considers the clinical relevance of preference samples to enhance Med-LVLM alignment. |
Kangyu Zhu; Peng Xia; Yun Li; Hongtu Zhu; Sheng Wang; Huaxiu Yao; |
320 | On The Power of Context-Enhanced Learning in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formalize a new concept for LLMs, **context-enhanced learning**. |
Xingyu Zhu; Abhishek Panigrahi; Sanjeev Arora; |
321 | Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: But they underperform on multi-step visual reasoning—even compared to LLMs on the same tasks presented in text form—giving rise to perceptions of *modality imbalance* or *brittleness*. Towards a systematic study of such issues, we introduce a synthetic framework for assessing the ability of VLMs to perform algorithmic visual reasoning, comprising three tasks: Table Readout, Grid Navigation, and Visual Analogy. |
Simon Park; Abhishek Panigrahi; Yun Cheng; Dingli Yu; Anirudh Goyal; Sanjeev Arora; |
322 | The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we mathematically show that a sufficiently low expected test error of the reward model guarantees low worst-case regret, but that for any *fixed* expected test error, there exist realistic data distributions that allow for error-regret mismatch to occur. |
Lukas Fluri; Leon Lang; Alessandro Abate; Patrick Forré; David Krueger; Joar Max Viktor Skalse; |
323 | Do Multiple Instance Learning Models Transfer? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe a substantial performance boost with finetuning pretrained models over training from randomly initialized weights, even with domain differences between pretraining and target tasks. |
Daniel Shao; Richard J. Chen; Andrew H. Song; Joel Runevic; Ming Y. Lu; Tong Ding; Faisal Mahmood; |
324 | Sundial: A Family of Highly Capable Time Series Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Sundial, a family of native, flexible, and scalable time series foundation models. |
Yong Liu; Guo Qin; Zhiyuan Shi; Zhi Chen; Caiyin Yang; Xiangdong Huang; Jianmin Wang; Mingsheng Long; |
325 | An All-Atom Generative Model for Designing Protein Complexes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite these developments, the study and modeling of multi-chain proteins remain largely uncharted, though they are vital for understanding biological functions. Recognizing the importance of these interactions, we introduce APM (all-Atom Protein generative Model), a model specifically designed for modeling multi-chain proteins. |
Ruizhe Chen; Dongyu Xue; Xiangxin Zhou; Zaixiang Zheng; xiangxiang Zeng; Quanquan Gu; |
326 | SToFM: A Multi-scale Foundation Model for Spatial Transcriptomics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This process requires integrating macro-scale tissue morphology, micro-scale cellular microenvironment, and gene-scale gene expression profile. To address this challenge, we propose **SToFM**, a multi-scale **S**patial **T**ranscript**o**mics **F**oundation **M**odel. |
Suyuan Zhao; YIZHEN LUO; Ganbo Yang; Yan Zhong; Hao Zhou; Zaiqing Nie; |
327 | Unifying 2D and 3D Vision-Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel language-conditioned mask decoder shared across 2D and 3D modalities to ground objects effectively in both RGB and RGB-D images, outperforming box-based approaches. |
Ayush Jain; Alexander Swerdlow; Yuzhou Wang; Sergio Arnaud; Ada Martin; Alexander Sax; Franziska Meier; Katerina Fragkiadaki; |
328 | Hi-Patch: Hierarchical Patch GNN for Irregular Multivariate Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing time series multi-scale analysis methods treat all variables in the same manner, making them unsuitable for Irregular Multivariate Time Series (IMTS), where variables have distinct origin scales/sampling rates. To fill this gap, we propose Hi-Patch, a hierarchical patch graph network. |
Yicheng Luo; Bowen Zhang; Zhen Liu; Qianli Ma; |
329 | AssistanceZero: Scalably Solving Assistance Games Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the first scalable approach to solving assistance games and apply it to a new, challenging Minecraft-based assistance game with over $10^{400}$ possible goals. |
Cassidy Laidlaw; Eli Bronstein; Timothy Guo; Dylan Feng; Lukas Berglund; Justin Svegliato; Stuart Russell; Anca Dragan; |
330 | Decision Mixer: Integrating Long-term and Local Dependencies Via Dynamic Token Selection for Decision-Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Decision Mixer (DM), which addresses the conflict between features of different scales in the modeling process from the perspective of dynamic integration. |
Hongling Zheng; Li Shen; Yong Luo; Deheng Ye; Bo Du; Jialie Shen; Dacheng Tao; |
331 | Falcon: Fast Visuomotor Policies Via Partial Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing acceleration techniques either require retraining or degrade performance under low sampling steps. Here we propose Falcon, which mitigates this speed-performance trade-off and achieves further acceleration. |
Haojun Chen; Minghao Liu; Chengdong Ma; Xiaojian Ma; Zailin Ma; Huimin Wu; Yuanpei Chen; Yifan Zhong; Mingzhi Wang; Qing Li; Yaodong Yang; |
332 | On The Clean Generalization and Robust Overfitting in Adversarial Training from Two Theoretical Views: Representation Complexity and Training Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the CGRO phenomenon in adversarial training from two views: representation complexity and training dynamics. |
Binghui Li; Yuanzhi Li; |
333 | Nonparametric Modern Hopfield Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a nonparametric interpretation for deep learning compatible modern Hopfield models and utilize this new perspective to debut efficient variants. |
Jerry Yao-Chieh Hu; Bo-Yu Chen; Dennis Wu; Feng Ruan; Han Liu; |
334 | MoH: Multi-Head Attention As Mixture-of-Head Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we upgrade the multi-head attention mechanism, the core of the Transformer model, to reduce computational costs while maintaining or surpassing the previous accuracy level. |
Peng Jin; Bo Zhu; Li Yuan; Shuicheng YAN; |
335 | From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike sequential recommendation, which naturally fits a generative next-item prediction paradigm, it’s hard to formulate CTR models into this paradigm without explicit feature order. Therefore, we propose a novel Supervised Feature Generation framework for CTR models, shifting from the discriminative feature interaction paradigm to the generative feature generation paradigm. |
Mingjia Yin; Junwei Pan; Hao Wang; Ximei Wang; Shangyu Zhang; Jie Jiang; Defu Lian; Enhong Chen; |
336 | FOCoOp: Enhancing Out-of-Distribution Robustness in Federated Prompt Learning for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The inherent in-distribution (ID) data heterogeneity among different clients makes it more challenging to maintain this trade-off. To fill this gap, we introduce a Federated OOD-aware Context Optimization (FOCoOp) framework, which captures diverse distributions among clients using ID global prompts, local prompts, and OOD prompts. |
Xinting Liao; Weiming Liu; Jiaming Qian; Pengyang Zhou; Jiahe Xu; Wenjie Wang; Chaochao Chen; Xiaolin Zheng; Tat-Seng Chua; |
337 | Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an algorithm namely robust contextual dueling bandits ($\texttt{RCDB}$), which is based on uncertainty-weighted maximum likelihood estimation. |
Qiwei Di; Jiafan He; Quanquan Gu; |
338 | Self-Play $Q$-Learners Can Provably Collude in The Iterated Prisoner’s Dilemma Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A growing body of computational studies shows that simple machine learning agents converge to cooperative behaviors in social dilemmas, such as collusive price-setting in oligopoly markets, raising questions about what drives this outcome. In this work, we provide theoretical foundations for this phenomenon in the context of self-play multi-agent Q-learners in the iterated prisoner’s dilemma. |
Quentin Bertrand; Juan Agustin Duque; Emilio Calvano; Gauthier Gidel; |
339 | ShieldAgent: Shielding Agents Via Verifiable Safety Policy Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More critically, existing guardrails for LLMs are not applicable due to the complex and dynamic nature of agents. To tackle these challenges, we propose ShieldAgent, the first guardrail agent designed to enforce explicit safety policy compliance for the action trajectory of other protected agents through logical reasoning. |
Zhaorun Chen; Mintong Kang; Bo Li; |
340 | Learning Parametric Distributions from Samples and Preferences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Leveraging the hard constraints revealed by deterministic preferences, we propose an estimator achieving an estimation error scaling of $\mathcal{O}(1/n)$—a significant improvement over the $\Theta(1/\sqrt{n})$ rate attainable with samples alone. |
Marc Jourdan; Gizem Yüce; Nicolas Flammarion; |
341 | Observation Interference in Partially Observable Assistance Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study partially observable assistance games (POAGs), a model of the human-AI value alignment problem which allows the human and the AI assistant to have partial observations. |
Scott Emmons; Caspar Oesterheld; Vincent Conitzer; Stuart Russell; |
342 | Wasserstein Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Wasserstein Policy Optimization (WPO), an actor-critic algorithm for reinforcement learning in continuous action spaces. |
David Pfau; Ian Davies; Diana L Borsa; João Guilherme Madeira Araújo; Brendan Daniel Tracey; Hado van Hasselt; |
343 | Eliciting Language Model Behaviors with Investigator Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of behavioral elicitation, where the goal is to search for prompts that induce specific target behaviors (e.g., hallucinations, harmful responses) from a target language model. |
Xiang Lisa Li; Neil Chowdhury; Daniel D. Johnson; Tatsunori Hashimoto; Percy Liang; Sarah Schwettmann; Jacob Steinhardt; |
344 | G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce **G-Sim**, a hybrid framework that automates simulator construction by synergizing LLM-driven structural design with rigorous empirical calibration. |
Samuel Holt; Max Ruiz Luyten; Antonin Berthon; Mihaela van der Schaar; |
345 | Safe-EF: Error Feedback for Non-smooth Constrained Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We advance our understanding of EF in the canonical non-smooth convex setting by establishing new lower complexity bounds for first-order algorithms with contractive compression. Next, we propose Safe-EF, a novel algorithm that matches our lower bound (up to a constant) while enforcing safety constraints essential for practical applications. |
Rustem Islamov; Yarden As; Ilyas Fatkhullin; |
346 | Federated Disentangled Tuning with Textual Prior Decoupling and Visual Dynamic Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 2) Visual Feature Diversity: The diversity of visual features makes it challenging to leverage naive image features directly for image-text alignment in downstream tasks. In this work, we propose Federated Disentangled Tuning with Textual Prior Decoupling and Visual Dynamic Adaptation (FedDDA) to overcome the above limitations. |
Yihao Yang; Wenke Huang; Guancheng Wan; Bin Yang; Mang Ye; |
347 | InfAlign: Inference-aware Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a framework for inference-aware alignment (InfAlign), which aims to optimize *inference-time win rate* of the aligned policy against the base model. |
Ananth Balashankar; Ziteng Sun; Jonathan Berant; Jacob Eisenstein; Michael Collins; Adrian Hutter; Jong Lee; Chirag Nagpal; Flavien Prost; Aradhana Sinha; Ananda Theertha Suresh; Ahmad Beirami; |
348 | Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we start from a new perspective to excavate the reason behind the failure generalization in AIGI detection, named the asymmetry phenomenon, where a naively trained detector tends to favor overfitting to the limited and monotonous fake patterns, causing the feature space to become highly constrained and low-ranked, which is proved seriously limiting the expressivity and generalization. |
Zhiyuan Yan; Jiangming Wang; Peng Jin; Ke-Yue Zhang; Chengchun Liu; Shen Chen; Taiping Yao; Shouhong Ding; Baoyuan Wu; Li Yuan; |
349 | MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present MimicMotion, a framework for generating high-quality human videos of arbitrary length using motion guidance. |
Yuang Zhang; Jiaxi Gu; Li-Wen Wang; Han Wang; JunqiCheng; Yuefeng Zhu; FangYuan Zou; |
350 | Fast Video Generation with Sliding Tile Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Diffusion Transformers (DiTs) with 3D full attention power state-of-the-art video generation, but suffer from prohibitive compute cost — when generating just a 5-second 720P video, attention alone takes 800 out of 950 seconds of total inference time. This paper introduces sliding tile attention (STA) to address this challenge. |
Peiyuan Zhang; Yongqi Chen; Runlong Su; Hangliang Ding; Ion Stoica; Zhengzhong Liu; Hao Zhang; |
351 | MAGELLAN: Metacognitive Predictions of Learning Progress Guide Autotelic LLM Agents in Large Goal Spaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MAGELLAN, a metacognitive framework that lets LLM agents learn to predict their competence and learning progress online. |
Loris Gaven; Thomas Carta; Clément ROMAC; Cédric Colas; sylvain lamprier; Olivier Sigaud; Pierre-Yves Oudeyer; |
352 | DCTdiff: Intriguing Properties of Image Generative Modeling in The DCT Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper explores image modeling from the frequency space and introduces DCTdiff, an end-to-end diffusion generative paradigm that efficiently models images in the discrete cosine transform (DCT) space. |
Mang Ning; Mingxiao Li; Jianlin Su; Jia Haozhe; Lanmiao Liu; Martin Benes; Wenshuo Chen; Albert Ali Salah; Itir Onal Ertugrul; |
353 | How to Synthesize Text Data Without Model Collapse? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on two questions: what is the impact of synthetic data on language model training, and how to synthesize data without model collapse? |
Xuekai Zhu; Daixuan Cheng; Hengli Li; Kaiyan Zhang; Ermo Hua; Xingtai Lv; Ning Ding; Zhouhan Lin; Zilong Zheng; Bowen Zhou; |
354 | Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Unisolver, a novel Transformer model trained on diverse data and conditioned on diverse PDEs, aiming towards a universal neural PDE solver capable of solving a wide scope of PDEs. |
Hang Zhou; Yuezhou Ma; Haixu Wu; Haowen Wang; Mingsheng Long; |
355 | SPMC: Self-Purifying Federated Backdoor Defense Via Margin Contribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These attacks exploit FL decentralized nature, while existing defenses, based on isolated behaviors and fixed rules, can be bypassed by adaptive attackers. To address these limitations, we propose **SPMC**, a marginal collaboration defense mechanism that leverages intrinsic consistency across clients to estimate inter-client marginal contributions. This allows the system to dynamically reduce the influence of clients whose behavior deviates from the collaborative norm, thus maintaining robustness even as the number of attackers changes. |
Wenwen He; Wenke Huang; Bin Yang; ShuKan Liu; Mang Ye; |
356 | STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We design the Self-play Theorem Prover (STP) that simultaneously takes on two roles, conjecturer and prover, each providing training signals to the other. |
Kefan Dong; Tengyu Ma; |
357 | Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent studies have attempted to incorporate camera control into the generation process, but their results are often limited to simple trajectories or lack the ability to generate consistent videos from multiple distinct camera paths for the same scene. To address these limitations, we introduce Cavia, a novel framework for camera-controllable, multi-view video generation, capable of converting an input image into multiple spatiotemporally consistent videos. |
Dejia Xu; Yifan Jiang; Chen Huang; Liangchen Song; Thorsten Gernoth; Liangliang Cao; Zhangyang Wang; Hao Tang; |
358 | Deep Electromagnetic Structure Design Under Limited Evaluation Budgets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing methods commonly employ high-quality predictors or generators to alleviate evaluations, they are often data-intensive and struggle with real-world scale and budget constraints. To address this, we propose a novel method called Progressive Quadtree-based Search (PQS). |
Shijian Zheng; Fangxiao Jin; Shuhai Zhang; Quan Xue; Mingkui Tan; |
359 | Self-Supervised Transformers As Iterative Solution Improvers for Constraint Satisfaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a Transformer-based framework for Constraint Satisfaction Problems (CSPs). |
Yudong Xu; Wenhao Li; Scott Sanner; Elias Boutros Khalil; |
360 | Do Vision-Language Models Really Understand Visual Language? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, recent studies seem to suggest that Large Vision-Language Models (LVLMs) can even tackle complex reasoning tasks involving diagrams. In this paper, we investigate this phenomenon by developing a comprehensive test suite to evaluate the diagram comprehension capability of LVLMs. |
Yifan Hou; Buse Giledereli; Yilei Tu; Mrinmaya Sachan; |
361 | The Role of Sparsity for Length Generalization in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new theoretical framework to study length generalization for the next-token prediction task, as performed by decoder-only transformers. |
Noah Golowich; Samy Jelassi; David Brandfonbrener; Sham M. Kakade; Eran Malach; |
362 | Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that entropy regularization mitigates NC to improve generalization, while a fixed Simplex ETF projector enforces NC for better detection. Based on these insights, we propose a method to control NC at different DNN layers. |
Md Yousuf Harun; Jhair Gallardo; Christopher Kanan; |
363 | MASS: Mathematical Data Selection Via Skill Graphs for Pretraining Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce MASS, a Mathematical data Selection framework using the Skill graph for pretraining LLMs in the mathematical reasoning domain. |
Jiazheng Li; Lu Yu; Qing Cui; Zhiqiang Zhang; JUN ZHOU; Yanfang Ye; Chuxu Zhang; |
364 | DAMA: Data- and Model-aware Alignment of Multi-modal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Data- and Model-aware DPO (DAMA) to dynamically adjust the optimization process from two key aspects: (1) a data-aware strategy that incorporates data hardness, and (2) a model-aware strategy that integrates real-time model responses. |
Jinda Lu; Junkang Wu; Jinghan Li; Xiaojun Jia; Shuo Wang; YiFan Zhang; Junfeng Fang; Xiang Wang; Xiangnan He; |
365 | Teaching Transformers Causal Reasoning Through Axiomatic Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since interventional data is costly to generate, we study to what extent an agent can learn causal reasoning from passive data. |
Aniket Vashishtha; Abhinav Kumar; Atharva Pandey; Abbavaram Gowtham Reddy; Kabir Ahuja; Vineeth N. Balasubramanian; Amit Sharma; |
366 | Test-Time Training Provably Improves Transformers As In-context Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we provide a comprehensive theoretical characterization of linear transformers when the update rule is a single gradient step. Our theory (i) delineates the role of alignment between pretraining distribution and target task, (ii) demystifies how TTT can alleviate distribution shift, and (iii) quantifies the sample complexity of TTT including how it can significantly reduce the eventual sample size required for in-context learning. |
Halil Alperen Gozeten; Muhammed Emrullah Ildiz; Xuechen Zhang; Mahdi Soltanolkotabi; Marco Mondelli; Samet Oymak; |
367 | MMInference: Accelerating Pre-filling for Long-Context Visual Language Models Via Modality-Aware Permutation Sparse Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the quadratic attention complexity during the pre-filling phase remains a significant obstacle to real-world deployment. To overcome this limitation, we introduce MMInference (Multimodality Million tokens Inference), a dynamic sparse attention method that accelerates the prefilling stage for long-context multi-modal inputs. |
Yucheng Li; Huiqiang Jiang; Chengruidong Zhang; Qianhui Wu; Xufang Luo; Surin Ahn; Amir H. Abdi; Dongsheng Li; Jianfeng Gao; Yuqing Yang; Lili Qiu; |
368 | Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce \emph{preference embedding}, an approach that embeds responses into a latent space to capture intricate preference structures efficiently, achieving linear query complexity. |
Yifan Zhang; Ge Zhang; Yue Wu; Kangping Xu; Quanquan Gu; |
369 | $S^2$FGL: Spatial Spectral Federated Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the challenges, we propose a global knowledge repository to mitigate label signal disruption and a frequency alignment to address spectral client drifts. |
Zihan Tan; Suyuan Huang; Guancheng Wan; Wenke Huang; He Li; Mang Ye; |
370 | Measuring Diversity in Synthetic Datasets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce DCScore, a novel method for measuring synthetic dataset diversity from a classification perspective. |
Yuchang Zhu; Huizhe Zhang; Bingzhe Wu; Jintang Li; Zibin Zheng; Peilin Zhao; Liang Chen; Yatao Bian; |
371 | Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Orient Anything, the first foundation model for zero-shot object orientation estimation. |
Zehan Wang; Ziang Zhang; Tianyu Pang; Chao Du; Hengshuang Zhao; Zhou Zhao; |
372 | Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study how unlabeled offline trajectory data can be leveraged to learn efficient exploration strategies. |
Max Wilcoxson; Qiyang Li; Kevin Frans; Sergey Levine; |
373 | Bootstrapping Self-Improvement of Language Model Programs for Zero-Shot Schema Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose Matchmaker – a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring. |
Nabeel Seedat; Mihaela van der Schaar; |
374 | RISE: Radius of Influence Based Subgraph Extraction for 3D Molecular Graph Explanation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While existing methods have primarily focused on explaining molecular substructures in 2D GNNs, the transition to 3D GNNs introduces unique challenges, such as handling the implicit dense edge structures created by a cutoff radius. To tackle this, we introduce a novel explanation method specifically designed for 3D GNNs, which localizes the explanation to the immediate neighborhood of each node within the 3D space. |
Jingxiang Qu; Wenhan Gao; Jiaxing Zhang; Xufeng Liu; Hua Wei; Haibin Ling; Yi Liu; |
375 | Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify two distinct goals of loss reweighting, namely, Saturation and Importance—the former indicates that those insufficiently optimized data should be emphasized, while the latter stresses some critical data that are most influential for loss minimization. |
Puning Yang; Qizhou Wang; Zhuo Huang; Tongliang Liu; Chengqi Zhang; Bo Han; |
376 | BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is well known that for small models, generating multiple responses and selecting the best can enhance quality while remaining cheaper than a single large-model response. We leverage this idea to propose BEST-Route, a novel routing framework that chooses a model and the number of responses to sample from it based on query difficulty and the quality thresholds. |
Dujian Ding; Ankur Mallick; Shaokun Zhang; Chi Wang; Daniel Madrigal; Mirian Del Carmen Hipolito Garcia; Menglin Xia; Laks V. S. Lakshmanan; Qingyun Wu; Victor Rühle; |
377 | TopoTune: A Framework for Generalized Combinatorial Complex Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, TDL lacks the principled and standardized frameworks that underpin GNN development, restricting its accessibility and applicability. To address this issue, we introduce Generalized CCNNs (GCCNs), a simple yet powerful family of TDL models that can be used to systematically transform any (graph) neural network into its TDL counterpart. |
Mathilde Papillon; Guillermo Bernardez; Claudio Battiloro; Nina Miolane; |
378 | Privacy Amplification By Structured Subsampling for Deep Differentially Private Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we observe in this work that the formal guarantees of DP-SGD are incompatible with time series specific tasks like forecasting, since they rely on the *privacy amplification* attained by training on small, unstructured batches sampled from an unstructured dataset. |
Jan Schuchardt; Mina Dalirrooyfard; Jed Guzelkabaagac; Anderson Schneider; Yuriy Nevmyvaka; Stephan Günnemann; |
379 | Merge-Friendly Post-Training Quantization for Multi-Target Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we analyze the impact of quantization on model merging through the lens of error barriers. |
Juncheol Shin; Minsang Seok; Seonggon Kim; Eunhyeok Park; |
380 | The Best of Both Worlds: Bridging Quality and Diversity in Data Selection with Bipartite Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current data selection methods often prioritize one aspect over the other, resulting in suboptimal training outcomes. To address this, we formulate data selection as a set cover problem and present GraphFilter, a novel approach that balances both quality and diversity in data selection. |
Minghao Wu; Thuy-Trang Vu; Lizhen Qu; Gholamreza Haffari; |
381 | Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Aguvis, a unified vision-based framework for autonomous GUI agents that directly operates on screen images, standardizes cross-platform interactions and incorporates structured reasoning via inner monologue. |
Yiheng Xu; Zekun Wang; Junli Wang; Dunjie Lu; Tianbao Xie; Amrita Saha; Doyen Sahoo; Tao Yu; Caiming Xiong; |
382 | A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the mixture of LoRAs (MoE-LoRA) still exhibits its low robustness during tuning and inferring. Inspired by the Riemannian Preconditioners which train LoRA as a sub-space projector, we propose a new training strategy for MoE-LoRA, to stabilize and boost its feature learning by gate-rescaled multi-space projections. |
Mengyang Sun; Yihao Wang; Tao Feng; Dan Zhang; Yifan Zhu; Jie Tang; |
383 | Simultaneous Multi-Robot Motion Planning with Projected Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, this work proposes **S**imultaneous **M**RMP **D**iffusion (SMD), a novel approach integrating constrained optimization into the diffusion sampling process to produce collision-free, kinematically feasible trajectories.Additionally, the paper introduces a comprehensive MRMP benchmark to evaluate trajectory planning algorithms across scenarios with varying robot densities, obstacle complexities, and motion constraints. |
Jinhao Liang; Jacob K Christopher; Sven Koenig; Ferdinando Fioretto; |
384 | Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that *sparse coding* offers a compelling alternative for achieving adaptive representation with minimal overhead and higher fidelity. |
Tiansheng Wen; Yifei Wang; Zequn Zeng; Zhong Peng; Yudi Su; Xinyang Liu; Bo Chen; Hongwei Liu; Stefanie Jegelka; Chenyu You; |
385 | ETTA: Elucidating The Design Space of Text-to-Audio Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our contributions include: 1) AF-Synthetic, a large dataset of high quality synthetic captions obtained from an audio understanding model; 2) a systematic comparison of different architectural, training, and inference design choices for TTA models; 3) an analysis of sampling methods and their Pareto curves with respect to generation quality and inference speed. |
Sang-gil Lee; Zhifeng Kong; Arushi Goel; Sungwon Kim; Rafael Valle; Bryan Catanzaro; |
386 | Learning In-context $n$-grams with Transformers: Sub-$n$-grams Are Near-Stationary Points Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we explore the loss landscape of next-token prediction with transformers. |
Aditya Varre; Gizem Yüce; Nicolas Flammarion; |
387 | Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By taking an information-theoretic perspective, we propose the first theoretical framework that enables the quantification of the maximum risk of MLLMs under distribution shifts. |
Changdae Oh; Zhen Fang; Shawn Im; Xuefeng Du; Yixuan Li; |
388 | Universal Length Generalization with Turing Programs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on prior scratchpad and Chain-of-Thought (CoT) techniques, we propose *Turing Programs*, a novel CoT strategy that decomposes an algorithmic task into steps mimicking the computation of a Turing Machine. |
Kaiying Hou; David Brandfonbrener; Sham M. Kakade; Samy Jelassi; Eran Malach; |
389 | Analytical Construction on Geometric Architectures: Transitioning from Static to Temporal Link Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, real-world systems often evolve dynamically, introducing significant challenges in modeling their temporal changes. To overcome this limitation, we propose a unified cross-geometric learning framework for dynamic systems, which synergistically integrates Euclidean and hyperbolic spaces, aligning embedding spaces with structural properties through fine-grained substructure modeling. |
Yadong Sun; Xiaofeng Cao; Ivor Tsang; Heng Tao Shen; |
390 | Federated Incomplete Multi-view Clustering with Globally Fused Graph Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, missing data problem in federated multi-view clustering task is less explored. To address these problems, we propose a novel Federated Incomplete Multi-view Clustering method with globally Fused Graph guidance (FIMCFG). |
Guoqing Chao; Zhenghao Zhang; Lei Meng; Jie Wen; Dianhui Chu; |
391 | Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model Is Secretly A GAN Discriminator Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While likelihood-based generative models, particularly diffusion and autoregressive models, have achieved remarkable fidelity in visual generation, the maximum likelihood estimation (MLE) objective, which minimizes the forward KL divergence, inherently suffers from a mode-covering tendency that limits the generation quality under limited model capacity. In this work, we propose Direct Discriminative Optimization (DDO) as a unified framework that integrates likelihood-based generative training and GAN-type discrimination to bypass this fundamental constraint by exploiting reverse KL and self-generated negative signals. |
Kaiwen Zheng; Yongxin Chen; Huayu Chen; Guande He; Ming-Yu Liu; Jun Zhu; Qinsheng Zhang; |
392 | OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce OWLS, an open-access, reproducible suite of multilingual speech recognition and translation models spanning 0.25B to 18B parameters, with the 18B version being the largest speech model, to the best of our knowledge. |
William Chen; Jinchuan Tian; Yifan Peng; Brian Yan; Chao-Han Huck Yang; Shinji Watanabe; |
393 | Improving Out-of-Distribution Detection Via Dynamic Covariance Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that the influence of ill-distributed samples can be corrected by dynamically adjusting the prior geometry in response to new data. |
Kaiyu Guo; Zijian Wang; Tan Pan; Brian C. Lovell; Mahsa Baktashmotlagh; |
394 | Field Matching: An Electrostatic Paradigm to Generate and Transfer Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Electrostatic Field Matching (EFM), a novel method that is suitable for both generative modelling and distribution transfer tasks. |
Alexander Kolesov; S. I. Manukhov; Vladimir Vladimirovich Palyulin; Alexander Korotin; |
395 | Categorical Schrödinger Bridge Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a theoretical and algorithmic foundation for solving SB in discrete spaces using the recently introduced Iterative Markovian Fitting (IMF) procedure. |
Grigoriy Ksenofontov; Alexander Korotin; |
396 | Quantifying Memory Utilization with Effective State-Size Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As the space of causal sequence modeling architectures continues to grow, the need to develop a general framework for their analysis becomes increasingly important. With this aim, we draw insights from classical signal processing and control theory, to develop a quantitative measure of *memory utilization*: the internal mechanisms through which a model stores past information to produce future outputs. |
Rom Parnichkun; Neehal Tumma; Armin W Thomas; Alessandro Moro; Qi An; Taiji Suzuki; Atsushi Yamashita; Michael Poli; Stefano Massaroli; |
397 | FlexTok: Resampling Images Into 1D Token Sequences of Flexible Length Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce FlexTok, a tokenizer that projects 2D images into variable-length, ordered 1D token sequences. |
Roman Bachmann; Jesse Allardice; David Mizrahi; Enrico Fini; Oğuzhan Fatih Kar; Elmira Amirloo; Alaaeldin El-Nouby; Amir Zamir; Afshin Dehghan; |
398 | GMAIL: Generative Modality Alignment for Generated Image Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework for discriminative use of generated images, coined \textit{GMAIL}, that explicitly treats generated images as a separate modality from real images. |
Shentong Mo; Sukmin Yun; |
399 | Geometry Informed Tokenization of Molecules for Language Model Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although tokenization of molecular graphs exists, that for 3D geometries is largely unexplored. Here, we attempt to bridge this gap by proposing a novel method which converts molecular geometries into SE(3)-invariant 1D discrete sequences. |
Xiner Li; Limei Wang; Youzhi Luo; Carl Edwards; Shurui Gui; Yuchao Lin; Heng Ji; Shuiwang Ji; |
400 | Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose **Ca2-VDM**, an efficient autoregressive VDM with **Ca**usal generation and **Ca**che sharing. |
Kaifeng Gao; Jiaxin Shi; Hanwang Zhang; Chunping Wang; Jun Xiao; Long Chen; |
401 | AutoEval Done Right: Using Synthetic Data for Model Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We suggest efficient and statistically principled algorithms for this purpose that improve sample efficiency while remaining unbiased. |
Pierre Boyeau; Anastasios Nikolas Angelopoulos; Tianle Li; Nir Yosef; Jitendra Malik; Michael I. Jordan; |
402 | SUICA: Learning Super-high Dimensional Sparse Implicit Neural Representations for Spatial Transcriptomics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we manage to model ST in a continuous and compact manner by the proposed tool, SUICA, empowered by the great approximation capability of Implicit Neural Representations (INRs) that can enhance both the spatial density and the gene expression. |
Qingtian Zhu; Yumin Zheng; Yuling Sang; Yifan Zhan; Ziyan Zhu; Jun Ding; Yinqiang Zheng; |
403 | Catch Your Emotion: Sharpening Emotion Perception in Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on improving the ability of MLLMs to capture emotions during the inference phase. |
Yiyang Fang; Jian Liang; Wenke Huang; He Li; Kehua Su; Mang Ye; |
404 | CAN: Leveraging Clients As Navigators for Generative Replay in Federated Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the potential benefits that come from emphasizing the role of clients throughout the process. |
Xuankun Rong; Jianshu Zhang; Kun He; Mang Ye; |
405 | Splitting with Importance-aware Updating for Heterogeneous Federated Learning with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key insight is decomposing client updates into consensus and divergence components, enabling the model to maintain core capabilities while adapting to domain-specific knowledge. We propose a novel federated learning framework called **FedICU** (Splitting with **I**mportan**C**e-aware **U**pdating for Heterogeneous **Fed**erated Learning with Large Language Models), which introduces an aggregation mechanism that dynamically balances these components based on their contribution to global model performance, while implementing an importance-aware parameter updating strategy to prevent catastrophic forgetting and domain overfitting. |
Yangxu Liao; Wenke Huang; Guancheng Wan; Jian Liang; Bin Yang; Mang Ye; |
406 | FedPHA: Federated Prompt Learning for Heterogeneous Client Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **Fed**erated **P**rompt Learning for **H**eterogeneous Client **A**daptation (FedPHA), a novel framework that combines a fixed-length global prompt for efficient aggregation with local prompts of varying lengths to capture client-specific data characteristics. |
Chengying Fang; Wenke Huang; Guancheng Wan; Yihao Yang; Mang Ye; |
407 | GHOST: Generalizable One-Shot Federated Graph Learning with Proxy-Based Topology Knowledge Retention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these issues, we introduce **GHOST**, an innovative one-shot FGL framework. In GHOST, we establish a proxy model for each client to leverage diverse local knowledge and integrate it to train the global model. |
Jiaru Qian; Guancheng Wan; Wenke Huang; Guibin Zhang; Yuxin Wu; Bo Du; Mang Ye; |
408 | Great Models Think Alike and This Undermines AI Oversight Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study how model similarity affects both aspects of AI oversight by proposing *Chance Adjusted Probabilistic Agreement (CAPA)*–a metric for LM similarity based on overlap in model mistakes. |
Shashwat Goel; Joschka Strüber; Ilze Amanda Auzina; Karuna K Chandra; Ponnurangam Kumaraguru; Douwe Kiela; Ameya Prabhu; Matthias Bethge; Jonas Geiping; |
409 | Enhancing Foundation Models for Time Series Forecasting Via Wavelet-based Tokenization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Tokenization is a crucial consideration in this effort: what is an effective discrete vocabulary for a real-valued sequential input? To address this question, we develop WaveToken, a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localized frequencies. |
Luca Masserano; Abdul Fatir Ansari; Boran Han; Xiyuan Zhang; Christos Faloutsos; Michael W. Mahoney; Andrew Gordon Wilson; Youngsuk Park; Syama Sundar Rangapuram; Danielle C. Maddix; Bernie Wang; |
410 | Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in The Turnstile Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we give the first sublinear space differentially private algorithms for the fundamental problems of counting distinct elements in the turnstile streaming model. |
Rachel Cummings; Alessandro Epasto; Jieming Mao; Tamalika Mukherjee; Tingting Ou; Peilin Zhong; |
411 | SNS-Bench: Defining, Building, and Assessing Capabilities of Large Language Models in Social Networking Services Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SNS-Bench, specially constructed for assessing the abilities of large language models from different Social Networking Services, with a wide range of SNS-related information. |
Hongcheng Guo; wangyue; Shaosheng Cao; Fei zhao; Boyang Wang; Lei Li; Liang Chen; Xinze Lyu; Zhe Xu; Yao Hu; Zhoujun Li; |
412 | Improving Model Alignment Through Collective Intelligence of Open-Source Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Constructing such datasets is often expensive and hard to scale, and may face potential limitations on diversity and generalization. To address these challenges, we introduce Mixture of Agents Alignment (MoAA), that leverages the collective strengths of various language models to provide high-quality data for model alignment. |
Junlin Wang; Roy Xie; Shang Zhu; Jue WANG; Ben Athiwaratkun; Bhuwan Dhingra; Shuaiwen Leon Song; Ce Zhang; James Zou; |
413 | Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce ERank, Top-k Ranking Correlation and Energy to systematize analysis.We introduce the Agent Compression Benchmark (ACBench), the first comprehensive benchmark for evaluating how compression impacts LLMs’ agentic abilities. |
Peijie Dong; Zhenheng Tang; Xiang Liu; Lujun Li; Xiaowen Chu; Bo Li; |
414 | LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this limitation, we introduce a new task called interactive person re-identification (Inter-ReID).To facilitate the study of this new task, we construct a dialogue dataset that incorporates multiple types of questions by decomposing fine-grained attributes of individuals. |
Yiding Lu; Mouxing Yang; Dezhong Peng; Peng Hu; Yijie Lin; Xi Peng; |
415 | Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the spirit of advancing neural PDE solvers to real industrial applications, we present Transolver++, a highly parallel and efficient neural solver that can accurately solve PDEs on million-scale geometries. |
Huakun Luo; Haixu Wu; Hang Zhou; Lanxiang Xing; Yichen Di; Jianmin Wang; Mingsheng Long; |
416 | Trajectory World Models for Heterogeneous Environments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore pre-training world models for heterogeneous environments by addressing key transfer barriers in both data diversity and model flexibility. |
Shaofeng Yin; Jialong Wu; Siqiao Huang; Xingjian Su; Xu He; Jianye HAO; Mingsheng Long; |
417 | Partially Observable Reinforcement Learning with Memory Traces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce *memory traces*. |
Onno Eberhard; Michael Muehlebach; Claire Vernade; |
418 | You Always Recognize Me (YARM): Robust Texture Synthesis Against Multi-View Corruption Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the use of warning colors and camouflage in the real world, we propose designing a robust appearance that can enhance model recognition of low-quality image data. |
Weihang Ran; Wei Yuan; Yinqiang Zheng; |
419 | ProofAug: Efficient Neural Theorem Proving Via Fine-grained Proof Structure Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, for proof synthesis with LLMs, previous work applies automation tools either only when explicitly invoked by the model or at a single granularity level, failing to fully exploit their power. To solve this issue, we propose ProofAug, a procedure that equips LLMs with automation methods at various granularities through fine-grained structure analysis of model-generated proof proposals. |
Haoxiong Liu; Jiacheng Sun; Zhenguo Li; Andrew C Yao; |
420 | Visual Generation Without Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to build visual models that are free from guided sampling. |
Huayu Chen; Kai Jiang; Kaiwen Zheng; Jianfei Chen; Hang Su; Jun Zhu; |
421 | RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to improve conditional DDPMs for signal restoration by leveraging a more informative prior that is jointly learned with the diffusion model. |
Ching-Hua Lee; Chouchang Yang; Jaejin Cho; Yashas Malur Saidutta; Rakshith Sharma Srinivasa; Yilin Shen; Hongxia Jin; |
422 | Network Sparsity Unlocks The Scaling Potential of Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of pursuing more complex modifications, we show that introducing static network sparsity alone can unlock further scaling potential beyond their dense counterparts with state-of-the-art architectures. |
Guozheng Ma; Lu Li; Zilin Wang; Li Shen; Pierre-Luc Bacon; Dacheng Tao; |
423 | Interpreting The Repeated Token Phenomenon in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This unexplained failure mode represents a *vulnerability*, allowing even end users to diverge models away from their intended behavior. We aim to explain the causes for this phenomenon and link it to the concept of attention sinks, an emergent LLM behavior crucial for fluency, in which the initial token receives disproportionately high attention scores. |
Itay Yona; Ilia Shumailov; Jamie Hayes; Yossi Gandelsman; |
424 | Sanity Checking Causal Representation Learning on A Simple Real-World System Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We evaluate methods for causal representation learning (CRL) on a simple, real-world system where these methods are expected to work. |
Juan L. Gamella; Simon Bing; Jakob Runge; |
425 | Kernel-based Unsupervised Embedding Alignment for Enhanced Visual Representation in Vision-language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel kernel-based method to align CLIP’s visual representation with that of DINOv2, ensuring that the resulting embeddings maintain compatibility with text embeddings while enhancing perceptual capabilities. |
Shizhan Gong; Yankai Jiang; Qi Dou; Farzan Farnia; |
426 | GTR: A General, Multi-View, and Dynamic Framework for Trajectory Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose GTR, a general, multi-view, and dynamic Trajectory Representation framework built on a pre-train and fine-tune architecture. |
Xiangheng Wang; Ziquan Fang; Chenglong Huang; Danlei Hu; Lu Chen; Yunjun Gao; |
427 | Learning to (Learn at Test Time): RNNs with Expressive Hidden States Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a practical framework for instantiating sequence modeling layers with linear complexity and expressive hidden states. |
Yu Sun; Xinhao Li; Karan Dalal; Jiarui Xu; Arjun Vikram; Genghan Zhang; Yann Dubois; Xinlei Chen; Xiaolong Wang; Sanmi Koyejo; Tatsunori Hashimoto; Carlos Guestrin; |
428 | Avoiding Spurious Sharpness Minimization Broadens Applicability of SAM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the discrepancy across domains and find that in the NLP setting, SAM is dominated by regularization of the logit statistics — instead of improving the geometry of the function itself. We use this observation to develop an alternative algorithm we call Functional SAM, which regularizes curvature only through modification of the statistics of the overall function implemented by the neural network, and avoids spurious minimization through logit manipulation. |
Sidak Pal Singh; Hossein Mobahi; Atish Agarwala; Yann Dauphin; |
429 | Mind Your Step (by Step): Chain-of-Thought Can Reduce Performance on Tasks Where Thinking Makes Humans Worse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we seek to identify the characteristics of tasks where CoT reduces performance by drawing inspiration from cognitive psychology, focusing on six representative tasks from the psychological literature where deliberation hurts performance in humans. |
Ryan Liu; Jiayi Geng; Addison J. Wu; Ilia Sucholutsky; Tania Lombrozo; Thomas L. Griffiths; |
430 | AnyEdit: Edit Any Knowledge Encoded in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These limitations arise from their reliance on editing a single token’s hidden state, a limitation we term as “efficacy barrier”. To solve this, we propose \textbf{AnyEdit}, a new autoregressive editing paradigm. |
Houcheng Jiang; Junfeng Fang; Ningyu Zhang; Mingyang Wan; Guojun Ma; Xiang Wang; Xiangnan He; Tat-Seng Chua; |
431 | Average Sensitivity of Hierarchical $k$-Median Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the hierarchical $k$ -median clustering problem, which bridges hierarchical and centroid-based clustering while offering theoretical appeal, practical utility, and improved interpretability. |
Shijie Li; Weiqiang He; Ruobing Bai; Pan Peng; |
432 | The Power of Random Features and The Limits of Distribution-Free Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the relationship between gradient-based optimization of parametric models (e.g., neural networks) and optimization of linear combinations of random features. |
Ari Karchmer; Eran Malach; |
433 | Robust Conformal Outlier Detection Under Contaminated Reference Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This conservativeness, however, typically results in a loss of power. To alleviate this limitation, we propose a novel, active data-cleaning framework that leverages a limited labeling budget and an outlier detection model to selectively annotate data points in the contaminated reference set that are suspected as outliers. |
Meshi Bashari; Matteo Sesia; Yaniv Romano; |
434 | Layer By Layer: Uncovering Hidden Representations in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, our analysis shows that intermediate layers can encode even richer representations, often improving performance on a wide range of downstream tasks. To explain and quantify these hidden-layer properties, we propose a unified framework of representation quality metrics based on information theory, geometry, and invariance to input perturbations. |
Oscar Skean; Md Rifat Arefin; Dan Zhao; Niket Nikul Patel; Jalal Naghiyev; Yann LeCun; Ravid Shwartz-Ziv; |
435 | Theoretical Guarantees on The Best-of-n Alignment Policy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A commonly used analytical expression in the literature claims that the KL divergence between the best-of-$n$ policy and the reference policy is equal to $\log (n) – (n-1)/n.$ We disprove the validity of this claim, and show that it is an upper bound on the actual KL divergence. |
Ahmad Beirami; Alekh Agarwal; Jonathan Berant; Alexander Nicholas D’Amour; Jacob Eisenstein; Chirag Nagpal; Ananda Theertha Suresh; |
436 | Train for The Worst, Plan for The Best: Understanding Token Ordering in Masked Diffusions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we closely examine these two competing effects. On the training front, we theoretically and empirically demonstrate that MDMs indeed train on computationally intractable subproblems compared to their autoregressive counterparts. On the inference front, we show that a suitable strategy for adaptively choosing the token decoding order significantly enhances the capabilities of MDMs, allowing them to sidestep hard subproblems. |
Jaeyeon Kim; Kulin Shah; Vasilis Kontonis; Sham M. Kakade; Sitan Chen; |
437 | In-Context Learning and Occam’s Razor Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. |
Eric Elmoznino; Tom Marty; Tejas Kasetty; Leo Gagnon; Sarthak Mittal; Mahan Fathi; Dhanya Sridhar; Guillaume Lajoie; |
438 | Towards A Formal Theory of Representational Compositionality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, while we have strong intuitions about what compositionality is, we lack satisfying formal definitions for it. Here, we propose such a definition called representational compositionality that is conceptually simple, quantitative, and grounded in algorithmic information theory. |
Eric Elmoznino; Thomas Jiralerspong; Yoshua Bengio; Guillaume Lajoie; |
439 | How to Evaluate and Mitigate IP Infringement in Visual Generative AI? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In detail, we develop a revised generation paradigm that can identify potentially infringing generated content and prevent IP infringement by utilizing guidance techniques during the diffusion process. |
Zhenting Wang; Chen Chen; Vikash Sehwag; Minzhou Pan; Lingjuan Lyu; |
440 | Securing Equal Share: A Principled Approach for Learning Multiplayer Symmetric Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper examines multiplayer symmetric constant-sum games with more than two players in a competitive setting, such as Mahjong, Poker, and various board and video games. |
Jiawei Ge; Yuanhao Wang; Wenzhe Li; Chi Jin; |
441 | Large Continual Instruction Assistant Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a general continual instruction tuning framework to address the challenge. |
Jingyang Qiao; zhizhong zhang; Xin Tan; Yanyun Qu; Shouhong Ding; Yuan Xie; |
442 | The Importance of Being Lazy: Scaling Limits of Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we perform a systematic study on the impact of model scale and the degree of feature learning in continual learning. |
Jacopo Graldi; Alessandro Breccia; Giulia Lanzillotta; Thomas Hofmann; Lorenzo Noci; |
443 | Wyckoff Transformer: Generation of Symmetric Crystals Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this is often inadequately addressed by existing generative models, making the consistent generation of stable and symmetrically valid crystal structures a significant challenge. We introduce WyFormer, a generative model that directly tackles this by formally conditioning on space group symmetry. |
Nikita Kazeev; Wei Nong; Ignat Romanov; Ruiming Zhu; Andrey E Ustyuzhanin; Shuya Yamazaki; Kedar Hippalgaonkar; |
444 | Neurosymbolic World Models for Sequential Decision Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Structured World Modeling for Policy Optimization (SWMPO), a framework for unsupervised learning of neurosymbolic Finite State Machines (FSM) that capture environmental structure for policy optimization. |
Leonardo Hernandez Cano; Maxine Perroni-Scharf; Neil Dhir; Arun Ramamurthy; Armando Solar-Lezama; |
445 | Geometry-Informed Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce geometry-informed neural networks (GINNs) — a framework for training shape-generative neural fields without data by leveraging user-specified design requirements in the form of objectives and constraints. |
Arturs Berzins; Andreas Radler; Eric Volkmann; Sebastian Sanokowski; Sepp Hochreiter; Johannes Brandstetter; |
446 | Stochastic Forward–Backward Deconvolution: Training Diffusion Models with Finite Noisy Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, through the lens of deconvolution theory, we show that although it is theoretically feasible to learn the data distribution from noisy samples, the practical challenge of collecting sufficient samples makes successful learning nearly unattainable. To overcome this limitation, we propose to pretrain the model with a small fraction of clean data to guide the deconvolution process. |
Haoye Lu; Qifan Wu; Yaoliang Yu; |
447 | Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we examine \textit{logical preference consistency} as a foundational requirement for building more dependable LLM systems, ensuring stable and coherent decision-making while minimizing erratic or contradictory outputs. |
Yinhong Liu; Zhijiang Guo; Tianya Liang; Ehsan Shareghi; Ivan Vulić; Nigel Collier; |
448 | Think Smarter Not Harder: Adaptive Reasoning with Inference Aware Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a way to allow models to be aware of inference budgets by formulating it as utility maximization with respect to an inference budget constraint, hence naming our algorithm Inference Budget-Constrained Policy Optimization (IBPO). |
Zishun Yu; Tengyu Xu; Di Jin; Karthik Abinav Sankararaman; Yun He; Wenxuan Zhou; Zhouhao Zeng; Eryk Helenowski; Chen Zhu; Sinong Wang; Hao Ma; Han Fang; |
449 | Testing The Limits of Fine-Tuning for Improving Visual Cognition in Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In an effort to improve visual cognition and align models with human behavior, we introduce visual stimuli and human judgments on visual cognition tasks, allowing us to systematically evaluate performance across cognitive domains under a consistent environment. |
Luca M. Schulze Buschoff; Konstantinos Voudouris; Elif Akata; Matthias Bethge; Joshua B. Tenenbaum; Eric Schulz; |
450 | CLOVER: Cross-Layer Orthogonal Vectors Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Decoder-only models generate tokens autoregressively by caching key/value vectors, but as the cache grows, inference becomes memory-bounded. To address this challenge, we introduce CLOVER (Cross-Layer Orthogonal Vectors) pruning, a novel approach that treats pairs of components of the attention mechanism as low-rank decompositions. |
Fanxu Meng; Pingzhi Tang; Fan Jiang; Muhan Zhang; |
451 | Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By compressing the spatial size of images, this approach can effectively shorten the token sequence and reduce the computational cost of ViT-like plain architectures. In this work, we aim to thoroughly examine the information loss caused by this patchification-based compressive encoding paradigm and how it affects visual understanding. |
Feng Wang; Yaodong Yu; Wei Shao; Yuyin Zhou; Alan Yuille; Cihang Xie; |
452 | Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Under this constraint, most existing methods for mitigating forgetting are inapplicable. To address this challenge, we propose a *sample weighting scheme for the fine-tuning data* solely based on the pre-trained model’s losses. |
Sunny Sanyal; Hayden Prairie; Rudrajit Das; Ali Kavis; Sujay Sanghavi; |
453 | Cradle: Empowering Foundation Agents Towards General Computer Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Cradle, a modular and flexible LMM-powered framework, as a preliminary attempt towards GCC. |
Weihao Tan; Wentao Zhang; Xinrun Xu; Haochong Xia; Ziluo Ding; Boyu Li; Bohan Zhou; Junpeng Yue; Jiechuan Jiang; Yewen Li; Ruyi An; Molei Qin; Chuqiao Zong; Longtao Zheng; YuJie Wu; Xiaoqiang Chai; Yifei Bi; Tianbao Xie; Pengjie Gu; Xiyun Li; Ceyao Zhang; Long Tian; Chaojie Wang; Xinrun Wang; Börje F. Karlsson; Bo An; Shuicheng YAN; Zongqing Lu; |
454 | BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model incorporating latent thinking processes and evaluation signals. |
Han Zhong; Yutong Yin; Shenao Zhang; Xiaojun Xu; Yuanxin Liu; Yifei Zuo; Zhihan Liu; Boyi Liu; Sirui Zheng; Hongyi Guo; Liwei Wang; Mingyi Hong; Zhaoran Wang; |
455 | DPO Meets PPO: Reinforced Token Optimization for RLHF Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the great successes of PPO in the alignment of state-of-the-art closed-source large language models (LLMs), its open-source implementation is still largely sub-optimal, as widely reported by numerous research studies. To address these issues, we introduce a framework that models RLHF problems as a Markov decision process (MDP), enabling the capture of fine-grained token-wise information. |
Han Zhong; Zikang Shan; Guhao Feng; Wei Xiong; Xinle Cheng; Li Zhao; Di He; Jiang Bian; Liwei Wang; |
456 | Representations Shape Weak-to-Strong Generalization: Theoretical Insights and Empirical Predictions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate W2SG through a theoretical lens and show that it can be characterized using kernels derived from the principal components of weak and strong models’ internal representations. |
Yihao Xue; Jiping Li; Baharan Mirzasoleiman; |
457 | Griffin: Towards A Graph-Centric Relational Database Foundation Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Griffin, the first foundation model attemptation designed specifically for Relational Databases (RDBs). |
Yanbo Wang; Xiyuan Wang; Quan Gan; Minjie Wang; Qibin Yang; David Wipf; Muhan Zhang; |
458 | LLM Data Selection and Utilization Via Dynamic Bi-level Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new Data Weighting Model (DWM) to adjust the weight of selected data within each batch to achieve a dynamic data utilization during LLM training. |
Yang Yu; Kai Han; Hang Zhou; Yehui Tang; Kaiqi Huang; Yunhe Wang; Dacheng Tao; |
459 | Mahalanobis++: Improving OOD Detection Via Feature Normalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While post-hoc methods based on the Mahalanobis distance applied to pre-logit features are among the most effective for ImageNet-scale OOD detection, their performance varies significantly across models. We connect this inconsistency to strong variations in feature norms, indicating severe violations of the Gaussian assumption underlying the Mahalanobis distance estimation. |
Maximilian Müller; Matthias Hein; |
460 | An Augmentation-Aware Theory for Self-Supervised Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fill in the blank, we for the first time propose an augmentation-aware error bound for self-supervised contrastive learning, showing that the supervised risk is bounded not only by the unsupervised risk, but also explicitly by a trade-off induced by data augmentation. |
Jingyi Cui; Hongwei Wen; Yisen Wang; |
461 | Design Considerations in Offline Preference-based RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study how the different design choices made in methods such as DPO, IPO, SLiC and many variants influence the quality of the learned policy, from a theoretical perspective. |
Alekh Agarwal; Christoph Dann; Teodor Vanislavov Marinov; |
462 | The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Analysis of Orthogonal Safety Directions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we discover that safety-aligned behavior is jointly controlled by multi-dimensional directions. |
Wenbo Pan; Zhichao Liu; Qiguang Chen; Xiangyang Zhou; Yu Haining; Xiaohua Jia; |
463 | Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an automated pipeline that leverages the rich resources of the Art of Problem Solving (AoPS) forum, which predominantly features Olympiad-level problems and community-driven solutions. |
Sadegh Mahdavi; Muchen Li; Kaiwen Liu; Christos Thrampoulidis; Leonid Sigal; Renjie Liao; |
464 | QuEst: Enhancing Estimates of Quantile-Based Distributional Measures Using Model Predictions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present QuEst, a principled framework to merge observed and imputed data to deliver point estimates and rigorous confidence intervals for a wide family of quantile-based distributional measures. |
Zhun Deng; Thomas P Zollo; Benjamin Eyre; Amogh Inamdar; David Madras; Richard Zemel; |
465 | Sort Before You Prune: Improved Worst-Case Guarantees of The DiskANN Family of Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we improve and generalize this analysis as follows: – We introduce **sorted** $\alpha$-reachable graphs, and use this notion to obtain a stronger approximation factor of $\frac{\alpha}{\alpha-1}$ for the DiskANN algorithm on Euclidean metrics. – We present the **first** worst-case theoretical analysis for the popular **beam-search** algorithm, which is used in practice to search these graphs for $k > 1$ candidate nearest neighbors. |
Siddharth Gollapudi; Ravishankar Krishnaswamy; Kirankumar Shiragur; Harsh Wardhan; |
466 | SAE-V: Interpreting Multimodal Models for Enhanced Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, extending SAEs to multimodal settings presents new challenges due to modality fusion and the difficulty of isolating cross-modal representations. To address these challenges, we introduce SAE-V, a mechanistic interpretability framework that extends the SAE paradigm to MLLMs. |
Hantao Lou; Changye Li; Jiaming Ji; Yaodong Yang; |
467 | ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose, for the first time, the dual-learning hypothesis, which posits that LLMs simultaneously learn both the task-relevant latent concepts and backdoor latent concepts within poisoned demonstrations, jointly influencing the probability of model outputs. |
Zhiyao Ren; Siyuan Liang; Aishan Liu; Dacheng Tao; |
468 | Understanding Mode Connectivity Via Parameter Space Symmetry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new approach to exploring the connectedness of minima using parameter space symmetry. |
Bo Zhao; Nima Dehmamy; Robin Walters; Rose Yu; |
469 | TUMTraf VideoQA: Dataset and Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present TUMTraf VideoQA, a novel dataset and benchmark designed for spatio-temporal video understanding in complex roadside traffic scenarios. |
Xingcheng Zhou; Konstantinos Larintzakis; Hao Guo; Walter Zimmer; Mingyu Liu; Hu Cao; Jiajie Zhang; Venkatnarayanan Lakshminarasimhan; Leah Strand; Alois Knoll; |
470 | Mixture of Experts Made Intrinsically Interpretable Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of relying on post-hoc methods, we present \textbf{MoE-X}, a mixture-of-experts (MoE) language model designed to be \emph{intrinsically} interpretable. |
Xingyi Yang; Constantin Venhoff; Ashkan Khakzar; Christian Schroeder de Witt; Puneet K. Dokania; Adel Bibi; Philip Torr; |
471 | Normalizing Flows Are Capable Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate that NFs are more powerful than previously believed. |
Shuangfei Zhai; Ruixiang ZHANG; Preetum Nakkiran; David Berthelot; Jiatao Gu; Huangjie Zheng; Tianrong Chen; Miguel Ángel Bautista; Navdeep Jaitly; Joshua M. Susskind; |
472 | LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new KV cache optimization paradigm called LaCache, a training-free method for efficient and accurate generative inference of LLMs. |
Dachuan Shi; Yonggan Fu; Xiangchi Yuan; Zhongzhi Yu; Haoran You; Sixu Li; Xin Dong; Jan Kautz; Pavlo Molchanov; Yingyan Celine Lin; |
473 | From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to A Test Set Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we design a method which specializes a prompted Autorater to a given test set, by leveraging historical ratings on the test set to construct in-context learning (ICL) examples. |
Mara Finkelstein; Daniel Deutsch; Parker Riley; Juraj Juraska; Geza Kovacs; Markus Freitag; |
474 | CoSER: Coordinating LLM-Based Persona Simulation of Established Roles Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present CoSER, a collection of a high-quality dataset, open models, and an evaluation protocol towards effective RPLAs of established characters. |
Xintao Wang; Heng Wang; Yifei Zhang; Xinfeng Yuan; Rui Xu; Jen-tse Huang; Siyu Yuan; Haoran Guo; Jiangjie Chen; Shuchang Zhou; Wei Wang; Yanghua Xiao; |
475 | Test-Time Learning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Test-Time Learning (TTL) paradigm for LLMs, namely TLM, which dynamically adapts LLMs to target domains using only unlabeled test data during testing.We introduce the AdaptEval benchmark for TTL and demonstrate through experiments that TLM improves performance by at least 20% compared to original LLMs on domain knowledge adaptation. |
Jinwu Hu; Zitian Zhang; Guohao Chen; Xutao Wen; Chao Shuai; Wei Luo; Bin Xiao; Yuanqing Li; Mingkui Tan; |
476 | KGMark: A Diffusion Watermark for Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a novel clustering-based alignment method to adapt the watermark to spatial variations. |
Hongrui Peng; Haolang Lu; Yuanlong Yu; WeiYe Fu; Kun Wang; Guoshun Nan; |
477 | Modular Duality in Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: An old idea in optimization theory says that since the gradient is a dual vector it may not be subtracted from the weights without first being mapped to the primal space where the weights reside. We take this idea seriously in this paper and construct such a duality map for general neural networks. |
Jeremy Bernstein; Laker Newhouse; |
478 | Elucidating The Design Space of Multimodal Protein Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we systematically elucidate the design space of multimodal PLMs to overcome their limitations. |
Cheng-Yen Hsieh; Xinyou Wang; Daiheng Zhang; Dongyu Xue; Fei Ye; Shujian Huang; Zaixiang Zheng; Quanquan Gu; |
479 | Visual Autoregressive Modeling for Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Building upon the tremendous success of autoregressive models in the language domain, we propose \textbf{VARSR}, a novel visual autoregressive modeling for ISR framework with the form of next-scale prediction.Furthermore, we collect large-scale data and design a training process to obtain robust generative priors. |
Yunpeng Qu; Kun Yuan; Jinhua Hao; Kai Zhao; Qizhi Xie; Ming Sun; Chao Zhou; |
480 | Concept-Centric Token Interpretation for Vector-Quantized Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Concept-Oriented Token Explanation (CORTEX), a novel approach for interpreting VQGMs by identifying concept-specific token combinations. |
Tianze Yang; Yucheng Shi; Mengnan Du; Xuansheng Wu; Qiaoyu Tan; Jin Sun; Ninghao Liu; |
481 | DreamDPO: Aligning Text-to-3D Generation with Human Preferences Via Direct Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods often struggle to align generated content with human preferences, limiting their applicability and flexibility. To address these limitations, in this paper, we propose DreamDPO, an optimization-based framework that integrates human preferences into the 3D generation process, through direct preference optimization. |
Zhenglin Zhou; Xiaobo Xia; Fan Ma; Hehe Fan; Yi Yang; Tat-Seng Chua; |
482 | Domain-Adapted Diffusion Model for PROTAC Linker Design Through The Lens of Density Ratio in Chemical Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Direct fine-tuning on limited PROTAC datasets often results in overfitting and poor generalization. In this work, we propose DAD-PROTAC, a domain-adapted diffusion model for PROTAC linker design, which addresses this distribution mismatch in chemical space through density ratio estimation to bridge the gap between small-molecule and PROTAC domains. |
Zixing Song; Ziqiao Meng; José Miguel Hernández-Lobato; |
483 | Discovering Latent Causal Graphs from Spatiotemporal Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SPACY (SPAtiotemporal Causal discoverY), a novel framework based on variational inference, designed to model latent time series and their causal relationships from spatiotemporal data. |
Kun Wang; Sumanth Varambally; Duncan Watson-Parris; Yian Ma; Rose Yu; |
484 | Learning Vision and Language Concepts for Controllable Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we establish fundamental conditions for learning atomic multimodal concepts and their underlying interactions With identfiability guarantees. |
Shaoan Xie; Lingjing Kong; Yujia Zheng; Zeyu Tang; Eric P. Xing; Guangyi Chen; Kun Zhang; |
485 | AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the current community suffers from a lack of large-scale datasets with intensive, descriptive emotion annotations, as well as a multimodal-centric framework to maximize the potential of MLLMs for emotion understanding. To address this, we establish a new benchmark for MLLM-based emotion understanding with a novel dataset (MER-Caption) and a new model (AffectGPT). |
Zheng Lian; Haoyu Chen; Lan Chen; Haiyang Sun; Licai Sun; Yong Ren; Zebang Cheng; Bin Liu; Rui Liu; Xiaojiang Peng; Jiangyan Yi; Jianhua Tao; |
486 | OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paradigm shift aims to enable models to predict emotions beyond a fixed label space, accommodating a flexible set of categories to better reflect the nuanced spectrum of human emotions. To achieve this, we propose a novel paradigm: *Open-Vocabulary MER (OV-MER)*, which enables emotion prediction without being confined to predefined spaces. |
Zheng Lian; Haiyang Sun; Licai Sun; Haoyu Chen; Lan Chen; Hao Gu; Zhuofan Wen; Shun Chen; Zhang Siyuan; Hailiang Yao; Bin Liu; Rui Liu; Shan Liang; Ya Li; Jiangyan Yi; Jianhua Tao; |
487 | BiMaCoSR: Binary One-Step Diffusion Model Leveraging Flexible Matrix Compression for Real Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nonetheless, it remains impossible to deploy DM to resource-limited edge devices. To address this problem, we propose BiMaCoSR, which combines binarization and one-step distillation to obtain extreme compression and acceleration. |
Kai Liu; Kaicheng Yang; Zheng Chen; Zhiteng Li; Yong Guo; Wenbo Li; Linghe Kong; Yulun Zhang; |
488 | AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Asymmetric Reduction and Restoration (**AsymRnR**), **a training-free and model-agnostic method to accelerate video DiTs**. |
Wenhao Sun; Rong-Cheng Tu; Jingyi Liao; Zhao Jin; Dacheng Tao; |
489 | Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a practical estimator that decomposes the causal effect into confounded and unconfounded contributions for each intervention variable. |
Armin Kekić; Sergio Hernan Garrido Mejia; Bernhard Schölkopf; |
490 | SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SpeCache, which takes full advantage of the large and easily expandable CPU memory to offload the complete KV cache, and dynamically fetches KV pairs back in each decoding step based on their importance measured by low-precision KV cache copy in VRAM. |
Shibo Jie; Yehui Tang; Kai Han; Zhi-Hong Deng; Jing Han; |
491 | Retrieval-Augmented Perception: High-resolution Image Perception Meets Visual RAG Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose Retrieval-Augmented Perception (RAP), a training-free framework that retrieves and fuses relevant image crops while preserving spatial context using the proposed Spatial-Awareness Layout. |
Wenbin Wang; Yongcheng Jing; Liang Ding; Yingjie Wang; Li Shen; Yong Luo; Bo Du; Dacheng Tao; |
492 | Fast and Low-Cost Genomic Foundation Models Via Outlier Removal Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the challenge of scarce computational resources in genomic modeling, we introduce GERM, a genomic foundation model optimized for accessibility and adaptability. |
Haozheng Luo; Chenghao Qiu; Maojiang Su; Zhihan Zhou; Zoe Mehta; Guo Ye; Jerry Yao-Chieh Hu; Han Liu; |
493 | Multinoulli Extension: A Lossless Yet Effective Probabilistic Framework for Subset Selection Over Partition Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the existing distorted local-search methods are often hindered by their prohibitive query complexities and the rigid requirement for prior knowledge of difficult-to-obtain structural parameters. To overcome these limitations, we introduce a novel algorithm titled **Multinoulli-SCG**, which not only is parameter-free, but also can achieve the same approximation guarantees as the distorted local-search methods with significantly fewer function evaluations. |
Qixin Zhang; Wei Huang; Can Jin; Puning Zhao; Yao Shu; Li Shen; Dacheng Tao; |
494 | TAROT: Targeted Data Selection Via Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose TAROT, a targeted data selection framework grounded in Optimal Transport theory. |
Lan Feng; Fan Nie; Yuejiang Liu; Alexandre Alahi; |
495 | SAN: Hypothesizing Long-Term Synaptic Development and Neural Engram Mechanism in Scalable Model’s Parameter-Efficient Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Starting from drawing insights from Neural Engrams (NE) in Biological Neural Networks (BNNs), we establish a connection between the low-rank property observed during PEFT’s parameter space shifting and neurobiological mechanisms. |
Gaole Dai; Chun-Kai Fan; Yiming Tang; Zhi Zhang; Yuan Zhang; Yulu Gan; Qizhe Zhang; Cheng-Ching Tseng; Shanghang Zhang; Tiejun Huang; |
496 | Reward-Augmented Data Enhances Direct Preference Alignment of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an effective yet simple data relabeling method that conditions the preference pairs on quality scores to construct a reward-augmented dataset. |
Shenao Zhang; Zhihan Liu; Boyi Liu; Yufeng Zhang; Yingxiang Yang; Yongfei Liu; Liyu Chen; Tao Sun; Zhaoran Wang; |
497 | MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we simplify the process of building an MAS by reframing it as a generative language task, where the input is a user query and the output is a corresponding MAS. |
Rui Ye; Shuo Tang; Rui Ge; Yaxin Du; Zhenfei Yin; Siheng Chen; Jing Shao; |
498 | Complex Wavelet Mutual Information Loss: A Multi-Scale Loss Function for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite these advancements, most loss functions are still primarily pixel-wise, while regional and boundary-focused loss functions often incur high computational costs or are restricted to small-scale regions. To address this limitation, we propose the complex wavelet mutual information (CWMI) loss, a novel loss function that leverages mutual information from subband images decomposed by a complex steerable pyramid. |
Renhao Lu; |
499 | Learning to Match Unpaired Data with Minimum Entropy Coupling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method to solve the continuous MEC problem, using well-known generative diffusion models that learn to approximate and minimize the joint Entropy through a cooperative scheme, while satisfying a relaxed version of the marginal constraints. |
Mustapha BOUNOUA; Giulio Franzese; Pietro Michiardi; |
500 | Unlocking The Power of SAM 2 for Few-Shot Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, the FG objects in different frames of SAM 2’s video data are always the same identity, while those in FSS are different identities, i.e., the matching step is incompatible. Therefore, we design Pseudo Prompt Generator to encode pseudo query memory, matching with query features in a compatible way. |
Qianxiong Xu; Lanyun Zhu; Xuanyi Liu; Guosheng Lin; Cheng Long; Ziyue Li; Rui Zhao; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~3,300 papers), please visit Paper Digest: ICML-2025 (Full List).