Paper Digest: ICLR 2025 Papers & Highlights
Note: ICLR-2025 accepts more than 3,700 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 3,700 ICLR-2025 papers in a separate page, which takes quite some time to load.
To search for papers presented at ICLR-2025 on a specific topic, please make use of the search by venue (ICLR-2025) service. To summarize the latest research published at ICLR-2025 on a specific topic, you can utilize the review by venue (ICLR-2025) service. If you are interested in browsing papers by author, we have a comprehensive list of ~ 15,000 authors (ICLR-2025). Additionally, you may want to explore our “Best Paper” Digest (ICLR), which lists the most influential ICLR papers since 2018.
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Try us today and unlock the full potential of our services for free!
TABLE 1: Paper Digest: ICLR 2025 Papers & Highlights
Paper | Author(s) | |
---|---|---|
1 | Scaling In-the-Wild Training for Diffusion-based Illumination Harmonization and Editing By Imposing Consistent Light Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Without appropriate constraints, directly training the latest large image models with complex, varied, or in-the-wild data is likely to produce a structure-guided random image generator, rather than achieving the intended goal of precise illumination manipulation. We propose Imposing Consistent Light (IC-Light) transport during training, rooted in the physical principle that the linear blending of an object’s appearances under different illumination conditions is consistent with its appearance under mixed illumination. |
Lvmin Zhang; Anyi Rao; Maneesh Agrawala; |
2 | What Makes Large Language Models Reason in (Multi-Turn) Code Generation? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus investigate the effects of a wide range of prompting strategies with a focus on automatic re-prompting over multiple turns and computational requirements. |
Kunhao Zheng; Juliette Decugis; Jonas Gehring; Taco Cohen; benjamin negrevergne; Gabriel Synnaeve; |
3 | PolyPythias: Stability and Outliers Across Fifty Language Model Pre-Training Runs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the PolyPythias, a set of 45 new training runs for the Pythia model suite: 9 new seeds across 5 model sizes, from 14M to 410M parameters, resulting in about 7k new checkpoints that we release. |
Oskar van der Wal; Pietro Lesci; Max Müller-Eberstein; Naomi Saphra; Hailey Schoelkopf; Willem Zuidema; Stella Biderman; |
4 | Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce DeCapBench along with a novel metric, DCScore, specifically designed for detailed captioning tasks. |
Qinghao Ye; Xianhan Zeng; Fu Li; Chunyuan Li; Haoqi Fan; |
5 | Transfusion: Predict The Next Token and Diffuse Images with One Multi-Modal Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. |
Chunting Zhou; LILI YU; Arun Babu; Kushal Tirumala; Michihiro Yasunaga; Leonid Shamis; Jacob Kahn; Xuezhe Ma; Luke Zettlemoyer; Omer Levy; |
6 | SAM 2: Segment Anything in Images and Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos. |
Nikhila Ravi; Valentin Gabeur; Yuan-Ting Hu; Ronghang Hu; Chaitanya Ryali; Tengyu Ma; Haitham Khedr; Roman Rädle; Chloe Rolland; Laura Gustafson; Eric Mintun; Junting Pan; Kalyan Vasudev Alwala; Nicolas Carion; Chao-Yuan Wu; Ross Girshick; Piotr Dollar; Christoph Feichtenhofer; |
7 | OLMoE: Open Mixture-of-Experts Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). |
Niklas Muennighoff; Luca Soldaini; Dirk Groeneveld; Kyle Lo; Jacob Morrison; Sewon Min; Weijia Shi; Evan Pete Walsh; Oyvind Tafjord; Nathan Lambert; Yuling Gu; Shane Arora; Akshita Bhagia; Dustin Schwenk; David Wadden; Alexander Wettig; Binyuan Hui; Tim Dettmers; Douwe Kiela; Ali Farhadi; Noah A. Smith; Pang Wei Koh; Amanpreet Singh; Hannaneh Hajishirzi; |
8 | (Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, we survey over 50 papers that study scaling trends: while 45 of these papers quantify these trends using a power law, most under-report crucial details needed to reproduce their findings. To mitigate this, we we propose a checklist for authors to consider while contributing to scaling law research. |
Margaret Li; Sneha Kudugunta; Luke Zettlemoyer; |
9 | Generative Representational Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. |
Niklas Muennighoff; Hongjin SU; Liang Wang; Nan Yang; Furu Wei; Tao Yu; Amanpreet Singh; Douwe Kiela; |
10 | Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a complementary approach towards self-improvement where finetuning is applied to a multiagent society of language models. |
Vighnesh Subramaniam; Yilun Du; Joshua B. Tenenbaum; Antonio Torralba; Shuang Li; Igor Mordatch; |
11 | Matryoshka Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the concept of Matryoshka Dolls, we propose : Matryoshka Multimodal Models, which learns to represent visual content as nested sets of visual tokens that capture information across multiple coarse-to-fine granularities. |
Mu Cai; Jianwei Yang; Jianfeng Gao; Yong Jae Lee; |
12 | Aligned Datasets Improve Detection of Latent Diffusion-Generated Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Finally, to demonstrate the effectivenss of dataset alignment, we build a detector using images that are not natural objects, and present promising results. Overall, our work identifies the subtle but significant issues that arise when training a fake image detector and proposes a simple and inexpensive solution to address these problems. |
Anirudh Sundara Rajan; Utkarsh Ojha; Jedidiah Schloesser; Yong Jae Lee; |
13 | EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an Auto-regressive Auto-encoder (ArAE) model capable of generating high-quality 3D meshes with up to 4,000 faces at a spatial resolution of $512^3$. |
Jiaxiang Tang; Zhaoshuo Li; Zekun Hao; Xian Liu; Gang Zeng; Ming-Yu Liu; Qinsheng Zhang; |
14 | BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To assess how well LLMs can solve challenging and practical tasks via programs, we introduce BigCodeBench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained tasks. |
Terry Yue Zhuo; Vu Minh Chien; Jenny Chim; Han Hu; Wenhao Yu; Ratnadira Widyasari; Imam Nur Bani Yusuf; Haolan Zhan; Junda He; Indraneil Paul; Simon Brunner; Chen GONG; James Hoang; Armel Randy Zebaze; Xiaoheng Hong; Wen-Ding Li; Jean Kaddour; Ming Xu; Zhihan Zhang; Prateek Yadav; Naman Jain; Alex Gu; Zhoujun Cheng; Jiawei Liu; Qian Liu; Zijian Wang; Binyuan Hui; Niklas Muennighoff; David Lo; Daniel Fried; Xiaoning Du; Harm de Vries; Leandro Von Werra; |
15 | MMTEB: Massive Multilingual Text Embedding Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To circumvent this limitation and to provide a more comprehensive evaluation, we introduce the Massive Multilingual Text Embedding Benchmark (MMTEB) — a large-scale community-driven initiative expanding MTEB to over 500 quality-controlled evaluation tasks across 1,000+ languages.For instance, we introduce a new zero-shot English benchmark that maintains a similar ordering at a fraction of the cost. |
Kenneth Enevoldsen; Isaac Chung; Imene Kerboua; Márton Kardos; Ashwin Mathur; David Stap; Jay Gala; Wissam Siblini; Dominik Krzemiński; Genta Indra Winata; Saba Sturua; Saiteja Utpala; Mathieu Ciancone; Marion Schaeffer; Diganta Misra; Shreeya Dhakal; Jonathan Rystrøm; Roman Solomatin; Ömer Veysel Çağatan; Akash Kundu; Martin Bernstorff; Shitao Xiao; Akshita Sukhlecha; Bhavish Pahwa; Rafał Poświata; Kranthi Kiran GV; Shawon Ashraf; Daniel Auras; Björn Plüster; Jan Philipp Harries; Loïc Magne; Isabelle Mohr; Dawei Zhu; Hippolyte Gisserot-Boukhlef; Tom Aarsen; Jan Kostkan; Konrad Wojtasik; Taemin Lee; Marek Suppa; Crystina Zhang; Roberta Rocca; Mohammed Hamdy; Andrianos Michail; John Yang; Manuel Faysse; Aleksei Vatolin; Nandan Thakur; Manan Dey; Dipam Vasani; Pranjal A Chitale; Simone Tedeschi; Nguyen Tai; Artem Snegirev; Mariya Hendriksen; Michael Günther; Mengzhou Xia; Weijia Shi; Xing Han Lù; Jordan Clive; Gayatri K; Maksimova Anna; Silvan Wehrli; Maria Tikhonova; Henil Shalin Panchal; Aleksandr Abramov; Malte Ostendorff; Zheng Liu; Simon Clematide; Lester James Validad Miranda; Alena Fenogenova; Guangyu Song; Ruqiya Bin Safi; Wen-Ding Li; Alessia Borghini; Federico Cassano; Lasse Hansen; Sara Hooker; Chenghao Xiao; Vaibhav Adlakha; Orion Weller; Siva Reddy; Niklas Muennighoff; |
16 | RouteLLM: Learning to Route LLMs from Preference Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Powerful models offer better results but are expensive, while smaller models are more cost-effective but less capable. To address this trade-off, we introduce a training framework for learning efficient router models that dynamically select between a stronger and weaker LLM during inference. |
Isaac Ong; Amjad Almahairi; Vincent Wu; Wei-Lin Chiang; Tianhao Wu; Joseph E. Gonzalez; M Waleed Kadous; Ion Stoica; |
17 | Proteina: Scaling Flow-based Protein Structure Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To meaningfully quantify performance, we introduce a new set of metrics that directly measure the distributional similarity of generated proteins with reference sets, complementing existing metrics. |
Tomas Geffner; Kieran Didi; Zuobai Zhang; Danny Reidenbach; Zhonglin Cao; Jason Yim; Mario Geiger; Christian Dallago; Emine Kucukbenli; Arash Vahdat; Karsten Kreis; |
18 | ProtComposer: Compositional Protein Structure Generation with 3D Ellipsoids Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop ProtComposer to generate protein structures conditioned on spatial protein layouts that are specified via a set of 3D ellipsoids capturing substructure shapes and semantics. |
Hannes Stark; Bowen Jing; Tomas Geffner; Jason Yim; Tommi Jaakkola; Arash Vahdat; Karsten Kreis; |
19 | Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent studies have shown that the denoising process in (generative) diffusion models can induce meaningful (discriminative) representations inside the model, though the quality of these representations still lags behind those learned through recent self-supervised learning methods. We argue that one main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations. |
Sihyun Yu; Sangkyung Kwak; Huiwon Jang; Jongheon Jeong; Jonathan Huang; Jinwoo Shin; Saining Xie; |
20 | DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we identify that only a fraction of attention heads, a.k.a, Retrieval Heads, are critical for processing long contexts and require full attention across all tokens. |
Guangxuan Xiao; Jiaming Tang; Jingwei Zuo; junxian guo; Shang Yang; Haotian Tang; Yao Fu; Song Han; |
21 | LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose LiveCodeBench, a comprehensive and contamination-free evaluation of LLMs for code, which collects new problems over time from contests across three competition platforms, Leetcode, Atcoder, and Codeforces. |
Naman Jain; King Han; Alex Gu; Wen-Ding Li; Fanjia Yan; Tianjun Zhang; Sida Wang; Armando Solar-Lezama; Koushik Sen; Ion Stoica; |
22 | JudgeBench: A Benchmark for Evaluating LLM-Based Judges Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing benchmarks primarily focus on a judge’s alignment with human preferences, but often fail to account for more challenging tasks where crowdsourced human preference is a poor indicator of factual and logical correctness. To address this, we propose a novel evaluation framework to objectively evaluate LLM-based judges. |
Sijun Tan; Siyuan Zhuang; Kyle Montgomery; William Yuan Tang; Alejandro Cuadron; Chenguang Wang; Raluca Popa; Ion Stoica; |
23 | BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Real-world tasks require handling intricate interactions, advanced spatial reasoning, long-term planning, and continuous exploration of new strategies—areas in which we lack effective methodologies for comprehensively evaluating these capabilities. To address this gap, we introduce BALROG, a novel benchmark designed to assess the agentic capabilities of LLMs and VLMs through a diverse set of challenging games. |
Davide Paglieri; Bartłomiej Cupiał; Samuel Coward; Ulyana Piterbarg; Maciej Wolczyk; Akbir Khan; Eduardo Pignatelli; Łukasz Kuciński; Lerrel Pinto; Rob Fergus; Jakob Nicolaus Foerster; Jack Parker-Holder; Tim Rocktäschel; |
24 | How to Evaluate Reward Models for RLHF Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new benchmark for reward models that quantifies their ability to produce strong language models through RLHF (Reinforcement Learning from Human Feedback). |
Evan Frick; Tianle Li; Connor Chen; Wei-Lin Chiang; Anastasios Nikolas Angelopoulos; Jiantao Jiao; Banghua Zhu; Joseph E. Gonzalez; Ion Stoica; |
25 | MAVIS: Mathematical Visual Instruction Tuning with An Automatic Data Engine Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This draws forth an urgent demand for an effective training paradigm and a large-scale, comprehensive dataset with detailed CoT rationales, which is challenging to collect and costly to annotate manually. To tackle this issue, we propose MAVIS, a MAthematical VISual instruction tuning pipeline for MLLMs, featuring an automatic data engine to efficiently create mathematical visual datasets. |
Renrui Zhang; Xinyu Wei; Dongzhi Jiang; Ziyu Guo; Yichi Zhang; Chengzhuo Tong; Jiaming Liu; Aojun Zhou; Shanghang Zhang; Peng Gao; Hongsheng Li; |
26 | Self-Boosting Large Language Models with Synthetic Preference Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SynPO, a self-boosting paradigm that leverages synthetic preference data for model alignment. |
Qingxiu Dong; Li Dong; Xingxing Zhang; Zhifang Sui; Furu Wei; |
27 | First-Person Fairness in Chatbots Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The open-ended nature and diverse use-cases of chatbots necessitate novel methods for bias assessment. This paper addresses these challenges by introducing a scalable counterfactual approach to evaluate first-person fairness, meaning fairness toward chatbot users based on demographic characteristics. |
Tyna Eloundou; Alex Beutel; David G. Robinson; Keren Gu; Anna-Luisa Brakman; Pamela Mishkin; Meghan Shah; Johannes Heidecke; Lilian Weng; Adam Tauman Kalai; |
28 | Explore Theory of Mind: Program-guided Adversarial Data Generation for Theory of Mind Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce ExploreToM, the first framework to allow large-scale generation of diverse and challenging theory of mind data for robust training and evaluation. |
Melanie Sclar; Jane Dwivedi-Yu; Maryam Fazel-Zarandi; Yulia Tsvetkov; Yonatan Bisk; Yejin Choi; Asli Celikyilmaz; |
29 | VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce $\textbf{VibeCheck}$, a system for automatically comparing a pair of LLMs by discovering identifying traits of a model (vibes) that are well-defined, differentiating, and user-aligned. |
Lisa Dunlap; Krishna Mandal; Trevor Darrell; Jacob Steinhardt; Joseph E. Gonzalez; |
30 | Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data. |
Alessandro Palma; Till Richter; Hanyi Zhang; Manuel Lubetzki; Alexander Tong; Andrea Dittadi; Fabian J Theis; |
31 | Tamper-Resistant Safeguards for Open-Weight LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop a method, called TAR, for building tamper-resistant safeguards into open-weight LLMs such that adversaries cannot remove the safeguards even after hundreds of steps of fine-tuning. |
Rishub Tamirisa; Bhrugu Bharathi; Long Phan; Andy Zhou; Alice Gatti; Tarun Suresh; Maxwell Lin; Justin Wang; Rowan Wang; Ron Arel; Andy Zou; Dawn Song; Bo Li; Dan Hendrycks; Mantas Mazeika; |
32 | Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on a popular class of vision-language models (VLMs) that generate text outputs conditioned on visual and textual inputs. |
Rylan Schaeffer; Dan Valentine; Luke Bailey; James Chua; Cristobal Eyzaguirre; Zane Durante; Joe Benton; Brando Miranda; Henry Sleight; Tony Tong Wang; John Hughes; Rajashree Agrawal; Mrinank Sharma; Scott Emmons; Sanmi Koyejo; Ethan Perez; |
33 | JudgeLM: Fine-tuned Large Language Models Are Scalable Judges Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this problem, we propose to fine-tune LLMs as scalable judges (JudgeLM) to evaluate LLMs efficiently and effectively in open-ended benchmarks.We first propose a comprehensive, large-scale, high-quality dataset containing task seeds, LLMs-generated answers, and GPT-4-generated judgments for fine-tuning high-performance judges, as well as a new benchmark for evaluating the judges. |
Lianghui Zhu; Xinggang Wang; Xinlong Wang; |
34 | Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Studying our agent baselines closely, we identify open problems in building and evaluating research agents, including failures of LLM self-evaluation and their lack of diversity in generation. |
Chenglei Si; Diyi Yang; Tatsunori Hashimoto; |
35 | On Scaling Up 3D Gaussian Splatting Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Grendel, a distributed system designed to partition 3DGS parameters and parallelize computation across multiple GPUs. |
Hexu Zhao; Haoyang Weng; Daohan Lu; Ang Li; Jinyang Li; Aurojit Panda; Saining Xie; |
36 | Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Toward that end, we introduce Cybench, a framework for specifying cybersecurity tasks and evaluating agents on those tasks. |
Andy K Zhang; Neil Perry; Riya Dulepet; Joey Ji; Celeste Menders; Justin W Lin; Eliot Jones; Gashon Hussein; Samantha Liu; Donovan Julian Jasper; Pura Peetathawatchai; Ari Glenn; Vikram Sivashankar; Daniel Zamoshchin; Leo Glikbarg; Derek Askaryar; Haoxiang Yang; Aolin Zhang; Rishi Alluri; Nathan Tran; Rinnara Sangpisit; Kenny O Oseleononmen; Dan Boneh; Daniel E. Ho; Percy Liang; |
37 | Bidirectional Decoding: Improving Action Chunking Via Guided Test-Time Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that action chunking allows the learner to better capture the temporal dependencies in demonstrations but at the cost of reduced reactivity to unexpected states. To address this tradeoff, we propose Bidirectional Decoding (BID), a test-time inference algorithm that bridges action chunking with closed-loop adaptation. |
Yuejiang Liu; Jubayer Ibn Hamid; Annie Xie; Yoonho Lee; Max Du; Chelsea Finn; |
38 | DiscoveryBench: Towards Data-Driven Discovery with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. |
Bodhisattwa Prasad Majumder; Harshit Surana; Dhruv Agarwal; Bhavana Dalvi Mishra; Abhijeetsingh Meena; Aryan Prakhar; Tirth Vora; Tushar Khot; Ashish Sabharwal; Peter Clark; |
39 | Safety Alignment Should Be Made More Than Just A Few Tokens Deep Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We unifiedly refer to this issue as shallow safety alignment. In this paper, we present case studies to explain why shallow safety alignment can exist and show how this issue universally contributes to multiple recently discovered vulnerabilities in LLMs, including the susceptibility to adversarial suffix attacks, prefilling attacks, decoding parameter attacks, and fine-tuning attacks. |
Xiangyu Qi; Ashwinee Panda; Kaifeng Lyu; Xiao Ma; Subhrajit Roy; Ahmad Beirami; Prateek Mittal; Peter Henderson; |
40 | On Evaluating The Durability of Safeguards for Open-Weight LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Several recent studies have proposed methods to produce durable LLM safeguards for open-weight LLMs that can withstand adversarial modifications of the model’s weights via fine-tuning. This holds the promise of raising adversaries’ costs even under strong threat models where adversaries can directly fine-tune parameters. However, we caution against over-reliance on such methods in their current state. Through several case studies, we demonstrate that even the evaluation of these defenses is exceedingly difficult and can easily mislead audiences into thinking that safeguards are more durable than they really are. |
Xiangyu Qi; Boyi Wei; Nicholas Carlini; Yangsibo Huang; Tinghao Xie; Luxi He; Matthew Jagielski; Milad Nasr; Prateek Mittal; Peter Henderson; |
41 | Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Agent-to-Sim (ATS), a framework for learning interactive behavior models of 3D agents from casual longitudinal video collections. |
Gengshan Yang; Andrea Bajcsy; Shunsuke Saito; Angjoo Kanazawa; |
42 | From Models to Microtheories: Distilling A Model’s Topical Knowledge for Grounded Question-Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent reasoning methods (e.g., chain-of-thought) help users understand how language models (LMs) answer a single question, but they do little to reveal the LM’s overall understanding, or “theory,” about the question’s topic, making it still hard to trust the model. Our goal is to materialize such theories – here called microtheories (a linguistic analog of logical microtheories) – as a set of sentences encapsulating an LM’s core knowledge about a topic. |
Nathaniel Weir; Bhavana Dalvi Mishra; Orion Weller; Oyvind Tafjord; Sam Hornstein; Alexander Sabol; Peter Jansen; Benjamin Van Durme; Peter Clark; |
43 | T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce sampling Trajectory Stitching (T-Stitch), a simple yet efficient technique to improve the sampling efficiency with little or no generation degradation. |
Zizheng Pan; Bohan Zhuang; De-An Huang; Weili Nie; Zhiding Yu; Chaowei Xiao; Jianfei Cai; Anima Anandkumar; |
44 | LongVILA: Scaling Long-Context Visual Language Models for Long Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce LongVILA, a full-stack solution for long-context visual-language models by co-designing the algorithm and system. |
Yukang Chen; Fuzhao Xue; Dacheng Li; Qinghao Hu; Ligeng Zhu; Xiuyu Li; Yunhao Fang; Haotian Tang; Shang Yang; Zhijian Liu; Yihui He; Hongxu Yin; Pavlo Molchanov; Jan Kautz; Linxi Fan; Yuke Zhu; Yao Lu; Song Han; |
45 | Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Scaling up autoregressive models in vision has not proven as beneficial as in large language models. In this work, we investigate this scaling problem in the context of text-to-image generation, focusing on two critical factors: whether models use discrete or continuous tokens, and whether tokens are generated in a random or fixed raster order using BERT- or GPT-like transformer architectures. |
Lijie Fan; Tianhong Li; Siyang Qin; Yuanzhen Li; Chen Sun; Michael Rubinstein; Deqing Sun; Kaiming He; Yonglong Tian; |
46 | Real2Code: Reconstruct Articulated Objects Via Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Real2Code, a novel approach to reconstructing articulated objects via code generation. |
Zhao Mandi; Yijia Weng; Dominik Bauer; Shuran Song; |
47 | Sequential Controlled Langevin Diffusions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a principled framework for combining SMC with diffusion-based samplers by viewing both methods in continuous time and considering measures on path space. |
Junhua Chen; Lorenz Richter; Julius Berner; Denis Blessing; Gerhard Neumann; Anima Anandkumar; |
48 | The KoLMogorov Test: Compression By Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the *KoLMogorov-Test* (KT), a compression-as-intelligence intelligence test for code generation LLMs. |
Ori Yoran; Kunhao Zheng; Fabian Gloeckle; Jonas Gehring; Gabriel Synnaeve; Taco Cohen; |
49 | AutoBencher: Towards Declarative Benchmark Construction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present AutoBencher, a declarative framework for automatic benchmark construction, and use it to scalably discover novel insights and vulnerabilities of existing language models. |
Xiang Lisa Li; Farzaan Kaiyom; Evan Zheran Liu; Yifan Mai; Percy Liang; Tatsunori Hashimoto; |
50 | Robust Representation Consistency Model Via Contrastive Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While these methods excel at small perturbation radii, they struggle with larger perturbations and incur a significant computational overhead during inference compared to classical methods. To address this, we reformulate the generative modeling task along the diffusion trajectories in pixel space as a discriminative task in the latent space. |
Jiachen Lei; Julius Berner; Jiongxiao Wang; Zhongzhu Chen; Chaowei Xiao; Zhongjie Ba; Kui Ren; Jun Zhu; Anima Anandkumar; |
51 | LiveBench: A Challenging, Contamination-Limited LLM Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a new benchmark for LLMs designed to be resistant to both test set contamination and the pitfalls of LLM judging and human crowdsourcing. |
Colin White; Samuel Dooley; Manley Roberts; Arka Pal; Benjamin Feuer; Siddhartha Jain; Ravid Shwartz-Ziv; Neel Jain; Khalid Saifullah; Sreemanti Dey; Shubh-Agrawal; Sandeep Singh Sandha; Siddartha Venkat Naidu; Chinmay Hegde; Yann LeCun; Tom Goldstein; Willie Neiswanger; Micah Goldblum; |
52 | Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces PANGEA, a multilingual multimodal LLM trained on PANGEAINS, a diverse 6M instruction dataset spanning 39 languages. |
Xiang Yue; Yueqi Song; Akari Asai; Seungone Kim; Jean de Dieu Nyandwi; Simran Khanuja; Anjali Kantharuban; Lintang Sutawika; Sathyanarayanan Ramamoorthy; Graham Neubig; |
53 | Towards Semantic Equivalence of Tokenization in Multimodal LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods aggressively fragment visual input, corrupting the visual semantic integrity. To address this, this paper proposes a novel dynamic Semantic-Equivalent Vision Tokenizer (SeTok), which groups visual features into semantic units via a dynamic clustering algorithm, flexibly determining the number of tokens based on image complexity. |
Shengqiong Wu; Hao Fei; Xiangtai Li; Jiayi Ji; Hanwang Zhang; Tat-Seng Chua; Shuicheng YAN; |
54 | ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Diffusion models break down the challenging task of generating data from high-dimensional distributions into a series of easier denoising steps. Inspired by this paradigm, we propose a novel approach that extends the diffusion framework into modality space, decomposing the complex task of RGB image generation into simpler, interpretable stages. |
Eslam Mohamed BAKR; Liangbing Zhao; Vincent Tao Hu; Matthieu Cord; Patrick Perez; Mohamed Elhoseiny; |
55 | Diffusion State-Guided Projected Gradient for Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enhance the performance and robustness of diffusion models in solving inverse problems, we propose Diffusion State-Guided Projected Gradient (DiffStateGrad), which projects the measurement gradient onto a subspace that is a low-rank approximation of an intermediate state of the diffusion process. |
Rayhan Zirvi; Bahareh Tolooshams; Anima Anandkumar; |
56 | LeanAgent: Lifelong Learning for Formal Theorem Proving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present LeanAgent, a novel lifelong learning framework for formal theorem proving that continuously generalizes to and improves on ever-expanding mathematical knowledge without forgetting previously learned knowledge. |
Adarsh Kumarappan; Mo Tiwari; Peiyang Song; Robert Joseph George; Chaowei Xiao; Anima Anandkumar; |
57 | Improving Pretraining Data Using Perplexity Correlations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, progress in understanding pretraining data has been slow due to the costly pretraining runs required for data selection experiments. We present a framework that avoids these costs and selects high-quality pretraining data without any LLM training of our own. |
Tristan Thrush; Christopher Potts; Tatsunori Hashimoto; |
58 | Language Models Scale Reliably with Over-training and on Downstream Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address both shortcomings, we create a testbed of 104 models with 0.011B to 6.9B parameters trained with various numbers of tokens on three data distributions. |
Samir Yitzhak Gadre; Georgios Smyrnis; Vaishaal Shankar; Suchin Gururangan; Mitchell Wortsman; Rulin Shao; Jean Mercat; Alex Fang; Jeffrey Li; Sedrick Keh; Rui Xin; Marianna Nezhurina; Igor Vasiljevic; Luca Soldaini; Jenia Jitsev; Alex Dimakis; Gabriel Ilharco; Pang Wei Koh; Shuran Song; Thomas Kollar; Yair Carmon; Achal Dave; Reinhard Heckel; Niklas Muennighoff; Ludwig Schmidt; |
59 | Synthetic Continued Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This poses a challenge when adapting a pretrained model to a small corpus of domain-specific documents, where each fact may appear rarely or only once. We propose to bridge this gap with synthetic continued pretraining: using the small domain-specific corpus to synthesize a large corpus more amenable to learning, and then performing continued pretraining on the synthesized corpus. |
Zitong Yang; Neil Band; Shuangping Li; Emmanuel Candes; Tatsunori Hashimoto; |
60 | Locality Alignment Improves Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We hypothesize that this is due to VLMs adopting pre-trained vision backbones, specifically vision transformers (ViTs) trained with image-level supervision and minimal inductive biases. Such models may fail to encode the class contents at each position in the image, and our goal is to resolve this with a vision backbone that effectively captures both local and global image semantics. |
Ian Connick Covert; Tony Sun; James Zou; Tatsunori Hashimoto; |
61 | AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite diverse strategies (e.g., cipher, low-resource language, persuasions, and so on) that have been proposed and shown success, these strategies are still manually designed, limiting their scope and effectiveness as a red-teaming tool. In this paper, we propose AutoDAN-Turbo, a black-box jailbreak method that can automatically discover as many jailbreak strategies as possible from scratch, without any human intervention or predefined scopes (e.g., specified candidate strategies), and use them for red-teaming. |
Xiaogeng Liu; Peiran Li; G. Edward Suh; Yevgeniy Vorobeychik; Zhuoqing Mao; Somesh Jha; Patrick McDaniel; Huan Sun; Bo Li; Chaowei Xiao; |
62 | Descent with Misaligned Gradients and Applications to Hidden Convexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of minimizing a convex objective given access to an oracle that outputs misaligned stochastic gradients, where the expected value of the output is guaranteed to be correlated with, but not necessarily equal to the true gradient of the objective. |
Aditya Bhaskara; Ashok Cutkosky; Ravi Kumar; Manish Purohit; |
63 | MUSE: Machine Unlearning Six-Way Evaluation for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The evaluation of the efficacy of these algorithms has traditionally been narrow in scope, failing to precisely quantify the success and practicality of the algorithm from the perspectives of both the model deployers and the data owners. We address this issue by proposing MUSE, a comprehensive machine unlearning evaluation benchmark that enumerates six diverse desirable properties for unlearned models: (1) no verbatim memorization, (2) no knowledge memorization, (3) no privacy leakage, (4) utility preservation on data not intended for removal, (5) scalability with respect to the size of removal requests, and (6) sustainability over sequential unlearning requests. |
Weijia Shi; Jaechan Lee; Yangsibo Huang; Sadhika Malladi; Jieyu Zhao; Ari Holtzman; Daogao Liu; Luke Zettlemoyer; Noah A. Smith; Chiyuan Zhang; |
64 | MPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce the versatile multi-modal large language model, mPLUG-Owl3, which enhances the capability for long image-sequence understanding in scenarios that incorporate retrieved image-text knowledge, multimodal in-context examples, and lengthy videos. |
Jiabo Ye; Haiyang Xu; Haowei Liu; Anwen Hu; Ming Yan; Qi Qian; Ji Zhang; Fei Huang; Jingren Zhou; |
65 | SpinQuant: LLM Quantization with Learned Rotations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we identify a collection of applicable rotation parameterizations that lead to identical outputs in full-precision Transformer architectures while enhancing quantization accuracy. |
Zechun Liu; Changsheng Zhao; Igor Fedorov; Bilge Soran; Dhruv Choudhary; Raghuraman Krishnamoorthi; Vikas Chandra; Yuandong Tian; Tijmen Blankevoort; |
66 | UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce UniCon, a novel architecture designed to enhance control and efficiency in training adapters for large-scale diffusion models. |
Fanghua Yu; Jinjin Gu; Jinfan Hu; Zheyuan Li; Chao Dong; |
67 | Training Language Models to Self-Correct Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current methods for training self-correction typically depend on either multiple models, a more advanced model, or additional forms of supervision. To address these shortcomings, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that significantly improves an LLM’s self-correction ability using entirely self-generated data. |
Aviral Kumar; Vincent Zhuang; Rishabh Agarwal; Yi Su; John D Co-Reyes; Avi Singh; Kate Baumli; Shariq Iqbal; Colton Bishop; Rebecca Roelofs; Lei M Zhang; Kay McKinney; Disha Shrivastava; Cosmin Paduraru; George Tucker; Doina Precup; Feryal Behbahani; Aleksandra Faust; |
68 | 3D-SPATIAL MULTIMODAL MEMORY Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present 3D Spatial MultiModal Memory (M3), a multimodal memory system designed to retain information about medium-sized static scenes through video sources for visual perception. |
Xueyan Zou; Yuchen Song; Ri-Zhao Qiu; Xuanbin Peng; Jianglong Ye; Sifei Liu; Xiaolong Wang; |
69 | CREMA: Generalizable and Efficient Video-Language Reasoning Via Multimodal Modular Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite impressive advancements in recent multimodal reasoning approaches, they are still limited in flexibility and efficiency, as these models typically process only a few fixed modality inputs and require updates to numerous parameters. This paper tackles these critical challenges and proposes CREMA, a generalizable, highly efficient, and modular modality-fusion framework that can incorporate many new modalities to enhance video reasoning. |
Shoubin Yu; Jaehong Yoon; Mohit Bansal; |
70 | WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce WildBench, an automated evaluation framework designed to benchmark large language models (LLMs) using challenging, real-world user queries. |
Bill Yuchen Lin; Yuntian Deng; Khyathi Chandu; Abhilasha Ravichander; Valentina Pyatkin; Nouha Dziri; Ronan Le Bras; Yejin Choi; |
71 | Grounding Video Models to Actions Through Goal Conditioned Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate how to directly ground video models to continuous actions through self-exploration in the embodied environment — using generated video states as visual goals for exploration. |
Yunhao Luo; Yilun Du; |
72 | HELMET: How to Evaluate Long-context Models Effectively and Thoroughly Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce HELMET (How to Evaluate Long-context Models Effectively and Thoroughly), a comprehensive benchmark encompassing seven diverse, application-centric categories. |
Howard Yen; Tianyu Gao; Minmin Hou; Ke Ding; Daniel Fleischer; Peter Izsak; Moshe Wasserblat; Danqi Chen; |
73 | Scalable Extraction of Training Data from Aligned, Production Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work highlights the limitations of existing safeguards to prevent training data leakage in production language models. |
Milad Nasr; Javier Rando; Nicholas Carlini; Jonathan Hayase; Matthew Jagielski; A. Feder Cooper; Daphne Ippolito; Christopher A. Choquette-Choo; Florian Tramèr; Katherine Lee; |
74 | SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce SlowFast-VGen, a novel dual-speed learning system for action-driven long video generation.To facilitate the slow learning of an approximate world model, we collect a large-scale dataset of 200k videos with language action annotations, covering a wide range of scenarios. |
Yining Hong; Beide Liu; Maxine Wu; Yuanhao Zhai; Kai-Wei Chang; Linjie Li; Kevin Lin; Chung-Ching Lin; Jianfeng Wang; Zhengyuan Yang; Ying Nian Wu; Lijuan Wang; |
75 | Mix-CPT: A Domain Adaptation Framework Via Decoupling Knowledge Learning and Format Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this method may lead to inefficient knowledge memorization due to a lack of awareness of knowledge utilization during the continual pre-training and demands LLMs to simultaneously learn knowledge utilization and format alignment with divergent training objectives during the fine-tuning. To enhance the domain adaptation of LLMs, we revise this process and propose a new domain adaptation framework including domain knowledge learning and general format alignment, called \emph{Mix-CPT}. |
Jinhao Jiang; Junyi Li; Xin Zhao; Yang Song; Tao Zhang; Ji-Rong Wen; |
76 | SeCom: On Memory Construction and Retrieval for Personalized Conversational Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the impact of different memory granularities and present two key findings: (1) Both turn-level and session-level memory units are suboptimal, affecting not only the quality of final responses, but also the accuracy of the retrieval process. |
Zhuoshi Pan; Qianhui Wu; Huiqiang Jiang; Xufang Luo; Hao Cheng; Dongsheng Li; Yuqing Yang; Chin-Yew Lin; H. Vicky Zhao; Lili Qiu; Jianfeng Gao; |
77 | Magpie: Alignment Data Synthesis from Scratch By Prompting Aligned LLMs with Nothing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a self-synthesis method for generating large-scale alignment data named Magpie. |
Zhangchen Xu; Fengqing Jiang; Luyao Niu; Yuntian Deng; Radha Poovendran; Yejin Choi; Bill Yuchen Lin; |
78 | Selective Attention Improves Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Selective Attention, a simple parameter-free change to the standard attention mechanism which reduces attention to unneeded elements. |
Yaniv Leviathan; Matan Kalman; Yossi Matias; |
79 | AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present AndroidWorld, a fully functional Android environment that provides reward signals for 116 programmatic tasks across 20 real-world Android apps. |
Christopher Rawles; Sarah Clinckemaillie; Yifan Chang; Jonathan Waltz; Gabrielle Lau; Marybeth Fair; Alice Li; William E Bishop; Wei Li; Folawiyo Campbell-Ajala; Daniel Kenji Toyama; Robert James Berry; Divya Tyamagundlu; Timothy P Lillicrap; Oriana Riva; |
80 | Provable Uncertainty Decomposition Via Higher-Order Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give a principled method for decomposing the predictive uncertainty of a model into aleatoric and epistemic components with explicit semantics relating them to the real-world data distribution. |
Gustaf Ahdritz; Aravind Gollakota; Parikshit Gopalan; Charlotte Peale; Udi Wieder; |
81 | LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Large View Synthesis Model (LVSM), a novel transformer-based approach for scalable and generalizable novel view synthesis from sparse-view inputs. |
Haian Jin; Hanwen Jiang; Hao Tan; Kai Zhang; Sai Bi; Tianyuan Zhang; Fujun Luan; Noah Snavely; Zexiang Xu; |
82 | Advancing LLM Reasoning Generalists with Preference Trees Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce EURUS, a suite of large language models (LLMs) optimized for reasoning. |
Lifan Yuan; Ganqu Cui; Hanbin Wang; Ning Ding; Xingyao Wang; Boji Shan; Zeyuan Liu; Jia Deng; Huimin Chen; Ruobing Xie; Yankai Lin; Zhenghao Liu; Bowen Zhou; Hao Peng; Zhiyuan Liu; Maosong Sun; |
83 | Scaling and Evaluating Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose using k-sparse autoencoders [Makhzani and Frey, 2013] to directly control sparsity, simplifying tuning and improving the reconstruction-sparsity frontier. |
Leo Gao; Tom Dupre la Tour; Henk Tillman; Gabriel Goh; Rajan Troll; Alec Radford; Ilya Sutskever; Jan Leike; Jeffrey Wu; |
84 | Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce methods for discovering and applying **sparse feature circuits**. |
Samuel Marks; Can Rager; Eric J Michaud; Yonatan Belinkov; David Bau; Aaron Mueller; |
85 | TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A significant remaining issue lies in the major differences between teacher and student models, namely the substantial capacity gap, mode averaging, and mode collapse, which pose barriers during distillation. To address these issues, we introduce $\textit{Temporally Adaptive Interpolated Distillation (TAID)}$, a novel knowledge distillation approach that dynamically interpolates student and teacher distributions through an adaptive intermediate distribution, gradually shifting from the student’s initial distribution towards the teacher’s distribution. |
Makoto Shing; Kou Misaki; Han Bao; Sho Yokoi; Takuya Akiba; |
86 | VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the potential of building universal multimodal embeddings capable of handling a broad range of downstream tasks. |
Ziyan Jiang; Rui Meng; Xinyi Yang; Semih Yavuz; Yingbo Zhou; Wenhu Chen; |
87 | Controlling Space and Time with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present 4DiM, a cascaded diffusion model for 4D novel view synthesis (NVS), supporting generation with arbitrary camera trajectories and timestamps, in natural scenes, conditioned on one or more images. |
Daniel Watson; Saurabh Saxena; Lala Li; Andrea Tagliasacchi; David J. Fleet; |
88 | Harnessing Webpage UIs for Text-Rich Visual Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Text-rich visual understanding—the ability to interpret both textual content and visual elements within a scene—is crucial for multimodal large language models (MLLMs) to effectively interact with structured environments. We propose leveraging webpage UIs as a naturally structured and diverse data source to enhance MLLMs’ capabilities in this area. |
Junpeng Liu; Tianyue Ou; Yifan Song; Yuxiao Qu; Wai Lam; Chenyan Xiong; Wenhu Chen; Graham Neubig; Xiang Yue; |
89 | Subtask-Aware Visual Reward Learning from Segmented Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces REDS: REward learning from Demonstration with Segmentations, a novel reward learning framework that leverages action-free videos with minimal supervision. |
Changyeon Kim; Minho Heo; Doohyun Lee; Honglak Lee; Jinwoo Shin; Joseph J Lim; Kimin Lee; |
90 | X-Gen: Ego-centric Video Prediction By Watching Exo-centric Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the cross-view video prediction task, where given an exo-centric video, the first frame of the corresponding ego-centric video, and textual instructions, the goal is to generate future frames of the ego-centric video. |
Jilan Xu; Yifei Huang; Baoqi Pei; Junlin Hou; Qingqiu Li; Guo Chen; Yuejie Zhang; Rui Feng; Weidi Xie; |
91 | Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Therefore, continuing with the current architectures will present a computational roadblock. To address this gap, we propose Mixture-of-Denoising Experts (MoDE) as a novel policy for Imitation Learning. |
Moritz Reuss; Jyothish Pari; Pulkit Agrawal; Rudolf Lioutikov; |
92 | Wayward Concepts In Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct a large-scale analysis on three state-of-the-art models in text-to-image generation, open-set object detection, and zero-shot classification, and find that prompts optimized to represent new visual concepts are akin to an adversarial attack on the text encoder. |
Brandon Trabucco; Max A Gurinas; Kyle Doherty; Russ Salakhutdinov; |
93 | CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, our analysis of representation distributions reveals that multimodal DPO struggles to align image and text representations and to distinguish between hallucinated and non-hallucinated descriptions. To address these challenges, In this work, we propose a Cross-modal Hierarchical Direct Preference Optimization (CHiP) to address these limitations. |
Jinlan Fu; huangfushenzhen; Hao Fei; Xiaoyu Shen; Bryan Hooi; Xipeng Qiu; See-Kiong Ng; |
94 | MVTokenFlow: High-quality 4D Content Generation Using Multiview Token Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present MVTokenFlow for high-quality 4D content creation from monocular videos. |
Hanzhuo Huang; Yuan Liu; Ge Zheng; Jiepeng Wang; Zhiyang Dou; Sibei Yang; |
95 | OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce OmniCorpus, a 10 billion-scale image-text interleaved dataset. |
Qingyun Li; Zhe Chen; Weiyun Wang; Wenhai Wang; Shenglong Ye; Zhenjiang Jin; Guanzhou Chen; Yinan He; Zhangwei Gao; Erfei Cui; Jiashuo Yu; Hao Tian; Jiasheng Zhou; Chao Xu; Bin Wang; Xingjian Wei; Wei Li; Wenjian Zhang; Bo Zhang; Pinlong Cai; Licheng Wen; Xiangchao Yan; Pei Chu; Yi Wang; Min Dou; Changyao Tian; Xizhou Zhu; Lewei Lu; Yushi Chen; Junjun He; Tong Lu; Yali Wang; Limin Wang; Dahua Lin; Yu Qiao; Botian Shi; Conghui He; Jifeng Dai; |
96 | Look Before You Leap: Universal Emergent Mechanism for Retrieval in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To study how LMs solve retrieval tasks in diverse situations, we introduce ORION, a collection of structured retrieval tasks spanning six domains, from text understanding to coding. |
Alexandre Variengien; Eric Winsor; |
97 | DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore methodologies to leverage proof assistant feedback to augment the capabilities of large language models in constructing formal proofs. |
Huajian Xin; Z.Z. Ren; Junxiao Song; Zhihong Shao; Wanjia Zhao; Haocheng Wang; Bo Liu; Liyue Zhang; Xuan Lu; Qiushi Du; Wenjun Gao; Haowei Zhang; Qihao Zhu; Dejian Yang; Zhibin Gou; Z.F. Wu; Fuli Luo; Chong Ruan; |
98 | Walk The Talk? Measuring The Faithfulness of Large Language Model Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new approach for measuring the faithfulness of LLM explanations. |
Katie Matton; Robert Ness; John Guttag; Emre Kiciman; |
99 | LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In other words, their output limitation is due to the scarcity of long-output examples in existing SFT datasets. To address this, we introduce AgentWrite, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, enabling off-the-shelf LLMs to generate coherent outputs exceeding 20,000 words. |
Yushi Bai; Jiajie Zhang; Xin Lv; Linzhi Zheng; Siqi Zhu; Lei Hou; Yuxiao Dong; Jie Tang; Juanzi Li; |
100 | Vision Language Models Are In-Context Value Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, learning such progress estimator, or temporal value function, across different tasks and domains requires both a large amount of diverse data and methods which can scale and generalize. To address these challenges, we present Generative Value Learning (GVL), a universal value function estimator that leverages the world knowledge embedded in vision-language models (VLMs) to predict task progress. |
Yecheng Jason Ma; Joey Hejna; Chuyuan Fu; Dhruv Shah; Jacky Liang; Zhuo Xu; Sean Kirmani; Peng Xu; Danny Driess; Ted Xiao; Osbert Bastani; Dinesh Jayaraman; Wenhao Yu; Tingnan Zhang; Dorsa Sadigh; Fei Xia; |
101 | Can Large Language Models Understand Symbolic Graphics Programs? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Utilizing symbolic graphics programs, we propose a domain well-suited to test multiple spatial-semantic reasoning skills of LLMs. |
Zeju Qiu; Weiyang Liu; Haiwen Feng; Zhen Liu; Tim Z. Xiao; Katherine M. Collins; Joshua B. Tenenbaum; Adrian Weller; Michael J. Black; Bernhard Schölkopf; |
102 | Preference Optimization for Reasoning with Pseudo Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce a novel approach to generate pseudo feedback for reasoning tasks by framing the labeling of solutions to reason problems as an evaluation against associated \emph{test cases}. |
Fangkai Jiao; Geyang Guo; Xingxing Zhang; Nancy F. Chen; Shafiq Joty; Furu Wei; |
103 | Inverse Attention Agents for Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Agents trained using conventional methods tend to excel only within the confines of their training cohorts; their performance drops significantly when confronting unfamiliar agents. To address this shortcoming, we introduce Inverse Attention Agents that adopt concepts from the Theory of Mind (ToM) implemented algorithmically using an attention mechanism trained in an end-to-end manner. |
Qian Long; Ruoyan Li; Minglu Zhao; Tao Gao; Demetri Terzopoulos; |
104 | Latent Action Pretraining from Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a method to learn from internet-scale videos that do not have robot action labels. |
Seonghyeon Ye; Joel Jang; Byeongguk Jeon; Se June Joo; Jianwei Yang; Baolin Peng; Ajay Mandlekar; Reuben Tan; Yu-Wei Chao; Bill Yuchen Lin; Lars Liden; Kimin Lee; Jianfeng Gao; Luke Zettlemoyer; Dieter Fox; Minjoon Seo; |
105 | Generative Verifiers: Reward Modeling As Next-Token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While LLM-based verifiers are typically trained as discriminative classifiers to score solutions, they do not utilize the text generation capabilities of pretrained LLMs. To overcome this limitation, we instead propose training verifiers using the ubiquitous next-token prediction objective, jointly on verification and solution generation. |
Lunjun Zhang; Arian Hosseini; Hritik Bansal; Mehran Kazemi; Aviral Kumar; Rishabh Agarwal; |
106 | ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To generate high-quality, dynamic, and temporally consistent long videos, this paper presents ARLON, a novel framework that boosts diffusion Transformers with autoregressive (\textbf{AR}) models for long (\textbf{LON}) video generation, by integrating the coarse spatial and long-range temporal information provided by the AR model to guide the DiT model effectively. |
Zongyi Li; Shujie HU; Shujie LIU; Long Zhou; Jeongsoo Choi; Lingwei Meng; Xun Guo; Jinyu Li; Hefei Ling; Furu Wei; |
107 | Oscillatory State-Space Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Linear Oscillatory State-Space models (LinOSS) for efficiently learning on long sequences. |
T. Konstantin Rusch; Daniela Rus; |
108 | AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce AttriBoT, a series of novel techniques for efficiently computing an approximation of the LOO error for context attribution. |
Fengyuan Liu; Nikhil Kandpal; Colin Raffel; |
109 | Differential Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Diff Transformer, which amplifies attention to the relevant context while canceling noise. |
Tianzhu Ye; Li Dong; Yuqing Xia; Yutao Sun; Yi Zhu; Gao Huang; Furu Wei; |
110 | Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we expose a critical yet underexplored vulnerability in the deployment of unlearning systems: the assumption that the data requested for removal is always part of the original training set. |
Yangsibo Huang; Daogao Liu; Lynn Chua; Badih Ghazi; Pritish Kamath; Ravi Kumar; Pasin Manurangsi; Milad Nasr; Amer Sinha; Chiyuan Zhang; |
111 | Param$\Delta$ for Direct Mixing: Post-Train Large Language Model At Zero Cost Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Param$\Delta$, an innovative approach that streamlines the post-training process by transferring knowledge and capability from an existing post-trained model to a newly upgraded base model without additional training. |
Sheng Cao; Mingrui Wu; Karthik Prasad; Yuandong Tian; Zechun Liu; |
112 | MEGA-Bench: Scaling Multimodal Evaluation to Over 500 Real-World Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present MEGA-Bench, an evaluation suite that scales multimodal evaluation to over 500 real-world tasks, to address the highly heterogeneous daily use cases of end users. |
Jiacheng Chen; Tianhao Liang; Sherman Siu; Zhengqing Wang; Kai Wang; Yubo Wang; Yuansheng Ni; Ziyan Jiang; Wang Zhu; Bohan Lyu; Dongfu Jiang; Xuan He; Yuan Liu; Hexiang Hu; Xiang Yue; Wenhu Chen; |
113 | OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present OmniEdit, which is an omnipotent editor to handle seven different image editing tasks with any aspect ratio seamlessly. |
Cong Wei; Zheyang Xiong; Weiming Ren; Xeron Du; Ge Zhang; Wenhu Chen; |
114 | Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, the Best-of-N (BoN) inference strategy, where an LLM generates multiple responses and a verifier selects the best, has shown strong empirical performance. Motivated by this, we develop a novel inference-aware fine-tuning paradigm, which encompasses the BoN-aware inference framework as a special case. |
Yinlam Chow; Guy Tennenholtz; Izzeddin Gur; Vincent Zhuang; Bo Dai; Aviral Kumar; Rishabh Agarwal; Sridhar Thiagarajan; Craig Boutilier; Aleksandra Faust; |
115 | On The Self-verification Limitations of Large Language Models on Reasoning and Planning Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we set out to systematically investigate the effectiveness of iterative prompting in the context of reasoning and planning. |
Kaya Stechly; Karthik Valmeekam; Subbarao Kambhampati; |
116 | Diffusion Policy Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework including best practices for fine-tuning diffusion-based policies (e.g. Diffusion Policy) in continuous control and robot learning tasks using the policy gradient (PG) method from reinforcement learning (RL). |
Allen Z. Ren; Justin Lidard; Lars Lien Ankile; Anthony Simeonov; Pulkit Agrawal; Anirudha Majumdar; Benjamin Burchfiel; Hongkai Dai; Max Simchowitz; |
117 | Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Add-it, a training-free approach that extends diffusion models’ attention mechanisms to incorporate information from three key sources: the scene image, the text prompt, and the generated image itself. |
Yoad Tewel; Rinon Gal; Dvir Samuel; Yuval Atzmon; Lior Wolf; Gal Chechik; |
118 | Fantastic Copyrighted Beasts and How (Not) to Generate Them Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, little research has systematically examined these problems: (1) Can users easily prompt models to generate copyrighted characters, even if it is unintentional?; (2) How effective are the existing mitigation strategies? To address these questions, we introduce a novel evaluation framework with metrics that assess both the generated image�s similarity to copyrighted characters and its consistency with user intent, grounded in a set of popular copyrighted characters from diverse studios and regions. |
Luxi He; Yangsibo Huang; Weijia Shi; Tinghao Xie; Haotian Liu; Yue Wang; Luke Zettlemoyer; Chiyuan Zhang; Danqi Chen; Peter Henderson; |
119 | Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce AutoIF, the first scalable and reliable method for automatically generating instruction-following training data. |
Guanting Dong; Keming Lu; Chengpeng Li; Tingyu Xia; Bowen Yu; Chang Zhou; Jingren Zhou; |
120 | Energy-Based Diffusion Language Models for Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Energy-based Diffusion Language Model (EDLM), an energy-based model operating at the full sequence level for each diffusion step, introduced to improve the underlying approximation used by diffusion models. |
Minkai Xu; Tomas Geffner; Karsten Kreis; Weili Nie; Yilun Xu; Jure Leskovec; Stefano Ermon; Arash Vahdat; |
121 | From Exploration to Mastery: Enabling LLMs to Master Tools Via Self-Driven Interactions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel framework, DRAFT, aimed at Dynamically Refining tool documentation through the Analysis of Feedback and Trials emanating from LLMs’ interactions with external tools. |
Changle Qu; Sunhao Dai; Xiaochi Wei; Hengyi Cai; Shuaiqiang Wang; Dawei Yin; Jun Xu; Ji-Rong Wen; |
122 | U-Nets As Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel interpretation of the U-Net architecture by studying certain generative hierarchical models, which are tree-structured graphical models extensively utilized in both language and image domains. |
Song Mei; |
123 | RB-Modulation: Training-Free Stylization Using Reference-Based Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. |
Litu Rout; Yujia Chen; Nataniel Ruiz; Abhishek Kumar; Constantine Caramanis; Sanjay Shakkottai; Wen-Sheng Chu; |
124 | Semantic Image Inversion and Editing Using Rectified Stochastic Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose RF inversion using dynamic optimal control derived via a linear quadratic regulator, and prove that the resulting vector field is equivalent to a rectified stochastic differential equation. |
Litu Rout; Yujia Chen; Nataniel Ruiz; Constantine Caramanis; Sanjay Shakkottai; Wen-Sheng Chu; |
125 | {$\tau$}-bench: A Benchmark for \underline{T}ool-\underline{A}gent-\underline{U}ser Interaction in Real-World Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose $\tau$-bench, a benchmark with two domains (retail and airline) emulating dynamic conversations between a user (simulated by language models) and a customer service agent provided with domain-specific API tools and policy guidelines. |
Shunyu Yao; Noah Shinn; Pedram Razavi; Karthik R Narasimhan; |
126 | SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This limited coverage motivates our inquiry into how existing systems might perform on unrepresented software engineering domains (e.g., front-end, game development, DevOps), which use different programming languages and paradigms. Therefore, we propose SWE-bench Multimodal (SWE-bench M), to evaluate systems on their ability to fix bugs in visual, user-facing JavaScript software. |
John Yang; Carlos E Jimenez; Alex L Zhang; Kilian Lieret; Joyce Yang; Xindi Wu; Ori Press; Niklas Muennighoff; Gabriel Synnaeve; Karthik R Narasimhan; Diyi Yang; Sida Wang; Ofir Press; |
127 | To Code or Not To Code? Exploring Impact of Code in Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we systematically investigate the impact of code data on general performance. |
Viraat Aryabumi; Yixuan Su; Raymond Ma; Adrien Morisot; Ivan Zhang; Acyr Locatelli; Marzieh Fadaee; Ahmet Üstün; Sara Hooker; |
128 | OpenHands: An Open Platform for AI Software Developers As Generalist Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce OpenHands, a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to a human developer: by writing code, interacting with a command line, and browsing the web. |
Xingyao Wang; Boxuan Li; Yufan Song; Frank F. Xu; Xiangru Tang; Mingchen Zhuge; Jiayi Pan; Yueqi Song; Bowen Li; Jaskirat Singh; Hoang H. Tran; Fuqiang Li; Ren Ma; Mingzhang Zheng; Bill Qian; Yanjun Shao; Niklas Muennighoff; Yizhe Zhang; Binyuan Hui; Junyang Lin; Robert Brennan; Hao Peng; Heng Ji; Graham Neubig; |
129 | Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a model merging methodology that addresses the difficulty of fine-tuning Large Language Models (LLMs) for target tasks in non-English languages, where task-specific data is often unavailable. |
Lucas Bandarkar; Benjamin Muller; Pritish Yuvraj; Rui Hou; Nayan Singhal; Hongjiang Lv; Bing Liu; |
130 | Hierarchical World Models As Visual Whole-Body Humanoid Controllers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, reward design, or skill primitives. |
Nicklas Hansen; Jyothir S V; Vlad Sobal; Yann LeCun; Xiaolong Wang; Hao Su; |
131 | MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent multi-image LVLMs have begun to address this need. However, their evaluation has not kept pace with their development. To fill this gap, we introduce the Multimodal Multi-image Understanding (MMIU) benchmark, a comprehensive evaluation suite designed to assess LVLMs across a wide range of multi-image tasks. |
Fanqing Meng; Jin Wang; Chuanhao Li; Quanfeng Lu; Hao Tian; Tianshuo Yang; Jiaqi Liao; Xizhou Zhu; Jifeng Dai; Yu Qiao; Ping Luo; Kaipeng Zhang; Wenqi Shao; |
132 | Bilinear MLPs Enable Weight-based Mechanistic Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we analyze bilinear MLPs, a type of Gated Linear Unit (GLU) without any element-wise nonlinearity that neverthe- less achieves competitive performance. |
Michael T Pearce; Thomas Dooms; Alice Rigg; Jose Oramas; Lee Sharkey; |
133 | Vector-ICL: In-context Learning with Continuous Vector Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large language models (LLMs) have shown remarkable in-context learning (ICL) capabilities on textual data. We explore whether these capabilities can be extended to continuous vectors from diverse domains, obtained from black-box pretrained encoders. |
Yufan Zhuang; Chandan Singh; Liyuan Liu; Jingbo Shang; Jianfeng Gao; |
134 | STAR: Synthesis of Tailored Architectures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new approach for the synthesis of tailored architectures (STAR). |
Armin W Thomas; Rom Parnichkun; Alexander Amini; Stefano Massaroli; Michael Poli; |
135 | Scaling Speech-Text Pre-training with Synthetic Interleaved Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel approach to scaling speech-text pre-training by leveraging large-scale synthetic interleaved data derived from text corpora, eliminating the need for parallel speech-text datasets. |
Aohan Zeng; Zhengxiao Du; Mingdao Liu; Lei Zhang; shengmin jiang; Yuxiao Dong; Jie Tang; |
136 | Diverse Preference Learning for Capabilities and Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This causes the model to overweight majority opinions and sacrifice diversity in exchange for optimal reward. To address this, we propose Soft Preference Learning, which decouples the entropy and cross-entropy terms in the KL penalty — allowing for fine-grained control over LLM generation diversity. |
Stewart Slocum; Asher Parker-Sartori; Dylan Hadfield-Menell; |
137 | Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation. |
Jinheng Xie; Weijia Mao; Zechen Bai; David Junhao Zhang; Weihao Wang; Kevin Qinghong Lin; Yuchao Gu; Zhijie Chen; Zhenheng Yang; Mike Zheng Shou; |
138 | Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hallucinations in large language models are a widespread problem, yet the mechanisms behind whether models will hallucinate are poorly understood, limiting our ability to solve this problem. Using sparse autoencoders as an interpretability tool, we discover that a key part of these mechanisms is entity recognition, where the model detects if an entity is one it can recall facts about. |
Javier Ferrando; Oscar Balcells Obeso; Senthooran Rajamanoharan; Neel Nanda; |
139 | Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Samba, a simple hybrid architecture that layer-wise combines Mamba, a selective State Space Model (SSM), with Sliding Window Attention (SWA). |
Liliang Ren; Yang Liu; Yadong Lu; yelong shen; Chen Liang; Weizhu Chen; |
140 | LongPO: Long Context Self-Evolution of Large Language Models Through Short-to-Long Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This alignment process remains challenging due to the impracticality of human annotation for extended contexts and the difficulty in balancing short- and long-context performance. To address these challenges, we introduce LongPO, that enables short-context LLMs to self-evolve to excel on long-context tasks by internally transferring short-context capabilities. |
Guanzheng Chen; Xin Li; Michael Shieh; Lidong Bing; |
141 | Internet of Agents: Weaving A Web of Heterogeneous Agents for Collaborative Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, these frameworks often rely on hard-coded communication pipelines, limiting their adaptability to dynamic task requirements. Inspired by the concept of the Internet, we propose the Internet of Agents (IoA), a novel framework that addresses these limitations by providing a flexible and scalable platform for LLM-based multi-agent collaboration. |
Weize Chen; Ziming You; Ran Li; yitong guan; Chen Qian; Chenyang Zhao; Cheng Yang; Ruobing Xie; Zhiyuan Liu; Maosong Sun; |
142 | AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses this challenge using a zero-shot approach with a pre-trained diffusion model. Despite this potential, achieving our goals is difficult due to the diffusion model’s lack of understanding of ”where” and ”how” objects interact with the human body. |
Yukang Cao; Liang Pan; Kai Han; Kwan-Yee K. Wong; Ziwei Liu; |
143 | Can In-context Learning Really Generalize to Out-of-distribution Tasks? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the mechanism of in-context learning (ICL) on out-of-distribution (OOD) tasks that were not encountered during training. |
Qixun Wang; Yifei Wang; Xianghua Ying; Yisen Wang; |
144 | LLaVA-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce LLaVA-Interleave, which simultaneously tackles Multi-image, Multi-frame (video), Multi-view (3D), and Multi-patch (single-image) scenarios in LMMs. |
Feng Li; Renrui Zhang; Hao Zhang; Yuanhan Zhang; Bo Li; Wei Li; Zejun MA; Chunyuan Li; |
145 | Dissecting Adversarial Robustness of Multimodal LM Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To systematically examine the robustness of agents, we propose the Agent Robustness Evaluation (ARE) framework. |
Chen Henry Wu; Rishi Rajesh Shah; Jing Yu Koh; Russ Salakhutdinov; Daniel Fried; Aditi Raghunathan; |
146 | On Linear Representations and Pretraining Data Frequency in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the connection between pretraining data frequency and models’ linear representations of factual relations (e.g., mapping France to Paris in a capital prediction task). |
Jack Merullo; Noah A. Smith; Sarah Wiegreffe; Yanai Elazar; |
147 | Towards General-Purpose Model-Free Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. |
Scott Fujimoto; Pierluca D’Oro; Amy Zhang; Yuandong Tian; Michael Rabbat; |
148 | HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, there is still significant potential for improvement in current text-to-image inpainting models, particularly in better aligning the inpainted area with user prompts. Therefore, we introduce $\textit{HD-Painter}$, a $\textbf{training-free}$ approach that $\textbf{accurately follows prompts}$. |
Hayk Manukyan; Andranik Sargsyan; Barsegh Atanyan; Zhangyang Wang; Shant Navasardyan; Humphrey Shi; |
149 | Regulatory DNA Sequence Design with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current CRE design methods are limited by two major drawbacks: (1) they typically rely on iterative optimization strategies that modify existing sequences and are prone to local optima, and (2) they lack the guidance of biological prior knowledge in sequence optimization. In this paper, we address these limitations by proposing a generative approach that leverages reinforcement learning (RL) to fine-tune a pre-trained autoregressive (AR) model. |
Zhao Yang; Bing Su; Chuan Cao; Ji-Rong Wen; |
150 | Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, our imprecise understanding of ground-truth features in realistic scenarios makes it difficult to measure the success of SAEs. To address this challenge, we propose to evaluate SAEs on specific tasks by comparing them to supervised feature dictionaries computed with knowledge of the concepts relevant to the task. |
Aleksandar Makelov; Georg Lange; Neel Nanda; |
151 | Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More broadly, we present 12 results on how (1) training duration, (2) model architecture, (3) quantization, (4) sparsity constraints such as MoE, and (5) data signal-to-noise ratio affect a model’s knowledge storage capacity. |
Zeyuan Allen-Zhu; Yuanzhi Li; |
152 | GRNAde: Geometric Deep Learning for 3D RNA Inverse Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. |
Chaitanya K. Joshi; Arian Rokkum Jamasb; Ramon Viñas Torné; Charles Harris; Simon V Mathis; Alex Morehead; Rishabh Anand; Pietro Lio; |
153 | Physics of Language Models: Part 3.2, Knowledge Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, their performance in inverse knowledge search is virtually 0\%, regardless of the prompts. Our primary contribution is a \emph{controlled, synthetic experiment} that confirms these weaknesses are \emph{inherent} to language models: they cannot efficiently manipulate knowledge from pre-training data, even when such knowledge is perfectly stored in the models, despite adequate training and sufficient model size. |
Zeyuan Allen-Zhu; Yuanzhi Li; |
154 | Diffusion Models Are Real-Time Game Engines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present GameNGen, the first game engine powered entirely by a neural model that also enables real-time interaction with a complex environment over long trajectories at high quality. |
Dani Valevski; Yaniv Leviathan; Moab Arar; Shlomi Fruchter; |
155 | Dynamic Multimodal Evaluation with Flexible Complexity By Vision-Language Bootstrapping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these benchmarks keep a static nature and overlap with the pre-training data, resulting in fixed complexity constraints and data contamination issues. This raises the concern regarding the validity of the evaluation. To address these two challenges, we introduce a dynamic multimodal evaluation protocol called Vision-Language Bootstrapping (VLB). |
Yue Yang; Shuibo Zhang; Kaipeng Zhang; Yi Bin; Yu Wang; Ping Luo; Wenqi Shao; |
156 | Investigating The Pre-Training Dynamics of In-Context Learning: Task Recognition Vs. Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Further analysis of common pre-training factors (i.e., model size, dataset size, and data curriculum) demonstrates possible ways to regulate the competition. Based on these insights, we propose a simple yet effective method to better integrate these two abilities for ICL at inference time. |
Xiaolei Wang; Xinyu Tang; Junyi Li; Xin Zhao; Ji-Rong Wen; |
157 | Does Refusal Training in LLMs Generalize to The Past Tense? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We reveal a curious generalization gap in the current refusal training approaches: simply reformulating a harmful request in the past tense (e.g., *How to make a Molotov cocktail?* to *How did people make a Molotov cocktail?*) is often sufficient to jailbreak many state-of-the-art LLMs. |
Maksym Andriushchenko; Nicolas Flammarion; |
158 | Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this way, we achieve 100\% attack success rate—according to GPT-4 as a judge—on Vicuna-13B, Mistral-7B, Phi-3-Mini, Nemotron-4-340B, Llama-2-Chat-7B/13B/70B, Llama-3-Instruct-8B, Gemma-7B, GPT-3.5, GPT-4o, and R2D2 from HarmBench that was adversarially trained against the GCG attack. |
Maksym Andriushchenko; Francesco Croce; Nicolas Flammarion; |
159 | LLMs Know More Than They Show: On The Intrinsic Representation of LLM Hallucinations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that the internal representations of LLMs encode much more information about truthfulness than previously recognized. |
Hadas Orgad; Michael Toker; Zorik Gekhman; Roi Reichart; Idan Szpektor; Hadas Kotek; Yonatan Belinkov; |
160 | Don’t Flatten, Tokenize! Unlocking The Key to SoftMoE’s Efficacy in Deep RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While soft mixtures of experts (SoftMoEs) have recently shown promise in mitigating this issue for online RL, the reasons behind their effectiveness remain largely unknown. In this work we provide an in-depth analysis identifying the key factors driving this performance gain. |
Ghada Sokar; Johan Samir Obando Ceron; Aaron Courville; Hugo Larochelle; Pablo Samuel Castro; |
161 | The Last Iterate Advantage: Empirical Auditing and Principled Heuristic Analysis of Differentially Private SGD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple heuristic privacy analysis of noisy clipped stochastic gradient descent (DP-SGD) in the setting where only the last iterate is released and the intermediate iterates remain hidden. |
Milad Nasr; Thomas Steinke; Borja Balle; Christopher A. Choquette-Choo; Arun Ganesh; Matthew Jagielski; Jamie Hayes; Abhradeep Guha Thakurta; Adam Smith; Andreas Terzis; |
162 | Neuron Based Personality Trait Induction in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models (LLMs) have become increasingly proficient at simulating various personality traits, an important capability for supporting related applications (e.g., role-playing). To further improve this capacity, in this paper, we present a neuron based approach for personality trait induction in LLMs, with three major technical contributions. |
Jia Deng; Tianyi Tang; Yanbin Yin; Wenhao yang; Xin Zhao; Ji-Rong Wen; |
163 | Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explain the process of information retrieval with a causal graph and discover that PLM-based retrievers learn perplexity features for relevance estimation, causing source bias by ranking the documents with low perplexity higher. |
Haoyu Wang; Sunhao Dai; Haiyuan Zhao; Liang Pang; Xiao Zhang; Gang Wang; Zhenhua Dong; Jun Xu; Ji-Rong Wen; |
164 | Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We take an initial step towards exploring this security issue in a specific but realistic multi-objective alignment case, where there may be some alignment targets conflicting with each other (e.g., helpfulness v.s. harmlessness). We aim to explore whether, in such cases, strong models might deliberately make mistakes in areas known to them but unknown to weak models within one alignment dimension, in exchange for a higher reward in another dimension. |
Wenkai Yang; Shiqi Shen; Guangyao Shen; Wei Yao; Yong Liu; Gong Zhi; Yankai Lin; Ji-Rong Wen; |
165 | ACE: All-round Creator and Editor Following Instructions Via Diffusion Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose ACE, an All-round Creator and Editor, which achieves comparable performance compared to those expert models in a wide range of visual generation tasks. |
Zhen Han; Zeyinzi Jiang; Yulin Pan; Jingfeng Zhang; Chaojie Mao; Chen-Wei Xie; Yu Liu; Jingren Zhou; |
166 | Uncertainty and Influence Aware Reward Model Refinement for Reinforcement Learning from Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, reusing the policy interaction samples becomes a possible way to further refine the reward model. To tackle these challenges, we introduce a novel method \textbf{U}ncertainty-\textbf{G}radient based \textbf{D}ata \textbf{A}ugmentation (\textbf{UGDA} for short) to enhance reward modeling by leveraging policy samples to maintain on-distribution performance. |
Zexu Sun; Yiju Guo; Yankai Lin; Xu Chen; Qi Qi; Xing Tang; xiuqiang He; Ji-Rong Wen; |
167 | Measuring Non-Adversarial Reproduction of Training Data in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate an intermediate regime of memorization that we call non-adversarial reproduction, where we quantify the overlap between model responses and pretraining data when responding to natural and benign prompts. |
Michael Aerni; Javier Rando; Edoardo Debenedetti; Nicholas Carlini; Daphne Ippolito; Florian Tramèr; |
168 | Rotated Runtime Smooth: Training-Free Activation Smoother for Accurate INT4 Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Rotated Runtime Smooth (**RRS**), a plug-and-play activation smoother for quantization, consisting of Runtime Smooth and the Rotation operation. |
Ke Yi; Zengke Liu; jianwei zhang; Chengyuan Li; Tong Zhang; Junyang Lin; Jingren Zhou; |
169 | NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks. |
Jaden Fried Fiotto-Kaufman; Alexander Russell Loftus; Eric Todd; Jannik Brinkmann; Koyena Pal; Dmitrii Troitskii; Michael Ripa; Adam Belfki; Can Rager; Caden Juang; Aaron Mueller; Samuel Marks; Arnab Sen Sharma; Francesca Lucchetti; Nikhil Prakash; Carla E. Brodley; Arjun Guha; Jonathan Bell; Byron C Wallace; David Bau; |
170 | Cut Your Losses in Large-Vocabulary Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Cut Cross-Entropy (CCE), a method that computes the cross-entropy loss without materializing the logits for all tokens into global memory. |
Erik Wijmans; Brody Huval; Alexander Hertzberg; Vladlen Koltun; Philipp Kraehenbuehl; |
171 | Exploring The Design Space of Visual Context Representation in Video MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the design space for visual context representation, and aim to improve the performance of video MLLMs by finding more effective representation schemes. |
Yifan Du; Yuqi Huo; Kun Zhou; Zijia Zhao; Haoyu Lu; Han Huang; Xin Zhao; Bingning Wang; weipeng chen; Ji-Rong Wen; |
172 | VideoPhy: Evaluating Physical Commonsense for Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities (e.g. marbles will roll down when placed on a slanted surface). |
Hritik Bansal; Zongyu Lin; Tianyi Xie; Zeshun Zong; Michal Yarom; Yonatan Bitton; Chenfanfu Jiang; Yizhou Sun; Kai-Wei Chang; Aditya Grover; |
173 | Neural Spacetimes for DAG Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a class of trainable deep learning-based geometries called Neural SpaceTimes (NSTs), which can universally represent nodes in weighted Directed Acyclic Graphs (DAGs) as events in a spacetime manifold. |
Haitz Sáez de Ocáriz Borde; Anastasis Kratsios; Marc T. Law; Xiaowen Dong; Michael M. Bronstein; |
174 | Aligning Language Models with Demonstrated Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number ($<10$) of demonstrations as feedback. |
Omar Shaikh; Michelle S. Lam; Joey Hejna; Yijia Shao; Hyundong Justin Cho; Michael S. Bernstein; Diyi Yang; |
175 | The Superposition of Diffusion Models Using The Itô Density Estimator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we cast the problem of combining multiple pre-trained diffusion models at the generation stage under a novel proposed framework termed superposition. |
Marta Skreta; Lazar Atanackovic; Joey Bose; Alexander Tong; Kirill Neklyudov; |
176 | SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Stable Video 4D (SV4D) — a latent video diffusion model for multi-frame and multi-view consistent dynamic 3D content generation. |
Yiming Xie; Chun-Han Yao; Vikram Voleti; Huaizu Jiang; Varun Jampani; |
177 | Unbounded: A Generative Infinite Game of Character Life Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the concept of a generative infinite game, a video game that transcends the traditional boundaries of finite, hard-coded systems by using generative models. |
Jialu Li; Yuanzhen Li; Neal Wadhwa; Yael Pritch; David E. Jacobs; Michael Rubinstein; Mohit Bansal; Nataniel Ruiz; |
178 | Data Mixing Laws: Optimizing Data Mixtures By Predicting Language Modeling Performance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While existing endeavors rely on heuristics or qualitative strategies to tune the proportions, we discover the quantitative predictability of model performance regarding the mixture proportions in function forms, which we refer to as the data mixing laws. |
Jiasheng Ye; Peiju Liu; Tianxiang Sun; Jun Zhan; Yunhua Zhou; Xipeng Qiu; |
179 | NextBestPath: Efficient 3D Mapping of Unseen Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, existing indoor datasets are insufficient due to limited geometric complexity and inaccurate ground truth meshes. To overcome these limitations, we introduce a novel dataset AiMDoom with a map generator for the Doom video game, enabling to better benchmark active 3D mapping in diverse indoor environments. |
Shiyao Li; Antoine Guedon; Clémentin Boittiaux; Shizhe Chen; Vincent Lepetit; |
180 | MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a novel evaluation paradigm for Large Language Models (LLMs) that compels them to transition from a traditional question-answering role, akin to a student, to a solution-scoring role, akin to a teacher.To prove our point, we applied our paradigm to GSM8K dataset and developed the MR-GSM8K benchmark. |
Zhongshen Zeng; Pengguang Chen; Shu Liu; Haiyun Jiang; Jiaya Jia; |
181 | Sparse Autoencoders Reveal Selective Remapping of Visual Concepts During Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work provides a concrete framework to train and use SAEs for Vision Transformers and provides insights into explaining adaptation mechanisms. |
Hyesu Lim; Jinho Choi; Jaegul Choo; Steffen Schneider; |
182 | ConcreTizer: Model Inversion Attack Via Occupancy Classification and Dispersion Control for 3D Point Cloud Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our analysis reveals the unique challenges, the inherent sparsity of 3D point clouds and the ambiguity between empty and non-empty voxels after voxelization, which are further exacerbated by the dispersion of non-empty voxels across feature extractor layers. To address these challenges, we introduce ConcreTizer, a simple yet effective model inversion attack designed specifically for voxel-based 3D point cloud data. |
Youngseok Kim; Sunwook Hwang; Hyung-Sin Kim; Saewoong Bahk; |
183 | Smaller, Weaker, Yet Better: Training LLM Reasoners Via Compute-Optimal Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Training on high-quality synthetic data from strong language models (LMs) is a common strategy to improve the reasoning performance of LMs. In this work, we revisit whether this strategy is compute-optimal under a fixed inference budget (e.g., FLOPs). |
Hritik Bansal; Arian Hosseini; Rishabh Agarwal; Vinh Q. Tran; Mehran Kazemi; |
184 | SegLLM: Multi-round Reasoning Segmentation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SegLLM, a novel multi-round interactive reasoning segmentation model that enhances LLM-based segmentation by exploiting conversational memory of both visual and textual outputs. |
XuDong Wang; Shaolun Zhang; Shufan Li; Kehan Li; Konstantinos Kallidromitis; Yusuke Kato; Kazuki Kozuka; Trevor Darrell; |
185 | 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to manipulate multi-entity 3D motions in video generation.To address the lack of suitable training data, we construct a 360-Motion Dataset, which first correlates collected 3D human and animal assets with GPT-generated trajectory and then captures their motion with 12 evenly-surround cameras on diverse 3D UE platforms. |
Xiao FU; Xian Liu; Xintao Wang; Sida Peng; Menghan Xia; Xiaoyu Shi; Ziyang Yuan; Pengfei Wan; Di ZHANG; Dahua Lin; |
186 | Not All Language Model Features Are One-Dimensionally Linear Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We begin by developing a rigorous definition of irreducible multi-dimensional features based on whether they can be decomposed into either independent or non-co-occurring lower-dimensional features. Motivated by these definitions, we design a scalable method that uses sparse autoencoders to automatically find multi-dimensional features in GPT-2 and Mistral 7B. |
Joshua Engels; Eric J Michaud; Isaac Liao; Wes Gurnee; Max Tegmark; |
187 | VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose to tame video transformers for 3D camera control using a ControlNet-like conditioning mechanism that incorporates spatiotemporal camera embeddings based on Plucker coordinates. |
Sherwin Bahmani; Ivan Skorokhodov; Aliaksandr Siarohin; Willi Menapace; Guocheng Qian; Michael Vasilkovsky; Hsin-Ying Lee; Chaoyang Wang; Jiaxu Zou; Andrea Tagliasacchi; David B. Lindell; Sergey Tulyakov; |
188 | Trajectory Attention for Fine-grained Video Motion Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces trajectory attention, a novel approach that performs attention along available pixel trajectories for fine-grained camera motion control. |
Zeqi Xiao; Wenqi Ouyang; Yifan Zhou; Shuai Yang; Lei Yang; Jianlou Si; Xingang Pan; |
189 | OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With the goal of creating a high-quality finetuning (SFT) dataset for math reasoning, we conduct careful ablation experiments on data synthesis using the recently released Llama3.1 family of models.Based on these insights, we create the OpenMathInstruct-2 dataset which consists of 14M question-solution pairs (≈ 600K unique questions), making it nearly eight times larger than the previous largest open-source math reasoning dataset.Finally, to accelerate the open-source efforts, we release the code, the finetuned models, and the OpenMathInstruct-2 dataset under a commercially permissive license. |
Shubham Toshniwal; Wei Du; Ivan Moshkov; Branislav Kisacanin; Alexan Ayrapetyan; Igor Gitman; |
190 | How Does Vision-Language Adaptation Impact The Safety of Vision Language Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, our findings demonstrate that the objectives of VL adaptation and safety tuning are divergent, which often results in their simultaneous application being suboptimal. To address this, we suggest the weight merging approach as an optimal solution effectively reducing safety degradation while maintaining helpfulness. |
Seongyun Lee; Geewook Kim; Jiyeon Kim; Hyunji Lee; Hoyeon Chang; Sue Hyun Park; Minjoon Seo; |
191 | APE: Faster and Longer Context-Augmented Generation Via Adaptive Parallel Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable effective and efficient CAG, we propose Adaptive Parallel Encoding (**APE**), which brings shared prefix, attention temperature, and scaling factor to align the distribution of parallel encoding with sequential encoding. |
Xinyu Yang; Tianqi Chen; Beidi Chen; |
192 | IFORMER: INTEGRATING CONVNET AND TRANSFORMER FOR MOBILE APPLICATION Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new family of mobile hybrid vision networks, called iFormer, with a focus on optimizing latency and accuracy on mobile applications. |
Chuanyang Zheng; |
193 | MagicPIG: LSH Sampling for Efficient LLM Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To make the sampling-based approximation practical in LLM generation, we propose MagicPIG, a heterogeneous system based on Locality Sensitive Hashing (LSH). |
Zhuoming Chen; Ranajoy Sadhukhan; Zihao Ye; Yang Zhou; Jianyu Zhang; Niklas Nolte; Yuandong Tian; Matthijs Douze; Leon Bottou; Zhihao Jia; Beidi Chen; |
194 | Data Selection Via Optimal Control for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We formulate data selection as a generalized Optimal Control problem, which can be solved theoretically by Pontryagin’s Maximum Principle (PMP), yielding a set of necessary conditions that characterize the relationship between optimal data selection and LM training dynamics. Based on these theoretical results, we introduce **P**MP-based **D**ata **S**election (**PDS**), a framework that approximates optimal data selection by solving the PMP conditions. |
Yuxian Gu; Li Dong; Hongning Wang; Yaru Hao; Qingxiu Dong; Furu Wei; Minlie Huang; |
195 | MiniPLM: Knowledge Distillation for Pre-training Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose **MiniPLM**, a KD framework for pre-training LMs by refining the training data distribution with the teacher LM’s knowledge. |
Yuxian Gu; Hao Zhou; Fandong Meng; Jie Zhou; Minlie Huang; |
196 | EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, web tasks, such as booking flights, usually involve users’ personally identifiable information (PII), which may be exposed to potential privacy risks if web agents accidentally interact with compromised websites—a scenario that remains largely unexplored in the literature. In this work, we narrow this gap by conducting the first study on the privacy risks of generalist web agents in adversarial environments. |
Zeyi Liao; Lingbo Mo; Chejian Xu; Mintong Kang; Jiawei Zhang; Chaowei Xiao; Yuan Tian; Bo Li; Huan Sun; |
197 | Speculative Knowledge Distillation: Bridging The Teacher-Student Gap Through Interleaved Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conversely, on-policy KD, which uses student-generated samples for training, can suffer from low-quality training examples with which teacher models are not familiar, resulting in inaccurate teacher feedback. To address these limitations, we introduce Speculative Knowledge Distillation (SKD), a novel approach that leverages cooperation between student and teacher models to generate high-quality training data on-the-fly while aligning with the student’s inference-time distribution. |
Wenda Xu; Rujun Han; Zifeng Wang; Long Le; Dhruv Madeka; Lei Li; William Yang Wang; Rishabh Agarwal; Chen-Yu Lee; Tomas Pfister; |
198 | Iterative Label Refinement Matters More Than Preference Optimization Under Weak Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We find that in the presence of unreliable supervision, SFT still retains some effectiveness, but DPO (a common RLHF algorithm) fails to improve the model beyond SFT. To address this, we propose *iterative label refinement* (ILR) as an alternative to RLHF. |
Yaowen Ye; Cassidy Laidlaw; Jacob Steinhardt; |
199 | Bundle Neural Network for Message Diffusion on Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite being a strong inductive bias, the local message passing mechanism faces challenges such as over-smoothing, over-squashing, and limited expressivity. To address these issues, we introduce Bundle Neural Networks (BuNNs), a novel graph neural network architecture that operates via *message diffusion* on *flat vector bundles* — geometrically inspired structures that assign to each node a vector space and an orthogonal map. |
Jacob Bamberger; Federico Barbero; Xiaowen Dong; Michael M. Bronstein; |
200 | IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations. |
Zhibing Li; Tong Wu; Jing Tan; Mengchen Zhang; Jiaqi Wang; Dahua Lin; |
201 | Learning to Contextualize Web Pages for Enhanced Decision Making By LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce LCoW, a framework for Learning language models to Contextualize complex Web pages into a more comprehensible form, thereby enhancing decision making by LLM agents. |
Dongjun Lee; Juyong Lee; Kyuyoung Kim; Jihoon Tack; Jinwoo Shin; Yee Whye Teh; Kimin Lee; |
202 | DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, the dense BEV representation adopted by existing methods brings computational challenges for long-range perception and long-term temporal fusion. To address these challenges, we present DriveTransformer, a simplified E2E-AD framework for the ease of scaling up, characterized by three key features: Task Parallelism (All agent, map, and planning queries direct interact with each other at each block), Sparse Representation (Task queries direct interact with raw sensor features), and Streaming Processing (Task queries are stored and passed as history information). |
Xiaosong Jia; Junqi You; Zhiyuan Zhang; Junchi Yan; |
203 | B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the critical factors underlying the mechanism of these self-improving methods remain poorly understood, such as under what conditions self-improvement is effective, and what are the bottlenecks in the current iterations. In this work, we identify and propose methods to monitor two pivotal factors in this iterative process: (1) the model’s ability to explore and generate high-quality responses among multiple candidates (exploration); and (2) the reliability of external rewards in selecting the best responses from the generated outputs (exploitation). |
Weihao Zeng; Yuzhen Huang; Lulu Zhao; Yijun Wang; Zifei Shan; Junxian He; |
204 | Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel, controllable, and scalable captioning pipeline designed to generate diverse caption formats tailored to various multimodal models. |
Zhengfeng Lai; Vasileios Saveris; Chen Chen; Hong-You Chen; Haotian Zhang; Bowen Zhang; Wenze Hu; Juan Lao Tebar; Zhe Gan; Peter Grasch; Meng Cao; Yinfei Yang; |
205 | Simple, Good, Fast: Self-Supervised World Models Free of Baggage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces SGF, a Simple, Good, and Fast world model that uses self-supervised representation learning, captures short-time dependencies through frame and action stacking, and enhances robustness against model errors through data augmentation. |
Jan Robine; Marc Höftmann; Stefan Harmeling; |
206 | Permute-and-Flip: An Optimally Stable and Watermarkable Decoder for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new decoding method called Permute-and-Flip (PF) decoder. |
Xuandong Zhao; Lei Li; Yu-Xiang Wang; |
207 | BOND: Aligning LLMs with Best-of-N Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, a surprisingly simple and strong inference-time strategy is Best-of-N sampling that selects the best generation among N candidates. In this paper, we propose Best-of-N Distillation (BOND), a novel RLHF algorithm that seeks to emulate Best-of-N but without its significant computational overhead at inference time. |
Pier Giuseppe Sessa; Robert Dadashi-Tazehozi; Leonard Hussenot; Johan Ferret; Nino Vieillard; Alexandre Rame; Bobak Shahriari; Sarah Perrin; Abram L. Friesen; Geoffrey Cideron; Sertan Girgin; Piotr Stanczyk; Andrea Michi; Danila Sinopalnikov; Sabela Ramos Garea; Amélie Héliou; Aliaksei Severyn; Matthew Hoffman; Nikola Momchev; Olivier Bachem; |
208 | Not-So-Optimal Transport Flows for 3D Point Cloud Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One of the key properties of point clouds is their permutation invariance, i.e., changing the order of points in a point cloud does not change the shape they represent. In this paper, we analyze the recently proposed equivariant OT flows that learn permutation invariant generative models for point-based molecular data and we show that these models scale poorly on large point clouds. |
Ka-Hei Hui; Chao Liu; Xiaohui Zeng; Chi-Wing Fu; Arash Vahdat; |
209 | Sparse Autoencoders Do Not Find Canonical Units of Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To train meta-SAEs we introduce BatchTopK SAEs, an improved variant of the popular TopK SAE method, that only enforces a fixed average sparsity. |
Patrick Leask; Bart Bussmann; Michael T Pearce; Joseph Isaac Bloom; Curt Tigges; Noura Al Moubayed; Lee Sharkey; Neel Nanda; |
210 | Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose FiFA, a novel automated data filtering algorithm designed to enhance the fine-tuning of diffusion models using human feedback datasets with direct preference optimization (DPO). |
Yongjin Yang; Sihyeon Kim; Hojung Jung; Sangmin Bae; SangMook Kim; Se-Young Yun; Kimin Lee; |
211 | AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To facilitate research on LLM agent misuse, we propose a new benchmark called AgentHarm. |
Maksym Andriushchenko; Alexandra Souly; Mateusz Dziemian; Derek Duenas; Maxwell Lin; Justin Wang; Dan Hendrycks; Andy Zou; J Zico Kolter; Matt Fredrikson; Yarin Gal; Xander Davies; |
212 | Machine Unlearning Fails to Remove Data Poisoning Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to precisely characterize unlearning efficacy, we introduce new evaluation metrics for unlearning based on data poisoning. |
Martin Pawelczyk; Jimmy Z. Di; Yiwei Lu; Gautam Kamath; Ayush Sekhari; Seth Neel; |
213 | Denoising Autoregressive Transformers for Scalable Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DART, a transformer-based model that unifies autoregressive (AR) and diffusion within a non-Markovian framework. |
Jiatao Gu; Yuyang Wang; Yizhe Zhang; Qihang Zhang; Dinghuai Zhang; Navdeep Jaitly; Joshua M. Susskind; Shuangfei Zhai; |
214 | CoTFormer: A Chain of Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the same time—regardless of the model size—task-specific techniques continue to play a pivotal role in achieving optimal downstream performance. One of these techniques, called Chain-of-Thought (CoT), is particularly interesting since, as we point out in this work, it resembles employing a deeper transformer through re-applying the model multiple times. |
Amirkeivan Mohtashami; Matteo Pagliardini; Martin Jaggi; |
215 | Biologically Plausible Brain Graph Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel Biologically Plausible Brain Graph Transformer (BioBGT) that encodes the small-world architecture inherent in brain graphs. |
Ciyuan Peng; Yuelong Huang; Qichao Dong; Shuo Yu; Feng Xia; Chengqi Zhang; Yaochu Jin; |
216 | Consistency Checks for Language Model Forecasters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new, general consistency metric based on *arbitrage*: for example, if a forecasting AI illogically predicts that both the Democratic and Republican parties have 60\% probability of winning the 2024 US presidential election, an arbitrageur could trade against the forecaster’s predictions and make a profit.We then build a standard, proper-scoring-rule forecasting benchmark, and show that our (instantaneous) consistency metrics correlate strongly with LLM forecasters’ ground truth Brier scores (which are only known in the future).We also release a consistency benchmark that resolves in 2028, providing a long-term evaluation tool for forecasting. |
Daniel Paleka; Abhimanyu Pallavi Sudhir; Alejandro Alvarez; Vineeth Bhat; Adam Shen; Evan Wang; Florian Tramèr; |
217 | Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, several protection tools against style mimicry have been developed that incorporate small adversarial perturbations into artworks published online. In this work, we evaluate the effectiveness of popular protections—with millions of downloads—and show they only provide a false sense of security. |
Robert Hönig; Javier Rando; Nicholas Carlini; Florian Tramèr; |
218 | Adversarial Search Engine Optimization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce _Preference Manipulation Attacks_, a new class of attacks that manipulate an LLM’s selections to favor the attacker. |
Fredrik Nestaas; Edoardo Debenedetti; Florian Tramèr; |
219 | MagicDec: Breaking The Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We leverage draft model with sparse KV cache to address the KV bottleneck, which scales with both sequence length and batch size. |
Ranajoy Sadhukhan; Jian Chen; Zhuoming Chen; Vashisth Tiwari; Ruihang Lai; Jinyuan Shi; Ian En-Hsu Yen; Avner May; Tianqi Chen; Beidi Chen; |
220 | A Little Goes A Long Way: Efficient Long Context Training and Inference with Partial Contexts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper argues that integrating length extension with a GPU-friendly KV cache reduction architecture not only reduces training overhead during length extension, but also achieves better long-context performance. |
Suyu Ge; Xihui Lin; Yunan Zhang; Jiawei Han; Hao Peng; |
221 | CR-CTC: Consistency Regularization on CTC for Improved Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the Consistency-Regularized CTC (CR-CTC), which enforces consistency between two CTC distributions obtained from different augmented views of the input speech mel-spectrogram. |
Zengwei Yao; Wei Kang; Xiaoyu Yang; Fangjun Kuang; Liyong Guo; Han Zhu; Zengrui Jin; Zhaoqing Li; Long Lin; Daniel Povey; |
222 | Model Merging with SVD to Tie The Knots Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We hypothesize that improving this alignment is key to obtaining better LoRA model merges, and propose KnOTS to address this problem. |
George Stoica; Pratik Ramesh; Boglarka Ecsedi; Leshem Choshen; Judy Hoffman; |
223 | WizardMath: Empowering Mathematical Reasoning for Large Language Models Via Reinforced Evol-Instruct Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of LLMs, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. |
Haipeng Luo; Qingfeng Sun; Can Xu; Pu Zhao; Jian-Guang Lou; Chongyang Tao; Xiubo Geng; Qingwei Lin; Shifeng Chen; Yansong Tang; Dongmei Zhang; |
224 | System 1.x: Learning to Balance Fast and Slow Planning with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose the System-1.x Planner, a framework for controllable planning with language models that is capable of generating hybrid plans and balancing between the two planning modes based on the difficulty of the problem at hand. |
Swarnadeep Saha; Archiki Prasad; Justin Chen; Peter Hase; Elias Stengel-Eskin; Mohit Bansal; |
225 | MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This research aims to comprehensively explore building a multimodal foundation model for egocentric video understanding. |
Hanrong Ye; Haotian Zhang; Erik Daxberger; Lin Chen; Zongyu Lin; Yanghao Li; Bowen Zhang; Haoxuan You; Dan Xu; Zhe Gan; Jiasen Lu; Yinfei Yang; |
226 | Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these studies often rely on real-world data that LLMs may have encountered during pre-training or employ anonymization techniques that can inadvertently introduce factual inconsistencies. In this work, we address these limitations by introducing novel synthetic datasets specifically designed to assess LLM temporal reasoning abilities in various scenarios. |
Bahare Fatemi; Mehran Kazemi; Anton Tsitsulin; Karishma Malkan; Jinyeong Yim; John Palowitch; Sungyong Seo; Jonathan Halcrow; Bryan Perozzi; |
227 | Group Ligands Docking to Protein Pockets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the biochemical observation that ligands binding to the same target protein tend to adopt similar poses, we propose \textsc{GroupBind}, a novel molecular docking framework that simultaneously considers multiple ligands docking to a protein. |
Jiaqi Guan; Jiahan Li; Xiangxin Zhou; Xingang Peng; Sheng Wang; Yunan Luo; Jian Peng; Jianzhu Ma; |
228 | ReSi: A Comprehensive Benchmark for Representational Similarity Measures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents the first comprehensive benchmark for evaluating representational similarity measures based on well-defined groundings of similarity. |
Max Klabunde; Tassilo Wald; Tobias Schumacher; Klaus Maier-Hein; Markus Strohmaier; Florian Lemmerich; |
229 | Improving Instruction-Following in Language Models Through Activation Steering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The ability to follow instructions is crucial for numerous real-world applications of language models. In pursuit of deeper insights and more powerful capabilities, we derive instruction-specific vector representations from language models and use them to steer models accordingly. |
Alessandro Stolfo; Vidhisha Balachandran; Safoora Yousefi; Eric Horvitz; Besmira Nushi; |
230 | MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce MIA-Bench, a benchmark designed to assess MLLMs’ ability to strictly adhere to complex instructions.Additionally, we create extra training data and explore supervised fine-tuning and direct preference optimization to enhance the models’ ability to strictly follow instructions without compromising performance on other tasks. |
Yusu Qian; Hanrong Ye; Jean-Philippe Fauconnier; Peter Grasch; Yinfei Yang; Zhe Gan; |
231 | OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent T2V methods have focused on vision transformers, using a simple cross attention module for video generation, which falls short of making full use of semantic information from text tokens. To address these issues, we introduce OpenVid-1M, a precise high-quality dataset with expressive captions. |
Kepan Nan; Rui Xie; Penghao Zhou; Tiehan Fan; Zhenheng Yang; Zhijie Chen; Xiang Li; Jian Yang; Ying Tai; |
232 | OmnixR: Evaluating Omni-modality Language Models on Reasoning Across Modalities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce \textbf{OmnixR}, an evaluation suite designed to benchmark state-of-the-art Omni-modality Language Models (OLMs), such as GPT-4o and Gemini. |
Lichang Chen; Hexiang Hu; Mingda Zhang; Yiwen Chen; Zifeng Wang; YANDONG LI; Pranav Shyam; Tianyi Zhou; Heng Huang; Ming-Hsuan Yang; Boqing Gong; |
233 | LICO: Large Language Models for In-Context Molecular Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce LICO, a general-purpose model that extends arbitrary base LLMs for black-box optimization, with a particular application to the molecular domain. |
Tung Nguyen; Aditya Grover; |
234 | AI As Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models Via Systematic Attribution of Machine Text Against Web Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CREATIVITY INDEX as the first step to quantify the linguistic creativity of a text by reconstructing it from existing text snippets on the web. |
Ximing Lu; Melanie Sclar; Skyler Hallinan; Niloofar Mireshghallah; Jiacheng Liu; Seungju Han; Allyson Ettinger; Liwei Jiang; Khyathi Chandu; Nouha Dziri; Yejin Choi; |
235 | Mitigate The Gap: Improving Cross-Modal Alignment in CLIP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose AlignCLIP, in order to improve the alignment between text and image embeddings, and thereby reduce the modality gap. |
Sedigheh Eslami; Gerard de Melo; |
236 | Transformers Struggle to Learn to Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It is unknown whether this inability is due to a lack of data, insufficient model parameters, or fundamental limitations of the transformer architecture. In this work, we use the foundational graph connectivity problem as a testbed to generate effectively limitless high-coverage data to train small transformers and test whether they can learn to perform search. |
Abulhair Saparov; Srushti Ajay Pawar; Shreyas Pimpalgaonkar; Nitish Joshi; Richard Yuanzhe Pang; Vishakh Padmakumar; Mehran Kazemi; Najoung Kim; He He; |
237 | OGBench: Benchmarking Offline Goal-Conditioned RL Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose OGBench, a new, high-quality benchmark for algorithms research in offline goal-conditioned RL. |
Seohong Park; Kevin Frans; Benjamin Eysenbach; Sergey Levine; |
238 | DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose DiffusionGuard, a robust and effective defense method against unauthorized edits by diffusion-based image editing models, even in challenging setups.Finally, we introduce a comprehensive benchmark designed to evaluate the effectiveness and robustness of methods in protecting against privacy threats in realistic scenarios. |
June Suk Choi; Kyungmin Lee; Jongheon Jeong; Saining Xie; Jinwoo Shin; Kimin Lee; |
239 | Scaling Autonomous Agents Via Automatic Reward Modeling And Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address LLM agents’ limitations, we propose a framework that can automatically learn a reward model from the environment without human annotations. |
Zhenfang Chen; Delin Chen; Rui Sun; Wenjun Liu; Chuang Gan; |
240 | The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Modern language models can process inputs across diverse languages and modalities. We hypothesize that models acquire this capability through learning a _shared representation space_ across heterogeneous data types (e.g., different languages and modalities), which places semantically similar inputs near one another, even if they are from different modalities/languages. |
Zhaofeng Wu; Xinyan Velocity Yu; Dani Yogatama; Jiasen Lu; Yoon Kim; |
241 | ControlAR: Controllable Image Generation with Autoregressive Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce ControlAR, an efficient and effective framework for integrating spatial controls into autoregressive image generation models. |
Zongming Li; Tianheng Cheng; Shoufa Chen; Peize Sun; Haocheng Shen; Longjin Ran; Xiaoxin Chen; Wenyu Liu; Xinggang Wang; |
242 | The Geometry of Categorical and Hierarchical Concepts in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show how to extend the formalization of the linear representation hypothesis to represent features (e.g., is_animal) as _vectors_. |
Kiho Park; Yo Joong Choe; Yibo Jiang; Victor Veitch; |
243 | Learning to Engineer Protein Flexibility Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our contributions are three-fold. First, we perform a comprehensive comparison of methods for evaluating protein flexibility and identify relevant data for learning. Second, we overcome the data scarcity issue by leveraging a pre-trained protein language model. We design and train flexibility predictors utilizing either only sequential or both sequential and structural information on the input. Third, we introduce a method for fine-tuning a protein inverse folding model to make it steerable toward desired flexibility at specified regions. |
Petr Kouba; Joan Planas-Iglesias; Jiri Damborsky; Jiri Sedlar; Stanislav Mazurenko; Josef Sivic; |
244 | What’s The Move? Hybrid Imitation Learning Via Salient Points Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce **SPHINX**: **S**alient **P**oint-based **H**ybrid **I**mitatio**N** and e**X**ecution, a flexible IL policy that leverages multimodal observations (point clouds and wrist images), along with a hybrid action space of low-frequency, sparse waypoints and high-frequency, dense end effector movements. |
Priya Sundaresan; Hengyuan Hu; Quan Vuong; Jeannette Bohg; Dorsa Sadigh; |
245 | KAN: Kolmogorov–Arnold Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). |
Ziming Liu; Yixuan Wang; Sachin Vaidya; Fabian Ruehle; James Halverson; Marin Soljacic; Thomas Y. Hou; Max Tegmark; |
246 | Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, there has been active research aimed at improving reasoning accuracy, particularly by using pretrained language models to self-correct” their mistakes via multi-round prompting. In this paper, we follow this line of work but focus on understanding the usefulness of incorporating “error-correction” data directly into the pretraining stage. |
Tian Ye; Zicheng Xu; Yuanzhi Li; Zeyuan Allen-Zhu; |
247 | Physics of Language Models: Part 2.1, Grade-School Math and The Hidden Reasoning Process Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent advances in language models have demonstrated their capability to solve mathematical reasoning problems, achieving near-perfect accuracy on grade-school level math benchmarks like GSM8K. In this paper, we formally study how language models solve these problems. |
Tian Ye; Zicheng Xu; Yuanzhi Li; Zeyuan Allen-Zhu; |
248 | Fine-Grained Verifiers: Preference Modeling As Next-token Prediction in Vision-Language Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose FiSAO (Fine-Grained Self-Alignment Optimization), a novel self-alignment method that utilizes the model’s own visual encoder as a fine-grained verifier to improve vision-language alignment without the need for additional data. |
Chenhang Cui; An Zhang; Yiyang Zhou; Zhaorun Chen; Gelei Deng; Huaxiu Yao; Tat-Seng Chua; |
249 | Presto! Distilling Steps and Layers for Accelerating Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce steps, we develop a new score-based distribution matching distillation (DMD) method for the EDM-family of diffusion models, the first GAN-based distillation method for TTM. |
Zachary Novack; Ge Zhu; Jonah Casebeer; Julian McAuley; Taylor Berg-Kirkpatrick; Nicholas J. Bryan; |
250 | KBLaM: Knowledge Base Augmented Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Knowledge Base augmented Language Model (KBLAM), a new method for augmenting Large Language Models (LLMs) with external knowledge. |
Xi Wang; Taketomo Isazawa; Liana Mikaelyan; James Hensman; |
251 | DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present DailyDilemmas, a dataset of 1,360 moral dilemmas encountered in everyday life. |
Yu Ying Chiu; Liwei Jiang; Yejin Choi; |
252 | SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current large language model (LLM)-based software agents often follow linear, sequential processes that prevent backtracking and exploration of alternative solutions, limiting their ability to rethink their strategies when initial approaches prove ineffective. To address these challenges, we propose SWE-Search, a multi-agent framework that integrates Monte Carlo Tree Search (MCTS) with a self-improvement mechanism to enhance software agents’ performance on repository-level software tasks. |
Antonis Antoniades; Albert Örwall; Kexun Zhang; Yuxi Xie; Anirudh Goyal; William Yang Wang; |
253 | A Sanity Check for AI-generated Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we conduct a sanity check on whether the task of AI-generated image detection has been solved.To start with, we present Chameleon dataset, consisting of AI-generated images that are genuinely challenging for human perception. |
Shilin Yan; Ouxiang Li; Jiayin Cai; Yanbin Hao; Xiaolong Jiang; Yao Hu; Weidi Xie; |
254 | Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHR Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work highlights the potential for using long-context architectures to model EHR data, and offers a case study on how to identify and quantify new challenges in modeling sequential data motivated by domains outside of natural language. |
Michael Wornow; Suhana Bedi; Miguel Angel Fuentes Hernandez; Ethan Steinberg; Jason Alan Fries; Christopher Re; Sanmi Koyejo; Nigam Shah; |
255 | Decision Tree Induction Through LLMs Via Semantically-Aware Evolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current tree induction methods often face limitations such as suboptimal solutions from greedy methods or prohibitive computational costs and limited applicability of exact optimization approaches. To address these challenges, we propose an evolutionary optimization method for decision tree induction based on genetic programming (GP). |
Tennison Liu; Nicolas Huynh; Mihaela van der Schaar; |
256 | Benchmarking Vision Language Model Unlearning Via Fictitious Facial Identity Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, with the increasing integration of visual data, privacy concerns in Vision Language Models (VLMs) remain underexplored. To address this, we introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms under the Right to be Forgotten setting. |
Yingzi Ma; Jiongxiao Wang; Fei Wang; Siyuan Ma; Jiazhao Li; Jinsheng Pan; Xiujun Li; Furong Huang; Lichao Sun; Bo Li; Yejin Choi; Muhao Chen; Chaowei Xiao; |
257 | Tell Me About Yourself: LLMs Are Aware of Their Learned Behaviors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study *behavioral self-awareness*, which we define as an LLM’s capability to articulate its behavioral policies without relying on in-context examples. |
Jan Betley; Xuchan Bao; Martín Soto; Anna Sztyber-Betley; James Chua; Owain Evans; |
258 | Looking Inward: Language Models Can Learn About Themselves By Introspection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate LLMs predicting properties of their own behavior in hypothetical situations. |
Felix Jedidja Binder; James Chua; Tomek Korbak; Henry Sleight; John Hughes; Robert Long; Ethan Perez; Miles Turpin; Owain Evans; |
259 | Anyprefer: An Agentic Framework for Preference Data Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent methods often adopt a self-rewarding approach, where the target model generates and annotates its own preference data, but this can lead to inaccuracies since the reward model shares weights with the target model, thereby amplifying inherent biases. To address these issues, we propose Anyprefer, a framework designed to synthesize high-quality preference data for aligning the target model. |
Yiyang Zhou; Zhaoyang Wang; Tianle Wang; Shangyu Xing; Peng Xia; Bo Li; Kaiyuan Zheng; Zijian Zhang; Zhaorun Chen; Wenhao Zheng; Xuchao Zhang; Chetan Bansal; Weitong Zhang; Ying Wei; Mohit Bansal; Huaxiu Yao; |
260 | Image and Video Tokenization with Binary Spherical Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new transformer-based image and video tokenizer with Binary Spherical Quantization (BSQ). |
Yue Zhao; Yuanjun Xiong; Philipp Kraehenbuehl; |
261 | A Decade’s Battle on Dataset Bias: Are We There Yet? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We revisit the “dataset classification” experiment suggested by Torralba & Efros (2011) a decade ago, in the new era with large-scale, diverse, and hopefully less biased datasets as well as more capable neural network architectures. |
Zhuang Liu; Kaiming He; |
262 | Rational Decision-Making Agent with Learning Internal Utility Judgment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For genuine autonomous decision-making for LLM-based agents, it is imperative to develop rationality from their posterior experiences to judge the utility of each decision independently. In this work, we propose RaDAgent (Rational Decision-Making Agent), which fosters the development of its rationality through an iterative framework involving Experience Exploration and Utility Learning. |
Yining Ye; Xin Cong; Shizuo Tian; Yujia Qin; Chong Liu; Yankai Lin; Zhiyuan Liu; Maosong Sun; |
263 | Interactive Speculative Planning: Enhance Agent Efficiency Through Co-design of System and User Interface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given that inefficiency in service provision can undermine the value of automation for users, this paper presents a human-centered efficient agent planning method – Interactive Speculative Planning – aiming at enhancing the efficiency of agent planning through both system design and user interaction. |
Wenyue Hua; Mengting Wan; JAGANNATH SHASHANK SUBRAMANYA SAI VADREVU; Ryan Nadel; Yongfeng Zhang; Chi Wang; |
264 | GROOT-2: Weakly Supervised Multimodal Instruction Following Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While augmenting the dataset with instruction labels can mitigate this issue, acquiring such high-quality annotations at scale is impractical. To address this issue, we frame the problem as a semi-supervised learning task and introduce \agent, a multimodal instructable agent trained using a novel approach that combines weak supervision with latent variable models. |
Shaofei Cai; Bowei Zhang; Zihao Wang; Haowei Lin; Xiaojian Ma; Anji Liu; Yitao Liang; |
265 | Scaling Large Language Model-based Multi-Agent Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the neural scaling law—increasing neurons enhances performance, this study explores whether the continuous addition of collaborative agents can yield similar benefits. |
Chen Qian; Zihao Xie; YiFei Wang; Wei Liu; Kunlun Zhu; Hanchen Xia; Yufan Dang; Zhuoyun Du; Weize Chen; Cheng Yang; Zhiyuan Liu; Maosong Sun; |
266 | COMBO: Compositional World Models for Embodied Multi-Agent Cooperation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the problem of embodied multi-agent cooperation, where decentralized agents must cooperate given only egocentric views of the world. |
Hongxin Zhang; Zeyuan Wang; Qiushi Lyu; Zheyuan Zhang; Sunli Chen; Tianmin Shu; Behzad Dariush; Kwonjoon Lee; Yilun Du; Chuang Gan; |
267 | VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current RAG systems are solely based on text, rendering it impossible to utilize vision information like layout and images that play crucial roles in real-world multi-modality documents. In this paper, we introduce VisRAG, which tackles this issue by establishing a vision-language model (VLM)-based RAG pipeline. |
Shi Yu; Chaoyue Tang; Bokai Xu; Junbo Cui; Junhao Ran; Yukun Yan; Zhenghao Liu; Shuo Wang; Xu Han; Zhiyuan Liu; Maosong Sun; |
268 | Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Its key components include: 1) using the linear interpolating diffusion form of flow-matching, 2) employing $\boldsymbol v$-prediction, and 3) performing rectification (a.k.a. reflow). In this paper, we argue that the success of rectification primarily lies in using a pretrained diffusion model to obtain matched pairs of noise and samples, followed by retraining with these matched noise-sample pairs. |
Fu-Yun Wang; Ling Yang; Zhaoyang Huang; Mengdi Wang; Hongsheng Li; |
269 | Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we propose a straightforward but consistently effective approach that involves training a model specifically attuned to negative preferences. |
Fu-Yun Wang; Yunhao Shui; Jingtan Piao; Keqiang Sun; Hongsheng Li; |
270 | HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Distributed alignment search (DAS) leverages supervision from counterfactual data to learn concept features within hidden states, but DAS assumes we can afford to conduct a brute force search over potential feature locations. To address this, we present HyperDAS, a transformer-based hypernetwork architecture that (1) automatically locates the token-positions of the residual stream that a concept is realized in and (2) learns features of those residual stream vectors for the concept. |
Jiuding Sun; Jing Huang; Sidharth Baskaran; Karel D’Oosterlinck; Christopher Potts; Michael Sklar; Atticus Geiger; |
271 | An Undetectable Watermark for Generative Image Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the first undetectable watermarking scheme for generative image models. |
Sam Gunn; Xuandong Zhao; Dawn Song; |
272 | Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the single vector identified for a concept varies with both data and training, making it less robust and weakening its effectiveness in real-world applications. To address this challenge, we propose an approach to approximate the subspace representing a specific concept. |
Haiyan Zhao; Heng Zhao; Bo Shen; Ali Payani; Fan Yang; Mengnan Du; |
273 | GaussianAnything: Interactive Point Cloud Flow Matching for 3D Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing methods show promise, they face challenges in input formats, latent space structures, and output representations. This paper introduces a novel 3D generation framework that addresses these issues, enabling scalable and high-quality 3D generation with an interactive Point Cloud-structured Latent space. |
Yushi LAN; Shangchen Zhou; Zhaoyang Lyu; Fangzhou Hong; Shuai Yang; Bo Dai; Xingang Pan; Chen Change Loy; |
274 | Generative Classifiers Avoid Shortcut Solutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Discriminative approaches to classification often learn shortcuts that hold in-distribution but fail even under minor distribution shift. This failure mode stems from an overreliance on features that are spuriously correlated with the label. We show that generative classifiers, which use class-conditional generative models, can avoid this issue by modeling all features, both core and spurious, instead of mainly spurious ones. |
Alexander Cong Li; Ananya Kumar; Deepak Pathak; |
275 | Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to take a holistic approach to the construction of discrete generative models based on continuous-time Markov chains, and for the first time, allow the use of arbitrary discrete probability paths, or colloquially, corruption processes. |
Neta Shaul; Itai Gat; Marton Havasi; Daniel Severo; Anuroop Sriram; Peter Holderrieth; Brian Karrer; Yaron Lipman; Ricky T. Q. Chen; |
276 | NV-Embed: Improved Techniques for Training LLMs As Generalist Embedding Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the NV-Embed model, incorporating architectural designs, training procedures, and curated datasets to significantly enhance the performance of LLM as a versatile embedding model, while maintaining its simplicity and reproducibility. |
Chankyu Lee; Rajarshi Roy; Mengyao Xu; Jonathan Raiman; Mohammad Shoeybi; Bryan Catanzaro; Wei Ping; |
277 | Generalization V.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To effectively capture task-specific pretraining data frequency, we propose a novel task-gram language model, which is built by counting the co-occurrence of semantically related $n$-gram pairs from task inputs and outputs in the pretraining corpus. |
Xinyi Wang; Antonis Antoniades; Yanai Elazar; Alfonso Amayuelas; Alon Albalak; Kexun Zhang; William Yang Wang; |
278 | MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present MM1.5, a new family of multimodal large language models (MLLMs) designed to enhance capabilities in text-rich image understanding, visual referring and grounding, and multi-image reasoning. |
Haotian Zhang; Mingfei Gao; Zhe Gan; Philipp Dufter; Nina Wenzel; Forrest Huang; Dhruti Shah; Xianzhi Du; Bowen Zhang; Yanghao Li; Sam Dodge; Keen You; Zhen Yang; Aleksei Timofeev; Mingze Xu; Hong-You Chen; Jean-Philippe Fauconnier; Zhengfeng Lai; Haoxuan You; Zirui Wang; Afshin Dehghan; Peter Grasch; Yinfei Yang; |
279 | T2V-Turbo-v2: Enhancing Video Model Post-Training Through Data, Reward, and Conditional Guidance Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on enhancing a diffusion-based text-to-video (T2V) model during the post-training phase by distilling a highly capable consistency model from a pretrained T2V model. |
Jiachen Li; Qian Long; Jian Zheng; Xiaofeng Gao; Robinson Piramuthu; Wenhu Chen; William Yang Wang; |
280 | Your Weak LLM Is Secretly A Strong Teacher for Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a systematic study to evaluate and understand weak LLM’s ability to generate feedback for alignment. |
Leitian Tao; Yixuan Li; |
281 | Can Watermarks Be Used to Detect LLM IP Infringement For Free? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the potential of LLM watermarks for detecting model infringement.To demonstrate the effectiveness of this approach, we construct a challenging model set containing multiple suspect LLMs on which direct detection methods struggle to yield effective results. |
Zhengyue Zhao; Xiaogeng Liu; Somesh Jha; Patrick McDaniel; Bo Li; Chaowei Xiao; |
282 | Speculative RAG: Enhancing Retrieval Augmented Generation Through Drafting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Speculative RAG – a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM. |
Zilong Wang; Zifeng Wang; Long Le; Steven Zheng; Swaroop Mishra; Vincent Perot; Yuwei Zhang; Anush Mattapalli; Ankur Taly; Jingbo Shang; Chen-Yu Lee; Tomas Pfister; |
283 | CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel method for generating 360° panoramas from text prompts or images. |
Nikolai Kalischek; Michael Oechsle; Fabian Manhardt; Philipp Henzler; Konrad Schindler; Federico Tombari; |
284 | M^3PC: Test-time Model Predictive Control Using Pretrained Masked Trajectory Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this information has not been fully exploited during the inference phase, where the agent needs to generate an optimal policy instead of just reconstructing masked components from unmasked. Given that a pretrained trajectory model can act as both a Policy Model and a World Model with appropriate mask patterns, we propose using Model Predictive Control (MPC) at test time to leverage the model’s own predictive capacity to guide its action selection. |
Kehan Wen; Yutong Hu; Yao Mu; Lei Ke; |
285 | NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the evaluation of these models’ long-context abilities remains a challenge due to the limitations of current benchmarks. To address this gap, we introduce NovelQA, a benchmark tailored for evaluating LLMs with complex, extended narratives. |
Cunxiang Wang; Ruoxi Ning; Boqi Pan; Tonghui Wu; Qipeng Guo; Cheng Deng; Guangsheng Bao; Xiangkun Hu; Zheng Zhang; Qian Wang; Yue Zhang; |
286 | AFlow: Automating Agentic Workflow Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this challenge, we reformulate workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. We introduce AFLOW, an automated framework that efficiently explores this space using Monte Carlo Tree Search, iteratively refining workflows through code modification, tree-structured experience, and execution feedback. |
Jiayi Zhang; Jinyu Xiang; Zhaoyang Yu; Fengwei Teng; Xiong-Hui Chen; Jiaqi Chen; Mingchen Zhuge; Xin Cheng; Sirui Hong; Jinlin Wang; Bingnan Zheng; Bang Liu; Yuyu Luo; Chenglin Wu; |
287 | SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate design choices for creating a fast, accurate automated safety evaluator. |
Tinghao Xie; Xiangyu Qi; Yi Zeng; Yangsibo Huang; Udari Madhushani Sehwag; Kaixuan Huang; Luxi He; Boyi Wei; Dacheng Li; Ying Sheng; Ruoxi Jia; Bo Li; Kai Li; Danqi Chen; Peter Henderson; Prateek Mittal; |
288 | Physics-Informed Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a framework that unifies generative modeling and partial differential equation fulfillment by introducing a first-principle-based loss term that enforces generated samples to fulfill the underlying physical constraints. |
Jan-Hendrik Bastek; WaiChing Sun; Dennis Kochmann; |
289 | Personalized Visual Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Personalized Visual Instruction Tuning (PVIT), a novel data curation and training framework designed to enable MLLMs to identify target individuals within an image and engage in personalized and coherent dialogues.To evaluate the personalized potential of MLLMs, we present a benchmark called P-Bench, which encompasses various question types with different levels of difficulty. |
Renjie Pi; Jianshu Zhang; Tianyang Han; Jipeng Zhang; Rui Pan; Tong Zhang; |
290 | MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel method for generating mathematical code accompanied with corresponding reasoning steps for continued pretraining. |
Zimu Lu; Aojun Zhou; Ke Wang; Houxing Ren; Weikang Shi; Junting Pan; Mingjie Zhan; Hongsheng Li; |
291 | Privacy Auditing of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our method can be used to provide a privacy audit of $\varepsilon \approx 1$ for a model trained with theoretical $\varepsilon$ of 4. |
Ashwinee Panda; Xinyu Tang; Christopher A. Choquette-Choo; Milad Nasr; Prateek Mittal; |
292 | TIGeR: Unifying Text-to-Image Generation and Retrieval with Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we rethink the relationship between text-to-image generation and retrieval, proposing a *unified* framework for both tasks with one single Large Multimodal Model (LMM). |
Leigang Qu; Haochuan Li; Tan Wang; Wenjie Wang; Yongqi Li; Liqiang Nie; Tat-Seng Chua; |
293 | Representational Similarity Via Interpretable Visual Concepts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an interpretable representational similarity method (RSVC) to compare two networks. |
Neehar Kondapaneni; Oisin Mac Aodha; Pietro Perona; |
294 | MMDT: Decoding The Trustworthiness and Safety of Multimodal Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first unified platform, MMDT (Multimodal DecodingTrust), designed to provide a comprehensive safety and trustworthiness evaluation for MMFMs. |
Chejian Xu; Jiawei Zhang; Zhaorun Chen; Chulin Xie; Mintong Kang; Yujin Potter; Zhun Wang; Zhuowen Yuan; Alexander Xiong; Zidi Xiong; Chenhui Zhang; Lingzhi Yuan; Yi Zeng; Peiyang Xu; Chengquan Guo; Andy Zhou; Jeffrey Ziwei Tan; Xuandong Zhao; Francesco Pinto; Zhen Xiang; Yu Gai; Zinan Lin; Dan Hendrycks; Bo Li; Dawn Song; |
295 | SC-OmniGS: Self-Calibrating Omnidirectional Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SC-OmniGS, a novel self-calibrating omnidirectional Gaussian splatting system for fast and accurate omnidirectional radiance field reconstruction using 360-degree images. |
Huajian Huang; Yingshu Chen; Longwei Li; Hui Cheng; Tristan Braud; Yajie Zhao; Sai-Kit Yeung; |
296 | Eliciting Human Preferences with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: But selecting examples or writing prompts can be challenging—especially in tasks that require users to precisely articulate nebulous preferences or reason about complex edge cases. For such tasks, we introduce **Generative Active Task Elicitation (GATE)**, a method for using *LMs themselves* to guide the task specification process. |
Belinda Z. Li; Alex Tamkin; Noah Goodman; Jacob Andreas; |
297 | Mixture of Parrots: Experts Improve Memorization More Than Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that as we increase the number of experts (while fixing the number of active parameters), the memorization performance consistently increases while the reasoning capabilities saturate. |
Samy Jelassi; Clara Mohri; David Brandfonbrener; Alex Gu; Nikhil Vyas; Nikhil Anand; David Alvarez-Melis; Yuanzhi Li; Sham M. Kakade; eran malach; |
298 | Sail Into The Headwind: Alignment Via Robust Rewards and Dynamic Labels Against Reward Hacking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate reward hacking in offline preference optimization, which aims to improve an initial model using a preference dataset. |
Paria Rashidinejad; Yuandong Tian; |
299 | GSM-Symbolic: Understanding The Limitations of Mathematical Reasoning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome the limitations of existing evaluations, we introduce GSM-Symbolic, an improved benchmark created from symbolic templates that allow for the generation of a diverse set of questions. |
Seyed Iman Mirzadeh; Keivan Alizadeh; Hooman Shahrokhi; Oncel Tuzel; Samy Bengio; Mehrdad Farajtabar; |
300 | Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A key challenge in TVR is the information asymmetry between video and text: videos are inherently richer in information, while their textual descriptions often capture only fragments of this complexity. This paper introduces a novel, data-centric framework to bridge this gap by enriching textual representations to better match the richness of video content. |
Zechen Bai; Tianjun Xiao; Tong He; Pichao WANG; Zheng Zhang; Thomas Brox; Mike Zheng Shou; |
301 | $\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional KV Cache eviction strategies, which discard less critical KV-pairs based on attention scores, often degrade generation quality, leading to issues such as context loss or hallucinations. To address this, we introduce **D**ynamic **D**iscriminative **O**perations ($\mathbf{D_2 O}$), a novel method that optimizes KV cache size dynamically and discriminatively at two levels without fine-tuning, while preserving essential context. |
Zhongwei Wan; Xinjian Wu; Yu Zhang; Yi Xin; Chaofan Tao; Zhihong Zhu; Xin Wang; Siqi Luo; Jing Xiong; Longyue Wang; Mi Zhang; |
302 | From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs By Finetuning on Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent studies have shown that Large Language Models (LLMs) struggle to accurately retrieve information and maintain reasoning capabilities when processing long-context inputs. To address these limitations, we propose a finetuning approach utilizing a carefully designed synthetic dataset comprising numerical key-value retrieval tasks. |
Zheyang Xiong; Vasilis Papageorgiou; Kangwook Lee; Dimitris Papailiopoulos; |
303 | Can Watermarked LLMs Be Identified By Users Via Crafted Prompts? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Finally, we propose that the key to enhancing the imperceptibility of watermarked LLMs is to increase the randomness of watermark key selection. Based on this, we introduce the Water-Bag strategy, which significantly improves watermark imperceptibility by merging multiple watermark keys. |
Aiwei Liu; Sheng Guan; Yiming Liu; Leyi Pan; Yifei Zhang; Liancheng Fang; Lijie Wen; Philip S. Yu; Xuming Hu; |
304 | BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce BiGR, a novel conditional image generation model using compact binary latent codes for generative training, focusing on enhancing both generation and representation capabilities. |
Shaozhe Hao; Xuantong LIU; Xianbiao Qi; Shihao Zhao; Bojia Zi; Rong Xiao; Kai Han; Kwan-Yee K. Wong; |
305 | Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a flexible and probabilistically principled alternative, we propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate *diverse* and *effective* attack prompts. |
Seanie Lee; Minsu Kim; Lynn Cherif; David Dobre; Juho Lee; Sung Ju Hwang; Kenji Kawaguchi; Gauthier Gidel; Yoshua Bengio; Nikolay Malkin; Moksh Jain; |
306 | Deconstructing Denoising Diffusion Models for Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we examine the representation learning abilities of Denoising Diffusion Models (DDM) that were originally purposed for image generation. |
Xinlei Chen; Zhuang Liu; Saining Xie; Kaiming He; |
307 | Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates a basic question in reinforcement learning from human feedback (RLHF) from a theoretical perspective: how to efficiently explore in an online manner under preference feedback and general function approximation. We take the initial step towards a theoretical understanding of this problem by proposing a novel algorithm, *Exploratory Preference Optimization* (XPO). |
Tengyang Xie; Dylan J Foster; Akshay Krishnamurthy; Corby Rosset; Ahmed Hassan Awadallah; Alexander Rakhlin; |
308 | Improved Training Technique for Latent Consistency Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we analyze the statistical differences between pixel and latent spaces, discovering that latent data often contains highly impulsive outliers, which significantly degrade the performance of iCT in the latent space. |
Quan Dao; Khanh Doan; Di Liu; Trung Le; Dimitris N. Metaxas; |
309 | ReAttention: Training-Free Infinite Context with Finite Attention Scope Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose \textbf{ReAttention}, a training-free approach enabling LLM based on the self-attention mechanism to support an infinite context with a finite attention scope under sufficient memory resources. |
Xiaoran Liu; Ruixiao Li; Zhigeng Liu; Qipeng Guo; Yuerong Song; Kai Lv; Hang Yan; Linlin Li; Qun Liu; Xipeng Qiu; |
310 | When Prompt Engineering Meets Software Engineering: CNL-P As Natural and Robust APIs” for Human-AI Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve prompt quality, best practices for prompt engineering (PE) have been developed, including writing guidelines and templates. Building on this, we propose Controlled NL for Prompt (CNL-P), which not only incorporates PE best practices but also draws on key principles from software engineering (SE). |
Zhenchang Xing; Yang Liu; Zhuo Cheng; Qing Huang; Dehai Zhao; Daniel SUN; Chenhua Liu; |
311 | Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Illumination degradation image restoration (IDIR) techniques aim to improve the visibility of degraded images and mitigate the adverse effects of deteriorated illumination. Among these algorithms, diffusion-based models (DM) have shown promising performance but are often burdened by heavy computational demands and pixel misalignment issues when predicting the image-level distribution. To tackle these problems, we propose to leverage DM within a compact latent space to generate concise guidance priors and introduce a novel solution called Reti-Diff for the IDIR task. |
Chunming He; Chengyu Fang; Yulun Zhang; Longxiang Tang; Jinfa Huang; Kai Li; Zhenhua Guo; Xiu Li; Sina Farsiu; |
312 | Beyond Autoregression: Fast LLMs Via Self-Distillation Through Time Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we demonstrate that diffusion language models are capable of generating at least 32 tokens simultaneously, while exceeding the performance of AR models in text quality and on the LAMBADA natural language understanding benchmark. |
Justin Deschenaux; Caglar Gulcehre; |
313 | Mitigating Object Hallucination in MLLMs Via Data-augmented Phrase-level Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address object hallucinations in MLLMs, where information is generated about an object not present in the input image.To fine-tune MLLMs with DPA, we first generate a set of ‘hallucinated’ and ‘correct’ response pairs through generative data augmentation by selectively altering the ground-truth information of the correct responses at a phrase level. |
Pritam Sarkar; Sayna Ebrahimi; Ali Etemad; Ahmad Beirami; Sercan O Arik; Tomas Pfister; |
314 | Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the theory, we introduce WSD-S, a variant of WSD that reuses previous checkpoints’ decay phases and keeps only one main branch, where we resume from a decayed checkpoint. |
Kaiyue Wen; Zhiyuan Li; Jason S. Wang; David Leo Wright Hall; Percy Liang; Tengyu Ma; |
315 | Robust Conformal Prediction with A Single Binary Certificate Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a robust conformal prediction that produces smaller sets even with significantly lower MC samples (e.g. 150 for CIFAR10). |
Soroush H. Zargarbashi; Aleksandar Bojchevski; |
316 | ADIFF: Explaining Audio Difference Using Natural Language Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address these limitations, we propose ADIFF, which introduces a cross-projection module, position captioning, and a three-step training process to enhance the model’s ability to produce detailed explanations.This paper stands out as the first work to comprehensively study the task of explaining audio differences and then propose benchmark, baselines for the task.First, we present two new datasets for audio difference explanation derived from the AudioCaps and Clotho audio captioning datasets. |
Soham Deshmukh; Shuo Han; Rita Singh; Bhiksha Raj; |
317 | MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MovieDreamer, a novel hierarchical framework that integrates the strengths of autoregressive models with diffusion-based rendering to pioneer long-duration video generation with intricate plot progressions and high visual fidelity. |
Canyu Zhao; Mingyu Liu; Wen Wang; Weihua Chen; Fan Wang; Hao Chen; Bo Zhang; Chunhua Shen; |
318 | ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, existing approaches that fuse temporally forward and backward paths in parallel often suffer from off-manifold issues, leading to artifacts or requiring multiple iterative re-noising steps. In this work, we introduce a novel, bidirectional sampling strategy to address these off-manifold issues without requiring extensive re-noising or fine-tuning. |
Serin Yang; Taesung Kwon; Jong Chul Ye; |
319 | RNNs Are Not Transformers (Yet): The Key Bottleneck on In-Context Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We aim to understand whether RNNs can match the performance of Transformers, particularly when enhanced with Chain-of-Thought (CoT) prompting. |
Kaiyue Wen; Xingyu Dang; Kaifeng Lyu; |
320 | Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study how to subvert large language models (LLMs) from following prompt-specified rules. |
Anton Xue; Avishree Khare; Rajeev Alur; Surbhi Goel; Eric Wong; |
321 | HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200,000 edits. |
Mude Hui; Siwei Yang; Bingchen Zhao; Yichun Shi; Heng Wang; Peng Wang; Cihang Xie; Yuyin Zhou; |
322 | Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions. |
Yaxi Lu; Shenzhi Yang; Cheng Qian; Guirong Chen; Qinyu Luo; Yesai Wu; Huadong Wang; Xin Cong; Zhong Zhang; Yankai Lin; Weiwen Liu; Yasheng Wang; Zhiyuan Liu; Fangming Liu; Maosong Sun; |
323 | WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this limitation, we present WorkflowLLM, a data-centric framework elaborately designed to enhance the capability of LLMs in workflow orchestration.Specifically, the construction process can be divided into three phases: (1) Data Collection: we collect real-world workflow data from Apple Shortcuts and RoutineHub, transcribing them into Python-style code. |
Shengda Fan; Xin Cong; Yuepeng Fu; Zhong Zhang; Shuyan Zhang; Yuanwei Liu; Yesai Wu; Yankai Lin; Zhiyuan Liu; Maosong Sun; |
324 | Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we ask: how do successful models perform formatted MCQA? |
Sarah Wiegreffe; Oyvind Tafjord; Yonatan Belinkov; Hannaneh Hajishirzi; Ashish Sabharwal; |
325 | Small-to-Large Generalization: Training Data Influences Models Consistently Across Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that small- and large-scale language model predictions (generally) *do* highly correlate across choice of training data. |
Alaa Khaddaj; Logan Engstrom; Aleksander Madry; |
326 | Gated Delta Networks: Improving Mamba2 with Delta Rule Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe that these mechanisms are complementary—gating enables rapid memory erasure while the delta rule facilitates targeted updates. Building on this insight, we introduce the gated delta rule and develop a parallel training algorithm optimized for modern hardware. |
Songlin Yang; Jan Kautz; Ali Hatamizadeh; |
327 | What Matters When Repurposing Diffusion Models for General Dense Perception Tasks? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we conduct a thorough investigation into critical factors that affect transfer efficiency and performance when using diffusion priors. |
Guangkai Xu; Yongtao Ge; Mingyu Liu; Chengxiang Fan; Kangyang Xie; Zhiyue Zhao; Hao Chen; Chunhua Shen; |
328 | Benchmarking Agentic Workflow Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures. |
Shuofei Qiao; Runnan Fang; Zhisong Qiu; Xiaobin Wang; Ningyu Zhang; Yong Jiang; Pengjun Xie; Fei Huang; Huajun Chen; |
329 | SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096$\times$4096 resolution. |
Enze Xie; Junsong Chen; Junyu Chen; Han Cai; Haotian Tang; Yujun Lin; Zhekai Zhang; Muyang Li; Ligeng Zhu; Yao Lu; Song Han; |
330 | DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present effective algorithms for both approaches, demonstrating our model’s versatility and superior performance in various motion synthesis tasks. |
Kaifeng Zhao; Gen Li; Siyu Tang; |
331 | Lift Your Molecules: Molecular Graph Generation in Latent Euclidean Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new framework for 2D molecular graph generation using 3D molecule generative models. |
Mohamed Amine Ketata; Nicholas Gao; Johanna Sommer; Tom Wollschläger; Stephan Günnemann; |
332 | UniDrive: Towards Universal Driving Perception Across Camera Configurations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present UniDrive, a novel framework for vision-centric autonomous driving to achieve universal perception across camera configurations.To evaluate the effectiveness of our framework, we collect a dataset on CARLA by driving the same routes while only modifying the camera configurations. |
Ye Li; Wenzhao Zheng; Xiaonan Huang; Kurt Keutzer; |
333 | One Step Diffusion Via Shortcut Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Shortcut Models, a family of generative models that use a single network and training phase to produce high-quality samples in a single or multiple sampling steps. |
Kevin Frans; Danijar Hafner; Sergey Levine; Pieter Abbeel; |
334 | Graph Sparsification Via Mixture of Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Mixture-of-Graphs (MoG), leveraging the concept of Mixture-of-Experts (MoE), to dynamically select tailored pruning solutions for each node. |
Guibin Zhang; Xiangguo Sun; Yanwei Yue; Chonghe Jiang; Kun Wang; Tianlong Chen; Shirui Pan; |
335 | LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce LDAdam, a memory-efficient optimizer for training large models, that performs adaptive optimization steps within lower dimensional subspaces, while consistently exploring the full parameter space during training. |
Thomas Robert; Mher Safaryan; Ionut-Vlad Modoranu; Dan Alistarh; |
336 | EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we propose EditRoom, a unified framework capable of executing a variety of layout edits through natural language commands, without requiring manual intervention. |
Kaizhi Zheng; Xiaotong Chen; Xuehai He; Jing Gu; Linjie Li; Zhengyuan Yang; Kevin Lin; Jianfeng Wang; Lijuan Wang; Xin Eric Wang; |
337 | WebRL: Training LLM Web Agents Via Self-Evolving Online Curriculum Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces WebRL, a novel self-evolving online curriculum reinforcement learning framework designed to train high-performance web agents using open LLMs. |
Zehan Qi; Xiao Liu; Iat Long Iong; Hanyu Lai; Xueqiao Sun; Jiadai Sun; Xinyue Yang; Yu Yang; Shuntian Yao; Wei Xu; Jie Tang; Yuxiao Dong; |
338 | Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Without mitigating such gaps, diffusion for perception still struggles on tasks represented by multi-modal understanding (e.g., referring image segmentation). Motivated by these challenges, we analyze and improve the alignment between the generative diffusion process and perception objectives centering around the key observation: how perception quality evolves with the denoising process. |
Ziqi Pang; Xin Xu; Yu-Xiong Wang; |
339 | TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose that the optimal data for DPO has equal expected rewards for each token in winning and losing responses, as there is no difference in token importance. |
Aiwei Liu; Haoping Bai; Zhiyun Lu; Yanchao Sun; Xiang Kong; Xiaoming Simon Wang; Jiulong Shan; Albin Madappally Jose; Xiaojiang Liu; Lijie Wen; Philip S. Yu; Meng Cao; |
340 | World Model on Million-Length Video And Language With Blockwise RingAttention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Enabling long-context understanding remains a key challenge in scaling existing sequence models — a crucial component in developing generally intelligent models that can process and operate over long temporal horizons that potentially consist of millions of tokens. In this paper, we aim to address these challenges by providing a comprehensive exploration of the full development process for producing 1M context language models and video-language models, setting new benchmarks in language retrieval and new capabilities in long video understanding. |
Hao Liu; Wilson Yan; Matei Zaharia; Pieter Abbeel; |
341 | Personalized Representation from Personalized Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we explore a potential connection between these ideas, and formalize the challenge of using personalized synthetic data to learn personalized representations, which encode knowledge about an object of interest and may be flexibly applied to any downstream task relating to the target object. We introduce an evaluation suite for this challenge, including reformulations of two existing datasets and a novel dataset explicitly constructed for this purpose, and propose a contrastive learning approach that makes creative use of image generators. |
Shobhita Sundaram; Julia Chae; Yonglong Tian; Sara Beery; Phillip Isola; |
342 | Towards Neural Scaling Laws for Time Series Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we examine two common TSFM architectures—encoder-only and decoder-only Transformers—and investigate their scaling behavior on both ID and OOD data. |
Qingren Yao; Chao-Han Huck Yang; Renhe Jiang; Yuxuan Liang; Ming Jin; Shirui Pan; |
343 | Better Instruction-Following Through Minimum Bayes Risk Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that MBR decoding with reference-based LLM judges substantially improves over greedy decoding, best-of-N decoding with reference-free judges and MBR decoding with lexical and embedding-based metrics on AlpacaEval and MT-Bench. |
Ian Wu; Patrick Fernandes; Amanda Bertsch; Seungone Kim; Sina Khoshfetrat Pakazad; Graham Neubig; |
344 | AgentStudio: A Toolkit for Building General Virtual Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, current evaluations lack in-depth analyses that decompose fundamental agent capabilities. We introduce AgentStudio, a trinity of environments, tools, and benchmarks to address these issues. |
Longtao Zheng; Zhiyuan Huang; Zhenghai Xue; Xinrun Wang; Bo An; Shuicheng YAN; |
345 | Interaction Asymmetry: A General Principle for Learning Composable Abstractions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the principle of interaction asymmetry which states: Parts of the same concept have more complex interactions than parts of different concepts. |
Jack Brady; Julius von Kügelgen; Sebastien Lachapelle; Simon Buchholz; Thomas Kipf; Wieland Brendel; |
346 | ONLINE EPSILON NET & PIERCING SET FOR GEOMETRIC CONCEPTS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first deterministic online algorithm with an optimal competitive ratio for intervals in $\mathbb{R}$. |
Sujoy Bhore; Devdan Dey; Satyam Singh; |
347 | Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we cast reward fine-tuning as stochastic optimal control (SOC). |
Carles Domingo-Enrich; Michal Drozdzal; Brian Karrer; Ricky T. Q. Chen; |
348 | Denoising As Adaptation: Noise-Space Domain Adaptation for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that it is possible to perform domain adaptation via the noise space using diffusion models. |
Kang Liao; Zongsheng Yue; Zhouxia Wang; Chen Change Loy; |
349 | AdaIR: Adaptive All-in-One Image Restoration Via Frequency Mining and Modulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most methods purely operate in the spatial domain and do not delve into the distinct frequency variations inherent to different degradation types. To address this gap, we propose an adaptive all-in-one image restoration network based on frequency mining and modulation. |
Yuning Cui; Syed Waqas Zamir; Salman Khan; Alois Knoll; Mubarak Shah; Fahad Shahbaz Khan; |
350 | BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce $\textbf{BitStack}$, a novel, training-free weight compression approach that enables megabyte-level trade-offs between memory usage and model performance. |
Xinghao Wang; Pengyu Wang; Bo Wang; Dong Zhang; Yunhua Zhou; Xipeng Qiu; |
351 | Limits of Deep Learning: Sequence Modeling Through The Lens of Complexity Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite their successes, deep learning models struggle with tasks requiring complex reasoning and function composition. We present a theoretical and empirical investigation into the limitations of Structured State Space Models (SSMs) and Transformers in such tasks. |
Nikola Zubic; Federico Soldà; Aurelio Sulser; Davide Scaramuzza; |
352 | Neural Phylogeny: Fine-Tuning Relationship Detection Among Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present two approaches for neural phylogeny detection: a learning-free method and a learning-based method. |
Runpeng Yu; Xinchao Wang; |
353 | Reasoning with Latent Thoughts: On The Power of Looped Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we make a stronger claim — many reasoning problems require a large depth but not necessarily many parameters. |
Nikunj Saunshi; Nishanth Dikkala; Zhiyuan Li; Sanjiv Kumar; Sashank J. Reddi; |
354 | Uncertainty-Aware Decoding with Minimum Bayes Risk Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show how Minimum Bayes Risk (MBR) decoding, which selects model generations according to an expected risk, can be generalized into a principled uncertainty-aware decoding method. |
Nico Daheim; Clara Meister; Thomas Möllenhoff; Iryna Gurevych; |
355 | SEMDICE: Off-policy State Entropy Maximization Via Stationary Distribution Correction Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SEMDICE, a principled off-policy algorithm that computes an SEM policy from an arbitrary off-policy dataset, which optimizes the policy directly within the space of stationary distributions. |
Jongmin Lee; Meiqi Sun; Pieter Abbeel; |
356 | Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a principled approach to provide LLM-based evaluation with a rigorous guarantee of human agreement. |
Jaehun Jung; Faeze Brahman; Yejin Choi; |
357 | MMSearch: Unveiling The Potential of Large Models As Multi-modal Search Engines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we first design a delicate pipeline, MMSearch-Engine, to empower any LMMs with multimodal search capabilities. On top of this, we introduce MMSearch, a comprehensive evaluation benchmark to assess the multimodal search performance of LMMs. |
Dongzhi Jiang; Renrui Zhang; Ziyu Guo; Yanmin Wu; jiayi lei; Pengshuo Qiu; Pan Lu; Zehui Chen; Guanglu Song; Peng Gao; Yu Liu; Chunyuan Li; Hongsheng Li; |
358 | Simple Guidance Mechanisms for Discrete Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, controllable diffusion on discrete data faces challenges given that continuous guidance methods do not directly apply to discrete diffusion. Here, we provide a straightforward derivation of classifier-free and classifier-based guidance for discrete diffusion, as well as a new class of diffusion models that leverage uniform noise and that are more guidable because they can continuously edit their outputs. |
Yair Schiff; Subham Sekhar Sahoo; Hao Phung; Guanghan Wang; Sam Boshar; Hugo Dalla-torre; Bernardo P de Almeida; Alexander M Rush; Thomas PIERROT; Volodymyr Kuleshov; |
359 | Straight to Zero: Why Linearly Decaying The Learning Rate to Zero Works Best for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In a large-scale empirical study, we show that under an optimal peak LR, a simple linear decay-to-zero (D2Z) schedule consistently outperforms other schedules when training at compute-optimal dataset sizes. |
Shane Bergsma; Nolan Simran Dey; Gurpreet Gosal; Gavia Gray; Daria Soboleva; Joel Hestness; |
360 | MIND: Math Informed SyNthetic Dialogues for Pretraining LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel large-scale and diverse Math Informed syNthetic Dialogue (MIND) generation method that improves the mathematical reasoning ability of LLMs. |
Syeda Nahida Akter; Shrimai Prabhumoye; John Kamalu; Sanjeev Satheesh; Eric Nyberg; Mostofa Patwary; Mohammad Shoeybi; Bryan Catanzaro; |
361 | Digi-Q: Learning VLM Q-Value Functions for Training Device-Control Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop an approach, called Digi-Q, to train VLM-based action-value Q-functions which are then used to extract the agent policy. |
Hao Bai; Yifei Zhou; Li Erran Li; Sergey Levine; Aviral Kumar; |
362 | An Effective Theory of Bias Amplification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we contribute a precise analytical theory in the context of ridge regression, both with and without random projections, where the former models feedforward neural networks in a simplified regime. |
Arjun Subramonian; Samuel Bell; Levent Sagun; Elvis Dohmatob; |
363 | Model-agnostic Meta-learners for Estimating Heterogeneous Treatment Effects Over Time Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our paper, we propose several meta-learners that are model-agnostic and thus can be used in combination with arbitrary machine learning models (e.g., transformers) to estimate HTEs over time. |
Dennis Frauen; Konstantin Hess; Stefan Feuerriegel; |
364 | DistillHGNN: A Knowledge Distillation Approach for High-Speed Hypergraph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework to significantly enhance the inference speed and memory efficiency of Hypergraph Neural Networks (HGNNs) while preserving their high accuracy. |
Saman Forouzandeh; Parham Moradi DW; Mahdi Jalili; |
365 | ColPali: Efficient Document Retrieval with Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To benchmark current systems on visually rich document retrieval, we introduce the Visual Document Retrieval Benchmark $\textit{ViDoRe}$, composed of various page-level retrieval tasks spanning multiple domains, languages, and practical settings.We release models, data, code and benchmarks under open licenses at https://hf.co/vidore. |
Manuel Faysse; Hugues Sibille; Tony Wu; Bilel Omrani; Gautier Viaud; CELINE HUDELOT; Pierre Colombo; |
366 | UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose UniWav, an encoder-decoder framework designed to unify pre-training representation learning and generative tasks. |
Alexander H. Liu; Sang-gil Lee; Chao-Han Huck Yang; Yuan Gong; Yu-Chiang Frank Wang; James R. Glass; Rafael Valle; Bryan Catanzaro; |
367 | Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes Fiddler, a resource-efficient inference system for MoE models with limited GPU resources. |
Keisuke Kamahori; Tian Tang; Yile Gu; Kan Zhu; Baris Kasikci; |
368 | Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the Lumina-T2X family — a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a simple and scalable generative framework that can be adapted to various modalities, e.g., transforming noise into images, videos, multi-view 3D objects, or audio clips conditioned on text instructions. |
Peng Gao; Le Zhuo; Dongyang Liu; Ruoyi Du; Xu Luo; Longtian Qiu; Yuhang Zhang; Rongjie Huang; Shijie Geng; Renrui Zhang; Junlin Xie; Wenqi Shao; Zhengkai Jiang; Tianshuo Yang; Weicai Ye; Tong He; Jingwen He; Junjun He; Yu Qiao; Hongsheng Li; |
369 | CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More specifically, inspired by the recent advancements of diffusion model-based inverse problem solvers (DIS), we reformulate text-guidance as an inverse problem with a text-conditioned score matching loss and develop CFG++, a novel approach that tackles the off-manifold challenges inherent in traditional CFG. |
Hyungjin Chung; Jeongsol Kim; Geon Yeong Park; Hyelin Nam; Jong Chul Ye; |
370 | MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing benchmarks suffer from limitations in data scale, scope, and evaluation depth, while current evaluation metrics are often costly or biased, lacking in reliability for practical applications. To address these challenges, we introduce MMIE, a large-scale knowledge-intensive benchmark for evaluating interleaved multimodal comprehension and generation in Large Vision-Language Models (LVLMs). |
Peng Xia; Siwei Han; Shi Qiu; Yiyang Zhou; Zhaoyang Wang; Wenhao Zheng; Zhaorun Chen; Chenhang Cui; Mingyu Ding; Linjie Li; Lijuan Wang; Huaxiu Yao; |
371 | MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs. |
Peng Xia; Kangyu Zhu; Haoran Li; Tianze Wang; Weijia Shi; Sheng Wang; Linjun Zhang; James Zou; Huaxiu Yao; |
372 | LLM-SR: Scientific Equation Discovery Via Programming with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: They also employ limited representations such as expression trees, constraining the search space and expressiveness of equations. To bridge this gap, we introduce LLM-SR, a novel approach that leverages the extensive scientific knowledge and robust code generation capabilities of Large Language Models (LLMs) to discover scientific equations from data. |
Parshin Shojaee; Kazem Meidani; Shashank Gupta; Amir Barati Farimani; Chandan K. Reddy; |
373 | Adaptive Length Image Tokenization Via Recurrent Allocation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This contrasts with human intelligence —and even large language models—which allocate varying representational capacities based on entropy, context and familiarity. Inspired by this, we propose an approach to learn variable-length token representations for 2D images. |
Shivam Duggal; Phillip Isola; Antonio Torralba; William T. Freeman; |
374 | AlphaEdit: Null-Space Constrained Model Editing for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While effective, current studies have demonstrated that this perturbation inevitably disrupt the originally preserved knowledge within LLMs, especially in sequential editing scenarios. To address this, we introduce AlphaEdit, a novel solution that projects perturbation onto the null space of the preserved knowledge before applying it to the parameters. |
Junfeng Fang; Houcheng Jiang; Kun Wang; Yunshan Ma; Jie Shi; Xiang Wang; Xiangnan He; Tat-Seng Chua; |
375 | Inference Scaling for Long-Context Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate inference scaling for retrieval augmented generation (RAG), exploring the combination of multiple strategies beyond simply increasing the quantity of knowledge, including in-context learning and iterative prompting. |
Zhenrui Yue; Honglei Zhuang; Aijun Bai; Kai Hui; Rolf Jagerman; Hansi Zeng; Zhen Qin; Dong Wang; Xuanhui Wang; Michael Bendersky; |
376 | Implicit Search Via Discrete Diffusion: A Study on Chess Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose DiffuSearch , a model that does \textit{implicit search} by looking into the future world via discrete diffusion modeling. |
Jiacheng Ye; Zhenyu Wu; Jiahui Gao; Zhiyong Wu; Xin Jiang; Zhenguo Li; Lingpeng Kong; |
377 | Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Autoregressive language models, despite their impressive capabilities, struggle with complex reasoning and long-term planning tasks. We introduce discrete diffusion models as a novel solution to these challenges. |
Jiacheng Ye; Jiahui Gao; Shansan Gong; Lin Zheng; Xin Jiang; Zhenguo Li; Lingpeng Kong; |
378 | Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate two mainstream architectures for language modeling, namely Transformers and Mambas, to explore the extent of their mechanistic similarity. |
Junxuan Wang; Xuyang Ge; Wentao Shu; Qiong Tang; Yunhua Zhou; Zhengfu He; Xipeng Qiu; |
379 | Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, directly increasing image resolution leads to expensive computational cost for MLLMs. In this paper, we reveal that a combination of low- and high-resolution visual features can efficiently mitigate this shortcoming. |
Gen Luo; Yiyi Zhou; Yuxin Zhang; Xiawu Zheng; Xiaoshuai Sun; Rongrong Ji; |
380 | Repetition Improves Language Model Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Bidirectional models are considered essential for strong text embeddings. Recent approaches to adapt autoregressive language models (LMs) into strong text embedding models have largely had the requirement to modify the LM architecture to be bidirectional. We challenge this premise by introducing “echo embeddings” which converts autoregressive LMs into high quality text embedding models \emph{without} changing the architecture or requiring fine-tuning. |
Jacob Mitchell Springer; Suhas Kotha; Daniel Fried; Graham Neubig; Aditi Raghunathan; |
381 | Data Unlearning in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: General-purpose machine unlearning techniques were found to be either unstable or failed to unlearn data. We therefore propose a family of new loss functions called Subtracted Importance Sampled Scores (SISS) that utilize importance sampling and are the first method to unlearn data with theoretical guarantees. |
Silas Alberti; Kenan Hasanaliyev; Manav Shah; Stefano Ermon; |
382 | HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we design dual structure-aware adapters to adaptively fit task-related homogeneous and heterogeneous structural information. |
Yujie Mo; Runpeng Yu; Xiaofeng Zhu; Xinchao Wang; |
383 | Data Pruning By Information Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present InfoMax, a novel data pruning method, also known as coreset selection, designed to maximize the information content of selected samples while minimizing redundancy. |
Haoru Tan; Sitong Wu; Wei Huang; Shizhen Zhao; XIAOJUAN QI; |
384 | Holistically Evaluating The Environmental Impact of Creating Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we estimate the real-world environmental impact of developing a series of language models, ranging from 20 million to 13 billion active parameters, trained on up to 5.6 trillion tokens each. |
Jacob Morrison; Clara Na; Jared Fernandez; Tim Dettmers; Emma Strubell; Jesse Dodge; |
385 | LLMs Can Plan Only If We Tell Them Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates whether LLMs can independently generate long-horizon plans that rival human baselines. |
Bilgehan Sel; Ruoxi Jia; Ming Jin; |
386 | No Location Left Behind: Measuring and Improving The Fairness of Implicit Representations for Earth Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods disproportionately prioritize global average performance, whereas practitioners require fine-grained insights to understand biases and variations in these models. To bridge this gap, we introduce FAIR-Earth: a first-of-its-kind dataset explicitly crafted to challenge and examine inequities in Earth representations. |
Daniel Cai; Randall Balestriero; |
387 | Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing LLMs often respond by presupposing a single interpretation of such ambiguous requests, frustrating users who intended a different interpretation. We speculate this is caused by current preference data labeling practice, where LLM responses are evaluated only on their prior contexts. To address this, we assign preference labels by simulating their expected outcomes in future turns. |
Michael JQ Zhang; W. Bradley Knox; Eunsol Choi; |
388 | COAT: Compressing Optimizer States and Activations for Memory-Efficient FP8 Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces COAT (**C**ompressing **O**ptimizer States and **A**ctivations for FP8 **T**raining), a novel FP8 training framework designed to significantly reduce memory footprint when training large models. |
Haocheng Xi; Han Cai; Ligeng Zhu; Yao Lu; Kurt Keutzer; Jianfei Chen; Song Han; |
389 | Protein Language Model Fitness Is A Matter of Preference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We aim to predict the circumstances in which pLMs can successfully perform zero-shot fitness estimation. |
Cade W Gordon; Amy X. Lu; Pieter Abbeel; |
390 | Interpreting The Second-Order Effects of Neurons in CLIP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, each effect can be approximated by a single direction in the text-image space of CLIP. We describe neurons by decomposing these directions into sparse sets of text representations. |
Yossi Gandelsman; Alexei A Efros; Jacob Steinhardt; |
391 | Breach By A Thousand Leaks: Unsafe Information Leakage in ‘Safe’ AI Responses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We assert that robustness is fundamentally insufficient for ensuring safety goals due to inferential threats from dual-intent queries, with current defenses and evaluations failing to account for these risks. To quantify these risks, we introduce a new safety evaluation framework based on $\textit{impermissible information leakage}$ of model outputs and demonstrate how our proposed question-decomposition attack can extract dangerous knowledge from a censored LLM more effectively than traditional jailbreaking. |
David Glukhov; Ziwen Han; Ilia Shumailov; Vardan Papyan; Nicolas Papernot; |
392 | AutoG: Towards Automatic Graph Construction from Tabular Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our research aims to address this gap by formalizing the graph construction problem and proposing an effective solution.First, we introduce a set of datasets to formalize and evaluate graph construction methods. |
Zhikai Chen; Han Xie; Jian Zhang; Xiang song; Jiliang Tang; Huzefa Rangwala; George Karypis; |
393 | Do As We Do, Not As You Think: The Conformity of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we introduce BenchForm, a new conformity-oriented benchmark, featuring reasoning-intensive tasks and five distinct interaction protocols designed to probe LLMs’ behavior in collaborative scenarios. |
Zhiyuan Weng; Guikun Chen; Wenguan Wang; |
394 | Learning Clustering-based Prototypes for Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop ClusPro, a robust clustering-based prototype mining framework for CZSL that defines the conceptual boundaries of primitives through a set of diversified prototypes. |
Hongyu Qu; Jianan Wei; Xiangbo Shu; Wenguan Wang; |
395 | GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We believe our work provides valuable insights for future research in dynamic GUI content understanding.To this end, this paper introduces a new dataset, termed GUI-World, which features meticulously crafted Human-MLLM annotations, extensively covering six GUI scenarios and eight types of GUI-oriented questions in three formats. |
Dongping Chen; Yue Huang; Siyuan Wu; Jingyu Tang; Huichi Zhou; Qihui Zhang; Zhigang He; Yilin Bai; Chujie Gao; Liuyi Chen; Yiqiang Li; Chenlong Wang; Yue Yu; Tianshuo Zhou; Zhen Li; Yi Gui; Yao Wan; Pan Zhou; Jianfeng Gao; Lichao Sun; |
396 | Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the “Diffuse Risk Management” problem, aiming to balance the average-case safety and usefulness in the deployment of untrusted models over a large sequence of tasks. |
Jiaxin Wen; Vivek Hebbar; Caleb Larson; Aryan Bhatt; Ansh Radhakrishnan; Mrinank Sharma; Henry Sleight; Shi Feng; He He; Ethan Perez; Buck Shlegeris; Akbir Khan; |
397 | Framer: Interactive Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Framer for interactive frame interpolation, which targets producing smoothly transitioning frames between two images as per user creativity. |
Wen Wang; Qiuyu Wang; Kecheng Zheng; Hao OUYANG; Zhekai Chen; Biao Gong; Hao Chen; Yujun Shen; Chunhua Shen; |
398 | DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce DynamicCity, a novel 4D occupancy generation framework capable of generating large-scale, high-quality dynamic 4D scenes with semantics. |
Hengwei Bian; Lingdong Kong; Haozhe Xie; Liang Pan; Yu Qiao; Ziwei Liu; |
399 | McEval: Massively Multilingual Code Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further facilitate the research of code LLMs, we propose a massively multilingual code benchmark covering 40 programming languages (McEval) with 16K test samples, which substantially pushes the limits of code LLMs in multilingual scenarios. |
Linzheng Chai; Shukai Liu; Jian Yang; Yuwei Yin; JinKe; Jiaheng Liu; Tao Sun; Ge Zhang; Changyu Ren; Hongcheng Guo; Noah Wang; Boyang Wang; Xianjie Wu; Bing Wang; Tongliang Li; Liqun Yang; Sufeng Duan; Zhaoxiang Zhang; Zhoujun Li; |
400 | G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Utilizing the Geo170k dataset, we introduce G-LLaVA, a model that demonstrates exceptional performance in solving geometric problems. |
Jiahui Gao; Renjie Pi; Jipeng Zhang; Jiacheng Ye; Wanjun Zhong; Yufei Wang; Lanqing HONG; Jianhua Han; Hang Xu; Zhenguo Li; Lingpeng Kong; |
401 | Language Models Are Advanced Anonymizers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With ever-increasing model capabilities, existing text anonymization methods are currently lacking behind regulatory requirements and adversarial threats. In this work, we take two steps to bridge this gap: First, we present a new setting for evaluating anonymization in the face of adversarial LLM inferences, allowing for a natural measurement of anonymization performance while remedying some of the shortcomings of previous metrics. Then, within this setting, we develop a novel LLM-based adversarial anonymization framework leveraging the strong inferential capabilities of LLMs to inform our anonymization procedure. |
Robin Staab; Mark Vero; Mislav Balunovic; Martin Vechev; |
402 | Accelerating Neural Network Training: An Analysis of The AlgoPerf Competition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents the inaugural AlgoPerf competition’s results, which drew 18 diverse submissions from 10 teams. |
Priya Kasimbeg; Frank Schneider; Runa Eschenhagen; Juhan Bae; Chandramouli Shama Sastry; Mark Saroufim; BOYUAN FENG; Less Wright; Edward Z. Yang; Zachary Nado; Sourabh Medapati; Philipp Hennig; Michael Rabbat; George E. Dahl; |
403 | Integrating Protein Dynamics Into Structure-Based Drug Design Via Full-Atom Stochastic Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We curate a dataset of apo and multiple holo states of protein-ligand complexes, simulated by molecular dynamics, and propose a full-atom flow model (and a stochastic version), named DynamicFlow, that learns to transform apo pockets and noisy ligands into holo pockets and corresponding 3D ligand molecules. |
Xiangxin Zhou; Yi Xiao; Haowei Lin; Xinheng He; Jiaqi Guan; Yang Wang; Qiang Liu; Feng Zhou; Liang Wang; Jianzhu Ma; |
404 | Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Ferret-UI 2, a multimodal large language model (MLLM) designed for universal UI understanding across a wide range of platforms, including iPhone, Android, iPad, Webpage, and AppleTV. |
Zhangheng LI; Keen You; Haotian Zhang; Di Feng; Harsh Agrawal; Xiujun Li; Mohana Prasad Sathya Moorthy; Jeffrey Nichols; Yinfei Yang; Zhe Gan; |
405 | PivotMesh: Generic 3D Mesh Generation Via Pivot Vertices Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a generic and scalable mesh generation framework PivotMesh, which makes an initial attempt to extend the native mesh generation to large-scale datasets. |
Haohan Weng; Yikai Wang; Tong Zhang; C. L. Philip Chen; Jun Zhu; |
406 | MQuAKE-Remastered: Multi-Hop Knowledge Editing Can Only Be Advanced with Reliable Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we reveal that **up to 33\% or 76\% of \mquake{}’s questions and ground truth labels are, in fact, corrupted in various fashions due to some unintentional clerical or procedural oversights**. |
Shaochen Zhong; Yifan Lu; Lize Shao; Bhargav Bhushanam; Xiaocong Du; Yixin Wan; Yucheng Shi; Daochen Zha; Yiwei Wang; Ninghao Liu; Kaixiong Zhou; Shuai Xu; Kai-Wei Chang; Louis Feng; Vipin Chaudhary; Xia Hu; |
407 | Automated Design of Agentic Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Given that programming languages are Turing Complete, this approach theoretically enables the learning of any possible agentic system: including novel prompts, tool use, workflows, and combinations thereof. We present a simple yet effective algorithm named Meta Agent Search to demonstrate this idea, where a meta agent iteratively programs interesting new agents based on an ever-growing archive of previous discoveries. |
Shengran Hu; Cong Lu; Jeff Clune; |
408 | Language Model Alignment in Multilingual Trolley Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems and highlighting the importance of incorporating diverse perspectives in AI ethics.Building on the Moral Machine experiment, which captures over 40 million human judgments across 200+ countries, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP. |
Zhijing Jin; Max Kleiman-Weiner; Giorgio Piatti; Sydney Levine; Jiarui Liu; Fernando Gonzalez Adauto; Francesco Ortu; András Strausz; Mrinmaya Sachan; Rada Mihalcea; Yejin Choi; Bernhard Schölkopf; |
409 | SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such practice can introduce content variations irrelevant to whether the instruction is precisely followed (e.g., different expressions about the same semantic), interfering with the goal of teaching models to recognize the key differences that lead to improved instruction following. In light of this, we introduce SPaR, a self-play framework integrating tree-search self-refinement to yield valid and comparable preference pairs free from distractions. |
Jiale Cheng; Xiao Liu; Cunxiang Wang; Xiaotao Gu; Yida Lu; Dan Zhang; Yuxiao Dong; Jie Tang; Hongning Wang; Minlie Huang; |
410 | Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, inspired by the observation that the text-to-image generation process is the inverse of image-conditioned response generation in LVLMs, we explore the potential of leveraging text-to-image generative models to assist in mitigating hallucinations in LVLMs. |
Ce Zhang; Zifu Wan; Zhehan Kan; Martin Q. Ma; Simon Stepputtis; Deva Ramanan; Russ Salakhutdinov; Louis-Philippe Morency; Katia P. Sycara; Yaqi Xie; |
411 | Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through extensive evaluations of eleven representative watermarking methods against prevalent editing techniques, we demonstrate that most methods fail to detect watermarks after such edits. To address this limitation, we propose VINE, a watermarking method that significantly enhances robustness against various image editing techniques while maintaining high image quality. |
Shilin Lu; Zihan Zhou; Jiayou Lu; Yuanzhi Zhu; Adams Wai-Kin Kong; |
412 | Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that retaining offline data is unnecessary as long as we use a properly-designed online RL approach for fine-tuning offline RL initializations. |
Zhiyuan Zhou; Andy Peng; Qiyang Li; Sergey Levine; Aviral Kumar; |
413 | AuroraCap: Efficient, Performant Video Detailed Captioning and A New Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose AuroraCap, a video captioner based on a large multimodal model. |
Wenhao Chai; Enxin Song; Yilun Du; Chenlin Meng; Vashisht Madhavan; Omer Bar-Tal; Jenq-Neng Hwang; Saining Xie; Christopher D Manning; |
414 | CertainlyUncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and further explore finer categories within. |
Khyathi Chandu; Linjie Li; Anas Awadalla; Ximing Lu; Jae Sung Park; Jack Hessel; Lijuan Wang; Yejin Choi; |
415 | VILA-U: A Unified Foundation Model Integrating Visual Understanding and Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, VILA-U employs a single autoregressive next-token prediction framework for both tasks, eliminating the need for additional components like diffusion models. |
Yecheng Wu; Zhuoyang Zhang; Junyu Chen; Haotian Tang; Dacheng Li; Yunhao Fang; Ligeng Zhu; Enze Xie; Hongxu Yin; Li Yi; Song Han; Yao Lu; |
416 | Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing approaches treat predictive learning as a conditional generation problem, but often fail to fully exploit the temporal dynamics inherent in the data, leading to challenges in generating temporally coherent sequences. To address this, we introduce Dynamical Diffusion (DyDiff), a theoretically sound framework that incorporates temporally aware forward and reverse processes. |
Xingzhuo Guo; Yu Zhang; Baixu Chen; Haoran Xu; Jianmin Wang; Mingsheng Long; |
417 | MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering. |
Jun Shern Chan; Neil Chowdhury; Oliver Jaffe; James Aung; Dane Sherburn; Evan Mays; Giulio Starace; Kevin Liu; Leon Maksin; Tejal Patwardhan; Aleksander Madry; Lilian Weng; |
418 | Understanding Factual Recall in Transformers Via Associative Memories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we show that shallow transformers can use a combination of associative memories to obtain such near optimal storage capacity. |
Eshaan Nichani; Jason D. Lee; Alberto Bietti; |
419 | TFG-Flow: Training-free Guidance in Multimodal Generative Flow Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Another emerging trend is the growing use of the simple and general flow matching framework in building generative foundation models, where guided generation remains under-explored. To address this, we introduce TFG-Flow, a novel training-free guidance method for multimodal generative flow. |
Haowei Lin; Shanda Li; Haotian Ye; Yiming Yang; Stefano Ermon; Yitao Liang; Jianzhu Ma; |
420 | EgoSim: Egocentric Exploration in Virtual Worlds with Multi-modal Conditioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent advancements in video diffusion models have established a strong foundation for developing world models with practical applications. The next challenge lies in exploring how an agent can leverage these foundation models to understand, interact with, and plan within observed environments. This requires adding more controllability to the model, transforming it into a versatile game engine capable of dynamic manipulation and control. To address this, we investigated three key conditioning factors: camera, context frame, and text, identifying limitations in current model designs. |
Wei Yu; Songheng Yin; Steve Easterbrook; Animesh Garg; |
421 | Q-SFT: Q-Learning for Language Models Via Supervised Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This setting requires effectively leveraging pretraining, scaling to large architectures with billions of parameters, and training on large datasets, all of which represent major challenges for current value-based RL methods. In this work, we propose a novel offline RL algorithm that addresses these drawbacks, casting Q-learning as a modified supervised fine-tuning (SFT) problem where the probabilities of tokens directly translate to Q-values. |
Joey Hong; Anca Dragan; Sergey Levine; |
422 | Knowledge Entropy Decay During Language Model Pretraining Hinders New Knowledge Acquisition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate how a model’s tendency to broadly integrate its parametric knowledge evolves throughout pretraining, and how this behavior affects overall performance, particularly in terms of knowledge acquisition and forgetting. |
Jiyeon Kim; Hyunji Lee; Hyowon Cho; Joel Jang; Hyeonbin Hwang; Seungpil Won; Youbin Ahn; Dohaeng Lee; Minjoon Seo; |
423 | Stabilized Neural Prediction of Potential Outcomes in Continuous Time Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to estimate CAPOs in continuous time. |
Konstantin Hess; Stefan Feuerriegel; |
424 | PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a versatile image-to-image visual assistant, PixWizard, designed for image generation, manipulation, and translation based on free-from language instructions. |
Weifeng Lin; Xinyu Wei; Renrui Zhang; Le Zhuo; Shitian Zhao; Siyuan Huang; Junlin Xie; Peng Gao; Hongsheng Li; |
425 | Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the Draw-and-Understand framework, exploring how to integrate visual prompting understanding capabilities into Multimodal Large Language Models (MLLMs). |
Weifeng Lin; Xinyu Wei; Ruichuan An; Peng Gao; Bocheng Zou; Yulin Luo; Siyuan Huang; Shanghang Zhang; Hongsheng Li; |
426 | Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a unified approach to online and offline RLHF — value-incentivized preference optimization (VPO) — which regularizes the maximum-likelihood estimate of the reward function with the corresponding value function, modulated by a sign to indicate whether the optimism or pessimism is chosen. |
Shicong Cen; Jincheng Mei; Katayoon Goshvadi; Hanjun Dai; Tong Yang; Sherry Yang; Dale Schuurmans; Yuejie Chi; Bo Dai; |
427 | Uncovering Gaps in How Humans and LLMs Interpret Subjective Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we uncover instances of *misalignment* between LLMs’ actual operational semantics and what humans expect. |
Erik Jones; Arjun Patrawala; Jacob Steinhardt; |
428 | CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos that align seamlessly with text prompts, with a frame rate of 16 fps and resolution of 768 x 1360 pixels. |
Zhuoyi Yang; Jiayan Teng; Wendi Zheng; Ming Ding; Shiyu Huang; Jiazheng Xu; Yuanming Yang; Wenyi Hong; Xiaohan Zhang; Guanyu Feng; Da Yin; Yuxuan.Zhang; Weihan Wang; Yean Cheng; Bin Xu; Xiaotao Gu; Yuxiao Dong; Jie Tang; |
429 | Grounding Multimodal Large Language Model in GUI World Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an effective GUI grounding framework, which includes an automated data collection engine that gathers extensive GUI screenshots and annotations to ensure broad generalization. |
Weixian Lei; Difei Gao; Mike Zheng Shou; |
430 | 3DitScene: Editing Any Scene Via Language-guided Disentangled Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose 3DitScene, a novel and unified scene editing framework leveraging language-guided disentangled Gaussian Splatting that enables seamless editing from 2D to 3D, allowing precise control over scene composition and individual objects. |
Qihang Zhang; Yinghao Xu; Chaoyang Wang; Hsin-Ying Lee; Gordon Wetzstein; Bolei Zhou; Ceyuan Yang; |
431 | Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To solve the problem, we propose Oryx, a unified multimodal architecture for the spatial-temporal understanding of images, videos, and multi-view 3D scenes. |
Zuyan Liu; Yuhao Dong; Ziwei Liu; Winston Hu; Jiwen Lu; Yongming Rao; |
432 | PuzzleFusion++: Auto-agglomerative 3D Fracture Assembly By Denoise and Verify Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel “auto-agglomerative” 3D fracture assembly method, PuzzleFusion++, resembling how humans solve challenging spatial puzzles. |
Zhengqing Wang; Jiacheng Chen; Yasutaka Furukawa; |
433 | Energy-based Backdoor Defense Against Federated Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we propose an effective Federated Graph Backdoor Defense using Topological Graph Energy (FedTGE). |
Guancheng Wan; Zitong Shi; Wenke Huang; Guibin Zhang; Dacheng Tao; Mang Ye; |
434 | Monitoring Latent World States in Language Models with Propositional Probes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We hypothesize that LMs faithfully represent their input contexts in a latent world model, and we seek to extract these latent world states as logical propositions. |
Jiahai Feng; Stuart Russell; Jacob Steinhardt; |
435 | MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs), enabling a broader search scenario, termed universal multimodal retrieval, where multiple modalities and diverse retrieval tasks are accommodated. |
Sheng-Chieh Lin; Chankyu Lee; Mohammad Shoeybi; Jimmy Lin; Bryan Catanzaro; Wei Ping; |
436 | Probabilistic Language-Image Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Probabilistic Language-Image Pre-training (ProLIP), the first probabilistic VLM pre-trained on a billion-scale image-text dataset using only probabilistic objectives, achieving a strong zero-shot capability (e.g., 74.6% ImageNet zero-shot accuracy with ViT-B/16). |
Sanghyuk Chun; Wonjae Kim; Song Park; Sangdoo Yun; |
437 | Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study inference scaling laws (aka test-time scaling laws) and compute-optimal inference, focusing on the trade-offs between model sizes and generating additional tokens with different inference strategies. |
Yangzhen Wu; Zhiqing Sun; Shanda Li; Sean Welleck; Yiming Yang; |
438 | Jailbreaking As A Reward Misspecification Problem Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The widespread adoption of large language models (LLMs) has raised concerns about their safety and reliability, particularly regarding their vulnerability to adversarial attacks. In this paper, we propose a new perspective that attributes this vulnerability to reward misspecification during the alignment process. |
Zhihui Xie; Jiahui Gao; Lei Li; Zhenguo Li; Qi Liu; Lingpeng Kong; |
439 | Scaling Diffusion Language Models Via Adaptation from Autoregressive Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate connections between AR and diffusion modeling objectives and introduce a simple continual pre-training approach for training diffusion models. |
Shansan Gong; Shivam Agarwal; Yizhe Zhang; Jiacheng Ye; Lin Zheng; Mukai Li; Chenxin An; Peilin Zhao; Wei Bi; Jiawei Han; Hao Peng; Lingpeng Kong; |
440 | InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current studies primarily focus on natural image restoration, leaving the performance of these algorithms in scientific inverse problems largely unexplored. To address this gap, we introduce \textsc{InverseBench}, a framework that evaluates diffusion models across five distinct scientific inverse problems. |
Hongkai Zheng; Wenda Chu; Bingliang Zhang; Zihui Wu; Austin Wang; Berthy Feng; Caifeng Zou; Yu Sun; Nikola Borislavov Kovachki; Zachary E Ross; Katherine Bouman; Yisong Yue; |
441 | Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, it is essential to explore the environment and collect data to achieve beyond-expert performance. To overcome these challenges, we propose a novel imitation learning algorithm called Imitation Learning with Double Exploration (ILDE), which implements exploration in two aspects: (1) optimistic policy optimization via an exploration bonus that rewards state-action pairs with high uncertainty to potentially improve the convergence to the expert policy, and (2) curiosity-driven exploration of the states that deviate from the demonstration trajectories to potentially yield beyond-expert performance. |
Heyang Zhao; Xingrui Yu; David Mark Bossens; Ivor Tsang; Quanquan Gu; |
442 | CREAM: Consistency Regularized Self-Rewarding Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We then introduce the regularization to this generalized framework to mitigate the overconfident preference labeling in the self-rewarding process. Based on this theoretical insight, we propose a Consistency Regularized sElf-rewarding lAnguage Model (CREAM) that leverages the consistency of rewards across different iterations to regularize the self-rewarding training, helping the model to learn from more reliable preference data. |
Zhaoyang Wang; Weilei He; Zhiyuan Liang; Xuchao Zhang; Chetan Bansal; Ying Wei; Weitong Zhang; Huaxiu Yao; |
443 | Quantifying Generalization Complexity for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While large language models (LLMs) have shown exceptional capabilities in understanding complex queries and performing sophisticated tasks, their generalization abilities are often deeply entangled with memorization, necessitating more precise evaluation. To address this challenge, we introduce Scylla, a dynamic evaluation framework that quantitatively measures the generalization abilities of LLMs. |
Zhenting Qi; Hongyin Luo; Xuliang Huang; Zhuokai Zhao; Yibo Jiang; Xiangjun Fan; Himabindu Lakkaraju; James R. Glass; |
444 | Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8% on MATH dataset), indicating their inadequacy for truly challenging these models. To bridge this gap, we propose a comprehensive and challenging benchmark specifically designed to assess LLMs’ mathematical reasoning at the Olympiad level. |
Bofei Gao; Feifan Song; Zhe Yang; Zefan Cai; Yibo Miao; Qingxiu Dong; Lei Li; Chenghao Ma; Liang Chen; Runxin Xu; Zhengyang Tang; Benyou Wang; Daoguang Zan; Shanghaoran Quan; Ge Zhang; Lei Sha; Yichang Zhang; Xuancheng Ren; Tianyu Liu; Baobao Chang; |
445 | OS-ATLAS: Foundation Action Model for Generalist GUI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Practitioners are often reluctant to use open-source VLMs due to their significant performance lag compared to their closed-source counterparts, particularly in GUI grounding and Out-Of-Distribution (OOD) scenarios. To facilitate future research in this area, we developed OS-Atlas—a foundational GUI action model that excels at GUI grounding and OOD agentic tasks through innovations in both data and modeling. |
Zhiyong Wu; Zhenyu Wu; Fangzhi Xu; Yian Wang; Qiushi Sun; Chengyou Jia; Kanzhi Cheng; Zichen Ding; Liheng Chen; Paul Pu Liang; Yu Qiao; |
446 | Forewarned Is Forearmed: Harnessing LLMs for Data Synthesis Via Failure-induced Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel approach, ReverseGen, designed to automatically generate effective training samples that expose the weaknesses of LLMs. |
Qintong Li; Jiahui Gao; Sheng Wang; Renjie Pi; Xueliang Zhao; Chuan Wu; Xin Jiang; Zhenguo Li; Lingpeng Kong; |
447 | Perplexed By Perplexity: Perplexity-Based Data Pruning With Small Reference Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. |
Zachary Ankner; Cody Blakeney; Kartik Sreenivasan; Max Marion; Matthew L Leavitt; Mansheej Paul; |
448 | Scaling LLM Test-Time Compute Optimally Can Be More Effective Than Scaling Parameters for Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Enabling LLMs to improve their outputs by using more test-time compute is a critical step towards building self-improving agents that can operate on open-ended natural language. In this paper, we scale up inference-time computation in LLMs, with a focus on answering: if an LLM is allowed to use a fixed but non-trivial amount of inference-time compute, how much can it improve its performance on a challenging prompt? |
Charlie Victor Snell; Jaehoon Lee; Kelvin Xu; Aviral Kumar; |
449 | Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key insight is that, to be effective, the process reward for a step should measure *progress*: a change in the likelihood of producing a correct response in the future, before and after taking the step, as measured under a *prover* policy distinct from the base policy. |
Amrith Setlur; Chirag Nagpal; Adam Fisch; Xinyang Geng; Jacob Eisenstein; Rishabh Agarwal; Alekh Agarwal; Jonathan Berant; Aviral Kumar; |
450 | Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We believe this issue arises from the insufficient precision of knowledge localization. To address this, we propose a Fine-grained Neuron-level Knowledge Editing (FiNE) method that enhances editing locality without affecting overall success rates. |
Haowen Pan; Xiaozhi Wang; Yixin Cao; Zenglin Shi; Xun Yang; Juanzi Li; Meng Wang; |
451 | DOTS: Learning to Reason Dynamically in LLMs Via Optimal Reasoning Trajectories Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DOTS, an approach enabling LLMs to reason Dynamically via Optimal reasoning Trajectories Search, tailored to the specific characteristics of each question and the inherent capability of the task-solving LLM. |
Murong Yue; Wenlin Yao; Haitao Mi; Dian Yu; Ziyu Yao; Dong Yu; |
452 | Bayesian Optimization of Antibodies Informed By A Generative Model of Evolving Sequences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Clone-informed Bayesian Optimization (CloneBO), a Bayesian optimization procedure that efficiently optimizes antibodies in the lab by teaching a generative model how our immune system optimizes antibodies. |
Alan Nawzad Amin; Nate Gruver; Yilun Kuang; Yucen Lily Li; Hunter Elliott; Calvin McCarter; Aniruddh Raghu; Peyton Greenside; Andrew Gordon Wilson; |
453 | Compute-Optimal LLMs Provably Generalize Better with Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel, fully empirical Freedman-type martingale concentration inequality that tightens existing bounds by accounting for the variance of the loss function. |
Marc Anton Finzi; Sanyam Kapoor; Diego Granziol; Anming Gu; Christopher De Sa; J Zico Kolter; Andrew Gordon Wilson; |
454 | Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, ensuring the safety of LLMs during fine-tuning remains a critical concern, and mitigating the potential conflicts in safety and helpfulness is costly in RLHF. To address this issue, we propose a supervised learning framework called Bi-Factorial Preference Optimization (BFPO), which re-parameterizes a joint RLHF objective of both safety and helpfulness into a single supervised learning objective. |
Wenxuan Zhang; Philip Torr; Mohamed Elhoseiny; Adel Bibi; |
455 | Efficient Diversity-Preserving Diffusion Alignment Via Gradient-Informed GFlowNets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing post-training methods for reward finetuning of diffusion models typically suffer from lack of diversity in generated samples, lack of prior preservation, and/or slow convergence in finetuning. Inspired by recent successes in generative flow networks (GFlowNets), a class of probabilistic models that sample with the unnormalized density of a reward function, we propose a novel GFlowNet method dubbed Nabla-GFlowNet (abbreviated as \nabla-GFlowNet), the first GFlowNet method that leverages the rich signal in reward gradients, together with an objective called \nabla-DB plus its variant residual \nabla-DB designed for prior-preserving diffusion finetuning. |
Zhen Liu; Tim Z. Xiao; Weiyang Liu; Yoshua Bengio; Dinghuai Zhang; |
456 | EmbodiedSAM: Online Segment Any 3D Thing in Real Time Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to leverage Segment Anything Model (SAM) for real-time 3D instance segmentation in an online setting. |
Xiuwei Xu; Huangxing Chen; Linqing Zhao; Ziwei Wang; Jie Zhou; Jiwen Lu; |
457 | HAMSTER: Hierarchical Action Models for Open-World Robot Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we posit that *hierarchical* vision-language-action (VLA) models can be more effective in utilizing off-domain data than standard monolithic VLA models that directly finetune vision-language models (VLMs) to predict actions. |
Yi Li; Yuquan Deng; Jesse Zhang; Joel Jang; Marius Memmel; Caelan Reed Garrett; Fabio Ramos; Dieter Fox; Anqi Li; Abhishek Gupta; Ankit Goyal; |
458 | On The Role of Attention Heads in Large Language Model Safety Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing research tends to overlook the safety impact of multi-head attention mechanisms, despite their crucial role in various model functionalities. Hence, in this paper, we aim to explore the connection between standard attention mechanisms and safety capability to fill this gap in the safety-related mechanistic interpretability. |
Zhenhong Zhou; Haiyang Yu; Xinghua Zhang; Rongwu Xu; Fei Huang; Kun Wang; Yang Liu; Junfeng Fang; Yongbin Li; |
459 | Fine-Tuning Discrete Diffusion Models Via Reward Optimization with Applications to DNA and Protein Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We then formulate the reward maximization problem within discrete diffusion models, analogous to reinforcement learning (RL), while minimizing the KL divergence against pre-trained diffusion models to preserve naturalness. To solve this RL problem, we propose a novel algorithm that enables direct backpropagation of rewards through entire trajectories generated by diffusion models, by making the originally non-differentiable trajectories differentiable using the Gumbel-Softmax trick. |
Chenyu Wang; Masatoshi Uehara; Yichun He; Amy Wang; Avantika Lal; Tommi Jaakkola; Sergey Levine; Aviv Regev; Hanchen; Tommaso Biancalani; |
460 | From Attention to Activation: Unraveling The Enigmas of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that popular large language models, such as Llama attend maximally to the first token in 98% of attention heads, a behaviour we attribute to the softmax function. To mitigate this issue, we propose a reformulation of softmax to softmax-1. |
Prannay Kaul; Chengcheng Ma; Ismail Elezi; Jiankang Deng; |
461 | PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to address the challenge of hallucinations in Multimodal Large Language Models (MLLMs) particularly for dense image captioning tasks. |
Cong Chen; Mingyu Liu; Chenchen Jing; Yizhou Zhou; Fengyun Rao; Hao Chen; Bo Zhang; Chunhua Shen; |
462 | Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We focus on two research questions: (1) Can LLMs generate reliable preferences among wrong options? And if so, (2) Would alignment with such wrong-over-wrong preferences be helpful? We employ methods based on self-consistency, token probabilities, and LLM-as-a-judge to elicit wrong-over-wrong preferences, and fine-tune language models with preference optimization approaches using these synthesized preferences. |
Jihan Yao; Wenxuan Ding; Shangbin Feng; Lucy Lu Wang; Yulia Tsvetkov; |
463 | No Preference Left Behind: Group Distributional Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These methods often skew toward dominant preferences, overlooking the diversity of opinions, especially when conflicting preferences arise. To address this issue, we propose Group Distributional Preference Optimization (GDPO), a novel framework that aligns language models with the distribution of preferences within a group by incorporating the concept of beliefs that shape individual preferences. |
Binwei Yao; Zefan Cai; Yun-Shiuan Chuang; Shanglin Yang; Ming Jiang; Diyi Yang; Junjie Hu; |
464 | Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit layer tying as form of parameter sharing in Transformers, and introduce novel methods for converting existing LLMs into smaller Recursive Transformers that share parameters across layers, with minimal loss of performance. |
Sangmin Bae; Adam Fisch; Hrayr Harutyunyan; Ziwei Ji; Seungyeon Kim; Tal Schuster; |
465 | Fast and Accurate Blind Flexible Docking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing docking methods often face limitations: they either overlook crucial structural changes by assuming protein rigidity or suffer from low computational efficiency due to their reliance on generative models for structure sampling. To address these challenges, we propose FABFlex, a fast and accurate regression-based multi-task learning model designed for realistic blind flexible docking scenarios, where proteins exhibit flexibility and binding pocket sites are unknown (blind). |
Zizhuo Zhang; Lijun Wu; Kaiyuan Gao; Jiangchao Yao; Tao Qin; Bo Han; |
466 | A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work tackles the information loss bottleneck of vector-quantization (VQ) autoregressive image generation by introducing a novel model architecture called the 2-Dimensional Autoregression (DnD) Transformer. |
Liang Chen; Sinan Tan; Zefan Cai; Weichu Xie; Haozhe Zhao; Yichi Zhang; Junyang Lin; Jinze Bai; Tianyu Liu; Baobao Chang; |
467 | Emergence of A High-Dimensional Abstraction Phase in Language Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, much remains to be known about this mapping, including how its geometric properties relate to its function. We take a high-level geometric approach to its analysis, observing, across five pre-trained transformer-based LMs and three input datasets, a distinct phase characterized by high intrinsic dimensionality. |
Emily Cheng; Diego Doimo; Corentin Kervadec; Iuri Macocco; Lei Yu; Alessandro Laio; Marco Baroni; |
468 | Kolmogorov-Arnold Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce the Kolmogorov–Arnold Transformer (KAT), a novel architecture that replaces MLP layers with Kolmogorov-Arnold Network (KAN) layers to enhance the expressiveness and performance of the model. |
Xingyi Yang; Xinchao Wang; |
469 | Differentially Private Learners for Heterogeneous Treatment Effects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to estimate the conditional average treatment effect (CATE) from observational data under differential privacy. |
Maresa Schröder; Valentyn Melnychuk; Stefan Feuerriegel; |
470 | Scaling Laws for Downstream Task Performance in Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the scaling behavior in a transfer learning setting, where LLMs are finetuned for machine translation tasks. |
Berivan Isik; Natalia Ponomareva; Hussein Hazimeh; Dimitris Paparas; Sergei Vassilvitskii; Sanmi Koyejo; |
471 | Diffusion Feedback Helps CLIP See Better Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The main reason could be that the image-text pairs used to train CLIP are inherently biased, due to the lack of the distinctiveness of the text and the diversity of images. In this work, we present a simple post-training approach for CLIP models, which largely overcomes its visual shortcomings via a self-supervised diffusion process. |
Wenxuan Wang; Quan Sun; Fan Zhang; Yepeng Tang; Jing Liu; Xinlong Wang; |
472 | Persistent Pre-training Poisoning of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work evaluates for the first time whether language models can also be \emph{compromised during pre-training}, with a focus on the persistence of pre-training attacks after models are fine-tuned as helpful and harmless chatbots (i.e., after SFT and DPO). |
Yiming Zhang; Javier Rando; Ivan Evtimov; Jianfeng Chi; Eric Michael Smith; Nicholas Carlini; Florian Tramèr; Daphne Ippolito; |
473 | Human-Aligned Chess With A Bit of Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these systems are *not human-aligned*; they are unable to match the skill levels of all human partners or model human-like behaviors beyond piece movement. In this paper, we introduce Allie, a chess-playing AI designed to bridge the gap between artificial and human intelligence in this classic game. |
Yiming Zhang; Athul Paul Jacob; Vivian Lai; Daniel Fried; Daphne Ippolito; |
474 | Can Reinforcement Learning Solve Asymmetric Combinatorial-Continuous Zero-Sum Games? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we define and study a new practical class of asymmetric games called two-player Asymmetric Combinatorial-Continuous zEro-Sum (ACCES) games, featuring a combinatorial action space for one player and an infinite compact space for the other. |
Yuheng Li; Wang Panpan; Haipeng Chen; |
475 | TopoNets: High Performing Vision and Language Models with Brain-like Topography Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present *TopoLoss*, a new loss function that promotes spatially organized topographic representations in AI models without significantly sacrificing task performance. |
Mayukh Deb; Mainak Deb; Apurva Ratan Murty; |
476 | Planning in Natural Language Improves LLM Search for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We empirically demonstrate that this lack of diversity can be mitigated by searching over candidate plans for solving a problem in natural language. Based on this insight, we propose PlanSearch, a novel search algorithm which shows strong results across HumanEval+, MBPP+, and LiveCodeBench (a contamination-free benchmark for competitive coding). |
Evan Z Wang; Federico Cassano; Catherine Wu; Yunfeng Bai; William Song; Vaskar Nath; Ziwen Han; Sean M. Hendryx; Summer Yue; Hugh Zhang; |
477 | ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing code generation benchmarks fail to capture the diverse feedback encountered in multi-turn interactions, limiting our ability to evaluate LLMs in these contexts. To address this gap, we present a set of novel benchmarks that explicitly model the quality of feedback provided to code generation LLMs. |
Hojae Han; seung-won hwang; Rajhans Samdani; Yuxiong He; |
478 | Track-On: Transformer-based Online Point Tracking with Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider the problem of long-term point tracking, which requires consistent identification of points across multiple frames in a video, despite changes in appearance, lighting, perspective, and occlusions. |
Görkay Aydemir; Xiongyi Cai; Weidi Xie; Fatma Guney; |
479 | FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present \textbf{\textit{FasterCache}}, a novel training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation. |
Zhengyao Lv; Chenyang Si; Junhao Song; Zhenyu Yang; Yu Qiao; Ziwei Liu; Kwan-Yee K. Wong; |
480 | Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Meissonic, which elevates non-autoregressive text-to-image Masked Image Modeling (MIM) to a level comparable with state-of-the-art diffusion models like SDXL. |
Jinbin Bai; Tian Ye; Wei Chow; Enxin Song; Qing-Guo Chen; Xiangtai Li; Zhen Dong; Lei Zhu; Shuicheng YAN; |
481 | Revisiting Convolution Architecture in The Realm of DNA Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a simple but well-designed CNN-based method, termed ConvNova. |
Yu Bo; Weian Mao; Yanjun Shao; Weiqiang Bai; Peng Ye; Xinzhu Ma; Junbo Zhao; Hao Chen; Chunhua Shen; |
482 | Autoregressive Pretraining with Mamba in Vision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper shows that Mamba’s visual capability can be significantly enhanced through autoregressive pretraining, a direction not previously explored. |
Sucheng Ren; Xianhang Li; Haoqin Tu; Feng Wang; Fangxun Shu; Lei Zhang; Jieru Mei; Linjie Yang; Peng Wang; Heng Wang; Alan Yuille; Cihang Xie; |
483 | In Search of Forgotten Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In search of the forgotten domain generalization, we create large-scale datasets subsampled from LAION—LAION-Natural and LAION-Rendition—that are strictly OOD to corresponding ImageNet and DomainNet test sets in terms of style. |
Prasanna Mayilvahanan; Roland S. Zimmermann; Thaddäus Wiedemer; Evgenia Rusak; Attila Juhos; Matthias Bethge; Wieland Brendel; |
484 | MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Masked Image Modeling Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MIM (Masked Image Modeling)-Refiner, a contrastive learning boost for pre-trained MIM models. |
Benedikt Alkin; Lukas Miklautz; Sepp Hochreiter; Johannes Brandstetter; |
485 | Vision-LSTM: XLSTM As Generic Vision Backbone Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Vision-LSTM (ViL), an adaption of the xLSTM building blocks to computer vision. |
Benedikt Alkin; Maximilian Beck; Korbinian Pöppel; Sepp Hochreiter; Johannes Brandstetter; |
486 | Is In-Context Learning Sufficient for Instruction Following in LLMs? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that, while effective, ICL alignment with URIAL still underperforms compared to instruction fine-tuning on established benchmarks such as MT-Bench and AlpacaEval 2.0 (LC), especially with more capable base LLMs. |
Hao Zhao; Maksym Andriushchenko; Francesco Croce; Nicolas Flammarion; |
487 | INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we construct an evaluation suite of 197,243 QA pairs from local exam sources to measure the capabilities of multilingual LLMs in a variety of regional contexts. |
Angelika Romanou; Negar Foroutan; Anna Sotnikova; Sree Harsha Nelaturu; Shivalika Singh; Rishabh Maheshwary; Micol Altomare; Zeming Chen; Mohamed A. Haggag; Snegha A; Alfonso Amayuelas; Azril Hafizi Amirudin; Danylo Boiko; Michael Chang; Jenny Chim; Gal Cohen; Aditya Kumar Dalmia; Abraham Diress; Sharad Duwal; Daniil Dzenhaliou; Daniel Fernando Erazo Florez; Fabian Farestam; Joseph Marvin Imperial; Shayekh Bin Islam; Perttu Isotalo; Maral Jabbarishiviari; Börje F. Karlsson; Eldar Khalilov; Christopher Klamm; Fajri Koto; Dominik Krzemiński; Gabriel Adriano de Melo; Syrielle Montariol; Yiyang Nan; Joel Niklaus; Jekaterina Novikova; Johan Samir Obando Ceron; Debjit Paul; Esther Ploeger; Jebish Purbey; Swati Rajwal; Selvan Sunitha Ravi; Sara Rydell; Roshan Santhosh; Drishti Sharma; Marjana Prifti Skenduli; Arshia Soltani Moakhar; Bardia soltani moakhar; Ayush Kumar Tarun; Azmine Toushik Wasi; Thenuka Ovin Weerasinghe; Serhan Yilmaz; Mike Zhang; Imanol Schlag; Marzieh Fadaee; Sara Hooker; Antoine Bosselut; |
488 | Adaptive Gradient Clipping for Robust Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing static clipping strategies yield inconsistent results: enhancing robustness against some attacks while being ineffective or even detrimental against others. To address this limitation, we propose a principled adaptive clipping strategy, Adaptive Robust Clipping (ARC), which dynamically adjusts clipping thresholds based on the input gradients. |
Youssef Allouah; Rachid Guerraoui; Nirupam Gupta; Ahmed Jellouli; Geovani Rizk; John Stephan; |
489 | HART: Efficient Visual Generation with Hybrid Autoregressive Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Hybrid Autoregressive Transformer (HART), the first autoregressive (AR) visual generation model capable of directly generating 1024×1024 images, rivaling diffusion models in image generation quality. |
Haotian Tang; Yecheng Wu; Shang Yang; Enze Xie; Junsong Chen; Junyu Chen; Zhuoyang Zhang; Han Cai; Yao Lu; Song Han; |
490 | 3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce 3DGS-Drag, a point-based 3D editing framework that provides efficient, intuitive drag manipulation of real 3D scenes. |
Jiahua Dong; Yu-Xiong Wang; |
491 | LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel hierarchical autoencoder that maps 3D models into a highly compressed latent space. |
Biao Zhang; Peter Wonka; |
492 | Measuring and Enhancing Trustworthiness of LLMs in RAG Through Grounded Attributions and Learning to Refuse Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Consequently, we propose Trust-Align, a method to align LLMs for improved Trust-Score performance. |
Maojia Song; Shang Hong Sim; Rishabh Bhardwaj; Hai Leong Chieu; Navonil Majumder; Soujanya Poria; |
493 | Mixture Compressor for Mixture-of-Experts LLMs Gains More Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by these issues, we investigate the MoE-LLMs and make two key observations: a) different experts exhibit varying behaviors on activation reconstruction error, routing scores, and activated frequencies, highlighting their differing importance, and b) not all tokens are equally important– only a small subset is critical. Building on these insights, we propose MC, a training-free Mixture-Compressor for MoE-LLMs, which leverages the significance of both experts and tokens to achieve an extreme compression. |
Wei Huang; Yue Liao; Jianhui Liu; Ruifei He; Haoru Tan; Shiming Zhang; Hongsheng Li; Si Liu; XIAOJUAN QI; |
494 | NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While prior efforts focus on 3D diffusion models for their benefits in modeling continuous 3D conformers, they overlook the advantages of 1D SELFIES-based Language Models (LMs), which can generate 100\% valid molecules and leverage the billion-scale 1D molecule datasets. To combine these advantages for 3D molecule generation, we propose a foundation model — NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation. |
Zhiyuan Liu; Yanchen Luo; Han Huang; Enzhi Zhang; Sihang Li; Junfeng Fang; Yaorui Shi; Xiang Wang; Kenji Kawaguchi; Tat-Seng Chua; |
495 | SVDQuant: Absorbing Outliers By Low-Rank Component for 4-Bit Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to accelerate diffusion models by quantizing their weights and activations to 4 bits. |
Muyang Li; Yujun Lin; Zhekai Zhang; Tianle Cai; Xiuyu Li; Junxian Guo; Enze Xie; Chenlin Meng; Jun-Yan Zhu; Song Han; |
496 | InstaRevive: One-Step Image Enhancement Via Dynamic Score Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we propose InstaRevive, a straightforward yet powerful image enhancement framework that employs score-based diffusion distillation to harness potent generative capability and minimize the sampling steps. |
Yixuan Zhu; Haolin Wang; Ao Li; Wenliang Zhao; Yansong Tang; Jingxuan Niu; Lei Chen; Jie Zhou; Jiwen Lu; |
497 | Diffusing to The Top: Boost Graph Neural Networks with Minimal Hyperparameter Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces a graph-conditioned latent diffusion framework (GNN-Diff) to generate high-performing GNNs based on the model checkpoints of sub-optimal hyperparameters selected by a light-tuning coarse search. |
Lequan Lin; Dai Shi; Andi Han; Zhiyong Wang; Junbin Gao; |
498 | CodePlan: Unlocking Reasoning Potential in Large Language Models By Scaling Code-form Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the limitation, we introduce CodePlan, a scalable paradigm that empowers LLMs to generate and follow code-form plans—pseudocode that outlines high-level, structured reasoning processes.To train CodePlan, we construct a large-scale dataset of 2M examples that integrate code-form plans with standard prompt-response pairs from existing corpora. |
Jiaxin Wen; Jian Guan; Hongning Wang; Wei Wu; Minlie Huang; |
499 | Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Utilizing Distributionally Robust Optimization (DRO), we enhance DPO’s resilience to these types of noise. Our theoretical insights reveal that DPO inherently embeds DRO principles, conferring robustness to pointwise noise, with the regularization coefficient $\beta$ playing a critical role in its noise resistance. |
Junkang Wu; Yuexiang Xie; Zhengyi Yang; Jiancan Wu; Jiawei Chen; Jinyang Gao; Bolin Ding; Xiang Wang; Xiangnan He; |
500 | Non-myopic Generation of Language Models for Reasoning and Planning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper revisits LLM reasoning from an optimal control perspective, proposing a novel method, Predictive-Decoding, that leverages Model Predictive Control to enhance planning accuracy. |
Chang Ma; Haiteng Zhao; Junlei Zhang; Junxian He; Lingpeng Kong; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~3,700 papers), please visit Paper Digest: ICLR-2025 (Full List).