Paper Digest: ACL 2025 Papers & Highlights
Note: ACL-2025 (long, short and industry tracks) accepts more than 1,800 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All ~1,800 ACL-2025 papers in a separate page, which takes quite some time to load.
To search for papers presented at ACL-2025 on a specific topic, please make use of the search by venue (ACL-2025) service. To summarize the latest research published at ACL-2025 on a specific topic, you can utilize the review by venue (ACL-2025) service. If you are interested in browsing papers by author, we have a comprehensive list of ~ 8,500 authors (ACL-2025). Using this year’s data, our system also generates a report on recent natural language processing topics. Additionally, you may want to explore our “Best Paper” Digest (ACL), which lists the most influential ACL papers since 1981.
We’ve developed a service – ACL-2025 Research that synthesizes the latest findings from ACL 2025 into comprehensive reports. We encourage interested users to utilize our service to create tailored reports on other emerging topics.
This curated list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that gets you the personalized and comprehensive daily paper digests on the latest research in your field. It also empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Experience the full potential of our services today!
TABLE 1: Paper Digest: ACL 2025 Papers & Highlights
| Paper | Author(s) | |
|---|---|---|
| 1 | Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce ModernBERT, bringing modern model optimizations to encoder-only models and representing a major Pareto improvement over older encoders. |
Benjamin Warner; Antoine Chaffin; Benjamin Clavié; Orion Weller; Oskar Hallström; Said Taghadouini; Alexis Gallagher; Raja Biswas; Faisal Ladhak; Tom Aarsen; Griffin Thomas Adams; Jeremy Howard; Iacopo Poli; |
| 2 | Demons in The Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Under this strict constraint, even tokens from a domain-specific sequence (e. g. , code) are uniformly routed to all experts, thereby inhibiting expert specialization. In this work, we propose calculating LBL using a global-batch to loose this constraint. |
Zihan Qiu; Zeyu Huang; Bo Zheng; Kaiyue Wen; Zekun Wang; Rui Men; Ivan Titov; Dayiheng Liu; Jingren Zhou; Junyang Lin; |
| 3 | ProcessBench: Identifying Process Errors in Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce ProcessBench for measuring the ability to identify erroneous steps in mathematical reasoning. |
Chujie Zheng; Zhenru Zhang; Beichen Zhang; Runji Lin; Keming Lu; Bowen Yu; Dayiheng Liu; Jingren Zhou; Junyang Lin; |
| 4 | InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a simple and scalable training method called InSerter, which stands for Interleaved Speech-Text Representation Pre-training. |
Dingdong Wang; Jin Xu; Ruihang Chu; Zhifang Guo; Xiong Wang; Jincenzi Wu; Dongchao Yang; Shengpeng Ji; Junyang Lin; |
| 5 | Qwen2.5-xCoder: Multi-Agent Collaboration for Multilingual Code Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gap among different programming languages, we introduce a novel multi-agent collaboration framework to enhance multilingual instruction tuning for code LLMs, where multiple language-specific intelligent agent components with generation memory work together to transfer knowledge from one language to another efficiently and effectively. |
Jian Yang; Wei Zhang; Yibo Miao; Shanghaoran Quan; Zhenhe Wu; Qiyao Peng; Liqun Yang; Tianyu Liu; Zeyu Cui; Binyuan Hui; Junyang Lin; |
| 6 | Analyzing and Mitigating Inconsistency in Discrete Speech Tokens for Neural Codec Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we quantitatively analyze the DRI phenomenon within popular audio tokenizers such as EnCodec. |
Wenrui Liu; Zhifang Guo; Jin Xu; Yuanjun Lv; Yunfei Chu; Zemin Liu; Junyang Lin; |
| 7 | BIG-Bench Extra Hard Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: State-of-the-art models achieve near-perfect scores on many tasks in BBH, thus diminishing its utility. To address this limitation, we introduce BIG-Bench Extra Hard (BBEH), a new benchmark designed to push the boundaries of LLM reasoning evaluation. |
Mehran Kazemi; Bahare Fatemi; Hritik Bansal; John Palowitch; Chrysovalantis Anastasiou; Sanket Vaibhav Mehta; Lalit K Jain; Virginia Aglietti; Disha Jindal; Peter Chen; Nishanth Dikkala; Gladys Tyen; Xin Liu; Uri Shalit; Silvia Chiappa; Kate Olszewska; Yi Tay; Vinh Q. Tran; Quoc V Le; Orhan Firat; |
| 8 | Towards Effective Extraction and Evaluation of Factual Claims Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the lack of a standardized evaluation framework impedes assessment and comparison of claim extraction methods. To address this gap, we propose a framework for evaluating claim extraction in the context of fact-checking along with automated, scalable, and replicable methods for applying this framework, including novel approaches for measuring coverage and decontextualization. |
Dasha Metropolitansky; Jonathan Larson; |
| 9 | MPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this comes at the cost of generating thousands of visual tokens for a single document image, leading to excessive GPU memory and slower inference times, particularly in multi-page document comprehension. In this work, to address these challenges, we propose a High-resolution DocCompressor module to compress each high-resolution document image into 324 tokens, guided by low-resolution global visual features. |
Anwen Hu; Haiyang Xu; Liang Zhang; Jiabo Ye; Ming Yan; Ji Zhang; Qin Jin; Fei Huang; Jingren Zhou; |
| 10 | Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, translation introduces language bias and carries over cultural and regional assumptions from the original questions – often testing knowledge irrelevant to the target audience. In this work, we highlight the extent and impact of these biases and present a multilingual evaluation framework that aims to mitigate them through improved translations and annotation practices. |
Shivalika Singh; Angelika Romanou; Clémentine Fourrier; David Ifeoluwa Adelani; Jian Gang Ngui; Daniel Vila-Suero; Peerat Limkonchotiwat; Kelly Marchisio; Wei Qi Leong; Yosephine Susanto; Raymond Ng; Shayne Longpre; Sebastian Ruder; Wei-Yin Ko; Antoine Bosselut; Alice Oh; Andre Martins; Leshem Choshen; Daphne Ippolito; Enzo Ferrante; Marzieh Fadaee; Beyza Ermis; Sara Hooker; |
| 11 | Improve Vision Language Model Chain-of-thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that training VLM on short answers leads to poor generalization on reasoning tasks that require more detailed explanations. |
Ruohong Zhang; Bowen Zhang; Yanghao Li; Haotian Zhang; Zhiqing Sun; Zhe Gan; Yinfei Yang; Ruoming Pang; Yiming Yang; |
| 12 | Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present NSA, a Natively trained Sparse Attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling. |
Jingyang Yuan; Huazuo Gao; Damai Dai; Junyu Luo; Liang Zhao; Zhengyan Zhang; Zhenda Xie; Yuxing Wei; Lean Wang; Zhiping Xiao; Yuqing Wang; Chong Ruan; Ming Zhang; Wenfeng Liang; Wangding Zeng; |
| 13 | ACECODER: Acing Coder RL Via Automated Test-Case Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most progress in recent coder models has been driven by supervised fine-tuning (SFT), while the potential of reinforcement learning (RL) remains largely unexplored, primarily due to the lack of reliable reward data/model in the code domain. In this paper, we address this challenge by leveraging automated large-scale test-case synthesis to enhance code model training. |
Huaye Zeng; Dongfu Jiang; Haozhe Wang; Ping Nie; Xiaotong Chen; Wenhu Chen; |
| 14 | KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the overall superiority of the Decoder architecture, the gradually increasing Key-Value (KV) cache during inference has emerged as a primary efficiency bottleneck, both in aspects of memory consumption and data transfer bandwidth limitations. To address these challenges, we propose a paradigm called KV-Latent. |
Shi Luohe; Zuchao Li; Lefei Zhang; Baoyuan Qi; Liu Guoming; Hai Zhao; |
| 15 | Evaluating Language Models As Synthetic Data Generators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While prior works have focused on developing effective data generation methods, they lack systematic comparison of different LMs as data generators in a unified setting. To address this gap, we propose AgoraBench, a benchmark that provides standardized settings and metrics to evaluate LMs’ data generation abilities. |
Seungone Kim; Juyoung Suk; Xiang Yue; Vijay Viswanathan; Seongyun Lee; Yizhong Wang; Kiril Gashteovski; Carolin Lawrence; Sean Welleck; Graham Neubig; |
| 16 | MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces MMMU-Pro, a robust version of the Massive Multi-discipline Multimodal Understanding and Reasoning (MMMU) benchmark. |
Xiang Yue; Tianyu Zheng; Yuansheng Ni; Yubo Wang; Kai Zhang; Shengbang Tong; Yuxuan Sun; Botao Yu; Ge Zhang; Huan Sun; Yu Su; Wenhu Chen; Graham Neubig; |
| 17 | How to Train Long-Context Language Models (Effectively) Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information. |
Tianyu Gao; Alexander Wettig; Howard Yen; Danqi Chen; |
| 18 | Model Extrapolation Expedites Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the observation that alignment training typically involves only small parameter changes without injecting new knowledge into models, we propose a straightforward method called ExPO (model extrapolation) to expedite LLMs’ alignment with human preferences. |
Chujie Zheng; Ziqi Wang; Heng Ji; Minlie Huang; Nanyun Peng; |
| 19 | RPO: Retrieval Preference Optimization for Robust Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce the **R**etrieval **P**reference **O**ptimization (RPO), a lightweight and effective alignment method to adaptively leverage multi-source knowledge based on retrieval relevance. |
Shi-Qi Yan; Quan Liu; Zhen-Hua Ling; |
| 20 | CoT-Valve: Length-Compressible Chain-of-Thought Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new tuning and inference strategy named CoT-Valve, designed to allow models to generate reasoning chains of varying lengths. |
Xinyin Ma; Guangnian Wan; Runpeng Yu; Gongfan Fang; Xinchao Wang; |
| 21 | HalluLens: LLM Hallucination Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a comprehensive hallucination benchmark HalluLens, incorporating both extrinsic and intrinsic evaluation tasks, built upon a clear taxonomy of hallucination. |
Yejin Bang; Ziwei Ji; Alan Schelten; Anthony Hartshorn; Tara Fowler; Cheng Zhang; Nicola Cancedda; Pascale Fung; |
| 22 | LocAgent: Graph-Guided LLM Agents for Code Localization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce LocAgent, a framework that addresses code localization through a graph-guided agent. |
Zhaoling Chen; Robert Tang; Gangda Deng; Fang Wu; Jialong Wu; Zhiwei Jiang; Viktor Prasanna; Arman Cohan; Xingyao Wang; |
| 23 | Cramming 1568 Tokens Into A Single Vector and Back Again: Exploring The Limits of Embedding Space Capacity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore the limits of compression by replacing the encoder with a per-sample optimization procedure. |
Yuri Kuratov; Mikhail Arkhipov; Aydar Bulatov; Mikhail Burtsev; |
| 24 | Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Dynamic Block-Sparse Attention, an optimized method for retrieval-based many-shot in-context learning. |
Emily Xiao; Chin-Jou Li; Yilin Zhang; Graham Neubig; Amanda Bertsch; |
| 25 | MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These datasets target simplistic tasks, and only provide phrase-level answers without any intermediate rationales. To address these challenges, we introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales designed to elicit CoT reasoning. |
Jiawei Guo; Tianyu Zheng; Yizhi Li; Yuelin Bai; Bo Li; Yubo Wang; King Zhu; Graham Neubig; Wenhu Chen; Xiang Yue; |
| 26 | Binary Classifier Optimization for Large Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Binary Classifier Optimization (BCO), a technique that effectively aligns LLMs using only binary feedback. |
Seungjae Jung; Gunsoo Han; Daniel Wontae Nam; Kyoung-Woon On; |
| 27 | TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce TheoremExplainAgent, an agentic approach for generating long-form theorem explanation videos (over 5 minutes) using Manim animations. |
Max Ku; Cheuk Hei Chong; Jonathan Leung; Krish Shah; Alvin Yu; Wenhu Chen; |
| 28 | M³GQA: A Multi-Entity Multi-Hop Multi-Setting Graph Question Answering Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to construct diverse data with semantically correct ground-truth reasoning paths, we introduce a novel reasoning-driven four-step data construction method, including tree sampling, reasoning path backtracking, query creation, and multi-stage refinement and filtering. |
Boci Peng; Yongchao Liu; Xiaohe Bo; Jiaxin Guo; Yun Zhu; Xuanbo Fan; Chuntao Hong; Yan Zhang; |
| 29 | LongBench V2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces LongBench v2, a benchmark designed to assess the ability of LLMs to handle long-context problems requiring deep understanding and reasoning across real-world multitasks. |
Yushi Bai; Shangqing Tu; Jiajie Zhang; Hao Peng; Xiaozhi Wang; Xin Lv; Shulin Cao; Jiazheng Xu; Lei Hou; Yuxiao Dong; Jie Tang; Juanzi Li; |
| 30 | F5-TTS: A Fairytaler That Fakes Fluent and Faithful Speech with Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces F5-TTS, a fully non-autoregressive text-to-speech system based on flow matching with Diffusion Transformer (DiT). |
Yushen Chen; Zhikang Niu; Ziyang Ma; Keqi Deng; Chunhui Wang; JianZhao JianZhao; Kai Yu; Xie Chen; |
| 31 | EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we systematically evaluate mainstream acceleration techniques for LVLMs, categorized into token and parameter compression. |
Zekun Wang; MingHua Ma; Zexin Wang; Rongchuan Mu; Liping Shan; Ming Liu; Bing Qin; |
| 32 | AgentGym: Evaluating and Training Large Language Model-based Agents Across Diverse Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the community lacks a unified interactive framework that covers diverse environments for comprehensive evaluation of agents, and enables exploration and learning for their self-improvement. To address this, we propose AgentGym, a framework featuring 7 real-world scenarios, 14 environments, and 89 tasks for unified, real-time, and concurrent agent interaction. |
Zhiheng Xi; Yiwen Ding; Wenxiang Chen; Boyang Hong; Honglin Guo; Junzhe Wang; Xin Guo; Dingwen Yang; Chenyang Liao; Wei He; Songyang Gao; Lu Chen; Rui Zheng; Yicheng Zou; Tao Gui; Qi Zhang; Xipeng Qiu; Xuanjing Huang; Zuxuan Wu; Yu-Gang Jiang; |
| 33 | HateDay: Insights from A Global Hate Speech Dataset Representative of A Day on Twitter Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce HateDay, the first global hate speech dataset representative of social media settings, constructed from a random sample of all tweets posted on September 21, 2022 and covering eight languages and four English-speaking countries. |
Manuel Tonneau; Diyi Liu; Niyati Malhotra; Scott A. Hale; Samuel Fraiberger; Victor Orozco-Olvera; Paul Röttger; |
| 34 | RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios by generating high-quality documents, questions, answers, and references through a schema-based pipeline. |
Kunlun Zhu; Yifan Luo; Dingling Xu; Yukun Yan; Zhenghao Liu; Shi Yu; Ruobing Wang; Shuo Wang; Yishan Li; Nan Zhang; Xu Han; Zhiyuan Liu; Maosong Sun; |
| 35 | Extending LLM Context Window with Adaptive Grouped Positional Encoding: A Training-Free Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **Ada**ptive **Gro**uped **P**ositional **E**ncoding (AdaGroPE), a training-free, plug-and-play method to enhance long-context understanding in existing LLMs. |
Xinhao Xu; Jiaxin Li; Hui Chen; Zijia Lin; Jungong Han; Guiguang Ding; |
| 36 | GUICourse: From General Vision Language Model to Versatile GUI Agent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These limitations hinder their effectiveness as practical GUI agents. To address these challenges, we introduce GUICourse, a series of datasets for training visual-based GUI agents using general VLMs. |
Wentong Chen; Junbo Cui; Jinyi Hu; Yujia Qin; Junjie Fang; Yue Zhao; Chongyi Wang; Jun Liu; Guirong Chen; Yupeng Huo; Yuan Yao; Yankai Lin; Zhiyuan Liu; Maosong Sun; |
| 37 | AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose AGrail, a lifelong agent guardrail to enhance LLM agent safety, which features adaptive safety check generation, effective safety check optimization, and tool compatibility & flexibility. |
Weidi Luo; Shenghong Dai; Xiaogeng Liu; Suman Banerjee; Huan Sun; Muhao Chen; Chaowei Xiao; |
| 38 | People Who Frequently Use ChatGPT for Writing Tasks Are Accurate and Robust Detectors of AI-generated Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study how well humans can detect text generated by commercial LLMs (GPT-4o, Claude, o1). |
Jenna Russell; Marzena Karpinska; Mohit Iyyer; |
| 39 | Autoregressive Speech Synthesis Without Vector Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present MELLE, a novel continuous-valued token based language modeling approach for text-to-speech synthesis (TTS). |
Lingwei Meng; Long Zhou; Shujie Liu; Sanyuan Chen; Bing Han; Shujie Hu; Yanqing Liu; Jinyu Li; Sheng Zhao; Xixin Wu; Helen M. Meng; Furu Wei; |
| 40 | Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, our empirical findings reveal that, unlike smaller models, directly adding semantic parsing results into LLMs reduces their performance. To overcome this, we propose SENSE, a novel prompting approach that embeds semantic hints within the prompt. |
Kaikai An; Shuzheng Si; Helan Hu; Haozhe Zhao; Yuchi Wang; Qingyan Guo; Baobao Chang; |
| 41 | Time-MQA: Time Series Multi-Task Question Answering with Context Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing methods and datasets remain focused on a narrow spectrum of tasks, such as forecasting or anomaly detection. To bridge this gap, we introduce Time Series Multi-Task Question Answering (Time-MQA), a unified framework that enables natural language queries across multiple time series tasks – numerical analytical tasks and open-ended question answering with reasoning. |
Yaxuan Kong; Yiyuan Yang; Yoontae Hwang; Wenjie Du; Stefan Zohren; Zhangyang Wang; Ming Jin; Qingsong Wen; |
| 42 | Multilingual Arbitration: Optimizing Data Pools to Accelerate Multilingual Progress Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose multilingual arbitration, which exploits performance variations among multiple models for each language. |
Ayomide Odumakinde; Daniel D’souza; Pat Verga; Beyza Ermis; Sara Hooker; |
| 43 | Disentangling Memory and Reasoning Ability in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel language model inference paradigm that decomposes the complex inference process into two distinct and clear actions: (1) memory recall: which retrieves relevant knowledge in LLM, and (2) reasoning: which performs reasoning steps based on the recalled knowledge. |
Mingyu Jin; Weidi Luo; Sitao Cheng; Xinyi Wang; Wenyue Hua; Ruixiang Tang; William Yang Wang; Yongfeng Zhang; |
| 44 | AntiLeakBench: Preventing Data Contamination By Automatically Constructing Benchmarks with Updated Real-World Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they fail to guarantee contamination-free evaluation as the newly collected data may contain pre-existing knowledge, and their benchmark updates rely on intensive human labor. To address these issues, we in this paper propose AntiLeak-Bench, an automated anti-leakage benchmarking framework. |
Xiaobao Wu; Liangming Pan; Yuxi Xie; Ruiwen Zhou; Shuai Zhao; Yubo Ma; Mingzhe Du; Rui Mao; Anh Tuan Luu; William Yang Wang; |
| 45 | Understanding Common Ground Misalignment in Goal-Oriented Dialog: A Case-Study with Ubuntu Chat Logs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study failures of grounding in the Ubuntu IRC dataset, where participants use text-only communication to resolve technical issues. |
Rupak Sarkar; Neha Srikanth; Taylor Pellegrin; Rachel Rudinger; Claire Bonial; Philip Resnik; |
| 46 | Towards Context-Robust LLMs: A Gated Representation Fine-tuning Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, context-robust LLMs should rely on external context only when lacking internal knowledge, identify contradictions between internal and external knowledge, and disregard unhelpful contexts. To achieve this goal, we introduce Grft, a lightweight and plug-and-play gated representation fine-tuning approach. |
Shenglai Zeng; Pengfei He; Kai Guo; Tianqi Zheng; Hanqing Lu; Yue Xing; Hui Liu; |
| 47 | Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models Via A Multi-Paradigm Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Chain-of-Reasoning (CoR), a novel unified framework that integrates multiple reasoning paradigms — Natural Language Reasoning (NLR), Algorithmic Reasoning (AR), and Symbolic Reasoning (SR) — to enable synergistic collaboration. |
Yiyao Yu; Yuxiang Zhang; Dongdong Zhang; Xiao Liang; Hengyuan Zhang; Xingxing Zhang; Mahmoud Khademi; Hany Hassan Awadalla; Junjie Wang; Yujiu Yang; Furu Wei; |
| 48 | Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce **MCLM**, a multilingual math benchmark featuring competition-level problems in 55 languages. |
Guijin Son; Jiwoo Hong; Hyunwoo Ko; James Thorne; |
| 49 | Deliberate Reasoning in Language Models As Structure-Aware Planning with An Accurate World Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel reasoning framework, referred to as Structure-aware Planning with an Accurate World Model (SWAP), that integrates structured knowledge representation with learned planning. |
Siheng Xiong; Ali Payani; Yuan Yang; Faramarz Fekri; |
| 50 | DeAL: Decoding-time Alignment for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: First, the inability to incorporate multiple, custom rewards and reliance on a model developer�s view of universal and static principles are key limitations. Second, the reliability of such approaches is also questionable (e. g. susceptibility to jailbreaking even after safety training). To address these issues, we propose DeAL, a framework that allows the user to customize reward functions and enables Decoding-time Alignment of LLMs (DeAL). |
James Y. Huang; Sailik Sengupta; Daniele Bonadiman; Yi-An Lai; Arshit Gupta; Nikolaos Pappas; Saab Mansour; Katrin Kirchhoff; Dan Roth; |
| 51 | MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the open-source nature of these benchmarks and the broad sources of training data for LLMs have inevitably led to benchmark contamination, resulting in unreliable evaluation. To alleviate this issue, we propose the contamination-free MCQ benchmark called MMLU-CF, which reassesses LLMs’ understanding of world knowledge by averting both unintentional and malicious data contamination. |
Qihao Zhao; Yangyu Huang; Tengchao Lv; Lei Cui; Qinzheng Sun; Shaoguang Mao; Xin Zhang; Ying Xin; Qiufeng Yin; Scarlett Li; Furu Wei; |
| 52 | Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce **ReDial** (**Re**asoning with **Dial**ect Queries), a benchmark containing 1. |
Fangru Lin; Shaoguang Mao; Emanuele La Malfa; Valentin Hofmann; Adrian de Wynter; Xun Wang; Si-Qing Chen; Michael J. Wooldridge; Janet B. Pierrehumbert; Furu Wei; |
| 53 | OpenWebVoyager: Building Multimodal Web Agents Via Iterative Real-World Exploration, Feedback and Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce an innovative multimodal web agent that can autonomously conduct real-world exploration and improve itself. |
Hongliang He; Wenlin Yao; Kaixin Ma; Wenhao Yu; Hongming Zhang; Tianqing Fang; Zhenzhong Lan; Dong Yu; |
| 54 | ProxAnn: Use-Oriented Evaluations of Topic Models and Document Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design a scalable human evaluation protocol and a corresponding automated approximation that reflect practitioners� real-world usage of models. |
Alexander Miserlis Hoyle; Lorena Calvo-Bartolomé; Jordan Lee Boyd-Graber; Philip Resnik; |
| 55 | AndroidGen: Building An Android Language Agent Under Data Scarcity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the other hand, existing LLMs exhibit inadequate completion rates and need a robust data filtration strategy. Given these challenges, we develop a framework called AndroidGen to enhance the capabilities of LLM-based agents under data scarcity. |
Hanyu Lai; Junjie Gao; Xiao Liu; Yifan Xu; Shudan Zhang; Yuxiao Dong; Jie Tang; |
| 56 | A Survey of Post-Training Scaling in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a comprehensive survey of post-training scaling, an emergent paradigm aiming to relieve the limitations of traditional pre-training by focusing on the alignment phase, which traditionally accounts for a minor fraction of the total training computation. |
Hanyu Lai; Xiao Liu; Junjie Gao; Jiale Cheng; Zehan Qi; Yifan Xu; Shuntian Yao; Dan Zhang; Jinhua Du; Zhenyu Hou; Xin Lv; Minlie Huang; Yuxiao Dong; Jie Tang; |
| 57 | Agentic Knowledgeable Self-awareness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose KnowSelf, a data-centric approach that applies agents with knowledgeable self-awareness like humans. |
Shuofei Qiao; Zhisong Qiu; Baochang Ren; Xiaobin Wang; Xiangyuan Ru; Ningyu Zhang; Xiang Chen; Yong Jiang; Pengjun Xie; Fei Huang; Huajun Chen; |
| 58 | HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we collect HelpSteer3 data to train dedicated Feedback and Edit Models that are capable of performing inference-time scaling for open-ended general-domain tasks. |
Zhilin Wang; Jiaqi Zeng; Olivier Delalleau; Daniel Egert; Ellie Evans; Hoo-Chang Shin; Felipe Soares; Yi Dong; Oleksii Kuchaiev; |
| 59 | CulturalBench: A Robust, Diverse and Challenging Benchmark for Measuring LMs’ Cultural Knowledge Through Human-AI Red-Teaming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce CulturalBench: a set of 1,696 human-written and human-verified questions to assess LMs’ cultural knowledge, covering 45 global regions including underrepresented ones like Bangladesh, Zimbabwe, and Peru. |
Yu Ying Chiu; Liwei Jiang; Bill Yuchen Lin; Chan Young Park; Shuyue Stella Li; Sahithya Ravi; Mehar Bhatia; Maria Antoniak; Yulia Tsvetkov; Vered Shwartz; Yejin Choi; |
| 60 | ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although existing open-source MLLMs have achieved success in chart understanding tasks, they still face two major challenges when applied to chart-to-code tasks: (1) Low executability and poor restoration of chart details in the generated code and (2) Lack of large-scale and diverse training data. To address these challenges, we propose ChartCoder, the first dedicated chart-to-code MLLM, which leverages Code LLMs as the language backbone to enhance the executability of the generated code. |
Xuanle Zhao; Xianzhen Luo; Qi Shi; Chi Chen; Shuo Wang; Zhiyuan Liu; Maosong Sun; |
| 61 | LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce LLaMA-Omni 2, a series of speech language models (SpeechLMs) ranging from 0. |
Qingkai Fang; Yan Zhou; Shoutao Guo; Shaolei Zhang; Yang Feng; |
| 62 | Enhancing Multimodal Continual Instruction Tuning with BranchLoRA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify a critical parameter inefficiency in the MoELoRA framework within the MCIT context. |
Duzhen Zhang; Yong Ren; Zhong-Zhi Li; Yahan Yu; Jiahua Dong; Chenxing Li; Zhilong Ji; Jinfeng Bai; |
| 63 | UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on UTGenerator, we propose UTBoost, a comprehensive framework for test case augmentation. |
Boxi Yu; Yuxuan Zhu; Pinjia He; Daniel Kang; |
| 64 | InductionBench: LLMs Fail in The Simplest Complexity Class Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such inductive processes lie at the heart of scientific discovery, as they enable researchers to extract general principles from empirical observations. To assess whether LLMs possess this capacity, we introduce InductionBench, a new benchmark designed to evaluate the inductive reasoning ability of LLMs. |
Wenyue Hua; Tyler Wong; Fei Sun; Liangming Pan; Adam Jardine; William Yang Wang; |
| 65 | ShifCon: Enhancing Non-Dominant Language Capabilities with A Shift-based Multilingual Contrastive Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further enhance the performance of non-dominant languages, we propose ShifCon, a Shift-based multilingual Contrastive framework that aligns the internal forward process of other languages toward that of the dominant one. |
Hengyuan Zhang; Chenming Shang; Sizhe Wang; Dongdong Zhang; Yiyao Yu; Feng Yao; Renliang Sun; Yujiu Yang; Furu Wei; |
| 66 | Sticking to The Mean: Detecting Sticky Tokens in Text Embedding Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These tokens, when repeatedly inserted into sentences, pull sentence similarity toward a certain value, disrupting the normal distribution of embedding distances and degrading downstream performance. In this paper, we systematically investigate such anomalous tokens, formally defining them and introducing an efficient detection method, Sticky Token Detector (STD), based on sentence and token filtering. |
Kexin Chen; Dongxia Wang; Yi Liu; Haonan Zhang; Wenhai Wang; |
| 67 | Aligning Large Language Models with Implicit Preferences from User-Generated Content Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present PUGC, a novel framework that leverages implicit human Preferences in unlabeled User-Generated Content (UGC) to generate preference data. |
Zhaoxuan Tan; Zheng Li; Tianyi Liu; Haodong Wang; Hyokun Yun; Ming Zeng; Pei Chen; Zhihan Zhang; Yifan Gao; Ruijie Wang; Priyanka Nigam; Bing Yin; Meng Jiang; |
| 68 | Bitnet.cpp: Efficient Edge Inference for Ternary LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite this, research and practical applications focusing on efficient edge inference for ternary LLMs remain scarce. To bridge this gap, we introduce Bitnet. |
Jinheng Wang; Hansong Zhou; Ting Song; Shijie Cao; Yan Xia; Ting Cao; Jianyu Wei; Shuming Ma; Hongyu Wang; Furu Wei; |
| 69 | CaLMQA: Exploring Culturally Specific Long-form Question Answering Across 23 Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We define culturally specific questions as those that refer to concepts unique to one or a few cultures, or have different answers depending on the cultural or regional context. We obtain these questions by crawling naturally-occurring questions from community web forums in high-resource languages, and by hiring native speakers to write questions in under-resourced, rarely-studied languages such as Fijian and Kirundi. |
Shane Arora; Marzena Karpinska; Hung-Ting Chen; Ipsita Bhattacharjee; Mohit Iyyer; Eunsol Choi; |
| 70 | HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by human problem-solving strategies, this paper introduces HiAgent, a framework that leverages subgoals as memory chunks to manage the working memory of LLM-based agents hierarchically. |
Mengkang Hu; Tianxing Chen; Qiguang Chen; Yao Mu; Wenqi Shao; Ping Luo; |
| 71 | Why Prompt Design Matters and Works: A Complexity Analysis of Prompt Search Space in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a theoretical framework that explains why some prompts succeed while others fail. |
Xiang Zhang; Juntai Cao; Chenyu You; Dujian Ding; |
| 72 | Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While chain-of-thought and retrieval-augmented generation help break down problems and retrieve knowledge, they still falter on challenging tasks like competitive programming due to frequent reasoning errors and irrelevant retrieval. To address this, we introduce Critic-guided planning with Retrieval-augmentation, CR-Planner, a novel framework that leverages fine-tuned critic models to guide both reasoning and retrieval processes through planning. |
Xingxuan Li; Weiwen Xu; Ruochen Zhao; Fangkai Jiao; Shafiq Joty; Lidong Bing; |
| 73 | Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attributions Explainability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, AOPC scores are difficult to interpret in isolation without knowing the model-specific lower and upper limits. To address these issues, we propose a normalization approach, Normalized AOPC (NAOPC), enabling consistent cross-model evaluations and more meaningful interpretation of individual scores. |
Joakim Edin; Andreas Geert Motzfeldt; Casper L. Christensen; Tuukka Ruotsalo; Lars Maaløe; Maria Maistro; |
| 74 | Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve LLM decision-making while maintaining efficiency, we propose the Speculative Reward Model (SRM), a plug-and-play framework that seamlessly integrates with existing search strategies. |
Jiawei Gu; Shangsong Liang; |
| 75 | Demystifying Small Language Models for Edge Deployment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents the first comprehensive study of over 60 SLMs such as Microsoft Phi and Google Gemma that are publicly accessible. |
Zhenyan Lu; Xiang Li; Dongqi Cai; Rongjie Yi; Fangming Liu; Wei Liu; Jian Luan; Xiwen Zhang; Nicholas D. Lane; Mengwei Xu; |
| 76 | Nemotron-CC: Transforming Common Crawl Into A Refined Long-Horizon Pretraining Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show how to achieve better trade-offs between accuracy and data quantity by a combination of classifier ensembling, synthetic data rephrasing, and reduced reliance on heuristic filters. |
Dan Su; Kezhi Kong; Ying Lin; Joseph Jennings; Brandon Norick; Markus Kliegl; Mostofa Patwary; Mohammad Shoeybi; Bryan Catanzaro; |
| 77 | Toward Automatic Discovery of A Canine Phonetic Alphabet Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the first time, this paper presents an iterative algorithm inspired by human phonetic discovery, which is based on minimal pairs that determine phonemes by distinguishing different words in human language, and is able to produce a complete alphabet of distinct canine phoneme-like units. |
Theron S. Wang; Xingyuan Li; Hridayesh Lekhak; Tuan Minh Dang; Mengyue Wu; Kenny Q. Zhu; |
| 78 | Hierarchical Document Refinement for Long-context Retrieval-augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Real-world RAG applications often encounter long-context input scenarios, where redundant information and noise results in higher inference costs and reduced performance. To address these challenges, we propose LongRefiner, an efficient plug-and-play refiner that leverages the inherent structural characteristics of long documents. |
Jiajie Jin; Xiaoxi Li; Guanting Dong; Yuyao Zhang; Yutao Zhu; Yongkang Wu; Zhonghua Li; Ye Qi; Zhicheng Dou; |
| 79 | Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals’ Subjective Text Perceptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Across all tasks, our results suggest that models learn little meaningful connection between sociodemographics and annotation, raising doubts about the current use of LLMs for simulating sociodemographic variation and behaviour. |
Matthias Orlikowski; Jiaxin Pei; Paul Röttger; Philipp Cimiano; David Jurgens; Dirk Hovy; |
| 80 | BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose methodology for automated comparison of language models that uses performance-aware contextual embeddings to find fine-grained features of text where one LM outperforms another. |
Lindia Tjuatja; Graham Neubig; |
| 81 | Sightation Counts: Leveraging Sighted User Feedback in Building A BLV-aligned Dataset of Diagram Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we ask sighted individuals to assess—rather than produce—diagram descriptions generated by vision-language models (VLM) that have been guided with latent supervision via a multi-pass inference. |
Wan Ju Kang; Eunki Kim; Na Min An; Sangryul Kim; Haemin Choi; Ki Hoon Kwak; James Thorne; |
| 82 | Auto-Arena: Automating LLM Evaluations with Agent Peer Battles and Committee Discussions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, human evaluations require significant manual effort. Therefore, we propose Auto-Arena, an innovative framework that automates the entire evaluation process using LLM-powered agents. |
Ruochen Zhao; Wenxuan Zhang; Yew Ken Chia; Weiwen Xu; Deli Zhao; Lidong Bing; |
| 83 | LLMs Know Their Vulnerabilities: Uncover Safety Gaps Through Natural Distribution Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify a new safety vulnerability in LLMs: their susceptibility to natural distribution shifts between attack prompts and original toxic prompts, where seemingly benign prompts, semantically related to harmful content, can bypass safety mechanisms. |
Qibing Ren; Hao Li; Dongrui Liu; Zhanxu Xie; Xiaoya Lu; Yu Qiao; Lei Sha; Junchi Yan; Lizhuang Ma; Jing Shao; |
| 84 | Fusing Highly Specialized Language Models for Comprehensive Expertise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to “play the dealt cards well” and propose to fuse models that are already highly-specialized directly. |
Ning Ding; Yulin Chen; Ganqu Cui; Xingtai Lv; Weilin Zhao; Kaiyan Zhang; Ruobing Xie; Bowen Zhou; Zhiyuan Liu; Maosong Sun; |
| 85 | Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study addresses the gap by introducing the Generative Psycho-Lexical Approach (GPLA), a scalable, adaptable, and theoretically informed method for constructing value systems. |
Haoran Ye; TianZe Zhang; Yuhang Xie; Liyuan Zhang; Yuanyi Ren; Xin Zhang; Guojie Song; |
| 86 | M-RewardBench: Evaluating Reward Models in Multilingual Settings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we conduct a systematic evaluation of several reward models in multilingual settings. |
Srishti Gureja; Lester James Validad Miranda; Shayekh Bin Islam; Rishabh Maheshwary; Drishti Sharma; Gusti Triandi Winata; Nathan Lambert; Sebastian Ruder; Sara Hooker; Marzieh Fadaee; |
| 87 | Lost in The Context: Insufficient and Distracted Attention to Contexts in Preference Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These issues undermine the RM’s effectiveness in modeling human preferences. To further address these challenges, we propose AttnRM, a novel optimization framework that enables the RM to concentrate on crucial segments of the context. |
Shihan Dou; Jiayi Chen; Chenhao Huang; Feng Chen; Wei Chengzhi; Huiyuan Zheng; Shichun Liu; Yan Liu; Chenxiao Liu; Chao Xin; Lin Yan; Zongzhang Zhang; Tao Gui; Qi Zhang; Xuanjing Huang; |
| 88 | We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we decompose composite problems into sub-problems according to the required knowledge concepts and introduce a novel four-dimensional metric to hierarchically assess inherent issues in LMMs’ reasoning process. |
Runqi Qiao; Qiuna Tan; Guanting Dong; MinhuiWu MinhuiWu; Chong Sun; Xiaoshuai Song; Jiapeng Wang; Zhuoma GongQue; Shanglin Lei; YiFan Zhang; Zhe Wei; Miaoxuan Zhang; Runfeng Qiao; Xiao Zong; Yida Xu; Peiqing Yang; Zhimin Bao; Muxi Diao; Chen Li; Honggang Zhang; |
| 89 | V-Oracle: Making Progressive Reasoning in Deciphering Oracle Bones for You and Me Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose V-Oracle, an innovative framework that utilizes Large Multi-modal Models (LMMs) for interpreting OBS. |
Runqi Qiao; Qiuna Tan; Guanting Dong; MinhuiWu MinhuiWu; Jiapeng Wang; YiFan Zhang; Zhuoma GongQue; Chong Sun; Yida Xu; Yadong Xue; Ye Tian; Zhimin Bao; Lan Yang; Chen Li; Honggang Zhang; |
| 90 | APB: Accelerating Distributed Long-Context Inference By Passing Compressed Context Blocks Across GPUs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This hinders scaling the inputs to longer sequences and processing long-context queries in a timely manner. To address this, we introduce APB, an efficient long-context inference framework that leverages multi-host approximate attention to enhance prefill speed by reducing compute and enhancing parallelism simultaneously. |
Yuxiang Huang; Mingye Li; Xu Han; Chaojun Xiao; Weilin Zhao; Sun Ao; Hao Zhou; Jie Zhou; Zhiyuan Liu; Maosong Sun; |
| 91 | LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models Via Restoration Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this often leads to degraded performance on short-text tasks, while the reasons for this degradation remain insufficiently explored. In this work, we identify two primary factors contributing to this issue: distribution drift in hidden states and attention scores, and catastrophic forgetting during continual pre-training. |
Zican Dong; Junyi Li; Jinhao Jiang; Mingyu Xu; Xin Zhao; Bingning Wang; Weipeng Chen; |
| 92 | CodeDPO: Aligning Code Models with Self Generated and Verified Source Code Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing training methods like supervised fine-tuning face key limitations: they do not effectively teach models to prioritize correct over incorrect solutions in ambiguous situations, nor do they effectively optimize the runtime efficiency of the generated code. To address these challenges, we propose CodeDPO, a framework that integrates preference learning into code generation to improve two key code preference factors: code correctness and efficiency. |
Kechi Zhang; Ge Li; Yihong Dong; Jingjing Xu; Jun Zhang; Jing Su; Yongfei Liu; Zhi Jin; |
| 93 | MathAgent: Leveraging A Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though effective in mathematical problem-solving, MLLMs often struggle with the nuanced task of **identifying and categorizing student errors in multimodal mathematical contexts**. Therefore, we introduce **MathAgent, a novel Mixture-of-Math-Agent framework** specifically designed to address these challenges. |
Yibo Yan; Shen Wang; Jiahao Huo; Philip S. Yu; Xuming Hu; Qingsong Wen; |
| 94 | Uni-Retrieval: A Multi-Style Retrieval Framework for STEM’s Education Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a diverse expression retrieval task tailored to educational scenarios, supporting retrieval based on multiple query styles and expressions. |
Yanhao Jia; Xinyi Wu; Li Hao; QinglinZhang QinglinZhang; Yuxiao Hu; Shuai Zhao; Wenqi Fan; |
| 95 | Caution for The Environment: Multimodal LLM Agents Are Susceptible to Environmental Distractions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates the faithfulness of multimodal large language model (MLLM) agents in a graphical user interface (GUI) environment, aiming to address the research question of whether multimodal GUI agents can be distracted by environmental context. |
Xinbei Ma; Yiting Wang; Yao Yao; Tongxin Yuan; Aston Zhang; Zhuosheng Zhang; Hai Zhao; |
| 96 | AgentRM: Enhancing Agent Generalization with Reward Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we find that finetuning a reward model to guide the policy model is more robust than directly finetuning the policy model. |
Yu Xia; Jingru Fan; Weize Chen; Siyu Yan; Xin Cong; Zhong Zhang; Yaxi Lu; Yankai Lin; Zhiyuan Liu; Maosong Sun; |
| 97 | LLM×MapReduce: Simplified Long-Sequence Processing Using Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a training-free framework that enables large language models (LLMs) to effectively process long texts, using a divide-and-conquer strategy for comprehensive document understanding. |
Zihan Zhou; Chong Li; Xinyi Chen; Shuo Wang; Yu Chao; Zhili Li; Haoyu Wang; Qi Shi; Zhixing Tan; Xu Han; Xiaodong Shi; Zhiyuan Liu; Maosong Sun; |
| 98 | INews: A Multimodal Dataset for Modeling Personalized Affective Responses to News Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce iNews, a novel large-scale dataset specifically designed to facilitate the modeling of personalized affective responses to news content. |
Tiancheng Hu; Nigel Collier; |
| 99 | Aligning Large Language Models to Follow Instructions and Hallucinate Less Via Effective Data Filtering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Training LLMs on data containing unfamiliar knowledge during the instruction tuning stage can encourage hallucinations. To address this challenge, we introduce NOVA, a novel framework designed to identify high-quality data that aligns well with the LLM’s learned knowledge to reduce hallucinations. |
Shuzheng Si; Haozhe Zhao; Gang Chen; Cheng Gao; Yuzhuo Bai; Zhitong Wang; Kaikai An; Kangyang Luo; Chen Qian; Fanchao Qi; Baobao Chang; Maosong Sun; |
| 100 | The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel statistical procedure, the Alternative Annotator Test (alt-test), that requires only a modest subset of annotated examples to justify using LLM annotations. |
Nitay Calderon; Roi Reichart; Rotem Dror; |
| 101 | Enhancing Open-Domain Task-Solving Capability of LLMs Via Autonomous Tool Integration from GitHub Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce OpenAct benchmark to evaluate the open-domain task-solving capability, which is built on human expert consultation and repositories in GitHub. |
Bohan Lyu; Xin Cong; Heyang Yu; Pan Yang; Cheng Qian; Zihe Wang; Yujia Qin; Yining Ye; Yaxi Lu; Chen Qian; Zhong Zhang; Yukun Yan; Yankai Lin; Zhiyuan Liu; Maosong Sun; |
| 102 | Byte Latent Transformer: Patches Scale Better Than Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness. |
Artidoro Pagnoni; Ramakanth Pasunuru; Pedro Rodriguez; John Nguyen; Benjamin Muller; Margaret Li; Chunting Zhou; Lili Yu; Jason E Weston; Luke Zettlemoyer; Gargi Ghosh; Mike Lewis; Ari Holtzman; Srini Iyer; |
| 103 | DEEPER Insight Into Your User: Directed Persona Refinement for Dynamic Persona Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods—whether regenerating personas or incrementally extending them with new behaviors—often fail to achieve sustained improvements in persona quality or future behavior prediction accuracy. To address this, we propose DEEPER, a novel approach for dynamic persona modeling that enables continual persona optimization. |
Aili Chen; Chengyu Du; Jiangjie Chen; Jinghan Xu; Yikai Zhang; Siyu Yuan; Zulong Chen; Liangyue Li; Yanghua Xiao; |
| 104 | Towards Generating Controllable and Solvable Geometry Problem By Leveraging Symbolic Deduction Engine Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel task for geometry problem generation and propose a new pipeline method: the Symbolic Deduction Engine-based Geometry Problem Generation framework (SDE-GPG). |
Zhuoxuan Jiang; Tianyang Zhang; Peiyan Peng; Jing Chen; Yinong Xun; Haotian Zhang; Lichi Li; Yong Li; Shaohua Zhang; |
| 105 | SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce SongComposer, a pioneering step towards a unified song composition model that can readily create symbolic lyrics and melodies following instructions. |
Shuangrui Ding; Zihan Liu; Xiaoyi Dong; Pan Zhang; Rui Qian; Junhao Huang; Conghui He; Dahua Lin; Jiaqi Wang; |
| 106 | Gödel Agent: A Self-Referential Agent Framework for Recursively Self-Improvement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Gödel Agent, a self-evolving framework inspired by the Gödel Machine, enabling agents to recursively improve themselves without relying on predefined routines or fixed optimization algorithms. |
Xunjian Yin; Xinyi Wang; Liangming Pan; Li Lin; Xiaojun Wan; William Yang Wang; |
| 107 | Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is rooted in the fact that these systems fail to fully leverage the inherent structure of logical tasks throughout the reasoning processes, including decomposition, search, and resolution. To address this, this paper proposes a logic-complete reasoning framework, Aristotle. |
Jundong Xu; Hao Fei; Meng Luo; Qian Liu; Liangming Pan; William Yang Wang; Preslav Nakov; Mong-Li Lee; Wynne Hsu; |
| 108 | ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker’s voice and enabling arbitrary control and adjustment of speaking style. |
Shengpeng Ji; Qian Chen; Wen Wang; Jialong Zuo; Minghui Fang; Ziyue Jiang; Hai Huang; Zehan Wang; Xize Cheng; Siqi Zheng; Zhou Zhao; |
| 109 | Language-Codec: Bridging Discrete Codec Representations and Speech Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Consequently, leveraging the characteristics of speech language models, we propose Language-Codec. |
Shengpeng Ji; Minghui Fang; Jialong Zuo; Ziyue Jiang; Dingdong Wang; Hanting Wang; Hai Huang; Zhou Zhao; |
| 110 | Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Merge Hijacking, the first backdoor attack targeting model merging in LLMs. |
Zenghui Yuan; Yangming Xu; Jiawen Shi; Pan Zhou; Lichao Sun; |
| 111 | Learning to Generate Structured Output with Schema Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore various aspects of JSON generation, such as structure understanding, escaping, and natural language description, to determine how to assess and enable LLMs to generate valid responses. Building upon this, we propose SchemaBench features around 40K different JSON schemas to obtain and assess models’ abilities in generating valid JSON. |
Yaxi Lu; Haolun Li; Xin Cong; Zhong Zhang; Yesai Wu; Yankai Lin; Zhiyuan Liu; Fangming Liu; Maosong Sun; |
| 112 | In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections: (1) Prospective Reflection, which dynamically summarizes interactions across granularities—utterances, turns, and sessions—into a personalized memory bank for effective future retrieval, and (2) Retrospective Reflection, which iteratively refines the retrieval in an online reinforcement learning (RL) manner based on LLMs’ cited evidence. |
Zhen Tan; Jun Yan; I-Hung Hsu; Rujun Han; Zifeng Wang; Long Le; Yiwen Song; Yanfei Chen; Hamid Palangi; George Lee; Anand Rajan Iyer; Tianlong Chen; Huan Liu; Chen-Yu Lee; Tomas Pfister; |
| 113 | AutoMixer: Checkpoint Artifacts As Automatic Data Mixers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we observe that checkpoint models exhibit emerging capabilities at different points in the training trajectory. |
Ernie Chang; Yang Li; Patrick Huber; Vish Vogeti; David Kant; Yangyang Shi; Vikas Chandra; |
| 114 | TreeRL: LLM Reinforcement Learning with On-Policy Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose TreeRL, a reinforcement learning framework that directly incorporates on-policy tree search for RL training. |
Zhenyu Hou; Ziniu Hu; Yujiang Li; Rui Lu; Jie Tang; Yuxiao Dong; |
| 115 | Language Models Resist Alignment: Evidence From Data Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Does alignment fine-tuning yield have robust effects on models, or are its impacts merely superficial? In this work, we make the first exploration of this phenomenon from both theoretical and empirical perspectives. |
Jiaming Ji; Kaile Wang; Tianyi Alex Qiu; Boyuan Chen; Jiayi Zhou; Changye Li; Hantao Lou; Josef Dai; Yunhuai Liu; Yaodong Yang; |
| 116 | PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the PKU-SafeRLHF dataset, designed to promote research on safety alignment in large language models (LLMs). |
Jiaming Ji; Donghai Hong; Borong Zhang; Boyuan Chen; Josef Dai; Boren Zheng; Tianyi Alex Qiu; Jiayi Zhou; Kaile Wang; Boxun Li; Sirui Han; Yike Guo; Yaodong Yang; |
| 117 | Fixing Distribution Shifts of LLM Self-Critique Via On-Policy Self-Play Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an on-policy reinforcement learning framework to synchronize the reasoning and critique capabilities of language models. |
Rong Bao; Donglei Yu; Kai Fan; Minpeng Liao; |
| 118 | Efficiently Identifying Watermarked Segments in Mixed-Source Texts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Drawing inspiration from plagiarism detection systems, we propose two novel methods for partial watermark detection. |
Xuandong Zhao; Chenwen Liao; Yu-Xiang Wang; Lei Li; |
| 119 | Revisiting The Test-Time Scaling of O1-like Models: Do They Truly Possess Test-Time Scaling Capabilities? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We then compare sequential and parallel scaling strategies on QwQ, R1 and LIMO, finding that parallel scaling achieves better coverage and scalability. Based on these insights, we propose “Shortest Majority Vote”, a method that combines parallel scaling strategies with CoT length characteristics, significantly improving models’ test-time scalability compared to conventional majority voting approaches. |
Zhiyuan Zeng; Qinyuan Cheng; Zhangyue Yin; Yunhua Zhou; Xipeng Qiu; |
| 120 | LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters Through Modality Linear Representation-Steering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce Modality Linear Representation-Steering (MoReS), which re-balances intrinsic modalities by steering visual representations through linear transformations in the visual subspace across each model layer. |
Jinhe Bi; Yujun Wang; Haokun Chen; Xun Xiao; Artur Hecker; Volker Tresp; Yunpu Ma; |
| 121 | RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence Within Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Retrieval-augmented generation (RAG) offers an effective solution by incorporating external knowledge, but existing methods still face several limitations: additional deployment costs of separate retrievers, redundant input tokens from retrieved text chunks, and the lack of joint optimization of retrieval and generation. To address these issues, we propose RetroLLM, a unified framework that integrates retrieval and generation into a single, auto-regressive process, enabling LLMs to directly generate fine-grained evidence from the corpus with constrained decoding. |
Xiaoxi Li; Jiajie Jin; Yujia Zhou; Yongkang Wu; Zhonghua Li; Ye Qi; Zhicheng Dou; |
| 122 | PIGuard: Prompt Injection Guardrail Via Mitigating Overdefense for Free Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our results show that state-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60%). To mitigate this, we propose PIGuard, a novel prompt guard model that incorporates a new training strategy, Mitigating Over-defense for Free (MOF), which significantly reduces the bias on trigger words. |
Hao Li; Xiaogeng Liu; Ning Zhang; Chaowei Xiao; |
| 123 | LLM Agents Making Agent Tools Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the growing trend of scientific studies accompanied by public code repositories, we propose ToolMaker, an agentic framework that autonomously transforms papers with code into LLM-compatible tools. |
Georg Wölflein; Dyke Ferber; Daniel Truhn; Ognjen Arandjelovic; Jakob Nikolas Kather; |
| 124 | Cooperative or Competitive? Understanding The Interaction Between Attention Heads From A Game Theory Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further optimize the interactions among attention heads, we propose a training-free Game-theoretic Attention Calibration (GAC) method. |
Xiaoye Qu; Zengqi Yu; Dongrui Liu; Wei Wei; Daizong Liu; Jianfeng Dong; Yu Cheng; |
| 125 | DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce DRAMA, a training framework that leverages LLMs to train smaller generalizable dense retrievers. |
Xueguang Ma; Xi Victoria Lin; Barlas Oguz; Jimmy Lin; Wen-tau Yih; Xilun Chen; |
| 126 | Bias in Language Models: Beyond Trick Tests and Towards RUTEd Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that standard bias metrics have no significant correlation with long-form output metrics. |
Kristian Lum; Jacy Reese Anthis; Kevin Robinson; Chirag Nagpal; Alexander Nicholas D’Amour; |
| 127 | Can Language Models Reason About Individualistic Human Values and Preferences? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve an authentic representation of diversity that respects individuality, we propose individualistic alignment. |
Liwei Jiang; Taylor Sorensen; Sydney Levine; Yejin Choi; |
| 128 | S2R: Teaching LLMs to Self-verify and Self-correct Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce S2R, an efficient framework that enhances LLM reasoning by teaching models to self-verify and self-correct during inference. |
Ruotian Ma; Peisong Wang; Cheng Liu; Xingyan Liu; Jiaqi Chen; Bang Zhang; Xin Zhou; Nan Du; Jia Li; |
| 129 | Refuse Whenever You Feel Unsafe: Improving Safety in LLMs Via Decoupled Refusal Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at any response position, significantly enhancing their safety capabilities. |
Youliang Yuan; Wenxiang Jiao; Wenxuan Wang; Jen-tse Huang; Jiahao Xu; Tian Liang; Pinjia He; Zhaopeng Tu; |
| 130 | Can We Retrieve Everything All at Once? ARM: An Alignment-Oriented LLM-based Retrieval Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the *alignment* problem, we introduce an LLM-based retrieval method — ARM, designed to better align questions with the organization of the data collection. |
Peter Baile Chen; Yi Zhang; Mike Cafarella; Dan Roth; |
| 131 | NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose NeuSym-RAG, a hybrid neural symbolic retrieval framework which combines both paradigms in an interactive process. |
Ruisheng Cao; Hanchong Zhang; Tiancheng Huang; Zhangyi Kang; Yuxin Zhang; Liangtai Sun; Hanqi Li; Yuxun Miao; Shuai Fan; Lu Chen; Kai Yu; |
| 132 | 500xCompressor: Generalized Prompt Compression for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current methods face challenges such as low compression ratios and potential training-test overlap during evaluation. To address these issues, we propose 500xCompressor, a method that compresses natural language contexts into a minimum of one special token and demonstrates strong generalization ability. |
Zongqian Li; Yixuan Su; Nigel Collier; |
| 133 | Enhancing Automated Interpretability with Output-Centric Feature Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Using steering evaluations, we reveal that current pipelines provide descriptions that fail to capture the causal effect of the feature on outputs. To fix this, we propose efficient, output-centric methods for automatically generating feature descriptions. |
Yoav Gur-Arieh; Roy Mayan; Chen Agassy; Atticus Geiger; Mor Geva; |
| 134 | ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous work has proposed numerous methods to enhance code generation performance, including integrating feedback from the compiler. Inspired by this, we present ReflectionCoder, a novel approach that effectively leverages reflection sequences constructed by integrating compiler feedback to improve one-off code generation performance. |
Houxing Ren; Mingjie Zhan; Zhongyuan Wu; Aojun Zhou; Junting Pan; Hongsheng Li; |
| 135 | EfficientQAT: Efficient Quantization-Aware Training for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although quantization-aware training (QAT) offers a solution by reducing memory consumption through low-bit representations with minimal accuracy loss, it is impractical due to substantial training resources. To address this, we propose Efficient Quantization-Aware Training (EfficientQAT), a more feasible QAT algorithm. |
Mengzhao Chen; Wenqi Shao; Peng Xu; Jiahao Wang; Peng Gao; Kaipeng Zhang; Ping Luo; |
| 136 | GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. |
Yifan Yang; Zheshu Song; Jianheng Zhuo; Mingyu Cui; Jinpeng Li; Bo Yang; Yexing Du; Ziyang Ma; Xunying Liu; Ziyuan Wang; Ke Li; Shuai Fan; Kai Yu; Wei-Qiang Zhang; Guoguo Chen; Xie Chen; |
| 137 | Improving Factuality with Explicit Working Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent works have built upon retrieved-augmented generation to improve factuality through iterative prompting but these methods are limited by the traditional RAG design. To address these challenges, we introduce Ewe (Explicit Working Memory), a novel approach that enhances factuality in long-form text generation by integrating a working memory that receives real-time feedback from external resources. |
Mingda Chen; Yang Li; Karthik Padthe; Rulin Shao; Alicia Yi Sun; Luke Zettlemoyer; Gargi Ghosh; Wen-tau Yih; |
| 138 | Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate risks in critical domains, we introduce Consistency-based Confidence Calibration (C3), which assesses confidence consistency through question reformulation. |
Shiyu Ni; Keping Bi; Jiafeng Guo; Lulu Yu; Baolong Bi; Xueqi Cheng; |
| 139 | Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most studies focus on identifying circuits for individual tasks without investigating how functionally similar circuits relate to each other. To address this gap, we study the modularity of neural networks by analyzing circuits for highly compositional subtasks within a transformer-based language model. |
Philipp Mondorf; Sondre Wold; Barbara Plank; |
| 140 | KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning Over Knowledge Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to improve the reasoning ability of large language models(LLMs) over knowledge graphs(KGs) to answer complex questions. |
Jinhao Jiang; Kun Zhou; Xin Zhao; Yang Song; Chen Zhu; Hengshu Zhu; Ji-Rong Wen; |
| 141 | Synergistic Weak-Strong Collaboration By Aligning Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Fine-tuning large models for every niche application is often infeasible due to black-box constraints and high computational overhead. To address this, we propose a collaborative framework that pairs a specialized weak model with a general strong model. |
Yizhu Jiao; Xuchao Zhang; Zhaoyang Wang; Yubo Ma; Zhun Deng; Rujia Wang; Chetan Bansal; Saravan Rajmohan; Jiawei Han; Huaxiu Yao; |
| 142 | Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, given these advancements, a comprehensive benchmark to evaluate the performance of LALMs in the open-ended audio dialogue understanding remains absent currently. To address this gap, we propose an **A**udio **D**ialogue **U**nderstanding **Bench**mark **(ADU-Bench),** which consists of 4 benchmark datasets. |
Kuofeng Gao; Shu-Tao Xia; Ke Xu; Philip Torr; Jindong Gu; |
| 143 | TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a web-scale dataset for time-continual pretraining of LLMs derived from 114 dumps of Common Crawl (CC) – orders of magnitude larger than previous continual language modeling benchmarks. |
Jeffrey Li; Mohammadreza Armandpour; Seyed Iman Mirzadeh; Sachin Mehta; Vaishaal Shankar; Raviteja Vemulapalli; Samy Bengio; Oncel Tuzel; Mehrdad Farajtabar; Hadi Pouransari; Fartash Faghri; |
| 144 | Magnet: Multi-turn Tool-use Data Synthesis and Distillation Via Graph Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their performance may be limited in complex, multi-turn interactions involving users and multiple tools. To address this, we propose Magnet, a principled framework for synthesizing high-quality training trajectories to enhance the function calling capability of large language model agents in multi-turn conversations with humans. |
Fan Yin; Zifeng Wang; I-Hung Hsu; Jun Yan; Ke Jiang; Yanfei Chen; Jindong Gu; Long Le; Kai-Wei Chang; Chen-Yu Lee; Hamid Palangi; Tomas Pfister; |
| 145 | MultiAgentBench : Evaluating The Collaboration and Competition of LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce MultiAgentBench, a comprehensive benchmark designed to evaluate LLM-based multi-agent systems across diverse, interactive scenarios. |
Kunlun Zhu; Hongyi Du; Zhaochen Hong; Xiaocheng Yang; Shuyi Guo; Zhe Wang; Zhenhailong Wang; Cheng Qian; Robert Tang; Heng Ji; Jiaxuan You; |
| 146 | Literary Evidence Retrieval Via Long-Context Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How well do modern long-context language models understand literary fiction? We explore this question via the task of literary evidence retrieval, repurposing the RELiC dataset of Thai et al. (2022) to construct a benchmark where the entire text of a primary source (e. g. , The Great Gatsby) is provided to an LLM alongside literary criticism with a missing quotation from that work. |
Katherine Thai; Mohit Iyyer; |
| 147 | AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose AndroidLab as a systematic Android agent framework. |
Yifan Xu; Xiao Liu; Xueqiao Sun; Siyi Cheng; Hao Yu; Hanyu Lai; Shudan Zhang; Dan Zhang; Jie Tang; Yuxiao Dong; |
| 148 | FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce FACT-AUDIT, an agent-driven framework that adaptively and dynamically assesses LLMs’ fact-checking capabilities. |
Hongzhan Lin; Yang Deng; Yuxuan Gu; Wenxuan Zhang; Jing Ma; See-Kiong Ng; Tat-Seng Chua; |
| 149 | Beyond Prompt Engineering: Robust Behavior Control in LLMs Via Steering Target Atoms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Steering Target Atoms (STA), a novel method that isolates and manipulates disentangled knowledge components to enhance safety. |
Mengru Wang; Ziwen Xu; Shengyu Mao; Shumin Deng; Zhaopeng Tu; Huajun Chen; Ningyu Zhang; |
| 150 | MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces MMRC, a Multi-Modal Real-world Conversation benchmark for evaluating six core open-ended abilities of MLLMs: information extraction, multi-turn reasoning, information update, image management, memory recall, and answer refusal. |
Haochen Xue; Feilong Tang; Ming Hu; Yexin Liu; Qidong Huang; Yulong Li; Chengzhi Liu; Zhongxing Xu; Chong Zhang; Chun-Mei Feng; Yutong Xie; Imran Razzak; Zongyuan Ge; Jionglong Su; Junjun He; Yu Qiao; |
| 151 | Mitigating Visual Forgetting Via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe only a ~2 accuracy drop on MathVista’s test-hard subset, revealing the model’s textual outputs dominate the following reasoning process. Motivated by this, we propose Take-along Visual Conditioning (TVC), a strategy that shifts image input to critical reasoning stages and compresses redundant visual tokens via dynamic pruning. |
Hai-Long Sun; Zhun Sun; Houwen Peng; Han-Jia Ye; |
| 152 | Don’t Get Lost in The Trees: Streamlining LLM Reasoning By Overcoming Tree Search Exploration Pitfalls Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs), but at the cost of increased computational resources. In this work, we identify two key challenges contributing to this inefficiency: over-exploration due to redundant states with semantically equivalent content, and under-exploration caused by high variance in verifier scoring leading to frequent trajectory switching. |
Ante Wang; Linfeng Song; Ye Tian; Dian Yu; Haitao Mi; Xiangyu Duan; Zhaopeng Tu; Jinsong Su; Dong Yu; |
| 153 | Fairness Through Difference Awareness: Measuring Desired Group Discrimination in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, in contrast to most fairness work, we study fairness through the perspective of treating people differently – when it is contextually appropriate to. |
Angelina Wang; Michelle Phan; Daniel E. Ho; Sanmi Koyejo; |
| 154 | MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While fingerprinting techniques have been proposed for verifying model ownership, their resistance to model merging remains unexplored. To address this gap, we propose a novel fingerprinting method, MergePrint, which embeds robust fingerprints capable of surviving model merging. |
Shojiro Yamabe; Futa Kai Waseda; Tsubasa Takahashi; Koki Wataoka; |
| 155 | Mixtures of In-Context Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Mixtures of In-Context Learners (MoICL), a novel approach that uses subsets of demonstrations to train a set of experts via ICL and learns a weighting function to merge their output distributions via gradient-based optimisation. |
Giwon Hong; Emile Van Krieken; Edoardo Ponti; Nikolay Malkin; Pasquale Minervini; |
| 156 | A Silver Bullet or A Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide an empirical investigation of gist-based context compression methods to improve context processing in large language models. |
Chenlong Deng; Zhisong Zhang; Kelong Mao; Shuaiyi Li; Xinting Huang; Dong Yu; Zhicheng Dou; |
| 157 | SYNTHIA: Novel Concept Design with Affordance Composition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce SYNTHIA, a framework for generating novel, functionally coherent designs based on desired affordances. |
Hyeonjeong Ha; Xiaomeng Jin; Jeonghwan Kim; Jiateng Liu; Zhenhailong Wang; Khanh Duy Nguyen; Ansel Blume; Nanyun Peng; Kai-Wei Chang; Heng Ji; |
| 158 | AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing benchmarks primarily focus on basic abilities using nonverbal methods, such as yes-no and multiple-choice questions. In this paper, we address this gap by introducing AlignMMBench, which provides more nuanced evaluations of alignment capabilities and is the first benchmark specifically designed for Chinese visual contexts. |
Yuhang Wu; Wenmeng Yu; Yean Cheng; Yan Wang; Xiaohan Zhang; Jiazheng Xu; Ming Ding; Yuxiao Dong; |
| 159 | RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces RuleArena, a novel and challenging benchmark designed to evaluate the ability of large language models (LLMs) to follow complex, real-world rules in reasoning. |
Ruiwen Zhou; Wenyue Hua; Liangming Pan; Sitao Cheng; Xiaobao Wu; En Yu; William Yang Wang; |
| 160 | Genetic Instruct: Scaling Up Synthetic Generation of Coding Instructions for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Genetic-Instruct, a scalable algorithm for synthesizing large-scale, high quality coding instructions using evolutionary principles. |
Somshubra Majumdar; Vahid Noroozi; Mehrzad Samadi; Sean Narenthiran; Aleksander Ficek; Wasi Uddin Ahmad; Jocelyn Huang; Jagadeesh Balam; Boris Ginsburg; |
| 161 | Mind The Gap: Static and Interactive Evaluations of Large Audio Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, aligning LAM development with user goals requires a clear understanding of user needs and preferences to establish reliable progress metrics. This study addresses these challenges by introducing an interactive approach to evaluate LAMs and collecting 7,500 LAM interactions from 484 participants. |
Minzhi Li; William Barr Held; Michael J Ryan; Kunat Pipatanakul; Potsawee Manakul; Hao Zhu; Diyi Yang; |
| 162 | Enhancing Input-Label Mapping in In-Context Learning with Contrastive Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, prior research highlights that LLMs often overlook input-label mapping information in ICL, relying more on their pre-trained knowledge. To address this issue, we introduce In-Context Contrastive Decoding (ICCD), a novel method that emphasizes input-label mapping by contrasting the output distributions between positive and negative in-context examples. |
Keqin Peng; Liang Ding; Yuanxin Ouyang; Meng Fang; Yancheng Yuan; Dacheng Tao; |
| 163 | Identifying Reliable Evaluation Metrics for Scientific Text Revision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Evaluating text revision in scientific writing remains a challenge, as traditional metrics such as ROUGE and BERTScore primarily focus on similarity rather than capturing meaningful improvements. In this work, we analyse and identify the limitations of these metrics and explore alternative evaluation methods that better align with human judgments. |
Leane Jourdan; Nicolas Hernandez; Florian Boudin; Richard Dufour; |
| 164 | When The LM Misunderstood The Human Chuckled: Analyzing Garden Path Effects in Humans and Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we try to answer two questions: 1. What makes garden-path sentences hard to understand for humans? 2. Do the same reasons make garden-path sentences hard for LLMs as well? |
Samuel Joseph Amouyal; Aya Meltzer-Asscher; Jonathan Berant; |
| 165 | RAG-Critic: Leveraging Automated Critic-Guided Agentic Workflow for Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose RAG-Critic, a novel framework that leverages a critic-guided agentic workflow to improve RAG capabilities autonomously. |
Guanting Dong; Jiajie Jin; Xiaoxi Li; Yutao Zhu; Zhicheng Dou; Ji-Rong Wen; |
| 166 | Progressive Multimodal Reasoning Via Active Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose AR-MCTS, a universal framework designed to progressively improve the reasoning capabilities of MLLMs through Active Retrieval (AR) and Monte Carlo Tree Search (MCTS). |
Guanting Dong; Chenghao Zhang; Mengjie Deng; Yutao Zhu; Zhicheng Dou; Ji-Rong Wen; |
| 167 | SceneGenAgent: Precise Industrial Scene Generation with Coding Agent Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While large language models (LLMs) have shown significant progress in generating general 3D scenes from textual descriptions, generating industrial scenes with LLMs poses a unique challenge due to their demand for precise measurements and positioning, requiring complex planning over spatial arrangement. To address this challenge, we introduce SceneGenAgent, an LLM-based agent for generating industrial scenes through C# code. |
Xiao Xia; Dan Zhang; Zibo Liao; Zhenyu Hou; Tianrui Sun; Jing Li; Ling Fu; Yuxiao Dong; |
| 168 | What Are The Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, our preliminary experiments show that fewer than 35% of samples generated by Qwen-2-72B are multi-hop, and over 40% exhibit poor quality, limiting comprehensive understanding and further research. To address this, we propose the Multi-agent Interactive Multi-hop Generation (MIMG) framework, which integrates a quality verification agent, a single-hop question generation agent, a multiple question sampling strategy, and a multi-hop question merger agent. |
Zhi Chen; Qiguang Chen; Libo Qin; Qipeng Guo; Haijun Lv; Yicheng Zou; Hang Yan; Kai Chen; Dahua Lin; |
| 169 | Recent Advances in Speech Language Models: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this context, Speech Language Models (SpeechLMs)—foundation models designed to understand and generate speech—emerge as a promising solution for end-to-end speech interaction. This survey offers a comprehensive overview of recent approaches to building SpeechLMs, outlining their core architectural components, training methodologies, evaluation strategies, and the challenges and potential directions for future research in this rapidly advancing field. |
Wenqian Cui; Dianzhi Yu; Xiaoqi Jiao; Ziqiao Meng; Guangyan Zhang; Qichao Wang; Steven Y. Guo; Irwin King; |
| 170 | VoxEval: Benchmarking The Knowledge Understanding Capabilities of End-to-End Spoken Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While these models require comprehensive world knowledge for meaningful and reliable human interactions, existing question-answering (QA) benchmarks fall short in evaluating SLMs’ knowledge understanding due to their inability to support end-to-end speech evaluation and account for varied input audio conditions. To address these limitations, we present VoxEval, a novel SpeechQA benchmark that assesses SLMs’ knowledge understanding through pure speech interactions. |
Wenqian Cui; Xiaoqi Jiao; Ziqiao Meng; Irwin King; |
| 171 | Automated Structured Radiology Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This variability poses challenges for both generation and evaluation: existing models struggle to produce consistent, clinically meaningful reports, and standard evaluation metrics fail to capture the nuances of radiological interpretation. To address this, we introduce Structured Radiology Report Generation (SRRG), a new task that reformulates free-text radiology reports into a standardized format, ensuring clarity, consistency, and structured clinical reporting. |
Jean-Benoit Delbrouck; Justin Xu; Johannes Moll; Alois Thomas; Zhihong Chen; Sophie Ostmeier; Asfandyar Azhar; Kelvin Zhenghao Li; Andrew Johnston; Christian Bluethgen; Eduardo Pontes Reis; Mohamed S Muneer; Maya Varma; Curtis Langlotz; |
| 172 | Attacking Vision-Language Computer Agents Via Pop-ups Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate that VLM agents can be easily attacked by a set of carefully designed adversarial pop-ups, which human users would typically recognize and ignore. |
Yanzhe Zhang; Tao Yu; Diyi Yang; |
| 173 | Inferring Functionality of Attention Heads from Their Parameters Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Prior work on investigating their operation mostly focused on analyzing their behavior during inference for specific circuits or tasks. In this work, we seek a comprehensive mapping of the operations they implement in a model. |
Amit Elhelo; Mor Geva; |
| 174 | EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our results show that current LM models, despite employing working memory and Chain-of-Thought reasoning, achieve only 15% average progress without hints, highlighting their limitations in creativity. To bridge this gap, we propose EscapeAgent, a framework designed to enhance creative reasoning through Foresight (innovative tool use) and Reflection (identifying unsolved tasks). |
Cheng Qian; Peixuan Han; Qinyu Luo; Bingxiang He; Xiusi Chen; Yuji Zhang; Hongyi Du; Jiarui Yao; Xiaocheng Yang; Denghui Zhang; Yunzhu Li; Heng Ji; |
| 175 | LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Meanwhile, extending the context window in LLMs through post-pretraining is highly resource-intensive. To address this, we introduce LongRecipe, an efficient training strategy for extending the context window of LLMs, including impactful token analysis, position index transformation, and training optimization strategies. |
Zhiyuan Hu; Yuliang Liu; Jinman Zhao; Suyuchen Wang; WangYan WangYan; Wei Shen; Qing Gu; Anh Tuan Luu; See-Kiong Ng; Zhiwei Jiang; Bryan Hooi; |
| 176 | LLMs + Persona-Plug = Personalized LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this retrieval-based strategy may break the continuity of the user history and fail to capture the user’s overall styles and patterns, hence leading to sub-optimal performance. To address these challenges, we propose a novel personalized LLM model, PPlug. |
Jiongnan Liu; Yutao Zhu; Shuting Wang; Xiaochi Wei; Erxue Min; Yu Lu; Shuaiqiang Wang; Dawei Yin; Zhicheng Dou; |
| 177 | InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel real-world benchmark, InstructPart, comprising hand-labeled part segmentation annotations and task-oriented instructions to evaluate the performance of current models in understanding and executing part-level tasks within everyday contexts. |
Zifu Wan; Yaqi Xie; Ce Zhang; Zhiqiu Lin; Zihan Wang; Simon Stepputtis; Deva Ramanan; Katia P. Sycara; |
| 178 | Pre-training Distillation for Large Language Models: A Design Space Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend KD to the pre-training phase of LLMs, named pre-training distillation (PD). |
Hao Peng; Xin Lv; Yushi Bai; Zijun Yao; Jiajie Zhang; Lei Hou; Juanzi Li; |
| 179 | Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose agentic reward modeling, a reward system that combines reward models with verifiable correctness signals from different aspects to provide reliable rewards. |
Hao Peng; Yunjia Qi; Xiaozhi Wang; Zijun Yao; Bin Xu; Lei Hou; Juanzi Li; |
| 180 | VISA: Retrieval Augmented Generation with Visual Source Attribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches in RAG primarily link generated content to document-level references, making it challenging for users to locate evidence among multiple content-rich retrieved documents. To address this challenge, we propose Retrieval-Augmented Generation with Visual Source Attribution (VISA), a novel approach that combines answer generation with visual source attribution. |
Xueguang Ma; Shengyao Zhuang; Bevan Koopman; Guido Zuccon; Wenhu Chen; Jimmy Lin; |
| 181 | Mitigating Selection Bias with Node Pruning and Auxiliary Options Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce two methods: Bias Node Pruning (BNP), which prunes parameters that contribute to selection bias, and Auxiliary Option Injection (AOI), which introduces an additional answer choice to reduce bias in both white-box and black-box settings. |
Hyeong Kyu Choi; Weijie Xu; Chi Xue; Stephanie Eckman; Chandan K. Reddy; |
| 182 | PaSa: An LLM Agent for Comprehensive Academic Paper Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce PaSa, an advanced Paper Search agent powered by large language models. |
Yichen He; Guanhua Huang; Peiyuan Feng; Yuan Lin; Yuchen Zhang; Hang Li; Weinan E; |
| 183 | Dynamic and Generalizable Process Reward Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, static and coarse-grained evaluation criteria struggle to adapt to complex process supervision. To tackle these challenges, we propose Dynamic and Generalizable Process Reward Modeling (DG-PRM), which features a reward tree to capture and store fine-grained, multi-dimensional reward criteria. |
Zhangyue Yin; Qiushi Sun; Zhiyuan Zeng; Qinyuan Cheng; Xipeng Qiu; Xuanjing Huang; |
| 184 | Aligned But Blind: Alignment Increases Implicit Bias By Reducing Awareness of Race Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Not representing race likely fails to activate safety guardrails, leading to unintended biases. Inspired by this insight, we propose a new bias mitigation strategy that works by incentivizing the representation of racial concepts in the early model layers. |
Lihao Sun; Chengzhi Mao; Valentin Hofmann; Xuechunzi Bai; |
| 185 | Sliding Windows Are Not The End: Exploring Full Ranking with Long-Context Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we conduct a comprehensive study of long-context LLMs for ranking tasks in terms of efficiency and effectiveness. |
Wenhan Liu; Xinyu Ma; Yutao Zhu; Ziliang Zhao; Shuaiqiang Wang; Dawei Yin; Zhicheng Dou; |
| 186 | Exploring Compositional Generalization of Multimodal LLMs for Medical Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current research suggests that multi-task training outperforms single-task as different tasks can benefit each other, but they often overlook the internal relationships within these tasks. To analyze this phenomenon, we attempted to employ **compositional generalization** (CG), which refers to the models� ability to understand novel combinations by recombining learned elements, as a guiding framework. |
Zhenyang Cai; Junying Chen; Rongsheng Wang; Weihong Wang; Yonglin Deng; Dingjie Song; Yize Chen; Zixu Zhang; Benyou Wang; |
| 187 | HybGRAG: Hybrid Retrieval-Augmented Generation on Textual and Relational Knowledge Bases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, through our empirical analysis, we identify key insights that show why existing methods may struggle with hybrid question answering (HQA) over SKB. |
Meng-Chieh Lee; Qi Zhu; Costas Mavromatis; Zhen Han; Soji Adeshina; Vassilis N. Ioannidis; Huzefa Rangwala; Christos Faloutsos; |
| 188 | AXIS: Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these agents often suffer from high latency and low reliability due to the extensive sequential UI interactions. To address this issue, we propose AXIS, a novel LLM-based agents framework that prioritize actions through application programming interfaces (APIs) over UI actions. |
Junting Lu; Zhiyang Zhang; Fangkai Yang; Jue Zhang; Lu Wang; Chao Du; Qingwei Lin; Saravan Rajmohan; Dongmei Zhang; Qi Zhang; |
| 189 | RAVEN: Robust Advertisement Video Violation Temporal Grounding Via Reinforcement Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose RAVEN, a novel framework that integrates curriculum reinforcement learning with multimodal large language models (MLLMs) to enhance reasoning and cognitive capabilities for violation detection. |
Deyi Ji; Yuekui Yang; Haiyang Wu; Shaoping Ma; Tianrun Chen; Lanyun Zhu; |
| 190 | Efficient and Accurate Prompt Optimization: The Benefit of Memory in Exemplar-Guided Reflection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an Exemplar-Guided Reflection with Memory mechanism (ERM) to realize more efficient and accurate prompt optimization. |
Cilin Yan; Jingyun Wang; Lin Zhang; Ruihui Zhao; Xiaopu Wu; Kai Xiong; Qingsong Liu; Guoliang Kang; Yangyang Kang; |
| 191 | Navigating Rifts in Human-LLM Grounding: Study and Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, we find that early grounding failures predict later interaction breakdowns. Building on these insights, we introduce Rifts, a benchmark derived from publicly available LLM interaction data containing situations where LLMs fail to initiate grounding. |
Omar Shaikh; Hussein Mozannar; Gagan Bansal; Adam Fourney; Eric Horvitz; |
| 192 | Tracing and Dissecting How LLMs Recall Factual Knowledge for Real World Questions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce a two-dimensional analysis framework—comprising token back-tracing and individual token decoding—to uncover how LLMs conduct factual knowledge recall. |
Yiqun Wang; Chaoqun Wan; Sile Hu; Yonggang Zhang; Xiang Tian; Yaowu Chen; Xu Shen; Jieping Ye; |
| 193 | Browsing Like Human: A Multimodal Web Agent with Experiential Fast-and-Slow Thinking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the human thought pattern to empower agent with more human-like abilities in web navigation. |
Haohao Luo; Jiayi Kuang; Wei Liu; Ying Shen; Jian Luan; Yang Deng; |
| 194 | Synthesizing and Adapting Error Correction Data for Mobile Large Language Model Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use LLMs to synthesize a high-quality dataset of error correction pairs to evaluate and improve LLMs for mobile applications. |
Yanxiang Zhang; Zheng Xu; Shanshan Wu; Yuanbo Zhang; Daniel Ramage; |
| 195 | Substance Over Style: Evaluating Proactive Conversational Coaching Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, coaching presents unique challenges with initially undefined goals that evolve through multi-turn interactions, subjective evaluation criteria, mixed-initiative dialogue. In this work, we describe and implement five multi-turn coaching agents that exhibit distinct conversational styles, and evaluate them through a user study, collecting first-person feedback on 155 conversations. |
Vidya Srinivas; Xuhai Xu; Xin Liu; Kumar Ayush; Isaac Galatzer-Levy; Shwetak Patel; Daniel McDuff; Tim Althoff; |
| 196 | Dynamic Scaling of Unit Tests for Code Reward Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our pioneer experiment reveals a positive correlation between the number of unit tests and reward signal quality, with greater benefits observed in more challenging problems. Based on these insights, we propose CodeRM-8B, a lightweight yet effective unit test generator that enables efficient and high-quality unit test scaling. |
Zeyao Ma; Xiaokang Zhang; Jing Zhang; Jifan Yu; Sijia Luo; Jie Tang; |
| 197 | Around The World in 24 Hours: Probing LLM Knowledge of Time and Place Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first evaluation of the ability of language models to jointly reason over time and space. |
Carolin Holtermann; Paul Röttger; Anne Lauscher; |
| 198 | LongReward: Improving Long-context Large Language Models with AI Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose LongReward, a novel method that utilizes an off-the-shelf LLM to provide rewards for long-context model responses from four human-valued dimensions: helpfulness, logicality, faithfulness, and completeness, each with a carefully designed assessment pipeline. |
Jiajie Zhang; Zhongni Hou; Xin Lv; Shulin Cao; Zhenyu Hou; Yilin Niu; Lei Hou; Yuxiao Dong; Ling Feng; Juanzi Li; |
| 199 | World Modeling Makes A Better Planner: Dual Preference Optimization for Embodied Task Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Dual Preference Optimization (D2PO), a new learning framework that jointly optimizes state prediction and action selection through preference learning, enabling LVLMs to understand environment dynamics for better planning. |
Siyin Wang; Zhaoye Fei; Qinyuan Cheng; Shiduo Zhang; Panpan Cai; Jinlan Fu; Xipeng Qiu; |
| 200 | Diversity-oriented Data Augmentation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we explore data augmentation’s impact on dataset diversity and propose a Diversity-oriented data Augmentation framework (DoAug). |
Zaitian Wang; Jinghan Zhang; Xinhao Zhang; Kunpeng Liu; Pengfei Wang; Yuanchun Zhou; |
| 201 | Scalable Vision Language Model Training Via High Quality Data Curation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SAIL-VL ( ScAlable Vision Language Model TraIning via High QuaLity Data Curation), an open-source vision language model (VLM) series achieving state-of-the-art (SOTA) performance in 2B and 8B parameters. |
Hongyuan Dong; Zijian Kang; Weijie Yin; LiangXiao LiangXiao; ChaoFeng ChaoFeng; Ran Jiao; |
| 202 | Sample-Efficient Human Evaluation of Large Language Models Via Maximum Discrepancy Competition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a sample-efficient human evaluation method for LLMs based on the principle of MAximum Discrepancy (MAD) competition. |
Kehua Feng; Keyan Ding; Tan Hongzhi; Kede Ma; Zhihua Wang; Shuangquan Guo; Cheng Yuzhou; Ge Sun; Guozhou Zheng; Qiang Zhang; Huajun Chen; |
| 203 | DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing LLM-based review systems face significant challenges, including limited domain expertise, hallucinated reasoning, and a lack of structured evaluation. To address these limitations, we introduce DeepReview, a multi-stage framework designed to emulate expert reviewers by incorporating structured analysis, literature retrieval, and evidence-based argumentation. |
Minjun Zhu; Yixuan Weng; Linyi Yang; Yue Zhang; |
| 204 | Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Agentic Reasoning, a framework that enhances large language model (LLM) reasoning by integrating external tool-using agents. |
Junde Wu; Jiayuan Zhu; Yuyuan Liu; Min Xu; Yueming Jin; |
| 205 | KRISTEVA: Close Reading As A Novel Task for Benchmarking Interpretive Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With KRISTEVA, we propose three progressively more difficult sets of tasks to approximate different elements of the close reading process, which we use to test how well LLMs understand and reason about literary works: 1) extracting stylistic features, 2) retrieving relevant contextual information from parametric knowledge, and 3) multi-hop reasoning between style and external contexts. |
Peiqi Sui; Juan Diego Rodriguez; Philippe Laban; J. Dean Murphy; Joseph P. Dexter; Richard Jean So; Samuel Baker; Pramit Chaudhuri; |
| 206 | RefreshKV: Updating Small KV Cache During Long-form Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new inference-time method, RefreshKV, that flexibly alternates between full context attention and attention over a subset of input tokens during generation. |
Fangyuan Xu; Tanya Goyal; Eunsol Choi; |
| 207 | Can Indirect Prompt Injection Attacks Be Detected and Removed? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the feasibility of detecting and removing indirect prompt injection attacks, and we construct a benchmark dataset for evaluation. |
Yulin Chen; Haoran Li; Yuan Sui; Yufei He; Yue Liu; Yangqiu Song; Bryan Hooi; |
| 208 | Defense Against Prompt Injection Attack By Leveraging Attack Techniques Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: * In this paper, we invert the intention of prompt injection methods to develop novel defense methods based on previous training-free attack methods, by repeating the attack process but with the original input instruction rather than the injected instruction. |
Yulin Chen; Haoran Li; Zihao Zheng; Dekai Wu; Yangqiu Song; Bryan Hooi; |
| 209 | Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore augmenting standard AI annotator systems with additional tools to improve performance on three challenging response domains: long-form factual, math and code tasks. |
Arduin Findeis; Floris Weers; Guoli Yin; Ke Ye; Ruoming Pang; Tom Gunter; |
| 210 | JuStRank: Benchmarking LLM Judges for System Ranking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous work has focused on instance-based assessment of LLM judges, where a judge is evaluated over a set of responses, or response pairs, while being agnostic to their source systems. We argue that this setting overlooks critical factors affecting system-level ranking, such as a judge’s positive or negative bias towards certain systems. |
Ariel Gera; Odellia Boni; Yotam Perlitz; Roy Bar-Haim; Lilach Eden; Asaf Yehudai; |
| 211 | SDPO: Segment-Level Direct Preference Optimization for Social Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While these methods consider multiple turns across entire sessions, they are often overly coarse-grained, introducing training noise, and lack robust theoretical support. To resolve these limitations, we propose Segment-Level Direct Preference Optimization (SDPO), which dynamically select key segments within interactions to optimize multi-turn agent behavior. |
Aobo Kong; Wentao Ma; Shiwan Zhao; Yongbin Li; Yuchuan Wu; Ke Wang; Xiaoqian Liu; Qicheng Li; Yong Qin; Fei Huang; |
| 212 | Have We Designed Generalizable Structural Knowledge Promptings? Systematic Evaluation and Rethinking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To provide a thorough evaluation, we introduce a novel multi-granular, multi-level benchmark called SUBARU, consisting of 9 different tasks with varying levels of granularity and difficulty. |
Yichi Zhang; Zhuo Chen; Lingbing Guo; Yajing Xu; Shaokai Chen; Mengshu Sun; Binbin Hu; Zhiqiang Zhang; Lei Liang; Wen Zhang; Huajun Chen; |
| 213 | Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recently, o1-like models have drawn significant attention, where these models produce the long Chain-of-Thought (CoT) reasoning steps to improve the reasoning abilities of existing Large Language Models (LLMs). In this paper, to understand the qualities of these long CoTs and measure the critique abilities of existing LLMs on these long CoTs, we introduce the DeltaBench including the generated long CoTs from different o1-like models (e. g. , QwQ, DeepSeek-R1) for different reasoning tasks (e. g. , Math, Code, General Reasoning), to measure the ability to detect errors in long COT reasoning. |
Yancheng He; Shilong Li; Jiaheng Liu; Weixun Wang; Xingyuan Bu; Ge Zhang; Z.y. Peng; Zhaoxiang Zhang; Zhicheng Zheng; Wenbo Su; Bo Zheng; |
| 214 | Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Chinese SimpleQA, the first comprehensive Chinese benchmark to evaluate the factuality ability of LLMs to answer short questions, and Chinese SimpleQA mainly has five properties (i. e. , Chinese, Diverse, High-quality, Static, Easy-to-evaluate). |
Yancheng He; Shilong Li; Jiaheng Liu; Yingshui Tan; Weixun Wang; Hui Huang; Xingyuan Bu; Hangyu Guo; Chengwei Hu; Boren Zheng; Zhuoran Lin; Dekai Sun; Zhicheng Zheng; Wenbo Su; Bo Zheng; |
| 215 | Enhancing Human Evaluation in Machine Translation with Comparative Judgement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Human evaluation is crucial for assessing rapidly evolving language models but is influenced by annotator proficiency and task design. This study explores the integration of comparative judgment into human annotation for machine translation (MT) and evaluates three annotation setups—point-wise Multidimensional Quality Metrics (MQM), side-by-side (S×S) MQM, and its simplified version S×S relative ranking (RR). |
Yixiao Song; Parker Riley; Daniel Deutsch; Markus Freitag; |
| 216 | Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study Over Open-ended Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aims to (1) explore whether KGs can make LLMs more trustworthy in an open-ended setting, and (2) conduct a comparative analysis to shed light on method design. |
Yuan Sui; Yufei He; Zifeng Ding; Bryan Hooi; |
| 217 | SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces Self-aware Knowledge Retrieval (SeaKR), a novel adaptive RAG model that extracts self-aware uncertainty of LLMs from their internal states. |
Zijun Yao; Weijian Qi; Liangming Pan; Shulin Cao; Linmei Hu; Liu Weichuan; Lei Hou; Juanzi Li; |
| 218 | BOOKWORLD: From Novels to Interactive Agent Societies for Story Creation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce BookWorld, a comprehensive system for constructing and simulating book-based multi-agent societies. |
Yiting Ran; Xintao Wang; Tian Qiu; Jiaqing Liang; Yanghua Xiao; Deqing Yang; |
| 219 | HALoGEN: Fantastic LLM Hallucinations and Where to Find Them Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we release HALoGEN, a comprehensive hallucination benchmark consisting of: (1) 10,923 prompts for generative models spanning nine domains including programming, scientific attribution, and summarization, and (2) automatic high-precision verifiers for each use case that decompose LLM generations into atomic units, and verify each unit against a high-quality knowledge source. |
Abhilasha Ravichander; Shrusti Ghela; David Wadden; Yejin Choi; |
| 220 | The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel and orthogonal perspective that reframes agent security from preventing harmful actions to ensuring task alignment, requiring every agent action to serve user objectives. |
Feiran Jia; Tong Wu; Xin Qin; Anna Squicciarini; |
| 221 | HumT DumT: Measuring and Controlling Human-like Language in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DumT, a method using HumT to systematically control and reduce the degree of human-like tone while preserving model performance. |
Myra Cheng; Sunny Yu; Dan Jurafsky; |
| 222 | Can Community Notes Replace Professional Fact-Checkers? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the extent and nature of dependencies between fact-checking and *helpful* community notes remain unclear. To address these questions, we use language models to annotate a large corpus of Twitter/X community notes with attributes such as topic, cited sources, and whether they refute claims tied to broader misinformation narratives. |
Nadav Borenstein; Greta Warren; Desmond Elliott; Isabelle Augenstein; |
| 223 | Why Safeguarded Ships Run Aground? Aligned Large Language Models’ Safety Mechanisms Tend to Be Anchored in The Template Region Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct extensive experiments and verify that template-anchored safety alignment is widespread across various aligned LLMs. |
Chak Tou Leong; Qingyu Yin; Jian Wang; Wenjie Li; |
| 224 | SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To empower agents to autonomously explore environments, optimize workflows, and enhance their understanding of actions, we propose SynWorld, a framework that allows agents to synthesize possible scenarios with multi-step action invocation within the action space and perform Monte Carlo Tree Search (MCTS) exploration to effectively refine their action knowledge in the current environment. |
Runnan Fang; Xiaobin Wang; Yuan Liang; Shuofei Qiao; Jialong Wu; Zekun Xi; Ningyu Zhang; Yong Jiang; Pengjun Xie; Fei Huang; Huajun Chen; |
| 225 | Can’t See The Forest for The Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce MMSafeAware, the first comprehensive multimodal safety awareness benchmark designed to evaluate MLLMs across 29 safety scenarios with 1,500 carefully curated image-prompt pairs. |
Wenxuan Wang; Xiaoyuan Liu; Kuiyi Gao; Jen-tse Huang; Youliang Yuan; Pinjia He; Shuai Wang; Zhaopeng Tu; |
| 226 | Modeling Uncertainty in Composed Image Retrieval Via Probabilistic Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While metric learning methods have shown promise, they rely on deterministic point embeddings that fail to capture the inherent uncertainty in the input data, in which user intentions may be imprecisely specified or open to multiple interpretations. We address this challenge by reformulating CIR through our proposed Composed Probabilistic Embedding (CoPE) framework, which represents both queries and targets as Gaussian distributions in latent space rather than fixed points. |
Haomiao Tang; Jinpeng Wang; Yuang Peng; GuangHao Meng; Ruisheng Luo; Bin Chen; Long Chen; Yaowei Wang; Shu-Tao Xia; |
| 227 | Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A central question in multilingual language modeling is whether large language models (LLMs) develop a universal concept representation, disentangled from specific languages. In this paper, we address this question by analyzing latent representations (latents) during a word-translation task in transformer-based LLMs. |
Clément Dumas; Chris Wendler; Veniamin Veselovsky; Giovanni Monea; Robert West; |
| 228 | When People Are Floods: Analyzing Dehumanizing Metaphors in Immigration Discourse with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Grounded in qualitative social science research, we identify seven concepts evoked in immigration discourse (e. g. water or vermin). We propose and evaluate a novel technique that leverages both word-level and document-level signals to measure metaphor with respect to these concepts. |
Julia Mendelsohn; Ceren Budak; |
| 229 | PIC: Unlocking Long-Form Text Generation Capabilities of Large Language Models Via Position ID Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, beyond the focus on “input-long”, the ability to “output-long” is equally significant, yet it remains underexplored. To address this limitation, we propose a simple, efficient, and plug-in approach, Position ID Compression (PIC), to unlock the long-form text generation potential of LLMs. |
Haoran Que; Wenge Rong; |
| 230 | Attention Entropy Is A Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the underlying reasons and potential mitigations are unclear. In this work, we provide a detailed analysis of this issue and identify that unusually high attention entropy can be a key factor. |
Zhisong Zhang; Yan Wang; Xinting Huang; Tianqing Fang; Hongming Zhang; Chenlong Deng; Shuaiyi Li; Dong Yu; |
| 231 | Faster Speculative Decoding Via Effective Draft Decoder with Pruned Candidate Tree Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we found that the confidence scores predicted by the draft model are well-calibrated with the acceptance probability of draft tokens. |
Huanran Zheng; Xiaoling Wang; |
| 232 | Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces Light-R1, an opensource suite for training long reasoning modelsusing reproducible and cost-effective methodology. |
Liang Wen; Yunke Cai; Fenrui Xiao; Xin He; Qi An; Zhenyu Duan; Yimin Du; Junchen Liu; Tanglifu Tanglifu; Xiaowei Lv; Haosheng Zou; Yongchao Deng; Shousheng Jia; Xiangzheng Zhang; |
| 233 | Enhancing Safe and Controllable Protein Generation Via Knowledge Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These concerns underscore critical biosafety and ethical challenges. To address these issues, we propose a Knowledge-guided Preference Optimization (KPO) framework that integrates prior knowledge via a Protein Safety Knowledge Graph. |
Yuhao Wang; Keyan Ding; Kehua Feng; Zeyuan Wang; Ming Qin; Xiaotong Li; Qiang Zhang; Huajun Chen; |
| 234 | MathFusion: Enhancing Mathematical Problem-solving of LLM Through Instruction Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by human learning processes, where mathematical proficiency develops through systematic exposure to interconnected concepts, we introduce MathFusion, a novel framework that enhances mathematical reasoning through cross-problem instruction synthesis. |
Qizhi Pei; Lijun Wu; Zhuoshi Pan; Yu Li; Honglin Lin; Chenlin Ming; Xin Gao; Conghui He; Rui Yan; |
| 235 | Insight Over Sight: Exploring The Vision-Knowledge Conflicts in Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the problem of commonsense level vision-knowledge conflict in Multimodal Large Language Models (MLLMs), where visual information contradicts model’s internal commonsense knowledge. To study this issue, we introduce an automated framework, augmented with human-in-the-loop quality control, to generate inputs designed to simulate and evaluate these conflicts in MLLMs. |
Xiaoyuan Liu; Wenxuan Wang; Youliang Yuan; Jen-tse Huang; Qiuzhi Liu; Pinjia He; Zhaopeng Tu; |
| 236 | When to Speak, When to Abstain: Contrastive Decoding with Abstention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To investigate this challenge, we first present a controlled testbed featuring four distinct knowledge access scenarios, including the aforementioned edge case, revealing that conventional LLM usage exhibits insufficient robustness in handling all instances. Addressing this limitation, we propose Contrastive Decoding with Abstention (CDA), a novel training-free decoding method that allows LLMs to generate responses when relevant knowledge is available and to abstain otherwise. |
Hyuhng Joon Kim; Youna Kim; Sang-goo Lee; Taeuk Kim; |
| 237 | Efficient Pretraining Data Selection for Language Models Via Multi-Actor Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While various methods have been proposed to enhance data efficiency, limited research has addressed the inherent conflicts between these approaches to achieve optimal data selection for LM pretraining. To tackle this problem, we propose a multi-actor collaborative data selection mechanism. |
Tianyi Bai; Ling Yang; Zhen Hao Wong; Fupeng Sun; Xinlin Zhuang; Jiahui Peng; Chi Zhang; Lijun Wu; Qiu Jiantao; Wentao Zhang; Binhang Yuan; Conghui He; |
| 238 | White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on our observations, we propose **Mitigation via Selective Rewrite (MSR)**, a novel bias mitigation strategy that leverages an agency classifier to identify and selectively revise parts of generated texts that demonstrate communal traits. |
Yixin Wan; Kai-Wei Chang; |
| 239 | The Male CEO and The Female Assistant: Evaluation and Mitigation of Gender Biases in Text-To-Image Generation of Dual Subjects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, significant biases remain when generating images with more than one person. To systematically evaluate this, we propose the **Paired Stereotype Test (PST)** framework, which queries T2I models to depict two individuals assigned with male-stereotyped and female-stereotyped social identities, respectively (e. g. “a CEO” and “an Assistant”). |
Yixin Wan; Kai-Wei Chang; |
| 240 | Decoding By Contrasting Knowledge: Enhancing Large Language Model Confidence on Edited Facts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach termed Decoding by Contrasting Knowledge (DeCK). |
Baolong Bi; Shenghua Liu; Lingrui Mei; Yiwei Wang; Junfeng Fang; Pengliang Ji; Xueqi Cheng; |
| 241 | Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While some prior works have explored this issue in the context of LLMs, it presents a unique challenge for MLLMs due to the entangled nature of knowledge across modalities, making comprehensive unlearning more difficult. To address this challenge, we propose Modality Aware Neuron Unlearning (MANU), a novel unlearning framework for MLLMs designed to selectively clip neurons based on their relative importance to the targeted forget data, curated for different modalities. |
Zheyuan Liu; Guangyao Dou; Xiangchi Yuan; Chunhui Zhang; Zhaoxuan Tan; Meng Jiang; |
| 242 | Disentangling Biased Knowledge from Reasoning in Large Language Models Via Machine Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While these methods have experimentally proven effective, they can still be sub-optimum in fully disentangling biases from reasoning. To address this gap, we propose Selective Disentanglement Unlearning (SDU), a novel unlearning framework that selectively removes biased knowledge while preserving reasoning capabilities. |
Zheyuan Liu; Suraj Maharjan; Fanyou Wu; Rahil Parikh; Belhassen Bayar; Srinivasan H. Sengamedu; Meng Jiang; |
| 243 | M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Multidimensional Multi-Agent Debate (M-MAD), a systematic LLM-based multi-agent framework for advanced LLM-as-a-judge MT evaluation. |
Zhaopeng Feng; Jiayuan Su; Jiamei Zheng; Jiahan Ren; Yan Zhang; Jian Wu; Hongwei Wang; Zuozhu Liu; |
| 244 | From Selection to Generation: A Survey of LLM-based Active Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the increasing importance of high-quality data and efficient model training in the era of LLMs, we present a comprehensive survey on LLM-based Active Learning. |
Yu Xia; Subhojyoti Mukherjee; Zhouhang Xie; Junda Wu; Xintong Li; Ryan Aponte; Hanjia Lyu; Joe Barrow; Hongjie Chen; Franck Dernoncourt; Branislav Kveton; Tong Yu; Ruiyi Zhang; Jiuxiang Gu; Nesreen K. Ahmed; Yu Wang; Xiang Chen; Hanieh Deilamsalehy; Sungchul Kim; Zhengmian Hu; Yue Zhao; Nedim Lipka; Seunghyun Yoon; Ting-Hao Kenneth Huang; Zichao Wang; Puneet Mathur; Soumyabrata Pal; Koyel Mukherjee; Zhehao Zhang; Namyong Park; Thien Huu Nguyen; Jiebo Luo; Ryan A. Rossi; Julian McAuley; |
| 245 | Improving Medical Large Vision-Language Models with Abnormal-Aware Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we collect a Medical Abnormalities Unveiling (MAU) dataset and propose a two-stage training method for UMed-LVLM training. |
Yucheng Zhou; Lingran Song; Jianbing Shen; |
| 246 | Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the gap, we propose ContextualJudgeBench, a judge benchmark with 2,000 challenging response pairs across eight splits inspired by real-world contextual evaluation scenarios. |
Austin Xu; Srijan Bansal; Yifei Ming; Semih Yavuz; Shafiq Joty; |
| 247 | CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the unique challenges of unlearning in CLIP, a prominent multimodal model that aligns visual and textual representations. |
Tianyu Yang; Lisen Dai; Xiangqi Wang; Minhao Cheng; Yapeng Tian; Xiangliang Zhang; |
| 248 | DRPruning: Efficient Large Language Model Pruning Through Distributionally Robust Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Structured pruning reduces model size and speeds up inference but often causes uneven degradation across domains, leading to biased performance. To address this, we propose *DRPruning*, a method that dynamically adjusts the data distribution during training to restore balanced performance across heterogeneous and multi-tasking data. |
Hexuan Deng; Wenxiang Jiao; Xuebo Liu; Jing Li; Min Zhang; Zhaopeng Tu; |
| 249 | On The Mutual Influence of Gender and Occupation in LLM Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We examine LLM representations of gender for first names in various occupational contexts to study how occupations and the gender perception of first names in LLMs influence each other mutually. |
Haozhe An; Connor Baumler; Abhilasha Sancheti; Rachel Rudinger; |
| 250 | Language Model Probabilities Are Not Calibrated in Numeric Contexts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Some statements have one well-defined continuation (e. g. , “the Eiffel Tower is in [Paris]), whereas others have a natural distribution over multiple options (e. g. , “the weighted coin flip was [Heads/Tails]. ) We argue that language model (LM) outputs should capture these natural distributions. |
Charles Lovering; Michael Krumdick; Viet Dac Lai; Varshini Reddy; Seth Ebner; Nilesh Kumar; Rik Koncel-Kedziorski; Chris Tanner; |
| 251 | Rethinking Repetition Problems of LLMs in Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we formally define structural repetition and propose an efficient decoding approach called RPG, which stands for Repetition Penalization based on Grammar, to alleviate the repetition problems in code generation for LLMs. |
Yihong Dong; Yuchen Liu; Xue Jiang; Bin Gu; Zhi Jin; Ge Li; |
| 252 | LoGU: Long-form Generation with Uncertainty Expressions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce the task of Long-form Generation with Uncertainty (LoGU). |
Ruihan Yang; Caiqi Zhang; Zhisong Zhang; Xinting Huang; Sen Yang; Nigel Collier; Dong Yu; Deqing Yang; |
| 253 | Computation Mechanism Behind LLM Position Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show how LLMs enforce certain computational mechanisms to allow for the aforementioned tolerance in position perturbations. |
Chi Han; Heng Ji; |
| 254 | BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle the challenge of embedding realistic human personality traits into LLMs. |
Wenkai Li; Jiarui Liu; Andy Liu; Xuhui Zhou; Mona T. Diab; Maarten Sap; |
| 255 | LLMs Can Simulate Standardized Patients Via Agent Coevolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this focus has overlooked the critical need for patient agents to learn a standardized presentation pattern that transforms data into human-like patient responses through unsupervised simulations. To address this gap, we propose EvoPatient, a novel simulated patient framework in which a patient agent and doctor agents simulate the diagnostic process through multi-turn dialogues, simultaneously gathering experience to improve the quality of both questions and answers, ultimately enabling human doctor training. |
Zhuoyun Du; LujieZheng LujieZheng; Renjun Hu; Yuyang Xu; Xiawei Li; Ying Sun; Wei Chen; Jian Wu; Haolei Cai; Haochao Ying; |
| 256 | Cuckoo: An IE Free Rider Hatched By Massive Nutrition in LLM’s Nest Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that IE models can act as free riders on LLM resources by reframing next-token prediction into extraction for tokens already present in the context. |
Letian Peng; Zilong Wang; Feng Yao; Jingbo Shang; |
| 257 | SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While recent efforts explore continuous-space reasoning, they often require full-model fine-tuning and suffer from catastrophic forgetting, limiting their applicability to state-of-the-art LLMs that already perform well in zero-shot settings with a proper instruction. To address this challenge, we propose a novel approach for continuous-space reasoning that does not require modifying the LLM. |
Yige Xu; Xu Guo; Zhiwei Zeng; Chunyan Miao; |
| 258 | SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our pilot study shows that the dynamic routing nature of MoE LLMs introduces unique challenges, leading to excessive forgetting, uncontrolled knowledge erasure and substantial utility drops when existing unlearning methods are applied. To address this, we propose a novel Selected-Expert Unlearning Framework (SEUF). |
Haomin Zhuang; Yihua Zhang; Kehan Guo; Jinghan Jia; Gaowen Liu; Sijia Liu; Xiangliang Zhang; |
| 259 | LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce LLaSE-G1, a LLaMA-based language model that incentivizes generalization capabilities for speech enhancement. |
Boyi Kang; Xinfa Zhu; Zihan Zhang; Zhen Ye; Mingshuai Liu; Ziqian Wang; Yike Zhu; Guobin Ma; Jun Chen; Longshuai Xiao; Chao Weng; Wei Xue; Lei Xie; |
| 260 | Personalized Generation In Large Model Era: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conceptualize PGen from a unified perspective, systematically formalizing its key components, core objectives, and abstract workflows. Based on this unified perspective, we propose a multi-level taxonomy, offering an in-depth review of technical advancements, commonly used datasets, and evaluation metrics across multiple modalities, personalized contexts, and tasks. |
Yiyan Xu; Jinghao Zhang; Alireza Salemi; Xinting Hu; Wenjie Wang; Fuli Feng; Hamed Zamani; Xiangnan He; Tat-Seng Chua; |
| 261 | REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce REAL-MM-RAG, an automatically generated benchmark designed to address four key properties essential for real-world retrieval: (i) multi-modal documents, (ii) enhanced difficulty, (iii) Realistic-RAG queries and (iv) accurate labeling. |
Navve Wasserman; Roi Pony; Oshri Naparstek; Adi Raz Goldfarb; Eli Schwartz; Udi Barzelay; Leonid Karlinsky; |
| 262 | SEA: Low-Resource Safety Alignment for Multimodal Large Language Models Via Synthetic Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing low-resource security alignment methods, including textual alignment, have been found to struggle with the security risks posed by additional modalities. To address this, we propose Synthetic Embedding augmented safety Alignment (SEA), which optimizes embeddings of additional modality through gradient updates to expand textual datasets. |
Weikai Lu; Hao Peng; Huiping Zhuang; Cen Chen; Ziqian Zeng; |
| 263 | The Impossibility of Fair LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze a variety of technical fairness frameworks and find inherent challenges in each that make the development of a fair LLM intractable. |
Jacy Reese Anthis; Kristian Lum; Michael Ekstrand; Avi Feller; Chenhao Tan; |
| 264 | Distilling An End-to-End Voice Assistant Without Instruction Training Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work proposes an alternative paradigm for training Speech LLMs without instruction data, using the response of a text-only LLM to transcripts as self-supervision. |
William Barr Held; Yanzhe Zhang; Weiyan Shi; Minzhi Li; Michael J Ryan; Diyi Yang; |
| 265 | SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce SynthesizeMe, an approach to inducing synthetic user personas from user interactions for personalized reward modeling. |
Michael J Ryan; Omar Shaikh; Aditri Bhagirath; Daniel Frees; William Barr Held; Diyi Yang; |
| 266 | Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate whether student models can acquire the capabilities of teacher models through knowledge distillation while avoiding watermark inheritance. |
Leyi Pan; Aiwei Liu; Shiyu Huang; Yijian Lu; Xuming Hu; Lijie Wen; Irwin King; Philip S. Yu; |
| 267 | Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose directly fine-tuning LLMs to predict response distributions by leveraging unique structural characteristics of survey data. |
Joseph Suh; Erfan Jahanparast; Suhong Moon; Minwoo Kang; Serina Chang; |
| 268 | The AI Gap: How Socioeconomic Status Affects Language Technology Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find systematic differences across SES groups in language technology usage (i. e. , frequency, performed tasks), interaction styles, and topics. |
Elisa Bassignana; Amanda Cercas Curry; Dirk Hovy; |
| 269 | Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a novel methodology for efficiently identifying inherent cross-lingual weaknesses in LLMs. |
Zixiang Xu; Yanbo Wang; Yue Huang; Xiuying Chen; Jieyu Zhao; Meng Jiang; Xiangliang Zhang; |
| 270 | VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing methods have explored text-based slow thinking or rudimentary visual assistance, they fall short of capturing the intricate, interleaved nature of human visual-verbal reasoning processes. To overcome these limitations and inspired by the mechanisms of slow thinking in human cognition, we introduce VisuoThink, a novel framework that seamlessly integrates visuospatial and linguistic domains. |
Yikun Wang; Siyin Wang; Qinyuan Cheng; Zhaoye Fei; Liang Ding; Qipeng Guo; Dacheng Tao; Xipeng Qiu; |
| 271 | Memorizing Is Not Enough: Deep Knowledge Injection Through Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a four-tier knowledge injection framework that systematically defines the levels of knowledge injection: memorization, retrieval, reasoning, and association. |
Ruoxi Xu; Yunjie Ji; Boxi Cao; Yaojie Lu; Hongyu Lin; Xianpei Han; Ben He; Yingfei Sun; Xiangang Li; Le Sun; |
| 272 | Masks Can Be Learned As An Alternative to Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate how to sparsify a pre-trained dense large language model into a mixture-of-experts (MoE) architecture for faster inference. |
Peiyu Liu; Tianwen Wei; Bo Zhu; Xin Zhao; Shuicheng Yan; |
| 273 | Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous approaches to learning from errors synthesize training data by solely extrapolating from isolated bad cases, thereby failing to generalize the extensive patterns inherent within these cases. This paper presents Self-Error-Instruct (SEI), a framework that addresses these model weaknesses and synthesizes more generalized targeted training data. |
Erxin Yu; Jing Li; Ming Liao; Qi Zhu; Boyang Xue; Minghui Xu; Baojun Wang; Lanqing Hong; Fei Mi; Lifeng Shang; |
| 274 | FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods rely on traditional federated averaging of LoRA adapters, resulting in inexact updates. To address this, we propose Federated Exact LoRA, or FedEx-LoRA, which adds a residual error term to the pre-trained frozen weight matrix. |
Raghav Singhal; Kaustubh Ponkshe; Praneeth Vepakomma; |
| 275 | CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, the current studies are limited to a single scenario, either cross-lingual or cross-modal, leaving a gap in the exploration of hallucinations in the joint cross-lingual and cross-modal scenarios. Motivated by this, we introduce a novel joint Cross-lingual and Cross-modal Hallucinations benchmark (CCHall) to fill this gap. |
Yongheng Zhang; Xu Liu; Ruoxi Zhou; Qiguang Chen; Hao Fei; Wenpeng Lu; Libo Qin; |
| 276 | VLSBench: Unveiling Visual Leakage in Multimodal Safety Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, we empirically compare textual and multimodal alignment methods on VLSBench and find that textual alignment is effective enough for multimodal safety scenarios with VSIL, while multimodal alignment is preferable for safety scenarios without VSIL. |
Xuhao Hu; Dongrui Liu; Hao Li; Xuanjing Huang; Jing Shao; |
| 277 | HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Hierarchical Self-Contrastive Rewarding (HSCR), a novel approach that addresses two critical challenges in Med-VLM alignment: 1) Cost-effective generation of high-quality preference data; 2) Capturing nuanced and context-aware preferences for improved alignment. |
Songtao Jiang; Yan Zhang; Yeying Jin; Zhihang Tang; Yangyang Wu; Yang Feng; Jian Wu; Zuozhu Liu; |
| 278 | Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by Dual Process Theory, which distinguishes between instinctive and deliberate cognitive modes in human reasoning, we propose FOCUS, a plug-and-play approach that dynamically adapts to the complexity of questions, combining fast intuitive judgments with deliberate analytical reasoning to enhance the vision-language reasoning capability of the MLLM. |
Songtao Jiang; Chenyi Zhou; Yan Zhang; Yeying Jin; Zuozhu Liu; |
| 279 | Hybrid Preferences: Learning to Route Instances for Human Vs. AI Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce HyPER, a Hybrid Preference routER that defers an annotation to either humans or LMs, achieving better annotation quality while reducing the cost of human-only annotation. |
Lester James Validad Miranda; Yizhong Wang; Yanai Elazar; Sachin Kumar; Valentina Pyatkin; Faeze Brahman; Noah A. Smith; Hannaneh Hajishirzi; Pradeep Dasigi; |
| 280 | ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose ToolCoder, a novel framework that reformulates tool learning as a code generation task. |
Hanxing Ding; Shuchang Tao; Liang Pang; Zihao Wei; Jinyang Gao; Bolin Ding; Huawei Shen; Xueqi Cheng; |
| 281 | Intuitive Fine-Tuning: Towards Simplifying Alignment Into A Single Process Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we interpret SFT and PO with two sub-processes — *Preference Estimation* and *Transition Optimization* — defined at token level within the Markov Decision Process (MDP). |
Ermo Hua; Biqing Qi; Kaiyan Zhang; Kai Tian; Xingtai Lv; Ning Ding; Bowen Zhou; |
| 282 | Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Drawing on insights from linguistics and complexity theory, we hypothesize that effective transfer occurs when two conditions are met: the formal language should capture the dependency structures present in natural language, and it should remain within the computational limitations of the model architecture. |
Michael Y. Hu; Jackson Petty; Chuan Shi; William Merrill; Tal Linzen; |
| 283 | HiddenDetect: Detecting Jailbreak Attacks Against Multimodal Large Language Models Via Monitoring Hidden States Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work , we investigate whether LVLMs inherently encode safety-relevant signals within their internal activations during inference. |
Yilei Jiang; Xinyan Gao; Tianshuo Peng; Yingshui Tan; Xiaoyong Zhu; Bo Zheng; Xiangyu Yue; |
| 284 | “Give Me BF16 or Give Me Death”? Accuracy-Performance Trade-Offs in LLM Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a comprehensive empirical study of quantized accuracy, evaluating popular quantization formats (FP8, INT8, INT4) across academic benchmarks and real-world tasks, on the entire Llama-3. |
Eldar Kurtic; Alexandre Noll Marques; Shubhra Pandit; Mark Kurtz; Dan Alistarh; |
| 285 | Crowdsource, Crawl, or Generate? Creating SEA-VL, A Multicultural Vision-Language Dataset for Southeast Asia Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite Southeast Asia’s (SEA) extraordinary linguistic and cultural diversity, the region remains significantly underrepresented in vision-language (VL) research, resulting in AI models that inadequately capture SEA cultural nuances. To fill this gap, we present SEA-VL, an open-source initiative dedicated to developing culturally relevant high-quality datasets for SEA languages. |
Samuel Cahyawijaya; Holy Lovenia; Joel Ruben Antony Moniz; Tack Hwa Wong; Mohammad Rifqi Farhansyah; Thant Thiri Maung; Frederikus Hudi; David Anugraha; Muhammad Ravi Shulthan Habibi; Muhammad Reza Qorib; Amit Agarwal; Joseph Marvin Imperial; Hitesh Laxmichand Patel; Vicky Feliren; Bahrul Ilmi Nasution; Manuel Antonio Rufino; Genta Indra Winata; Rian Adam Rajagede; Carlos Rafael Catalan; Mohamed Fazli Mohamed Imam; Priyaranjan Pattnayak; Salsabila Zahirah Pranida; Kevin Pratama; Yeshil Bangera; Adisai Na-Thalang; Patricia Nicole Monderin; Yueqi Song; Christian Simon; Lynnette Hui Xian Ng; Richardy Lobo Sapan; Taki Hasan Rafi; Bin Wang; Supryadi; Kanyakorn Veerakanjana; Piyalitt Ittichaiwong; Matthew Theodore Roque; Karissa Vincentio; Takdanai Kreangphet; Phakphum Artkaew; Kadek Hendrawan Palgunadi; Yanzhi Yu; Rochana Prih Hastuti; William Nixon; Mithil Bangera; Adrian Xuan Wei Lim; Aye Hninn Khine; Hanif Muhammad Zhafran; Teddy Ferdinan; Audra Aurora Izzani; Ayushman Singh; Evan Evan; Jauza Akbar Krito; Michael Anugraha; Fenal Ashokbhai Ilasariya; Haochen Li; John Amadeo Daniswara; Filbert Aurelian Tjiaranata; Eryawan Presma Yulianrifat; Can Udomcharoenchaikit; Fadil Risdian Ansori; Mahardika Krisna Ihsani; Giang Nguyen; Anab Maulana Barik; Dan John Velasco; Rifo Ahmad Genadi; Saptarshi Saha; Chengwei Wei; Isaiah Edri W. Flores; Kenneth Chen Ko Han; Anjela Gail D. Santos; Wan Shen Lim; Kaung Si Phyo; Tim Santos; Meisyarah Dwiastuti; Jiayun Luo; Jan Christian Blaise Cruz; Ming Shan Hee; Ikhlasul Akmal Hanif; M.Alif Al Hakim; Muhammad Rizky Sya’ban; Kun Kerdthaisong; Lester James Validad Miranda; Fajri Koto; Tirana Noor Fatyanosa; Alham Fikri Aji; Jostin Jerico Rosal; Jun Kevin; Robert Wijaya; Onno P. Kampman; Ruochen Zhang; Börje F. Karlsson; Peerat Limkonchotiwat; |
| 286 | Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Browsing Lost Unformed Recollections, a tip-of-the-tongue known-item search and reasoning benchmark for general AI assistants. |
Sky CH-Wang; Darshan Girish Deshpande; Smaranda Muresan; Anand Kannappan; Rebecca Qian; |
| 287 | Culture Is Not Trivia: Sociocultural Theory for Cultural NLP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this leads to a number of recurring limitations: coarse national boundaries fail to capture nuanced differences that lay within them, limited coverage restricts datasets to only a subset of usually highly-represented cultures, and a lack of dynamicity results in static cultural benchmarks that do not change as culture evolves. In this position paper, we argue that these methodological limitations are symptomatic of a theoretical gap. |
Naitian Zhou; David Bamman; Isaac L. Bleaman; |
| 288 | Efficient Long Context Language Model Retrieval with Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a new compression approach tailored for LCLM retrieval, which is trained to maximize the retrieval performance while minimizing the length of the compressed passages. |
Minju Seo; Jinheon Baek; Seongyun Lee; Sung Ju Hwang; |
| 289 | UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs’ Memorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces UnSeenTimeQA, a novel data contamination-free time-sensitive question-answering (TSQA) benchmark. |
Md Nayem Uddin; Amir Saeidi; Divij Handa; Agastya Seth; Tran Cao Son; Eduardo Blanco; Steven Corman; Chitta Baral; |
| 290 | Design Choices for Extending The Context Length of Visual Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, existing open-source VLMs lack systematic exploration into extending their context length, and commercial models often provide limited details. To tackle this, we aim to establish an effective solution that enhances long context performance of VLMs while preserving their capacities in short context scenarios. |
Mukai Li; Lei Li; Shansan Gong; Qi Liu; |
| 291 | One Missing Piece for Open-Source Reasoning Models: A Dataset to Mitigate Cold-Starting Short CoT LLMs in RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present the Long CoT Collection, a dataset of 100K CoT rationales annotated using existing short CoT LLMs. |
Hyungjoo Chae; Dongjin Kang; Jihyuk Kim; Beong-woo Kwak; Sunghyun Park; Haeju Park; Jinyoung Yeo; Moontae Lee; Kyungjae Lee; |
| 292 | STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key contribution is exploring the interpolation between structured and unstructured pruning, to propose a novel structured-then-unstructured (STUN) approach outperforming both of structured or unstructured pruning, especially for MoEs. |
Jaeseong Lee; Seung-won Hwang; Aurick Qiao; Daniel F Campos; Zhewei Yao; Yuxiong He; |
| 293 | Improving Automatic Evaluation of Large Language Models (LLMs) in Biomedical Relation Extraction Via LLMs-as-the-Judge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our findings reveal that it happens mainly because relations extracted by LLMs do not adhere to any standard format. To address this, we propose structured output formatting for LLM-generated responses that helps LLM-Judges to improve their performance by about 15% (on average). |
Md Tahmid Rahman Laskar; Israt Jahan; Elham Dolatabadi; Chun Peng; Enamul Hoque; Jimmy Huang; |
| 294 | Judging The Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present a comprehensive evaluation of 13 open-source LVLMs as judges for diverse chart comprehension and reasoning tasks. |
Md Tahmid Rahman Laskar; Mohammed Saidul Islam; Ridwan Mahbub; Ahmed Masry; Mizanur Rahman; Amran Bhuiyan; Mir Tafseer Nayeem; Shafiq Joty; Enamul Hoque; Jimmy Huang; |
| 295 | Low-Bit Quantization Favors Undertrained LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This poses a potential challenge for low-bit quantization in the future and highlights the need for awareness of a model’s training level when evaluating low-bit quantization research. To facilitate future research on this problem, we release all the 1500+ quantized checkpoints used in this work at https://huggingface. |
Xu Ouyang; Tao Ge; Thomas Hartvigsen; Zhisong Zhang; Haitao Mi; Dong Yu; |
| 296 | Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, humans can quickly form impressions of a model’s capabilities by observing only a few samples. To mimic this, we propose the Evaluation Agent framework, which employs human-like strategies for efficient, dynamic, multi-round evaluations using only a few samples per round, while offering detailed, user-tailored analyses. |
Fan Zhang; Shulin Tian; Ziqi Huang; Yu Qiao; Ziwei Liu; |
| 297 | A Strategic Coordination Framework of Small LMs Matches Large LMs in Data Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by collaborative human processes (e. g. , peer review), we propose a multiple small LMs involved framework, GRA, that aggregates specialized roles across small LMs to iterative refinement and quality control typically achieved by a single large LM. |
Xin Gao; Qizhi Pei; Zinan Tang; Yu Li; Honglin Lin; Jiang Wu; Lijun Wu; Conghui He; |
| 298 | TestNUC: Enhancing Test-Time Computing Approaches and Scaling Through Neighboring Unlabeled Data Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces a novel, linearly scaling approach, TestNUC, that improves test-time predictions by leveraging the local consistency of neighboring unlabeled data-it classifies an input instance by considering not only the model’s prediction on that instance but also on neighboring unlabeled instances. |
Henry Peng Zou; Zhengyao Gu; Yue Zhou; Yankai Chen; Weizhi Zhang; Liancheng Fang; Yibo Wang; Yangning Li; Kay Liu; Philip S. Yu; |
| 299 | Position-aware Automatic Circuit Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This limits their ability to capture cross-positional interactions or mechanisms that vary across positions. To address this gap, we propose two improvements to incorporate positionality into circuits, even on tasks containing variable-length examples. |
Tal Haklay; Hadas Orgad; David Bau; Aaron Mueller; Yonatan Belinkov; |
| 300 | Medical Graph RAG: Evidence-based Medical Large Language Model Via Graph Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MedGraphRAG, a novel graph-based Retrieval-Augmented Generation (RAG) framework designed to enhance LLMs in generating evidence-based medical responses, improving safety and reliability with private medical data. |
Junde Wu; Jiayuan Zhu; Yunli Qi; Jingkun Chen; Min Xu; Filippo Menolascina; Yueming Jin; Vicente Grau; |
| 301 | How to Mitigate Overfitting in Weak-to-strong Generalization? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate overfitting in weak-to-strong generalization, we propose a two-stage framework that simultaneously improves the quality of supervision signals and the quality of input questions. |
Junhao Shi; Qinyuan Cheng; Zhaoye Fei; Yining Zheng; Qipeng Guo; Xipeng Qiu; |
| 302 | OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With the evolution of multi-modal large language models ((M)LLMs), this dream is closer to reality, as (M)LLM-based Agents using computers, mobile phones and web browsers by operating within the environments and interfaces (e. g. , Graphical User Interface (GUI) and Command Line Interface (CLI)) provided by operating systems (OS) to automate tasks have significantly advanced. This paper presents a comprehensive survey on these advanced agents, designated as OS Agents. |
Xueyu Hu; Tao Xiong; Biao Yi; Zishu Wei; Ruixuan Xiao; Yurun Chen; Jiasheng Ye; Meiling Tao; Xiangxin Zhou; Ziyu Zhao; Yuhuai Li; Shengze Xu; Shenzhi Wang; Xinchen Xu; Shuofei Qiao; Zhaokai Wang; Kun Kuang; Tieyong Zeng; Liang Wang; Jiwei Li; Yuchen Eleanor Jiang; Wangchunshu Zhou; Guoyin Wang; Keting Yin; Zhou Zhao; Hongxia Yang; Fan Wu; Shengyu Zhang; Fei Wu; |
| 303 | Towards Economical Inference: Enabling DeepSeek’s Multi-Head Latent Attention in Any Transformer-based LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes the first data-efficient fine-tuning method for transitioning from MHA to MLA (**MHA2MLA**), which includes two key components: for *partial-RoPE*, we remove RoPE from dimensions of queries and keys that contribute less to the attention scores, for *low-rank approximation*, we introduce joint SVD approximations based on the pre-trained parameters of keys and values. |
Tao Ji; Bin Guo; Yuanbin Wu; Qipeng Guo; Shenlixing Shenlixing; Chenzhan Chenzhan; Xipeng Qiu; Qi Zhang; Tao Gui; |
| 304 | INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite this progress, the field currently encounters two main challenges: (1) the lack of a comprehensive LLM agent framework adaptable to a variety of financial tasks, and (2) the absence of standardized benchmarks and consistent datasets for assessing agent performance. To tackle these issues, we introduce InvestorBench, the first benchmark specifically designed for evaluating LLM-based agents in diverse financial decision-making contexts. |
Haohang Li; Yupeng Cao; Yangyang Yu; Shashidhar Reddy Javaji; Zhiyang Deng; Yueru He; Yuechen Jiang; Zining Zhu; K.p. Subbalakshmi; Jimin Huang; Lingfei Qian; Xueqing Peng; Jordan W. Suchow; Qianqian Xie; |
| 305 | Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We hypothesize that this issue arises because existing VLMs are not explicitly trained to generate texts that are accurately grounded in fine-grained image details. To enhance visual feedback during VLM training, we propose S-VCO (Symmetrical Visual Contrastive Optimization), a novel finetuning objective that steers the model toward capturing important visual details and aligning them with corresponding text tokens. |
Shengguang Wu; Fan-Yun Sun; Kaiyue Wen; Nick Haber; |
| 306 | Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While existing approaches have explored various decomposition strategies, they often lack effective mechanisms to identify and correct errors in intermediate reasoning steps, leading to cascading error propagation. To address these issues, we propose Table-Critic, a novel multi-agent framework that facilitates collaborative criticism and iterative refinement of the reasoning process until convergence to correct solutions. |
Peiying Yu; Guoxin Chen; Jingjing Wang; |
| 307 | Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Cross Lingual Auto Evaluation (CIA) Suite, an extensible framework that includes evaluator LLMs (Hercule) and a novel test set (Recon) specifically designed for multilingual evaluation. |
Sumanth Doddapaneni; Mohammed Safi Ur Rahman Khan; Dilip Venkatesh; Raj Dabre; Anoop Kunchukuttan; Mitesh M Khapra; |
| 308 | DREsS: Dataset for Rubric-based Essay Scoring on EFL Writing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we release DREsS, a large-scale, standard dataset for rubric-based automated essay scoring with 48. |
Haneul Yoo; Jieun Han; So-Yeon Ahn; Alice Oh; |
| 309 | Diffusion Models Through A Global Lens: Are They Culturally Inclusive? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we introduce CULTDIFF benchmark, evaluating whether state-of-the-art diffusion models can generate culturally specific images spanning ten countries. |
Zahra Bayramli; Ayhan Suleymanzade; Na Min An; Huzama Ahmad; Eunsu Kim; Junyeong Park; James Thorne; Alice Oh; |
| 310 | OS-Genesis: Automating GUI Agent Trajectory Construction Via Reverse Task Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Further, these approaches exhibit significant gaps between the generated data and online environments, alongside limited data diversity. To address this issue, we introduce OS-Genesis, a novel GUI data synthesis pipeline that overcomes the challenges above. |
Qiushi Sun; Kanzhi Cheng; Zichen Ding; Chuanyang Jin; Yian Wang; Fangzhi Xu; Zhenyu Wu; Chengyou Jia; Liheng Chen; Zhoumianze Liu; Ben Kao; Guohao Li; Junxian He; Yu Qiao; Zhiyong Wu; |
| 311 | PsyDial: A Large-scale Long-term Conversational Dataset for Mental Health Support Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although removing personally identifiable information is feasible, this process is labor-intensive. To address these challenges, we propose a novel privacy-preserving data reconstruction method that reconstructs real-world client-counselor dialogues while mitigating privacy concerns. |
Huachuan Qiu; Zhenzhong Lan; |
| 312 | Knowledge Boundary of Large Language Models: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this survey, we propose a comprehensive definition of the LLM knowledge boundary and introduce a formalized taxonomy categorizing knowledge into four distinct types. |
Moxin Li; Yong Zhao; Wenxuan Zhang; Shuaiyi Li; Wenya Xie; See-Kiong Ng; Tat-Seng Chua; Yang Deng; |
| 313 | DiffPO: Diffusion-styled Preference Optimization for Inference Time Alignment of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach, Diffusion-styled Preference Optimization (DiffPO), which provides an efficient and policy-agnostic solution for aligning LLMs with humans. |
Ruizhe Chen; Wenhao Chai; Zhifei Yang; Xiaotian Zhang; Ziyang Wang; Tony Quek; Joey Tianyi Zhou; Soujanya Poria; Zuozhu Liu; |
| 314 | Do Large Language Models Have An English Accent? Evaluating and Improving The Naturalness of Multilingual LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the importance of this issue, the naturalness of multilingual LLM outputs has received limited attention. In this paper, we address this gap by introducing novel automatic corpus-level metrics to assess the lexical and syntactic naturalness of LLM outputs in a multilingual context. |
Yanzhu Guo; Simone Conia; Zelin Zhou; Min Li; Saloni Potdar; Henry Xiao; |
| 315 | Information Locality As An Inductive Bias for Neural Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the case of neural language models (LMs), debates persist as to whether these biases align with or diverge from human processing constraints. To address this issue, we propose a quantitative framework that allows for controlled investigations into the nature of these biases. |
Taiga Someya; Anej Svete; Brian DuSell; Timothy J. O’Donnell; Mario Giulianelli; Ryan Cotterell; |
| 316 | KnowShiftQA: How Robust Are RAG Systems When Textbook Knowledge Shifts in K-12 Education? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, discrepancies between these textbooks and the parametric knowledge inherent in Large Language Models (LLMs) can undermine the effectiveness of RAG systems. To systematically investigate RAG system robustness against such knowledge discrepancies, we introduce KnowShiftQA. |
Tianshi Zheng; Weihan Li; Jiaxin Bai; Weiqi Wang; Yangqiu Song; |
| 317 | Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our results indicate that a reliable diversity measure should properly account for both inter-sample differences and the information density in the sample space. Building on this, we propose NovelSum, a new diversity metric based on sample-level “novelty. ” |
Yuming Yang; Yang Nan; Junjie Ye; Shihan Dou; Xiao Wang; Shuo Li; Huijie Lv; Tao Gui; Qi Zhang; Xuanjing Huang; |
| 318 | L4Q: Parameter Efficient Quantization-Aware Fine-Tuning on Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose L4Q, a method that integrates Quantization-Aware Training (QAT) with LoRA. |
Hyesung Jeon; Yulhwa Kim; Jae-Joon Kim; |
| 319 | RATIONALYST: Pre-training Process-Supervision for Improving Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The reasoning steps generated by LLMs might be incomplete, as they mimic logical leaps common in everyday communication found in their pre-training data: underlying rationales are frequently left implicit (unstated). To address this challenge, we introduce RATIONALYST, a model for process-supervision of reasoning based on pre-training on a vast collection of rationale annotations extracted from unlabeled data. |
Dongwei Jiang; Guoxuan Wang; Yining Lu; Andrew Wang; Jingyu Zhang; Chuyu Liu; Benjamin Van Durme; Daniel Khashabi; |
| 320 | ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, progress has been hindered by a lack of reliable evaluation datasets. To address this, we present ToolHop, a dataset comprising 995 user queries and 3,912 associated tools, specifically designed for rigorous evaluation of multi-hop tool use. |
Junjie Ye; Zhengyin Du; Xuesong Yao; Weijian Lin; Yufei Xu; Zehui Chen; Zaiyuan Wang; Sining Zhu; Zhiheng Xi; Siyu Yuan; Tao Gui; Qi Zhang; Xuanjing Huang; Jiecao Chen; |
| 321 | MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Leveraging multi-agentic frameworks to enhance large language models (LLMs) has demonstrated significant potential recently, with most existing studies focusing on prompting and developing workflows with frozen LLMs. In this paper, we aim to further unleash the power of such multi-agentic frameworks for post-training LLMs for better collaboration. |
Chanwoo Park; Seungju Han; Xingzhi Guo; Asuman E. Ozdaglar; Kaiqing Zhang; Joo-Kyung Kim; |
| 322 | Advancing Zero-shot Text-to-Speech Intelligibility Across Diverse Domains Via Preference Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new dataset, named the Intelligibility Preference Speech Dataset (INTP), and extend the Direct Preference Optimization (DPO) framework to accommodate diverse TTS architectures. |
Xueyao Zhang; Yuancheng Wang; Chaoren Wang; Ziniu Li; Zhuo Chen; Zhizheng Wu; |
| 323 | Learning to Rewrite: Generalized LLM-Generated Text Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Learning2Rewrite, a novel framework to detect LLM-generated text with exceptional generalization to unseen domains. |
Wei Hao; Ran Li; Weiliang Zhao; Junfeng Yang; Chengzhi Mao; |
| 324 | IOPO: Empowering LLMs with Complex Instruction Following Via Input-Output Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, this paper introduces Trace, a benchmark for improving and evaluating the complex instruction-following ability, which consists of 120K training data and 1K evaluation data. |
Xinghua Zhang; Haiyang Yu; Cheng Fu; Fei Huang; Yongbin Li; |
| 325 | PlanningArena: A Modular Benchmark for Multidimensional Evaluation of Planning and Tool Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent studies have revealed that the performance of LLMs can be significantly improved by integrating external tools. Based on this, we propose a benchmark framework called PlanningArena, which aims to simulate real application scenarios and provide a series of apps and API tools that may be involved in the actual planning process. |
Zihan Zheng; Tianle Cui; Chuwen Xie; Jiahui Pan; Qianglong Chen; Lewei He; |
| 326 | AAD-LLM: Neural Attention-Driven Auditory Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing models do not incorporate this selectivity, limiting their ability to generate perception-aligned responses. To address this, we introduce intention-informed auditory scene understanding (II-ASU) and present Auditory Attention-Driven LLM (AAD-LLM), a prototype system that integrates brain signals to infer listener attention. |
Xilin Jiang; Sukru Samet Dindar; Vishal Choudhari; Stephan Bickel; Ashesh Mehta; Guy M McKhann; Daniel Friedman; Adeen Flinker; Nima Mesgarani; |
| 327 | Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we discover that code-switching in red-teaming queries can effectively elicit undesirable behaviors of LLMs, which are common practices in natural language. |
Haneul Yoo; Yongjin Yang; Hwaran Lee; |
| 328 | Exploring Forgetting in Large Language Model Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on our revised assessment of forgetting metrics, we explored low-cost, straightforward methods to mitigate forgetting during the pre-training phase. |
Chonghua Liao; Ruobing Xie; Xingwu Sun; Haowen Sun; Zhanhui Kang; |
| 329 | NvAgent: Automated Data Visualization from Natural Language Via Collaborative Agent Workflow Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they often struggle with complex queries that require reasoning across multiple tables. To address this limitation, we propose a collaborative agent workflow, termed **nvAgent**, for NL2Vis. |
Geliang Ouyang; Jingyao Chen; Zhihe Nie; Yi Gui; Yao Wan; Hongyu Zhang; Dongping Chen; |
| 330 | A Large-Scale Real-World Evaluation of An LLM-Based Virtual Teaching Assistant Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we develop an LLM-based VTA and deploy it in an introductory AI programming course with 477 graduate students. |
Sunjun Kweon; Sooyohn Nam; Hyunseung Lim; Hwajung Hong; Edward Choi; |
| 331 | Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While great success has been achieved in building vision models with Contrastive Language-Image Pre-training (CLIP) over Internet-scale image-text pairs, building transferable Graph Neural Networks (GNNs) with CLIP pipeline is challenging because of the scarcity of labeled data and text supervision, different levels of downstream tasks, and the conceptual gaps between domains. In this work, to address these issues, we propose a multi-modal prompt learning paradigm to effectively adapt pre-trained GNN to downstream tasks and data, given only a few semantically labeled samples, each with extremely weak text supervision. |
Zihao Li; Lecheng Zheng; Bowen Jin; Dongqi Fu; Baoyu Jing; Yikun Ban; Jingrui He; Jiawei Han; |
| 332 | ChatBench: From Static Benchmarks to Human-AI Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, standard benchmarks, such as MMLU, measure LLM capabilities in isolation (i. e. , “AI-alone”). Here, we design and conduct a user study to convert MMLU questions into user-AI conversations, by seeding the user with the question and having them carry out a conversation with the LLM to answer their question. |
Serina Chang; Ashton Anderson; Jake M. Hofman; |
| 333 | UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore how to unify dense retrieval and response generation for large language models in conversation. |
Fengran Mo; Yifan Gao; Chuan Meng; Xin Liu; Zhuofeng Wu; Kelong Mao; Zhengyang Wang; Pei Chen; Zheng Li; Xian Li; Bing Yin; Meng Jiang; |
| 334 | Maximizing The Effectiveness of Larger BERT Models for Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through Canonical Correlation Analysis, we identify that these methods fail to fully exploit the potential advantages of larger teachers. To address this, we propose an improved distillation approach that effectively enhances knowledge transfer. |
Wen-Shu Fan; Su Lu; Shangyu Xing; Xin-Chun Li; De-Chuan Zhan; |
| 335 | X-TURING: Towards An Enhanced and Efficient Turing Test for Long-Term Dialogue Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes X-Turing, which enhances the original test with a burst dialogue pattern, allowing more dynamic exchanges using consecutive messages. |
Weiqi Wu; Hongqiu Wu; Hai Zhao; |
| 336 | Colloquial Singaporean English Style Transfer with Fine-Grained Explainable Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Style transfer between Singlish and Standard (formal) English is vital for various applications, yet existing methods often lack explainability and fine-grained control. To fill this gap, we contribute in two key ways. |
Jinggui Liang; Dung Vo; Yap Hong Xian; Hai Leong Chieu; Kian Ming A. Chai; Jing Jiang; Lizi Liao; |
| 337 | Pattern Recognition or Medical Knowledge? The Problem with Multiple-Choice Questions in Medicine Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large Language Models (LLMs) such as ChatGPT demonstrate significant potential in the medical domain and are often evaluated using multiple-choice questions (MCQs) modeled on exams like the USMLE. |
Maxime Griot; Jean Vanderdonckt; Demet Yuksel; Coralie Hemptinne; |
| 338 | Sharper and Faster Mean Better: Towards More Efficient Vision-Language Model for Hour-scale Long Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite existing multimodal language models showing impressive performance on the video understanding task, extremely long videos still pose significant challenges to language model’s context length, memory consumption, and computational complexity. To address these issues, we propose a vision-language model named Sophia for long video understanding, which can efficiently handle hour-scale long videos. |
Daoze Zhang; Yuze Zhao; Jintao Huang; Yingda Chen; |
| 339 | Keys to Robust Edits: From Theoretical Insights to Practical Advances Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our solution introduces Robust Edit Pathway (REP), a plug-and-play module that: (1) disentangles editing keys from native model representations; (2) dynamically adjusts keys via contrastive learning to achieve robustness-specificity balance. |
Jianhao Yan; Futing Wang; Yun Luo; Yafu Li; Yue Zhang; |
| 340 | The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, this aspect is largely overlooked in existing benchmarks for evaluating machines’ ToM capabilities, due to their usage of short narratives without global context, especially personal background of characters. In this paper, we verify the importance of comprehensive contextual understanding about personal backgrounds in ToM and assess the performance of LLMs in such complex scenarios. |
Chulun Zhou; Qiujing Wang; Mo Yu; Xiaoqian Yue; Rui Lu; Jiangnan Li; Yifan Zhou; Shunchi Zhang; Jie Zhou; Wai Lam; |
| 341 | Towards Effective and Efficient Continual Pre-training of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Continual pre-training (CPT) has been an important approach for adapting language models to specific domains or tasks. In this paper, we comprehensively study its key designs to balance the new abilities while retaining the original abilities, and present an effective CPT method that can greatly improve the Chinese language ability and scientific reasoning ability of LLMs. |
Jie Chen; Zhipeng Chen; Jiapeng Wang; Kun Zhou; Yutao Zhu; Jinhao Jiang; Yingqian Min; Xin Zhao; Zhicheng Dou; Jiaxin Mao; Yankai Lin; Ruihua Song; Jun Xu; Xu Chen; Rui Yan; Zhewei Wei; Di Hu; Wenbing Huang; Ji-Rong Wen; |
| 342 | Rethinking Reward Model Evaluation Through The Lens of Reward Overoptimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing benchmarks for reward models show a weak correlation with the performance of optimized policies, suggesting that they fail to accurately assess the true capabilities of RMs. To bridge this gap, we explore several evaluation designs through the lens of reward overoptimization, i. e. , a phenomenon that captures both how well the reward model aligns with human preferences and the dynamics of the learning signal it provides to the policy. |
Sunghwan Kim; Dongjin Kang; Taeyoon Kwon; Hyungjoo Chae; Dongha Lee; Jinyoung Yeo; |
| 343 | Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose UniQuanF (Unified Quantization with Flexible Mapping), an accurate quantization method for LLMs. |
Seungcheol Park; Jeongin Bae; Beomseok Kwon; Minjun Kim; Byeongwook Kim; Se Jung Kwon; U Kang; Dongsoo Lee; |
| 344 | Representation Bending for Large Language Model Safety Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces RepBend, a novel approach that fundamentally disrupts the representations underlying harmful behaviors in LLMs, offering a scalable solution to enhance (potentially inherent) safety. |
Ashkan Yousefpour; Taeheon Kim; Ryan Sungmo Kwon; Seungbeen Lee; Wonje Jeung; Seungju Han; Alvin Wan; Harrison Ngan; Youngjae Yu; Jonghyun Choi; |
| 345 | Croppable Knowledge Graph Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel KGE training framework MED. |
Yushan Zhu; Wen Zhang; Zhiqiang Liu; Mingyang Chen; Lei Liang; Huajun Chen; |
| 346 | Boosting LLM’s Molecular Structure Elucidation with Knowledge Enhanced Tree Search Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a Knowledge-enhanced reasoning framework for Molecular Structure Elucidation (K-MSE), leveraging Monte Carlo Tree Search for test-time scaling as a plugin. |
Xiang Zhuang; Bin Wu; Jiyu Cui; Kehua Feng; Xiaotong Li; Huabin Xing; Keyan Ding; Qiang Zhang; Huajun Chen; |
| 347 | Towards Robust and Efficient Federated Low-Rank Adaptation with Heterogeneous Clients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we introduce LoRA-A^2 (Low Rank Adaptation with Alternating freeze and Adaptive rank selection), which demonstrates robustness in challenging settings with low ranks and high data heterogeneity. |
Jabin Koo; Minwoo Jang; Jungseul Ok; |
| 348 | Literature Meets Data: A Synergistic Approach to Hypothesis Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While both have proven effective in generating novel and plausible hypotheses, it remains an open question whether they can complement each other. To address this, we develop the first method that combines literature-based insights with data to perform LLM-powered hypothesis generation. |
Haokun Liu; Yangqiaoyu Zhou; Mingxuan Li; Chenfei Yuan; Chenhao Tan; |
| 349 | LegalAgentBench: Evaluating LLM Agents in Legal Domain Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing general-domain benchmarks are unable to fully capture the complexity and subtle nuances inherent in real-world judicial cognition and decision-making. Therefore, we propose LegalAgentBench, a comprehensive benchmark specifically designed to evaluate LLM Agents in the Chinese legal domain. |
Haitao Li; Junjie Chen; Jingli Yang; Qingyao Ai; Wei Jia; Youfeng Liu; Kai Lin; Yueyue Wu; Guozhi Yuan; Yiran Hu; Wuyue Wang; Yiqun Liu; Minlie Huang; |
| 350 | SHARE: An SLM-based Hierarchical Action CorREction Assistant for Text-to-SQL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose **SHARE**, a **S**LM-based **H**ierarchical **A**ction cor**RE**ction assistant that enables LLMs to perform more precise error localization and efficient correction. |
Ge Qu; Jinyang Li; Bowen Qin; Xiaolong Li; Nan Huo; Chenhao Ma; Reynold Cheng; |
| 351 | Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current data selection methods, such as natural language quality assessments, diversity-based filters, and classifier-based approaches, are limited by single-dimensional evaluation or redundancy-focused strategies. To address these gaps, we propose four dimensions to evaluate data quality: professionalism, readability, reasoning, and cleanliness. |
Xinlin Zhuang; Jiahui Peng; Ren Ma; Yinfan Wang; Tianyi Bai; Xingjian Wei; Qiu Jiantao; Chi Zhang; Ying Qian; Conghui He; |
| 352 | Sparse Latents Steer Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we leverage Sparse Autoencoders (SAEs) within the LLaMA Scope to uncover sparse, interpretable latents that govern RAG behaviors. |
Chunlei Xin; Shuheng Zhou; Huijia Zhu; Weiqiang Wang; Xuanang Chen; Xinyan Guan; Yaojie Lu; Hongyu Lin; Xianpei Han; Le Sun; |
| 353 | The Harmonic Structure of Information Contours Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These fluctuations are often explained by factors such as syntactic constraints, stylistic choices, or audience design. In this work, we explore an alternative perspective: that these fluctuations may be influenced by an implicit linguistic pressure towards periodicity, where the information rate oscillates at regular intervals, potentially across multiple frequencies simultaneously. |
Eleftheria Tsipidi; Samuel Kiegeland; Franz Nowak; Tianyang Xu; Ethan Wilcox; Alex Warstadt; Ryan Cotterell; Mario Giulianelli; |
| 354 | WebWalker: Benchmarking LLMs in Web Traversal Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, traditional search engines may retrieve shallow content, limiting the ability of LLMs to handle complex, multi-layered information. To address this, we introduce WebWalkerQA, a benchmark designed to assess the ability of LLMs to perform web traversal. |
Jialong Wu; Wenbiao Yin; Yong Jiang; Zhenglin Wang; Zekun Xi; Runnan Fang; Linhai Zhang; Yulan He; Deyu Zhou; Pengjun Xie; Fei Huang; |
| 355 | Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While these metrics are considered indicative (even if imperfect), of human evaluation for English, their suitability for other languages remains unclear. To address this, in this paper we systematically assess evaluation metrics for generation — both n-gram-based and neural-based— to assess their effectiveness across languages and tasks. |
Itai Mondshine; Tzuf Paz-Argaman; Reut Tsarfaty; |
| 356 | Unique Hard Attention: A Tale of Two Sides Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When multiple positions achieve the maximum score, either the rightmost or the leftmost of those is chosen. In this paper, we highlight the importance of this seeming triviality. |
Selim Jerad; Anej Svete; Jiaoda Li; Ryan Cotterell; |
| 357 | Gumbel Reranking: Differentiable End-to-End Reranker Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing distillation-based approaches suffer from training-inference misalignment and fail to capture interdependencies among candidate documents. To overcome these limitations, we reframe the reranking process as an attention-mask problem and propose Gumbel Reranking, an end-to-end training framework for rerankers aimed at minimizing the training-inference gap. |
Siyuan Huang; Zhiyuan Ma; Jintao Du; Changhua Meng; Weiqiang Wang; Jingwen Leng; Minyi Guo; Zhouhan Lin; |
| 358 | Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning, EMMA-X. |
Qi Sun; Pengfei Hong; Tej Deep Pala; Vernon Toh; U-Xuan Tan; Deepanway Ghosal; Soujanya Poria; |
| 359 | Learning Sparsity for Effective and Efficient Music Performance Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing Music AVQA methods often rely on dense and unoptimized representations, leading to inefficiencies in the isolation of key information, the reduction of redundancy, and the prioritization of critical samples. To address these challenges, we introduce Sparsify, a sparse learning framework specifically designed for Music AVQA. |
Xingjian Diao; Tianzhen Yang; Chunhui Zhang; Weiyi Wu; Ming Cheng; Jiang Gui; |
| 360 | Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we repurpose a relation extraction dataset (e. g. , Re-DocRED) to design controlled experiments that quantify the impact of heuristic biases, such as a preference for shorter documents, on retrievers like Dragon+ and Contriever. |
Mohsen Fayyaz; Ali Modarressi; Hinrich Schuetze; Nanyun Peng; |
| 361 | MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce MM-Verifier and MM-Reasoner to enhance multimodal reasoning through longer inference and more robust verification. |
Linzhuang Sun; Hao Liang; Jingxuan Wei; Bihui Yu; Tianpeng Li; Fan Yang; Zenan Zhou; Wentao Zhang; |
| 362 | Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe a novel phenomenon, *contextual entrainment*, across a wide range of language models (LMs) and prompt settings, providing a new mechanistic perspective on how LMs become distracted by “irrelevant” contextual information in the input prompt. |
Jingcheng Niu; Xingdi Yuan; Tong Wang; Hamidreza Saghir; Amir H. Abdi; |
| 363 | Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that increasing compute budget at inference time not only helps models answer more questions correctly, but also increases confidence in correct responses. |
William Jurayj; Jeffrey Cheng; Benjamin Van Durme; |
| 364 | Direct Prompt Optimization with Continuous Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we model the prompt optimization problem by the probability distribution of the prompt and present a novel approach that integrates greedy strategies into optimization with continuous representations. |
Yangkun Wang; Zihan Wang; Jingbo Shang; |
| 365 | SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SIFT (Speech Instruction Fine-Tuning), a 50M-example dataset designed for instruction fine-tuning and pre-training of speech-text large language models (LLMs). |
Prabhat Pandey; Rupak Vignesh Swaminathan; K V Vijay Girish; Arunasish Sen; Jian. Xie; Grant Strimel; Andreas Schwarz; |
| 366 | Tree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With the exponential growth of research facilitated by modern technology and improved accessibility, scientific discoveries have become increasingly fragmented within and across fields. This makes it challenging to assess the significance, novelty, incremental findings, and equivalent ideas between related works, particularly those from different research communities. |
Priyanka Kargupta; Ishika Agarwal; Tal August; Jiawei Han; |
| 367 | Synergizing Unsupervised Episode Detection with LLMs for Large-Scale News Events Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel task, **episode detection**, which identifies episodes within a news corpus of key event articles. |
Priyanka Kargupta; Yunyi Zhang; Yizhu Jiao; Siru Ouyang; Jiawei Han; |
| 368 | Beyond True or False: Retrieval-Augmented Hierarchical Analysis of Nuanced Claims Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This enables a more comprehensive, structured response that provides a well-rounded perspective on a given problem while also allowing the reader to prioritize specific angles of interest within the claim (e. g. , safety towards children). Thus, we propose ClaimSpect, a retrieval-augmented generation-based framework for automatically constructing a hierarchy of aspects typically considered when addressing a claim and enriching them with corpus-specific perspectives. |
Priyanka Kargupta; Runchu Tian; Jiawei Han; |
| 369 | TaxoAdapt: Aligning LLM-Based Multidimensional Taxonomy Construction to Evolving Research Corpora Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, these approaches fail to account for the multi-faceted nature of scientific literature, where a single research paper may contribute to multiple dimensions (e. g. , methodology, new tasks, evaluation metrics, benchmarks). To address these gaps, we propose TaxoAdapt, a framework that dynamically adapts an LLM-generated taxonomy to a given corpus across multiple dimensions. |
Priyanka Kargupta; Nan Zhang; Yunyi Zhang; Rui Zhang; Prasenjit Mitra; Jiawei Han; |
| 370 | MDCure: A Scalable Pipeline for Multi-Document Instruction-Following Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While LLMs have improved at processing long inputs, MD contexts still present unique difficulties, including management of inter-document dependencies, redundancy, and incoherent structures. To address this challenge, we introduce MDCure, a scalable and effective instruction data generation framework to enhance the MD capabilities of LLMs without the computational cost of pre-training or reliance on human-annotated data. |
Gabrielle Kaili-May Liu; Bowen Shi; Avi Caciularu; Idan Szpektor; Arman Cohan; |
| 371 | What Matters in Evaluating Book-Length Stories? A Systematic Study of Long Story Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct systematic research in a challenging area: the automatic evaluation of book-length stories (>100K tokens). |
Dingyi Yang; Qin Jin; |
| 372 | SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Key-Value (KV) cache has become a bottleneck of LLMs for long-context generation. Despite the numerous efforts in this area, the optimization for the decoding phase is generally … |
Jialong Wu; Zhenglin Wang; Linhai Zhang; Yilong Lai; Yulan He; Deyu Zhou; |
| 373 | Interpret and Improve In-Context Learning Via The Lens of Input-Label Mappings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the internal mechanisms behind ICL remain under-explored, particularly the mappings between inputs and labels. In this work, we reverse-engineer ICL by examining input-label mappings: what they are within LLMs, where they function, and how LLMs utilize them. |
Chenghao Sun; Zhen Huang; Yonggang Zhang; Le Lu; Houqiang Li; Xinmei Tian; Xu Shen; Jieping Ye; |
| 374 | BRIGHTER: BRIdging The Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present BRIGHTER–a collection of multi-labeled, emotion-annotated datasets in 28 different languages and across several domains. |
Shamsuddeen Hassan Muhammad; Nedjma Ousidhoum; Idris Abdulmumin; Jan Philip Wahle; Terry Ruas; Meriem Beloucif; Christine de Kock; Nirmal Surange; Daniela Teodorescu; Ibrahim Said Ahmad; David Ifeoluwa Adelani; Alham Fikri Aji; Felermino D. M. A. Ali; Ilseyar Alimova; Vladimir Araujo; Nikolay Babakov; Naomi Baes; Ana-Maria Bucur; Andiswa Bukula; Guanqun Cao; Rodrigo Tufiño; Rendi Chevi; Chiamaka Ijeoma Chukwuneke; Alexandra Ciobotaru; Daryna Dementieva; Murja Sani Gadanya; Robert Geislinger; Bela Gipp; Oumaima Hourrane; Oana Ignat; Falalu Ibrahim Lawan; Rooweither Mabuya; Rahmad Mahendra; Vukosi Marivate; Alexander Panchenko; Andrew Piper; Charles Henrique Porto Ferreira; Vitaly Protasov; Samuel Rutunda; Manish Shrivastava; Aura Cristina Udrea; Lilian Diana Awuor Wanzare; Sophie Wu; Florian Valentin Wunderlich; Hanif Muhammad Zhafran; Tianhui Zhang; Yi Zhou; Saif M. Mohammad; |
| 375 | Beyond Output Matching: Bidirectional Alignment for Enhanced In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the finding that the performance of ICL is highly sensitive to the selection of demonstration examples, we propose Bidirectional Alignment (BiAlign) to fully leverage the models’ preferences for ICL examples to improve the ICL abilities of student models. |
Chengwei Qin; Wenhan Xia; Fangkai Jiao; Chen Chen; Yuchen Hu; Bosheng Ding; Ruirui Chen; Shafiq Joty; |
| 376 | Document-Level Text Generation with Minimum Bayes Risk Decoding Using Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the adaptation of Minimum Bayes Risk (MBR) decoding for document-level text generation tasks. |
Yuu Jinnai; |
| 377 | Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gap between validation loss and downstream task capabilities, in this work, we introduce Capability Salience Vector, which decomposes the overall loss and assigns different importance weights to tokens to assess a specific meta-capability, aligning the validation loss with downstream task performance in terms of the model’s capabilities. |
Qiming Ge; Shuhao Xing; Songyang Gao; Yunhua Zhou; Yicheng Zou; Songyang Zhang; Zhi Chen; Hang Yan; Qi Zhang; Qipeng Guo; Kai Chen; |
| 378 | OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods primarily focus on mimicking dialogues among roles in textual form, neglecting the role’s voice traits (e. g. , voice style and emotions) as playing a crucial effect in interaction, which tends to be more immersive experiences in realistic scenarios. Towards this goal, we propose OmniCharacter, a first seamless speech-language personality interaction model to achieve immersive RPAs with low latency. |
Haonan Zhang; Run Luo; Xiong Liu; Yuchuan Wu; Ting-En Lin; Pengpeng Zeng; Qiang Qu; Feiteng Fang; Min Yang; Lianli Gao; Jingkuan Song; Fei Huang; Yongbin Li; |
| 379 | Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose an environment-guided neural-symbolic self-training framework named ENVISIONS. |
Fangzhi Xu; Qiushi Sun; Kanzhi Cheng; Jun Liu; Yu Qiao; Zhiyong Wu; |
| 380 | Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recognizing the intrinsic noise and uncertainty of self-supervision, we propose an advantage-calibrated optimization (ACO) loss function to mitigate estimation inconsistencies. |
Fangzhi Xu; Hang Yan; Chang Ma; Haiteng Zhao; Qiushi Sun; Kanzhi Cheng; Junxian He; Jun Liu; Zhiyong Wu; |
| 381 | 𝜙-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Built on it, we propose a novel decoding strategy, named 𝜙-Decoding. |
Fangzhi Xu; Hang Yan; Chang Ma; Haiteng Zhao; Jun Liu; Qika Lin; Zhiyong Wu; |
| 382 | SkillAggregation: Reference-free LLM-Dependent Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A new method called SkillAggregation is proposed, which learns to combine estimates from LLM judges without needing additional data or ground truth. |
Guangzhi Sun; Anmol Kagrecha; Potsawee Manakul; Phil Woodland; Mark Gales; |
| 383 | Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel approach where LLMs are fine-tuned to generate a sequence of Diverse Chains of Thought (DCoT) within a single inference step, which is fundamentally different from prior work that primarily operate on parallel CoT generations. |
Haritz Puerto; Tilek Chubakov; Xiaodan Zhu; Harish Tayyar Madabushi; Iryna Gurevych; |
| 384 | From Informal to Formal – Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The research in AI-based formal mathematical reasoning has shown an unstoppable growth trend. |
Jialun Cao; Yaojie Lu; Meiziniu Li; Haoyang Ma; Haokun Li; Mengda He; Cheng Wen; Le Sun; Hongyu Zhang; Shengchao Qin; Shing-Chi Cheung; Cong Tian; |
| 385 | Movie101v2: Improved Movie Narration Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unlike standard video captioning, it involves not only describing key visual details but also inferring plots that unfold across multiple movie shots, presenting distinct and complex challenges. To advance this field, we introduce Movie101v2, a large-scale, bilingual dataset with enhanced data quality specifically designed for movie narration. |
Zihao Yue; Yepeng Zhang; Ziheng Wang; Qin Jin; |
| 386 | Advancing Collaborative Debates with Role Differentiation Through Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in some given tasks, obtaining domain knowledge related to task characteristics and getting the strengths of different LLMs is hard. To solve these problems, we propose a Multi-LLM Cooperation (MLC) framework with automatic role assignment capabilities. |
Haoran Li; Ziyi Su; Yun Xue; Zhiliang Tian; Yiping Song; Minlie Huang; |
| 387 | ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although temporal reasoning has raised increasing research attention, comprehensive testing of Allen’s interval relations (e. g. , before, after, during) —a fundamental framework for temporal relationships— remains underexplored. To fill this gap, we present ChronoSense, a new benchmark for evaluating LLMs’ temporal understanding. |
Duygu Sezen Islakoglu; Jan-Christoph Kalo; |
| 388 | Boosting Long-Context Information Seeking Via Query-Guided Activation Refilling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the paper, we propose a method for processing long-context information-seeking tasks via query-guided ACtivation REfilling (ACRE). |
Hongjin Qian; Zheng Liu; Peitian Zhang; Zhicheng Dou; Defu Lian; |
| 389 | ReLearn: Unlearning Via Learning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, existing evaluation metrics overemphasize contextual forgetting while inadequately assessing response fluency and relevance. To address these challenges, we propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning, along with a comprehensive evaluation framework. |
Haoming Xu; Ningyuan Zhao; Liming Yang; Sendong Zhao; Shumin Deng; Mengru Wang; Bryan Hooi; Nay Oo; Huajun Chen; Ningyu Zhang; |
| 390 | The Impact of Token Granularity on The Predictive Power of Language Model Surprisal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One factor that has been overlooked in cognitive modeling is the granularity of subword tokens, which explicitly encodes information about word length and frequency, and ultimately influences the quality of vector representations that are learned. This paper presents experiments that manipulate the token granularity and evaluate its impact on the ability of surprisal to account for processing difficulty of naturalistic text and garden-path constructions. |
Byung-Doh Oh; William Schuler; |
| 391 | Transferring Textual Preferences to Vision-Language Understanding Through Model Merging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores a training-free alternative by merging text-based reward models (RMs) with LVLMs to create VLRMs. |
Chen-An Li; Tzu-Han Lin; Yun-Nung Chen; Hung-yi Lee; |
| 392 | TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although promising, these additional components often add complexity to the training and inference process, contravening the efficiency that PEFT is designed to deliver. Considering this, we introduce an innovative PEFT method, **TeamLoRA**, consisting of a collaboration and competition module for LoRA experts, thus achieving the right balance of effectiveness and efficiency:**(i)** For *collaboration*, we introduce a novel knowledge sharing and organization mechanism designed to optimize hierarchical learning while enhancing the efficiency of model training and inference. |
Tianwei Lin; Jiang Liu; Wenqiao Zhang; Yang Dai; Haoyuan Li; Zhelun Yu; Wanggui He; Juncheng Li; Jiannan Guo; Hao Jiang; Siliang Tang; Yueting Zhuang; |
| 393 | Dolphin: Moving Towards Closed-loop Auto-research Through Thinking, Practice, and Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further move towards the ultimate goal (i. e. , automatic scientific research), in this paper, we introduce Dolphin, a closed-loop LLM-driven framework to enhance the automation level of scientific research. |
Jiakang Yuan; Xiangchao Yan; Bo Zhang; Tao Chen; Botian Shi; Wanli Ouyang; Yu Qiao; Lei Bai; Bowen Zhou; |
| 394 | Pixel-Level Reasoning Segmentation Via Multi-turn Conversations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such systems cannot reason at the pixel level and comprehend dynamic user intent that changes over interaction. Our work tackles this issue by introducing a novel task, Pixel-level Reasoning Segmentation (Pixel-level RS) based on multi-turn conversations, tracking evolving user intent via multi-turn interactions for fine-grained segmentation. |
Dexian Cai; Xiaocui Yang; YongKang Liu; Daling Wang; Shi Feng; Yifei Zhang; Soujanya Poria; |
| 395 | MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop a Multilingual End-to-end Meta-Evaluation RAG benchmark MEMERAG. |
María Andrea Cruz Blandón; Jayasimha Talur; Bruno Charron; Dong Liu; Saab Mansour; Marcello Federico; |
| 396 | Synthesizing Post-Training Data for LLMs Through Multi-Agent Simulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fill this gap, inspired by the recent success of using LLMs to simulate human society, we propose MATRIX, a multi-agent simulator that automatically generates diverse text-based scenarios, capturing a wide range of real-world human needs in a realistic and scalable manner. Leveraging these outputs, we introduce a novel scenario-driven instruction generator MATRIX-Gen for controllable and highly realistic data synthesis. |
Shuo Tang; Xianghe Pang; Zexi Liu; Bohan Tang; Rui Ye; Tian Jin; Xiaowen Dong; Yanfeng Wang; Siheng Chen; |
| 397 | Unanswerability Evaluation for Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce UAEval4RAG, a comprehensive evaluation framework designed to evaluate whether RAG systems effectively handle unanswerable queries specific to a given knowledge base. |
Xiangyu Peng; Prafulla Kumar Choubey; Caiming Xiong; Chien-Sheng Wu; |
| 398 | Personalized Text Generation with Contrastive Activation Steering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While these approaches have advanced the field, they suffer from two critical limitations: (1) the entanglement of content semantics and stylistic patterns in historical texts impedes accurate modeling of user-specific writing preferences; and (2) scalability challenges arising from both RAG’s inference latency by retrieval operations and PEFT’s parameter storage requirements for per user model. To overcome these limitations, we propose StyleVector, a training-free framework that disentangles and represents personalized writing style as a vector in LLM’s activation space, enabling style-steered generation during inference without requiring costly retrieval or parameter storage. |
Jinghao Zhang; Yuting Liu; Wenjie Wang; Qiang Liu; Shu Wu; Liang Wang; Tat-Seng Chua; |
| 399 | Towards Harmonized Uncertainty Estimation for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent efforts have made significant advancements by leveraging the internal logic and linguistic features of LLMs to estimate uncertainty scores, our empirical analysis highlights the pitfalls of these methods to strike a harmonized estimation between indication, balance, and calibration, which hinders their broader capability for accurate uncertainty estimation. To address this challenge, we propose CUE (Corrector for Uncertainty Estimation): A straightforward yet effective method that employs a lightweight model trained on data aligned with the target LLM’s performance to adjust uncertainty scores. |
Rui Li; Jing Long; Muge Qi; Heming Xia; Lei Sha; Peiyi Wang; Zhifang Sui; |
| 400 | AlignDistil: Token-Level Language Model Alignment As Adaptive Policy Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The ignorance of token-level rewards may erroneously punish high-quality tokens or encourage low-quality tokens, resulting in suboptimal performance and slow convergence speed. To address this issue, we propose AlignDistil, a RLHF-equivalent distillation method for token-level reward optimization. |
Songming Zhang; Xue Zhang; Tong Zhang; Bojie Hu; Yufeng Chen; Jinan Xu; |
| 401 | CritiQ: Mining Data Quality Criteria from Human Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce CritiQ, a novel data selection method that automatically mines criteria from human preferences for data quality with only ~30 human-annotated pairs and performs efficient data selection. |
Honglin Guo; Kai Lv; Qipeng Guo; Tianyi Liang; Zhiheng Xi; Demin Song; Qiuyinzhe Zhang; Yu Sun; Kai Chen; Xipeng Qiu; Tao Gui; |
| 402 | A Spatio-Temporal Point Process for Fine-Grained Modeling of Reading Behavior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Standard modeling approaches, however, overlook much of the spatio-temporal dynamics involved in reading by relying on aggregated reading measurements—typically only focusing on fixation durations—and employing models with strong simplifying assumptions. In this paper, we propose a generative model that captures not only how long fixations last, but also where they land and when they occur. |
Francesco Ignazio Re; Andreas Opedal; Glib Manaiev; Mario Giulianelli; Ryan Cotterell; |
| 403 | Structure-aware Domain Knowledge Injection for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a pioneering methodology, termed StructTuning, to efficiently transform foundation Large Language Models (LLMs) into domain specialists. |
Kai Liu; Ze Chen; Zhihang Fu; Wei Zhang; Rongxin Jiang; Fan Zhou; Yaowu Chen; Yue Wu; Jieping Ye; |
| 404 | SocialEval: Evaluating Social Intelligence of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts. |
Jinfeng Zhou; Yuxuan Chen; Yihan Shi; Xuanming Zhang; Leqi Lei; Yi Feng; Zexuan Xiong; Miao Yan; Xunzhi Wang; Yaru Cao; Jianing Yin; Shuai Wang; Quanyu Dai; Zhenhua Dong; Hongning Wang; Minlie Huang; |
| 405 | EcomScriptBench: A Multi-task Benchmark for E-commerce Script Planning Via Step-wise Intention-Driven Product Association Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we step forward by formally defining the task of E-commerce Script Planning (EcomScript) as three sequential subtasks. |
Weiqi Wang; Limeng Cui; Xin Liu; Sreyashi Nag; Wenju Xu; Chen Luo; Sheikh Muhammad Sarwar; Yang Li; Hansu Gu; Hui Liu; Changlong Yu; Jiaxin Bai; Yifan Gao; Haiyang Zhang; Qi He; Shuiwang Ji; Yangqiu Song; |
| 406 | MARS: Benchmarking The Metaphysical Reasoning Abilities of Language Models with A Multi-task Evaluation Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its fundamental significance, this ability remains underexplored due to the complexity of modeling infinite possible changes in an event and their associated distributions, coupled with the lack of benchmark data with situational transitions. Addressing these gaps, we propose a novel formulation of ***reasoning with distributional changes as a three-step discriminative process***, termed as ***MetAphysical ReaSoning***. |
Weiqi Wang; Yangqiu Song; |
| 407 | STaR-SQL: Self-Taught Reasoner for Text-to-SQL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Self-Taught Reasoner for text-to-SQL (STaR-SQL), a novel approach that reframes SQL query generation as a reasoning-driven process. |
Mingqian He; Yongliang Shen; Wenqi Zhang; Qiuying Peng; Jun Wang; Weiming Lu; |
| 408 | Causal Estimation of Tokenisation Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we quantify one particular type of tokenisation bias: the effect of including or not a subword (e. g. , ⟨ hello ⟩) in a tokeniser’s vocabulary on the probability a trained model assigns to the corresponding characters (i. e. , “hello”). |
Pietro Lesci; Clara Meister; Thomas Hofmann; Andreas Vlachos; Tiago Pimentel; |
| 409 | OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work identifies the key ingredients for building a top-tier code LLM: optimized heuristic rules for data cleaning and deduplication, effective recall of code-related text corpus, and high-quality synthetic data for both annealing and supervised fine-tuning stages. By offering this level of openness, we aim to broaden access to all aspects of a top-tier code LLM, with OpenCoder serving as both a powerful model and an open foundation to accelerate research and enable reproducible advancements in code intelligence. |
Siming Huang; Tianhao Cheng; Jason Klein Liu; Weidi Xu; Jiaran Hao; Liuyihan Song; Yang Xu; Jian Yang; Jiaheng Liu; Chenchen Zhang; Linzheng Chai; Ruifeng Yuan; Xianzhen Luo; Qiufeng Wang; YuanTao Fan; Qingfu Zhu; Zhaoxiang Zhang; Yang Gao; Jie Fu; Qian Liu; Houyi Li; Ge Zhang; Yuan Qi; Xu Yinghui; Wei Chu; Zili Wang; |
| 410 | CER: Confidence Enhanced Reasoning in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce an uncertainty-aware framework designed to enhance the accuracy of LLM responses by systematically incorporating model confidence at critical decision points. |
Ali Razghandi; Seyed Mohammad Hadi Hosseini; Mahdieh Soleymani Baghshah; |
| 411 | Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces the Align-SLM framework, which leverages preference optimization inspired by Reinforcement Learning with Human Feedback (RLHF) to enhance the semantic understanding of SLMs. |
Guan-Ting Lin; Prashanth Gurunath Shivakumar; Aditya Gourav; Yile Gu; Ankur Gandhe; Hung-yi Lee; Ivan Bulyko; |
| 412 | Estimating Privacy Leakage of Augmented Contextual Knowledge in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce context influence, a metric that builds on differential privacy, a widely-adopted privacy notion, to estimate the privacy leakage of contextual knowledge during decoding. |
James Flemings; Bo Jiang; Wanrong Zhang; Zafar Takhirov; Murali Annavaram; |
| 413 | Visual Evidence Prompting Mitigates Hallucinations in Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our investigation suggests that such hallucinations often stem from the deficiencies in fine-grained comprehension on the visual aspect, particularly when visual scenes exhibit appearance or semantic similarities (e. g. , bicycle vs. motorcycles, baseball bat vs. baseball). In this work, we show such hallucination is naturally mitigated via a novel method called visual evidence prompting, utilizing small visual models to complement the LVLMs. |
Wei Li; Zhen Huang; Houqiang Li; Le Lu; Yang Lu; Xinmei Tian; Xu Shen; Jieping Ye; |
| 414 | \mathsf{Con Instruction}: Universal Jailbreaking of Multimodal Large Language Models Via Non-Textual Modalities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To evaluate whether an attack is successful, we introduce a new attack response categorization (ARC) that considers the response quality and relevancy concerning the malicious instruction. |
Jiahui Geng; Thy Thy Tran; Preslav Nakov; Iryna Gurevych; |
| 415 | AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate various LLM-based evaluation methods on AbGen-Eval, providing insights for future research on developing more effective and reliable LLM-based evaluation systems for complex scientific tasks. |
Yilun Zhao; Weiyuan Chen; Zhijian Xu; Manasi Patwardhan; Chengye Wang; Yixin Liu; Lovekesh Vig; Arman Cohan; |
| 416 | TESS 2: A Large-Scale Generalist Diffusion Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce TESS 2, a general instruction-following diffusion language model that outperforms contemporary instruction-tuned diffusion models, as well as matches and sometimes exceeds strong autoregressive (AR) models. |
Jaesung Tae; Hamish Ivison; Sachin Kumar; Arman Cohan; |
| 417 | Pandora’s Box or Aladdin’s Lamp: A Comprehensive Analysis Revealing The Role of RAG Noise in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we define seven distinct noise types from a linguistic perspective and establish a Noise RAG Benchmark (NoiserBench), a comprehensive evaluation framework encompassing multiple datasets and reasoning tasks. |
Jinyang Wu; Shuai Zhang; Feihu Che; Mingkuan Feng; Pengpeng Shao; Jianhua Tao; |
| 418 | Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models’ Uncertainty? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it remains unclear whether LLMs consistently use these markers to reflect their intrinsic confidence due to the difficulty of quantifying uncertainty associated with various markers. To address this gap, we first define ***marker confidence*** as the observed accuracy when a model employs an epistemic marker. We evaluate its stability across multiple question-answering datasets in both in-distribution and out-of-distribution settings for open-source and proprietary LLMs. |
Jiayu Liu; Qing Zong; Weiqi Wang; Yangqiu Song; |
| 419 | TC–RAG: Turing–Complete RAG’s Case Study on Medical LLM Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches to RAG fall short by neglecting system state variables, which are crucial for ensuring adaptive control, retrieval halting, and system convergence. In this paper, we introduce the Turing-Complete-RAG (TC-RAG) through rigorous proof, a novel framework that addresses these challenges by incorporating a Turing Complete System to manage state variables, thereby enabling more efficient and accurate knowledge retrieval. |
Xinke Jiang; Yue Fang; Rihong Qiu; Haoyu Zhang; Yongxin Xu; Hao Chen; Wentao Zhang; Ruizhe Zhang; Yuchen Fang; Xinyu Ma; Xu Chu; Junfeng Zhao; Yasha Wang; |
| 420 | HyKGE: A Hypothesis Knowledge Graph Enhanced RAG Framework for Accurate and Reliable Medical LLMs Responses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the retrieval-augmented generation (RAG) based on Knowledge Graphs (KGs) to improve the accuracy and reliability of Large Language Models (LLMs). |
Xinke Jiang; Ruizhe Zhang; Yongxin Xu; Rihong Qiu; Yue Fang; Zhiyuan Wang; Jinyi Tang; Hongxin Ding; Xu Chu; Junfeng Zhao; Yasha Wang; |
| 421 | Confidence V.s. Critique: A Decomposition of Self-Correction Capability for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To have a deeper understanding of self-correction, we endeavor to decompose, evaluate, and analyze the self-correction behaviors of LLMs. By enumerating and analyzing answer correctness before and after self-correction, we decompose the self-correction capability into confidence (being confident to correct answers) and critique (turning wrong answers to correct) capabilities, and propose two metrics from a probabilistic perspective to measure these 2 capabilities, along with another metric for overall self-correction capability evaluation. |
Zhe Yang; Yichang Zhang; Yudong Wang; Ziyao Xu; Junyang Lin; Zhifang Sui; |
| 422 | Are Rules Meant to Be Broken? Understanding Multilingual Moral Reasoning As A Computational Pipeline with UniMoral Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While natural language processing (NLP) offers promising tools for studying this phenomenon, current research lacks cohesion, employing discordant datasets and tasks that examine isolated aspects of moral reasoning. We bridge this gap with UniMoral, a unified dataset integrating psychologically grounded and social-media-derived moral dilemmas annotated with labels for action choices, ethical principles, contributing factors, and consequences, alongside annotators� moral and cultural profiles. |
Shivani Kumar; David Jurgens; |
| 423 | Mitigating Posterior Salience Attenuation in Long-Context LLMs with Positional Contrastive Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by it, we propose the training-free Positional Contrastive Decoding (PCD) that contrasts the logits derived from long-aware attention with those from designed local-aware attention, enabling the model to focus on the gains introduced by large-scale short-to-long training. |
Zikai Xiao; Ziyang Wang; Wen Ma; Yan Zhang; Wei Shen; WangYan WangYan; Luqi Gong; Zuozhu Liu; |
| 424 | Pre-Training Curriculum for Multi-Token Prediction in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, prior work has shown that smaller language models (SLMs) struggle with the MTP objective. To address this, we propose a curriculum learning strategy for MTP training, exploring two variants: a forward curriculum, which gradually increases the complexity of the pre-training objective from NTP to MTP, and a reverse curriculum, which does the opposite. |
Ansar Aynetdinov; Alan Akbik; |
| 425 | Guiding Not Forcing: Enhancing The Transferability of Jailbreaking Attacks on LLMs Via Removing Superfluous Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through a detailed analysis of the optimization process, we introduce a novel conceptual framework to elucidate transferability and identify superfluous constraints—specifically, the response pattern constraint and the token tail constraint—as significant barriers to improved transferability. |
Junxiao Yang; Zhexin Zhang; Shiyao Cui; Hongning Wang; Minlie Huang; |
| 426 | CoT-based Synthesizer: Enhancing LLM Performance Through Answer Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel inference scaling strategy, CoT-based Synthesizer, which leverages CoT reasoning to synthesize superior answers by analyzing complementary information from multiple candidate responses, even when all candidates are flawed. |
Bohan Zhang; Xiaokang Zhang; Jing Zhang; Jifan Yu; Sijia Luo; Jie Tang; |
| 427 | Segment-Based Attention Masking for GPTs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, attention is masked based on the known block structure at the prefill phase, followed by the conventional token-by-token autoregressive process after that. |
Shahar Katz; Liran Ringel; Yaniv Romano; Lior Wolf; |
| 428 | MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce MegaPairs, a novel data synthesis method that leverages vision language models (VLMs) and open-domain images, together with a massive synthetic dataset generated from this method. |
Junjie Zhou; Yongping Xiong; Zheng Liu; Ze Liu; Shitao Xiao; Yueze Wang; Bo Zhao; Chen Jason Zhang; Defu Lian; |
| 429 | CFBench: A Comprehensive Constraints-Following Benchmark for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing evaluations mainly focus on fragmented constraints or narrow scenarios, but they overlook the comprehensiveness and authenticity of constraints from the user’s perspective. To bridge this gap, we propose CFBench, a large-scale Chinese Comprehensive Constraints Following Benchmark for LLMs, featuring 1,000 curated samples that cover more than 200 real-life scenarios and over 50 NLP tasks. |
Tao Zhang; ChengLIn Zhu; Yanjun Shen; Wenjing Luo; Yan Zhang; Hao Liang; Tao Zhang; Fan Yang; Mingan Lin; Yujing Qiao; Weipeng Chen; Bin Cui; Wentao Zhang; Zenan Zhou; |
| 430 | Vulnerability of LLMs to Vertically Aligned Text Manipulations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the impact of vertical text input on the performance of various LLMs across multiple text classification datasets and analyze the underlying causes. |
Zhecheng Li; Yiwei Wang; Bryan Hooi; Yujun Cai; Zhen Xiong; Nanyun Peng; Kai-Wei Chang; |
| 431 | Whose Boat Does It Float? Improving Personalization in Preference Tuning Via Inferred User Personas Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such preference data do not convey *why* users prefer responses that are chosen or rejected, so LLMs trained on these datasets cannot tailor responses to varied user needs. To surface these parameters of personalization, we apply *abductive reasoning* to preference data, inferring needs and interests of users, i. e. , personas, that may prefer either response. |
Nishant Balepur; Vishakh Padmakumar; Fumeng Yang; Shi Feng; Rachel Rudinger; Jordan Lee Boyd-Graber; |
| 432 | Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of The Above Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In each issue, we give fixes from education, like rubrics to guide MCQ writing; scoring methods to bridle guessing; and Item Response Theory to build harder MCQs. |
Nishant Balepur; Rachel Rudinger; Jordan Lee Boyd-Graber; |
| 433 | A Reality Check on Context Utilisation for Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce DRUID (Dataset of Retrieved Unreliable, Insufficient and Difficult-to-understand contexts) with real-world queries and contexts manually annotated for stance. |
Lovisa Hagström; Sara Vera Marjanovic; Haeun Yu; Arnav Arora; Christina Lioma; Maria Maistro; Pepa Atanasova; Isabelle Augenstein; |
| 434 | GPT-4 As A Homework Tutor Can Improve Student Engagement and Learning Outcomes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work contributes to the scarce empirical literature on LLM-based interactive homework in real-world educational settings and offers a practical, scalable solution to improve homework in schools. |
Alessandro Vanzo; Sankalan Pal Chowdhury; Mrinmaya Sachan; |
| 435 | LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent findings reveal that much of the knowledge in a Transformer-based Large Language Model (LLM) is encoded in its feed-forward (FFN) layers, where each FNN layer can be interpreted as the summation of sub-updates, each corresponding to a weighted column vector from the FFN’s value parameter matrix that often encodes human-interpretable concepts. In light of this, we hypothesize that model performance and behaviors can be further enhanced and controlled by modulating the contributions of these sub-updates based on their relevance to the input or target output style, and propose LLMBraces, a novel and efficient method that computes relevance scores associated with value vectors in FFN layers and leverages these scores to dynamically adjust the contribution of sub-updates. |
Ying Shen; Lifu Huang; |
| 436 | Towards Multi-System Log Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, these models often encounter the **“identical shortcut”** predicament, erroneously predicting normal classes when confronted with rare anomaly logs due to reconstruction errors. To address these issues, we propose **MLAD**, a novel **M**ulti-system **L**og **A**nomaly **D**etection model incorporating semantic relational reasoning. |
Boyang Wang; Runqiang Zang; Hongcheng Guo; Shun Zhang; Shaosheng Cao; Donglin Di; Zhoujun Li; |
| 437 | Enhancing Text Editing for Grammatical Error Correction: Arabic As A Case Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a text editing approach that derives edit tags directly from data, eliminating the need for language-specific edits. |
Bashar Alhafni; Nizar Habash; |
| 438 | Neuron-Level Sequential Editing for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing model editing methods, especially those that alter model parameters, typically focus on single-round editing and often face significant challenges in sequential model editing-most notably issues of model forgetting and failure. To address these challenges, we introduce a new model editing method, namely Neuron-level Sequential Editing (NSE), tailored for supporting sequential model editing. |
Houcheng Jiang; Junfeng Fang; Tianyu Zhang; Baolong Bi; An Zhang; Ruipeng Wang; Tao Liang; Xiang Wang; |
| 439 | T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation Via Fine-grained AI Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current state-of-the-art T2A models still struggle to satisfy human preferences for prompt-following and acoustic quality when generating complex multi-event audio. To improve the performance of the model in these high-level applications, we propose to enhance the basic capabilities of the model with AI feedback learning. |
Zehan Wang; Ke Lei; Chen Zhu; Jiawei Huang; Sashuai Zhou; Luping Liu; Xize Cheng; Shengpeng Ji; Zhenhui Ye; Tao Jin; Zhou Zhao; |
| 440 | CULEMO: Cultural Lenses on Emotion – Benchmarking LLMs for Cross-Cultural Emotion Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing emotion benchmarks suffer fromtwo major shortcomings: (1) they largely rely on keyword-based emotion recognition, overlooking crucial cultural dimensions required fordeeper emotion understanding, and (2) many are created by translating English-annotated data into other languages, leading to potentially unreliable evaluation. To address these issues, we introduce Cultural Lenses on Emotion (CuLEmo), the first benchmark designedto evaluate culture-aware emotion prediction across six languages: Amharic, Arabic, English, German, Hindi, and Spanish. |
Tadesse Destaw Belay; Ahmed Haj Ahmed; Alvin C Grissom Ii; Iqra Ameer; Grigori Sidorov; Olga Kolesnikova; Seid Muhie Yimam; |
| 441 | Game Development As Human-LLM Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a data synthesis pipeline based on LLM to generate game script-code pairs and interactions from a few manually crafted seed data. |
Jiale Hong; Hongqiu Wu; Hai Zhao; |
| 442 | LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify two root causes: neuron misidentification due to simplistic parameter magnitude-based selection, and cross-task neuron interference during merging. To address these challenges, we propose LED-Merging, a three-stage framework that Locates task-specific neurons via gradient-based attribution, dynamically Elects critical neurons through multi-model importance fusion, and Disjoints conflicting updates through parameter isolation. |
Qianli Ma; Dongrui Liu; Qian Chen; Linfeng Zhang; Jing Shao; |
| 443 | IRIS: An Iterative and Integrated Framework for Verifiable Causal Discovery in The Absence of Tabular Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent LLM-based methods excel at identifying commonly known causal relations, they fail to uncover novel relations. We introduce IRIS (Iterative Retrieval and Integrated System for Real-Time Causal Discovery), a novel framework that addresses these limitations. |
Tao Feng; Lizhen Qu; Niket Tandon; Gholamreza Haffari; |
| 444 | On The Reliability of Large Language Models for Causal Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study investigates the efficacy of Large Language Models (LLMs) in causal discovery. |
Tao Feng; Lizhen Qu; Niket Tandon; Zhuang Li; Xiaoxi Kang; Gholamreza Haffari; |
| 445 | FlashAudio: Rectified Flow for Fast and High-Fidelity Text-to-Audio Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce FlashAudio with rectified flows to learn straight flow for fast simulation. |
Huadai Liu; Jialei Wang; Rongjie Huang; Yang Liu; Heng Lu; Zhou Zhao; Wei Xue; |
| 446 | UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a benchmark to evaluate whether video-large language models (Video-LLMs) can naturally process continuous first-person visual observations like humans, enabling recall, perception, reasoning, and navigation. |
Baining Zhao; Jianjie Fang; Zichao Dai; Ziyou Wang; Jirong Zha; Weichen Zhang; Chen Gao; Yue Wang; Jinqiang Cui; Xinlei Chen; Yong Li; |
| 447 | LongSafety: Evaluating Long-Context Safety of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the safety of LLMs in long-context tasks remains under-explored, leaving a significant gap in both evaluation and improvement of their safety. To address this, we introduce LongSafety, the first comprehensive benchmark specifically designed to evaluate LLM safety in open-ended long-context tasks. |
Yida Lu; Jiale Cheng; Zhexin Zhang; Shiyao Cui; Cunxiang Wang; Xiaotao Gu; Yuxiao Dong; Jie Tang; Hongning Wang; Minlie Huang; |
| 448 | PrivaCI-Bench: Evaluating Privacy with Contextual Integrity and Legal Compliance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we follow the merit of the Contextual Integrity (CI) theory, which posits that privacy evaluation should not only cover the transmitted attributes but also encompass the whole relevant social context through private information flows. |
Haoran Li; Wenbin Hu; Huihao Jing; Yulin Chen; Qi Hu; Sirui Han; Tianshu Chu; Peizhao Hu; Yangqiu Song; |
| 449 | DNASpeech: A Contextualized and Situated Text-to-Speech Dataset with Dialogues, Narratives and Actions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose contextualized and situated text-to-speech (CS-TTS), a novel TTS task to promote more accurate and customized speech generation using prompts with Dialogues, Narratives, and Actions (DNA). |
Chuanqi Cheng; Hongda Sun; Bo Du; Shuo Shang; Xinrong Hu; Rui Yan; |
| 450 | Extending Complex Logical Queries on Uncertain Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The study of machine learning-based logical query-answering enables reasoning with large-scale and incomplete knowledge graphs. |
Weizhi Fei; Zihao Wang; Hang Yin; Yang Duan; Yangqiu Song; |
| 451 | METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we build a vision-language model (VLM) based multi-agent framework for effective automatic chart generation. |
Bingxuan Li; Yiwei Wang; Jiuxiang Gu; Kai-Wei Chang; Nanyun Peng; |
| 452 | SConU: Selective Conformal Uncertainty in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach termed Selective Conformal Uncertainty (SConU), which, for the first time, implements significance tests, by developing two conformal p-values that are instrumental in determining whether a given sample deviates from the uncertainty distribution of the calibration set at a specific manageable risk level. |
Zhiyuan Wang; Qingni Wang; Yue Zhang; Tianlong Chen; Xiaofeng Zhu; Xiaoshuang Shi; Kaidi Xu; |
| 453 | Improving Model Factuality with Fine-grained Critique-based Evaluator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Factuality evaluation aims to detect factual errors produced by language models (LMs) and hence guide the development of more factual models. Towards this goal, we train a factuality evaluator, FenCE, that provides LM generators with claim-level factuality feedback. |
Yiqing Xie; Wenxuan Zhou; Pradyot Prakash; Di Jin; Yuning Mao; Quintin Fettes; Arya Talebzadeh; Sinong Wang; Han Fang; Carolyn Rose; Daniel Fried; Hejia Zhang; |
| 454 | Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Document Image Machine Translation (DIMT) aims to translate text within document images, facing generalization challenges due to limited training data and the complex interplay between visual and textual information. To address these challenges, we introduce M4Doc, a novel single-to-mix Modality alignment framework leveraging Multimodal Large Language Models (MLLMs). |
Yupu Liang; Yaping Zhang; Zhiyang Zhang; Yang Zhao; Lu Xiang; Chengqing Zong; Yu Zhou; |
| 455 | Optimizing Question Semantic Space for Dynamic Retrieval-Augmented Multi-hop Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Optimizing Question Semantic Space for Dynamic Retrieval-Augmented Multi-hop Question Answering (Q-DREAM). |
Linhao Ye; Lang Yu; Zhikai Lei; Qin Chen; Jie Zhou; Liang He; |
| 456 | GIFT-SW: Gaussian Noise Injected Fine-Tuning of Salient Weights for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent studies have shown that a small subset of weights significantly impacts performance. Based on this observation, we introduce a novel PEFT method, called Gaussian noise Injected Fine Tuning of Salient Weights (GIFT-SW). |
Maxim Zhelnin; Viktor Moskvoretskii; Egor Shvetsov; Maria Krylova; Venediktov Egor; Zuev Aleksandr; Evgeny Burnaev; |
| 457 | Just Go Parallel: Improving The Multilingual Capabilities of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct a systematic study on the impact of adding parallel data on LLMs’ multilingual capabilities, focusing specifically on translation and multilingual common-sense reasoning. |
Muhammad Reza Qorib; Junyi Li; Hwee Tou Ng; |
| 458 | Enhancing Chain-of-Thought Reasoning with Critical Representation Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate applying ReFT to complex reasoning tasks. |
Chenxi Huang; Shaotian Yan; Liang Xie; Binbin Lin; Sinan Fan; Yue Xin; Deng Cai; Chen Shen; Jieping Ye; |
| 459 | Dynamic Parallel Tree Search for Efficient LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The challenges of accelerating the ToT lie in the frequent switching of reasoning focus, and the redundant exploration of suboptimal solutions. To alleviate this dilemma, we propose Dynamic Parallel Tree Search (DPTS), a novel parallelism framework that aims to dynamically optimize the reasoning path in inference. |
Yifu Ding; Wentao Jiang; Shunyu Liu; Yongcheng Jing; Jinyang Guo; Yingjie Wang; Jing Zhang; Zengmao Wang; Ziwei Liu; Bo Du; Xianglong Liu; Dacheng Tao; |
| 460 | Recurrent Knowledge Identification and Fusion for Language Model Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Recurrent-KIF, a novel CL framework for Recurrent Knowledge Identification and Fusion, which enables dynamic estimation of parameter importance distributions to enhance knowledge transfer. |
Yujie Feng; Xujia Wang; Zexin Lu; Shenghong Fu; Guangyuan Shi; Yongxin Xu; Yasha Wang; Philip S. Yu; Xu Chu; Xiao-Ming Wu; |
| 461 | TARGA: Targeted Synthetic Data Generation for Practical Reasoning Over Structured Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods encounter two significant challenges: reliance on extensive manually annotated datasets and limited generalization capability to unseen examples. To tackle these issues, we propose Targeted Synthetic Data Generation (Targa), a practical framework that dynamically generates high-relevance synthetic data without manual annotation. |
Xiang Huang; Jiayu Shen; Shanshan Huang; Sitao Cheng; Xiaxia Wang; Yuzhong Qu; |
| 462 | CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current Large Language Models (LLMs) face limitations in these specialized domains, highlighting the need for the development of comprehensive datasets that can assess, continuously update, and progressively improve these culturally-grounded linguistic competencies through targeted training optimizations. To address this gap, we introduce CKnowEdit, the first-ever Chinese knowledge editing dataset designed to correct linguistic, factual, and logical errors in LLMs. |
Jizhan Fang; Tianhe Lu; Yunzhi Yao; Ziyan Jiang; Xin Xu; Huajun Chen; Ningyu Zhang; |
| 463 | MMDEND: Dendrite-Inspired Multi-Branch Multi-Compartment Parallel Spiking Neuron for Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though parallel spiking neurons are an efficient solution, their number of parameters is often tied to the hidden dimension or sequence length, which makes current parallel neurons unsuitable for large architectures. To address these issues, we propose **MMDEND**: a Multi-Branch Multi-Compartment Parallel Spiking Dendritic Neuron. |
Kexin Wang; Yuhong Chou; Di Shang; Shijie Mei; Jiahong Zhang; Yanbin Huang; Man Yao; Bo Xu; Guoqi Li; |
| 464 | On The Risk of Evidence Pollution for Malicious Social Text Detection in The Era of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate the negative impact, we propose three defense strategies from the data and model sides, including machine-generated text detection, a mixture of experts, and parameter updating. |
Herun Wan; Minnan Luo; Zhixiong Su; Guang Dai; Xiang Zhao; |
| 465 | Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The capabilities of recent large language models (LLMs) to generate high-quality content indistinguishable by humans from human-written texts raises many concerns regarding their … |
Aneta Zugecova; Dominik Macko; Ivan Srba; Robert Moro; Jakub Kopál; Katarína Marcinčinová; Matúš Mesarčík; |
| 466 | ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these, we propose an aggregation algorithm that ensures identifiability (asymptotically recovering ground-truth scores) and rapid convergence, enabling accurate model comparisons with relatively little data. |
Adhiraj Ghosh; Sebastian Dziadzio; Ameya Prabhu; Vishaal Udandarao; Samuel Albanie; Matthias Bethge; |
| 467 | UniRAG: Unified Query Understanding Method for Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose UniRAG, a unified framework for query understanding in RAG. |
Rui Li; Liyang He; Qi Liu; Zheng Zhang; Heng Yu; Yuyang Ye; Linbo Zhu; Yu Su; |
| 468 | YuLan-Mini: Pushing The Limits of Open Data-efficient Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the key bottlenecks and designs during pre-training, and make the following contributions: (1) a comprehensive investigation into the factors contributing to training instability; (2) a robust optimization approach designed to mitigate training instability effectively; (3) an elaborate data pipeline that integrates data synthesis, data curriculum, and data selection. |
Hu Yiwen; Huatong Song; Jie Chen; Jia Deng; Jiapeng Wang; Kun Zhou; Yutao Zhu; Jinhao Jiang; Zican Dong; Yang Lu; Xu Miao; Xin Zhao; Ji-Rong Wen; |
| 469 | Activation Steering Decoding: Mitigating Hallucination in Large Vision-Language Models Through Bidirectional Hidden State Intervention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore a novel perspective on hallucination mitigation by examining the intermediate activations of LVLMs during generation. |
Jingran Su; Jingfan Chen; Hongxin Li; Yuntao Chen; Li Qing; Zhaoxiang Zhang; |
| 470 | Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel task to evaluate the robust understanding capability of Large Multimodal Models (LMMs), termed Unsolvable Problem Detection (UPD). |
Atsuyuki Miyai; Jingkang Yang; Jingyang Zhang; Yifei Ming; Qing Yu; Go Irie; Yixuan Li; Hai Helen Li; Ziwei Liu; Kiyoharu Aizawa; |
| 471 | Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide experimental results on the BEIR dataset using the open-source Lucene search library that explicate the tradeoffs between HNSW and flat indexes (including quantized variants) from the perspectives of indexing time, query evaluation performance, and retrieval quality. |
Jimmy Lin; |
| 472 | Retrofitting Large Language Models with Dynamic Tokenization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This default choice typically results in degraded efficiency and language capabilities, especially in languages other than English. To address this issue, we challenge the static design and propose retrofitting LMs with dynamic tokenization: a way to dynamically decide on token boundaries based on the input text via a subword-merging algorithm inspired by byte-pair encoding. |
Darius Feher; Ivan Vulić; Benjamin Minixhofer; |
| 473 | Revisit Self-Debugging with Self-Generated Tests for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose and analyze two distinct paradigms for the self-debugging process: post-execution and in-execution self-debugging. |
Xiancai Chen; Zhengwei Tao; Kechi Zhang; Changzhi Zhou; Xinyu Zhang; Wanli Gu; Yuanpeng He; Mengdi Zhang; Xunliang Cai; Haiyan Zhao; Zhi Jin; |
| 474 | Unveiling Language-Specific Features in Large Language Models Via Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel metric to assess the monolinguality of features obtained from SAEs, discovering that some features are strongly related to specific languages. |
Boyi Deng; Yu Wan; Baosong Yang; Yidan Zhang; Fuli Feng; |
| 475 | A Systematic Study of Compositional Syntactic Transformer Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify key aspects of design choices in existing compositional SLMs and propose a unified framework encompassing both existing models and novel variants. |
Yida Zhao; Hao Xve; Xiang Hu; Kewei Tu; |
| 476 | ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Preliminary industrial applications of LLMs for operations research face two critical deployment challenges: 1) Self-correction focuses on code syntax rather than mathematical accuracy, causing costly errors; 2) Complex expert selection creates unpredictable workflows that reduce transparency and increase maintenance costs, making them impractical for time-sensitive business applications. To address these business limitations, we introduce ORMind, a cognitive-inspired framework that enhances optimization through counterfactual reasoning. |
Zhiyuan Wang; Bokui Chen; Yinya Huang; Qingxing Cao; Ming He; Jianping Fan; Xiaodan Liang; |
| 477 | Mind The Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal Gestures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce Multi-Cultural Set of Inappropriate Gestures and Nonverbal Signs (MC-SIGNS), a dataset of 288 gesture-country pairs annotated for offensiveness, cultural significance, and contextual factors across 25 gestures and 85 countries. |
Akhila Yerukola; Saadia Gabriel; Nanyun Peng; Maarten Sap; |
| 478 | Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How to intervene on such system outputs to mitigate anthropomorphic behaviors and their attendant harmful outcomes, however, remains understudied. With this work, we aim to provide empirical and theoretical grounding for developing such interventions. |
Myra Cheng; Su Lin Blodgett; Alicia DeVrio; Lisa Egede; Alexandra Olteanu; |
| 479 | Pragmatics in The Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To advance pragmatic abilities in models, it is essential to understand current evaluation trends and identify existing limitations. In this survey, we provide a comprehensive review of resources designed for evaluating pragmatic capabilities in NLP, categorizing datasets by the pragmatic phenomena they address. |
Bolei Ma; Yuting Li; Wei Zhou; Ziwei Gong; Yang Janet Liu; Katja Jasinskaja; Annemarie Friedrich; Julia Hirschberg; Frauke Kreuter; Barbara Plank; |
| 480 | Infogen: Generating Complex Statistical Infographics from Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Infogen, a two-stage framework where fine-tuned LLMs first generate metadata, which is then converted into infographic code. |
Akash Ghosh; Aparna Garimella; Pritika Ramu; Sambaran Bandyopadhyay; Sriparna Saha; |
| 481 | KiRAG: Knowledge-Driven Iterative Retriever for Enhancing Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their retrieval processes face two key challenges: (1) they can be disrupted by irrelevant documents or factually inaccurate chain-of-thoughts; (2) their retrievers are not designed to dynamically adapt to the evolving information needs in multi-step reasoning, making it difficult to identify and retrieve the missing information required at each iterative step. Therefore, we propose KiRAG, which uses a knowledge-driven iterative retriever model to enhance the retrieval process of iRAG. |
Jinyuan Fang; Zaiqiao Meng; Craig MacDonald; |
| 482 | CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, their judgments may become inconsistent when the option positions or ID tokens are swapped, compromising the effectiveness and fairness of the evaluation result. To address this challenge, we introduce CalibraEval, a novel label-free method for mitigating selection bias during inference. |
Haitao Li; Junjie Chen; Qingyao Ai; Zhumin Chu; Yujia Zhou; Qian Dong; Yiqun Liu; |
| 483 | EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods for strategic reasoning face challenges in adaptability, scalability, and transferring strategies to new contexts. To address these issues, we propose explicit policy optimization (*EPO*) for strategic reasoning, featuring an LLM that provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior. |
Xiaoqian Liu; Ke Wang; Yongbin Li; Yuchuan Wu; Wentao Ma; Aobo Kong; Fei Huang; Jianbin Jiao; Junge Zhang; |
| 484 | ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: And existing pre-trained backdoor attacks are idealized in practice due to resource access constraints. Therefore we establish ELBA-Bench, a comprehensive and unified framework that allows attackers to inject backdoor through parameter efficient fine-tuning (e. g. , LoRA) or without fine-tuning techniques (e. g. , In-context-learning). |
Xuxu Liu; Siyuan Liang; Mengya Han; Yong Luo; Aishan Liu; Xiantao Cai; Zheng He; Dacheng Tao; |
| 485 | Redundancy Principles for MLLMs Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on redundancy from three key perspectives: 1) Redundancy of benchmark capability dimensions, 2) Redundancy in the number of test questions, and 3) Cross-benchmark redundancy within specific domains. |
Zicheng Zhang; Xiangyu Zhao; Xinyu Fang; Chunyi Li; Xiaohong Liu; Xiongkuo Min; Haodong Duan; Kai Chen; Guangtao Zhai; |
| 486 | WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing RAG frameworks are primarily designed for text-based LLMs and rely on Automatic Speech Recognition to process speech input, which discards crucial audio information, risks transcription errors, and increases computational overhead. Therefore, we introduce WavRAG, the first retrieval augmented generation framework with native, end-to-end audio support. |
Yifu Chen; Shengpeng Ji; Haoxiao Wang; Ziqing Wang; Siyu Chen; Jinzheng He; Jin Xu; Zhou Zhao; |
| 487 | JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on our taxonomy and experiments, we identify some important patterns, such as heuristic-based attacks, which could achieve high attack success rates but are easy to mitigate by defenses. Our study offers valuable insights for future research on jailbreak attacks and defenses and serves as a benchmark tool for researchers and practitioners to evaluate them effectively. |
Junjie Chu; Yugeng Liu; Ziqing Yang; Xinyue Shen; Michael Backes; Yang Zhang; |
| 488 | SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose **S**ubtask-**o**riented **R**einforced **F**ine-**T**uning (**SoRFT**), a novel training approach to enhance the issue resolving capability of LLMs. |
Zexiong Ma; Chao Peng; Pengfei Gao; Xiangxin Meng; Yanzhen Zou; Bing Xie; |
| 489 | FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, this paper proposes FaithfulRAG, a novel framework that resolves knowledge conflicts by explicitly modeling discrepancies between the model’s parametric knowledge and retrieved context. |
Qinggang Zhang; Zhishang Xiang; Yilin Xiao; Le Wang; Junhui Li; Xinrun Wang; Jinsong Su; |
| 490 | MMBoundary: Advancing MLLM Knowledge Boundary Awareness Through Reasoning Step Confidence Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present MMBoundary, a novel framework that advances the knowledge boundary awareness of MLLMs through reasoning step confidence calibration. |
Zhitao He; Sandeep Polisetty; Zhiyuan Fan; Yuchen Huang; Shujin Wu; Yi R. Fung; |
| 491 | Improving Chain-of-Thought Reasoning Via Quasi-Symbolic Abstractions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we present QuaSAR (for Quasi-Symbolic Abstract Reasoning), a variation of CoT that guides LLMs to operate at a higher level of abstraction via quasi-symbolic explanations. |
Leonardo Ranaldi; Marco Valentino; Andre Freitas; |
| 492 | Length Controlled Generation for Black-box LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel iterative sampling framework for text length control, integrating the Metropolis-Hastings algorithm with an importance sampling acceleration strategy. |
Yuxuan Gu; Wenjie Wang; Xiaocheng Feng; Weihong Zhong; Kun Zhu; Lei Huang; Ting Liu; Bing Qin; Tat-Seng Chua; |
| 493 | Finding The Sweet Spot: Preference Data Construction for Scaling Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to scale up the number of on-policy samples via repeated random sampling to improve alignment performance. |
Yao Xiao; Hai Ye; Linyao Chen; Hwee Tou Ng; Lidong Bing; Xiaoli Li; Roy Ka-Wei Lee; |
| 494 | Combining Domain and Alignment Vectors Provides Better Knowledge-Safety Trade-offs in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that simple interpolation between the domain and alignment delta parameters leads to safer domain-specific models that preserve their utility. Building on this, we introduce MergeAlign, a simple, efficient, and effective model merging-based alignment method. |
Megh Thakkar; Quentin Fournier; Matthew Riemer; Pin-Yu Chen; Amal Zouaq; Payel Das; Sarath Chandar; |
| 495 | Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most RM research is centered on English and relies heavily on synthetic resources, which leads to limited and less reliable datasets and benchmarks for Chinese. To address this gap, we introduce CheemsBench, a fully human-annotated RM evaluation benchmark within Chinese contexts, and CheemsPreference, a large-scale and diverse preference dataset annotated through human-machine collaboration to support Chinese RM training. |
Xueru Wen; Jie Lou; Zichao Li; Yaojie Lu; XingYu XingYu; Yuqiu Ji; Guohai Xu; Hongyu Lin; Ben He; Xianpei Han; Le Sun; Debing Zhang; |
| 496 | MorphMark: Flexible Adaptive Watermarking for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unlike existing methods, where watermark strength is typically treated as a fixed hyperparameter, our theoretical insights lead to the development of MorphMark—a method that adaptively adjusts the watermark strength in response to changes in the identified factor, thereby achieving an effective resolution of the dilemma. |
Zongqi Wang; Tianle Gu; Baoyuan Wu; Yujiu Yang; |
| 497 | Words of Warmth: Trust and Sociability Norms for Over 26k English Words Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Words of Warmth, the first large-scale repository of manually derived word–warmth (as well as word–trust and word–sociability) associations for over 26k English words. |
Saif M. Mohammad; |
| 498 | From English to Second Language Mastery: Enhancing LLMs with Cross-Lingual Continued Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Cross-Lingual Continued Instruction Tuning (X-CIT), which fully leverages translation-based parallel instruction data to enhance cross-lingual adaptability. |
Linjuan Wu; Hao-Ran Wei; Baosong Yang; Weiming Lu; |
| 499 | Multi-Level Explanations for Generative Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Multi-Level Explanations for Generative Language Models (MExGen), a technique to provide explanations for context-grounded text generation. |
Lucas Monteiro Paes; Dennis Wei; Hyo Jin Do; Hendrik Strobelt; Ronny Luss; Amit Dhurandhar; Manish Nagireddy; Karthikeyan Natesan Ramamurthy; Prasanna Sattigeri; Werner Geyer; Soumya Ghosh; |
| 500 | ExploraCoder: Advancing Code Generation for Multiple Unseen APIs Via Planning and Chained Exploration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by exploratory programming paradigm in human behavior, we propose **ExploraCoder**, a training-free framework that empowers LLMs to invoke multiple unseen APIs in code solution by (1) planning a complex problem into several API invocation subtasks, and (2) experimenting with correct API usage at intermediate steps through a novel chain-of-API-exploration. |
Yunkun Wang; Yue Zhang; Zhen Qin; Chen Zhi; Binhua Li; Fei Huang; Yongbin Li; Shuiguang Deng; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~1,800 papers), please visit Paper Digest: ACL-2025 (Full List).