Paper Digest: EMNLP 2025 Papers & Highlights
Note: EMNLP-2025 accepts more than 2,000 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 2,000 EMNLP-2025 papers in a separate page.
To search for papers presented at EMNLP-2025 on a specific topic, please make use of the search by venue (EMNLP-2025) service. To summarize the latest research published at EMNLP-2025 on a specific topic, you can utilize the review by venue (EMNLP-2025) service. If you are interested in browsing papers by author, we have a comprehensive list of ~ 9,000 authors (EMNLP-2025). Additionally, you may want to explore our “Best Paper” Digest (EMNLP), which lists the most influential EMNLP papers since 1996.
This curated list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that gets you the personalized and comprehensive daily paper digests on the latest research in your field. It also empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Experience the full potential of our services today!
TABLE 1: Paper Digest: EMNLP 2025 Papers & Highlights
| Paper | Author(s) | |
|---|---|---|
| 1 | S1: Simple Test-time Scaling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We seek the simplest approach to achieve test-time scaling and strong reasoning performance. |
Niklas Muennighoff; Zitong Yang; Weijia Shi; Xiang Lisa Li; Li Fei-Fei; Hannaneh Hajishirzi; Luke Zettlemoyer; Percy Liang; Emmanuel Candes; Tatsunori Hashimoto; |
| 2 | CodeArena: Evaluating and Aligning CodeLLMs on Human Preference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CodeArena to emulate the complexity/diversity of real-world coding tasks, spanning 40 categories and 44 PLs. |
Jian Yang; Jiaxi Yang; Wei Zhang; Jin Ke; Yibo Miao; Lei Zhang; Liqun Yang; Zeyu Cui; Yichang Zhang; Zhoujun Li; Binyuan Hui; Junyang Lin; |
| 3 | Following Length Constraints in Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we show how to train models that can be controlled at inference time with instructions containing desired length constraints. |
Weizhe Yuan; Ilia Kulikov; Ping Yu; Kyunghyun Cho; Sainbayar Sukhbaatar; Jason E Weston; Jing Xu; |
| 4 | ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities Through Tree-Based Image Exploration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Zoom Eye, a training-free, model-agnostic tree search algorithm tailored for vision-level reasoning. |
Haozhan Shen; Kangjia Zhao; Tiancheng Zhao; Ruochen Xu; Zilun Zhang; Mingwei Zhu; Jianwei Yin; |
| 5 | Infini-gram Mini: Exact N-gram Search at The Internet Scale with FM-Index Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Infini-gram mini, an efficient and scalable system that can make petabyte-level text corpora searchable. |
Hao Xu; Jiacheng Liu; Yejin Choi; Noah A. Smith; Hannaneh Hajishirzi; |
| 6 | Towards Robust Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Finding the right north-star metrics is highly critical for advancing mathematical reasoning capabilities of foundation models, especially given that existing evaluations are either too easy or only focusing on getting correct short answers. To address these issues, we present IMO-Bench, a suite of advanced reasoning benchmarks that specifically targets the level of the International Mathematical Olympiad (IMO), the most prestigious venue for young mathematicians. |
Thang Luong; Dawsen Hwang; Hoang H Nguyen; Golnaz Ghiasi; Yuri Chervonyi; Insuk Seo; Junsu Kim; Garrett Bingham; Jonathan Lee; Swaroop Mishra; Alex Zhai; Huiyi Hu; Henryk Michalewski; Jimin Kim; Jeonghyun Ahn; Junhwi Bae; Xingyou Song; Trieu Hoang Trinh; Quoc V Le; Junehyuk Jung; |
| 7 | Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Sketch-of-Thought (SoT), a prompting framework that integrates cognitively inspired reasoning paradigms with linguistic constraints to reduce token usage while preserving reasoning accuracy. |
Simon A. Aytes; Jinheon Baek; Sung Ju Hwang; |
| 8 | Do RAG Systems Really Suffer From Positional Bias? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Retrieval Augmented Generation enhances LLM accuracy by adding passages retrieved from an external corpus to the LLM prompt. This paper investigates how positional bias – the tendency of LLMs to weight information differently based on its position in the prompt – affects not only the LLM’s capability to capitalize on relevant passages, but also its susceptibility to distracting passages. |
Florin Cuconasu; Simone Filice; Guy Horowitz; Yoelle Maarek; Fabrizio Silvestri; |
| 9 | From Chat Logs to Collective Insights: Aggregative Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Aggregative Question Answering, a novel task requiring models to reason explicitly over thousands of user-chatbot interactions to answer aggregational queries, such as identifying emerging concerns among specific demographics. |
Wentao Zhang; Woojeong Kim; Yuntian Deng; |
| 10 | HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: MeZO addresses this using zeroth-order (ZO) optimization, matching memory usage to inference but suffering from slow convergence due to varying curvatures across model parameters. To overcome this limitation, We propose HELENE, a scalable and memory-efficient optimizer that integrates annealed A-GNB gradients with diagonal Hessian estimation and layer-wise clipping as a second-order pre-conditioner. |
Huaqin Zhao; Jiaxi Li; Yi Pan; Shizhe Liang; Xiaofeng Yang; Fei Dou; Tianming Liu; Jin Lu; |
| 11 | Database-Augmented Query Representation for Information Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they may be suboptimal to effectively augment the query, and there is plenty of other information available to augment it in a relational database. Motivated by this fact, we present a novel retrieval framework called Database-Augmented Query representation (DAQu), which augments the original query with various (query-related) metadata across multiple tables. |
Soyeong Jeong; Jinheon Baek; Sukmin Cho; Sung Ju Hwang; Jong C. Park; |
| 12 | Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Ambiguous Prompts and Unanswerable Questions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel, test-time approach to detecting model hallucination through systematic analysis of information flow across model layers. |
Hazel Kim; Tom A. Lamb; Adel Bibi; Philip Torr; Yarin Gal; |
| 13 | TokenSkip: Controllable Chain-of-Thought Compression in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this limitation, we analyze the semantic importance of tokens within CoT outputs and reveal that their contributions to reasoning vary. Building on this insight, we propose TokenSkip, a simple yet effective approach that enables LLMs to selectively skip less important tokens, allowing for controllable CoT compression. |
Heming Xia; Chak Tou Leong; Wenjie Wang; Yongqi Li; Wenjie Li; |
| 14 | Unleashing The Reasoning Potential of LLMs By Critique Fine-Tuning on One Problem Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce one-shot CFT, a highly compute-efficient approach that leverages critique data generated from a single math problem. |
Yubo Wang; Ping Nie; Kai Zou; Lijun Wu; Wenhu Chen; |
| 15 | Astra: Efficient Transformer Architecture and Contrastive Dynamics Learning for Embodied Instruction Following Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Astra, a novel Transformer architecture featuring trajectory attention and learnable action queries, designed to efficiently process segmented multimodal trajectories and predict actions for imitation learning. |
Yueen Ma; DaFeng Chi; Shiguang Wu; Yuecheng Liu; Yuzheng Zhuang; Irwin King; |
| 16 | Long Chain-of-Thought Fine-tuning Via Understanding-to-Reasoning Transition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, previous research on long-context scaling in language models has generally focused on managing lengthy input prompts instead of producing long outputs. To leverage the strong long context understanding abilities of current models, we introduce Understanding-to-Reasoning Transition (URT) fine-tuning, a sequence-level curriculum learning framework that gradually shifts a model’s focus from interpreting long chain-of-thoughts to generating them. |
Chenxin An; Zhihui Xie; Xiaonan Li; Ming Zhong; Shansan Gong; Lei Li; Jun Zhang; Jingjing Xu; Lingpeng Kong; |
| 17 | A Simple Yet Effective Method for Non-Refusing Context Relevant Fine-grained Safety Steering in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SafeSteer, a simple and effective method to guide LLM outputs by (i) leveraging category-specific steering vectors for fine-grained control, (ii) applying a gradient-free, unsupervised approach that enhances safety while preserving text quality and topic relevance without forcing explicit refusals, and (iii) eliminating the need for contrastive safe data. |
Shaona Ghosh; Amrita Bhattacharjee; Yftah Ziser; Christopher Parisien; |
| 18 | Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods have primarily focused on improving model responses rather than judgment capabilities, resulting in rapid saturation during iterative training. To address this issue, we introduce a novel Meta-Rewarding step to the self-improvement process, where the model judges its own judgements and uses that feedback to refine its judgment skills. |
Tianhao Wu; Weizhe Yuan; Olga Golovneva; Jing Xu; Yuandong Tian; Jiantao Jiao; Jason E Weston; Sainbayar Sukhbaatar; |
| 19 | VisualWebInstruct: Scaling Up Multimodal Instruction Data Through Web Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to address the scarcity of reasoning-focused multimodal datasets. |
Yiming Jia; Jiachen Li; Xiang Yue; Bo Li; Ping Nie; Kai Zou; Wenhu Chen; |
| 20 | Mitigating The Privacy Issues in Retrieval-Augmented Generation (RAG) Via Pure Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, when the retrieval process involves private data, RAG systems may face severe privacy risks, potentially leading to the leakage of sensitive information. To address this issue, we propose using synthetic data as a privacy-preserving alternative for the retrieval data. |
Shenglai Zeng; Jiankun Zhang; Pengfei He; Jie Ren; Tianqi Zheng; Hanqing Lu; Han Xu; Hui Liu; Yue Xing; Jiliang Tang; |
| 21 | InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing benchmarks often fail to test the full range of cognitive skills needed to process these temporally rich and narratively complex inputs. Therefore, we introduce InfiniBench, a comprehensive benchmark designed to evaluate the capabilities of models in long video understanding rigorously. |
Kirolos Ataallah; Eslam Mohamed Bakr; Mahmoud Ahmed; Chenhui Gou; Khushbu Pahwa; Jian Ding; Mohamed Elhoseiny; |
| 22 | Temporal Scaling Law for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the novel concept of Temporal Scaling Law, studying how the test loss of an LLM evolves as the training steps scale up. |
Yizhe Xiong; Xiansheng Chen; Xin Ye; Hui Chen; Zijia Lin; Haoran Lian; Zhenpeng Su; Wei Huang; Jianwei Niu; Jungong Han; Guiguang Ding; |
| 23 | Speculative Streaming: Efficient and Scalable Speculative Decoding with Multi-Stream Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel speculative decoding method that integrates speculative draft generation directly within the target model using multi-stream attention. |
Nikhil Bhendawade; Irina Belousova; Qichen Fu; Henry Mason; Antonie Lin; Mohammad Rastegari; Mahyar Najibi; |
| 24 | Value Profiles for Encoding Human Variation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose representing individuals using value profiles – natural language descriptions of underlying values compressed from in-context demonstrations – along with a steerable decoder model to estimate ratings conditioned on a value profile or other rater information. |
Taylor Sorensen; Pushkar Mishra; Roma Patel; Michael Henry Tessler; Michiel A. Bakker; Georgina Evans; Iason Gabriel; Noah Goodman; Verena Rieser; |
| 25 | Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The key limitation lies in the prevalent data organization paradigm, chunk, which commonly divides documents into fixed-size segments, and disrupts the logical coherence and connections within the context. To address this, we propose THREAD, a novel data organization paradigm enabling systems to handle how-to questions more effectively. |
Kaikai An; Fangkai Yang; Liqun Li; Junting Lu; Sitao Cheng; Shuzheng Si; Lu Wang; Pu Zhao; Lele Cao; Qingwei Lin; Saravan Rajmohan; Dongmei Zhang; Baobao Chang; |
| 26 | UltraIF: Advancing Instruction Following from The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To bridge the gap, we propose a simple and scalable approach UltraIF for building LLMs that can follow complex instructions with open-source data. |
Kaikai An; Li Sheng; Ganqu Cui; Shuzheng Si; Ning Ding; Yu Cheng; Baobao Chang; |
| 27 | AutoPenBench: A Vulnerability Testing Benchmark for Generative Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce milestones per task, allowing the comparison of intermediate steps where agents struggle. |
Luca Gioacchini; Alexander Delsanto; Idilio Drago; Marco Mellia; Giuseppe Siracusano; Roberto Bifulco; |
| 28 | When Life Gives You Samples: The Benefits of Scaling Up Inference Compute for Multilingual LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our findings highlight the need for tailored sampling and selection strategies. We propose novel solutions tailored for this multi-faceted inference scenario, demonstrating notable gains across languages and tasks. |
Ammar Khairi; Daniel D’souza; Ye Shen; Julia Kreutzer; Sara Hooker; |
| 29 | M-LongDoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce M-LongDoc, a benchmark of 851 samples, and an automated framework to evaluate the performance of large multimodal models. |
Yew Ken Chia; Liying Cheng; Hou Pong Chan; Maojia Song; Chaoqun Liu; Mahani Aljunied; Soujanya Poria; Lidong Bing; |
| 30 | FLSA: Learning Semantic Structures in Document Collections Using Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce fLSA, a foundation-model-based Latent Semantic Analysis method that iteratively clusters and tags document segments based on document-level contexts. |
Weijia Xu; Nebojsa Jojic; Nicolas Le Roux; |
| 31 | Not-Just-Scaling Laws: Towards A Better Understanding of The Downstream Impact of Language Model Design Decisions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We find that by incorporating features besides model size and number of training tokens, we can achieve a relative 3-28% increase in ability to predict downstream performance compared with using scale alone. |
Emmy Liu; Amanda Bertsch; Lintang Sutawika; Lindia Tjuatja; Patrick Fernandes; Lara Marinov; Michael Chen; Shreya Singhal; Carolin Lawrence; Aditi Raghunathan; Kiril Gashteovski; Graham Neubig; |
| 32 | WebInject: Prompt Injection Attack to Web Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose WebInject, a prompt injection attack that manipulates the webpage environment to induce a web agent to perform an attacker-specified action. |
Xilong Wang; John Bloch; Zedian Shao; Yuepeng Hu; Shuyan Zhou; Neil Zhenqiang Gong; |
| 33 | Textual Aesthetics in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a pipeline for aesthetics polishing and help construct a textual aesthetics dataset named TEXAES. |
Lingjie Jiang; Shaohan Huang; Xun Wu; Furu Wei; |
| 34 | PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, leveraging such planning structures during post-training to boost the performance of smaller open-source LLMs remains underexplored. Motivated by this, we introduce PLAN-TUNING, a unified post-training framework that (i) distills synthetic task decompositions (termed “planning trajectories”) from large-scale LLMs and (ii) fine-tunes smaller models via supervised and reinforcement-learning objectives designed to mimic these planning processes to improve complex reasoning. |
Mihir Parmar; Palash Goyal; Xin Liu; Yiwen Song; Mingyang Ling; Chitta Baral; Hamid Palangi; Tomas Pfister; |
| 35 | BYOKG-RAG: Multi-Strategy Graph Retrieval for Knowledge Graph Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce BYOKG-RAG, a framework that enhances KGQA by synergistically combining LLMs with specialized graph retrieval tools. |
Costas Mavromatis; Soji Adeshina; Vassilis N. Ioannidis; Zhen Han; Qi Zhu; Ian Robinson; Bryan Thompson; Huzefa Rangwala; George Karypis; |
| 36 | GRPO-LEAD: A Difficulty-Aware Reinforcement Learning Approach for Concise Mathematical Reasoning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose GRPO-LEAD, enhancing GRPO with: (1) length-regularized rewards to encourage conciseness while maintaining accuracy; (2) explicit penalties for incorrect solutions to improve model precision; and (3) difficulty-aware advantage reweighting for robust generalization on challenging problems. |
Jixiao Zhang; Chunsheng Zuo; |
| 37 | LaMP-QA: A Benchmark for Personalized Long-form Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is mainly due to lack of resources for training and evaluating personalized question answering systems. We address this gap by introducing LaMP-QA—a benchmark designed for evaluating personalized long-form answer generation. |
Alireza Salemi; Hamed Zamani; |
| 38 | From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we systematically investigate WVS-based training for cultural value adaptation and find that relying solely on survey data can homogenize cultural norms and interfere with factual knowledge. |
Farid Adilazuarda; Chen Cecilia Liu; Iryna Gurevych; Alham Fikri Aji; |
| 39 | VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce VoiceCraft-X, an autoregressive neural codec language model which unifies multilingual speech editing and zero-shot text-to-speech (TTS) synthesis across 11 languages: English, Mandarin, Korean, Japanese, Spanish, French, German, Dutch, Italian, Portuguese, and Polish. |
Zhisheng Zheng; Puyuan Peng; Anuj Diwan; Cong Phuoc Huynh; Xiaohang Sun; Zhu Liu; Vimal Bhat; David Harwath; |
| 40 | All Roads Lead to Rome: Graph-Based Confidence Estimation for Large Language Model Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods are primarily designed for factual QA tasks and often fail to generalize to reasoning tasks. To address this gap, we propose a set of training-free, graph-based confidence estimation methods tailored to reasoning tasks. |
Caiqi Zhang; Chang Shu; Ehsan Shareghi; Nigel Collier; |
| 41 | ActionStudio: A Lightweight Framework for Data and Training of Large Action Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce ActionStudio, a lightweight and extensible data and training framework designed for large action models. |
Jianguo Zhang; Thai Quoc Hoang; Ming Zhu; Zuxin Liu; Shiyu Wang; Tulika Manoj Awalgaonkar; Akshara Prabhakar; Haolin Chen; Weiran Yao; Zhiwei Liu; Juntao Tan; Juan Carlos Niebles; Shelby Heinecke; Huan Wang; Silvio Savarese; Caiming Xiong; |
| 42 | GLIMPSE: Do Large Vision-Language Models Truly Think With Videos or Just Glimpse at Them? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This limits our ability to assess whether large vision-language models (LVLMs) can truly think with videos rather than perform superficial frame-level analysis. To address this, we introduce , a benchmark specifically designed to evaluate whether LVLMs can genuinely think with videos. |
Yiyang Zhou; Linjie Li; Shi Qiu; Zhengyuan Yang; Yuyang Zhao; Siwei Han; Yangfan He; Kangqi Li; Haonian Ji; Zihao Zhao; Haibo Tong; Lijuan Wang; Huaxiu Yao; |
| 43 | Context Is Gold to Find The Gold Passage: Evaluating and Training Contextual Document Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our results show that state-of-the-art embedding models struggle in retrieval scenarios where context is required. To address this limitation, we propose InSeNT (In-sequence Negative Training), a novel contrastive post-training approach which combined with late chunking pooling enhances contextual representation learning while preserving computational efficiency. |
Max Conti; Manuel Faysse; Gautier Viaud; Antoine Bosselut; Celine Hudelot; Pierre Colombo; |
| 44 | UniEDU: Toward Unified and Efficient Large Multimodal Models for Educational Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified language and vision assistant UniEDU designed for various educational applications, including knowledge recommendation, knowledge tracing, time cost prediction, and user answer prediction, all within a single model. |
Zhendong Chu; Jian Xie; Shen Wang; Zichao Wang; Qingsong Wen; |
| 45 | Calibrating Verbal Uncertainty As A Linear Feature to Reduce Hallucinations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Achieving the ability to express in language the actual degree of uncertainty around a claim is therefore of great importance. We find that ”verbal uncertainty” is governed by a single linear feature in the representation space of LLMs, and shows that this has only moderate correlation with the actual ”semantic uncertainty” of the model. |
Ziwei Ji; Lei Yu; Yeskendir Koishekenov; Yejin Bang; Anthony Hartshorn; Alan Schelten; Cheng Zhang; Pascale Fung; Nicola Cancedda; |
| 46 | EasyRec: Simple Yet Effective Language Models for Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the success of language models (LMs) and their robust generalization capabilities, we pose the question: How can we leverage language models to enhance recommender systems? |
Xubin Ren; Chao Huang; |
| 47 | Media Source Matters More Than Content: Unveiling Political Bias in LLM-Generated Citations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through systematic evaluations, we find that LLMs exhibit a consistent tendency to cite left-leaning sources at notably higher rates compared to traditional retrieval systems (e. g. , BM25 and dense retrievers). |
Sunhao Dai; Zhanshuo Cao; Wenjie Wang; Liang Pang; Jun Xu; See-Kiong Ng; Tat-Seng Chua; |
| 48 | Cost-Optimal Grouped-Query Attention for Long-Context Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we analyze the relationship among context length, model size, GQA configuration, and model loss, and introduce two innovations: (1) we decouple the total head size from the hidden size, enabling more flexible control over attention FLOPs; and (2) we jointly optimize the model size and the GQA configuration to arrive at a better allocation of inference resources between attention layers and other components. |
Yingfa Chen; Yutong Wu; Chenyang Song; Zhen Leng Thai; Xingyu Shen; Xu Han; Zhiyuan Liu; Maosong Sun; |
| 49 | R-PRM: Reasoning-Driven Process Reward Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This limitation is further compounded by the scarcity of annotated data. To address these issues, we propose Reasoning-Driven Process Reward Modeling (R-PRM), which activates inherent reasoning to enhance process-level evaluation. |
Shuaijie She; Junxiao Liu; Yifeng Liu; Jiajun Chen; Xin Huang; Shujian Huang; |
| 50 | Demystifying Domain-adaptive Post-training for Financial LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, significant challenges remain in identifying optimal adaptation criteria and training strategies across varying data and model configurations. To address these challenges, we introduce FINDAP, a systematic and fine-grained investigation into domain-adaptive post-training of LLMs for the finance domain. |
Zixuan Ke; Yifei Ming; Xuan-Phi Nguyen; Caiming Xiong; Shafiq Joty; |
| 51 | GATEAU: Selecting Influential Samples for Long Context Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, we propose GATEAU, a novel framework to address the unique challenge of long context alignment by identifying the influential samples enriched with long-range dependency relations. |
Shuzheng Si; Haozhe Zhao; Gang Chen; Yunshui Li; Kangyang Luo; Chuancheng Lv; Kaikai An; Fanchao Qi; Baobao Chang; Maosong Sun; |
| 52 | MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, refinement faces three key challenges: (1) Excessive refinement: Uniformly refining all instances can cause over-correction and reduce overall performance. (2) Inability to localize and address errors: LLMs struggle to identify and correct their own mistakes. (3) Insufficient refinement: Stopping refinement too soon could leave errors unaddressed. To tackle these issues, we propose MAgICoRe, a framework for Multi-Agent Iteration for Coarse-to-fine Refinement. |
Justin Chen; Archiki Prasad; Swarnadeep Saha; Elias Stengel-Eskin; Mohit Bansal; |
| 53 | Grounding Multilingual Multimodal LLMs With Cultural Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Multimodal Large Language Models excel in high-resource settings, but often misinterpret long-tail cultural entities and underperform in low-resource languages. To address this gap, we propose a data-centric approach that directly grounds MLLMs in cultural knowledge. |
Jean De Dieu Nyandwi; Yueqi Song; Simran Khanuja; Graham Neubig; |
| 54 | FuseChat: Knowledge Fusion of Chat Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new framework for the knowledge fusion of chat LLMs through two main stages, resulting in FuseChat. |
Fanqi Wan; Longguang Zhong; Ziyi Yang; Ruijun Chen; Xiaojun Quan; |
| 55 | Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we found pre-training on rephrased synthetic data alone is not faster than pre-training on natural web texts; while pre-training on 1/3 rephrased synthetic data mixed with 2/3 natural web texts can speed up 5-10x (to reach the same validation loss) at larger data budgets. |
Feiyang Kang; Newsha Ardalani; Michael Kuchnik; Youssef Emad; Mostafa Elhoushi; Shubhabrata Sengupta; Shang-Wen Li; Ramya Raghavendra; Ruoxi Jia; Carole-Jean Wu; |
| 56 | From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a large-scale, high-quality multi-way parallel corpus, TED2025, based on TED Talks. |
Yingli Shen; Wen Lai; Shuo Wang; Ge Gao; Kangyang Luo; Alexander Fraser; Maosong Sun; |
| 57 | Exploring Response Uncertainty in MLLMs: An Empirical Evaluation Under Misleading Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We reveal a response uncertainty phenomenon: across nine standard datasets, twelve state-of-the-art open-source MLLMs overturn a previously correct answer in 65% of cases after receiving a single deceptive cue. To systematically quantify this vulnerability, we propose a two-stage evaluation pipeline: (1) elicit each model’s original response on unperturbed inputs; (2) inject explicit (false-answer hints) and implicit (contextual contradictions) misleading instructions, and compute the misleading rate—the fraction of correct-to-incorrect flips. |
Yunkai Dang; Mengxi Gao; Yibo Yan; Xin Zou; Yanggan Gu; Jungang Li; Jingyu Wang; Peijie Jiang; Aiwei Liu; Jia Liu; Xuming Hu; |
| 58 | ProCut: LLM Prompt Compression Via Attribution Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This expansion leads to bloated prompts that are difficult to maintain and incur significant inference latency and serving costs. To address this, we introduce Prompt Compression via Attribution Estimation (ProCut), a flexible, LLM-agnostic, training-free framework that compresses prompts through attribution analysis. |
Zhentao Xu; Fengyi Li; Albert C. Chen; Xiaofeng Wang; |
| 59 | MR. Judge: Multimodal Reasoner As A Judge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Multimodal Reasoner as a Judge (MR. Judge), a paradigm for empowering general-purpose MLLMs judges with strong reasoning capabilities. |
Renjie Pi; Haoping Bai; Qibin Chen; Xiaoming Simon Wang; Jiulong Shan; Xiaojiang Liu; Meng Cao; |
| 60 | LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce LiteASR, a low-rank compression scheme for ASR encoders that significantly reduces inference costs while maintaining transcription accuracy. |
Keisuke Kamahori; Jungo Kasai; Noriyuki Kojima; Baris Kasikci; |
| 61 | An Address Intelligence Framework for E-commerce Deliveries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this two-partstudy, we first outline the construction ofa language model to assist customers withaddress standardization and in the latterpart, we detail a novel Pareto-ensemblemulti-task prediction algorithm that derives critical insights from customer addresses to minimize operational losses arising from a given geographical area. |
Gokul Swamy; Aman Gulati; Srinivas Virinchi; Anoop Saladi; |
| 62 | Pointing to A Llama and Call It A Camel: On The Sycophancy of Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we find that this approach also makes the MLLM overly resistant to corrective instructions (i. e. , stubborn even if it is wrong). To alleviate this trade-off, we propose Sycophantic Reflective Tuning (SRT), which enables the MLLM to engage in reflective reasoning, allowing it to determine whether a user’s instruction is misleading or corrective before drawing a conclusion. |
Renjie Pi; Kehao Miao; Li Peihang; Runtao Liu; Jiahui Gao; Jipeng Zhang; Xiaofang Zhou; |
| 63 | On LLM-Based Scientific Inductive Reasoning Beyond Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the parallels between inductive reasoning and human scientific discovery, we propose the task of LLM-Based Scientific Inductive Reasoning Beyond Equations and introduce a new benchmark, SIRBench-V1, to evaluate the inductive reasoning abilities of LLMs in scientific settings. |
Brian S. Lin; Jiaxin Yuan; Zihan Zhou; Shouli Wang; Shuo Wang; Cunliang Kong; Qi Shi; Yuxuan Li; Liner Yang; Zhiyuan Liu; Maosong Sun; |
| 64 | LLM Agents Implement An NLG System from Scratch: Building Interpretable Rule-Based RDF-to-Text Generators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel neurosymbolic framework for RDF-to-text generation, in which the model is “trained” through collaborative interactions among multiple LLM agents rather than traditional backpropagation. |
Mateusz Lango; Ondrej Dusek; |
| 65 | Hidden in Plain Sight: Reasoning in Underspecified and Misspecified Scenarios for Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In such cases, success hinges not merely on task execution, but on the model’s ability to detect when something is silently wrong. This paper presents a systematic analysis of how current MLLMs handle such underspecified and misspecified scenarios: cases where flaws must be inferred from context rather than explicitly stated. |
Qianqi Yan; Hongquan Li; Shan Jiang; Yang Zhao; Xinze Guan; Ching-Chen Kuo; Xin Eric Wang; |
| 66 | ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods for multimodal document retrieval often replicate techniques developed for text-only retrieval, whether in how they encode documents, define training objectives, or compute similarity scores. To address these limitations, we present ColMate, a document retrieval model that bridges the gap between multimodal representation learning and document retrieval. |
Ahmed Masry; Megh Thakkar; Patrice Bechard; Sathwik Tejaswi Madhusudhan; Rabiul Awal; Shambhavi Mishra; Akshay Kalkunte Suresh; Srivatsava Daruru; Enamul Hoque; Spandana Gella; Torsten Scholak; Sai Rajeswar; |
| 67 | Search-o1: Agentic Search-Enhanced Large Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, their extended reasoning processes often suffer from knowledge insufficiency, leading to frequent uncertainties and potential errors. To address this limitation, we introduce **Search-o1**, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents. |
Xiaoxi Li; Guanting Dong; Jiajie Jin; Yuyao Zhang; Yujia Zhou; Yutao Zhu; Peitian Zhang; Zhicheng Dou; |
| 68 | Router-Tuning: A Simple and Effective Approach for Dynamic Depth Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response to the first issue, we propose Router-Tuning, which fine-tunes only the routers on a small dataset, drastically reducing the computational overhead associated with full model training. |
Shwai He; Tao Ge; Guoheng Sun; Bowei Tian; Xiaoyang Wang; Dong Yu; |
| 69 | P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous assessments often limited their scope to fundamental natural language processing (NLP) or isolated capability-specific tasks. To alleviate this drawback, we aim to present a comprehensive multilingual multitask benchmark. |
Yidan Zhang; Yu Wan; Boyi Deng; Baosong Yang; Hao-Ran Wei; Fei Huang; Bowen Yu; Dayiheng Liu; Junyang Lin; Fei Huang; Jingren Zhou; |
| 70 | SRS-Stories: Vocabulary-constrained Multilingual Story Generation for Language Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use large language models to generate personalized stories for language learners, using only the vocabulary they know. |
Wiktor Kamzela; Mateusz Lango; Ondrej Dusek; |
| 71 | LM-Searcher: Cross-domain Neural Architecture Search with LLMs Via Unified Numerical Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose LM-Searcher, a novel framework that leverages LLMs for cross-domain neural architecture optimization without the need for extensive domain-specific adaptation. |
Yuxuan Hu; Jihao Liu; Ke Wang; Jinliang Zheng; Weikang Shi; Manyuan Zhang; Qi Dou; Rui Liu; Aojun Zhou; Hongsheng Li; |
| 72 | Logits-Based Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, traditional Supervised Fine-Tuning (SFT), which relies on singular ground truth labels, often fails to capture token-level dependencies and linguistic diversity. To address these limitations, we propose a logits-based fine-tuning framework that integrates the strengths of supervised learning and knowledge distillation. |
Jingyao Li; Senqiao Yang; Sitong Wu; Han Shi; Chuanyang Zheng; Hong Xu; Jiaya Jia; |
| 73 | CIE: Controlling Language Model Text Generations Using Continuous Signals Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we are interested in continuous control signals, ones that exist along a spectrum that can’t easily be captured in a natural language prompt or via existing techniques in conditional generation. |
Vinay Samuel; Harshita Diddee; Yiming Zhang; Daphne Ippolito; |
| 74 | WebEvolver: Enhancing Web Agent Self-Improvement with Co-evolving World Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the performance of self-improvement, we propose a novel framework that introduces a co-evolving World Model LLM. |
Tianqing Fang; Hongming Zhang; Zhisong Zhang; Kaixin Ma; Wenhao Yu; Haitao Mi; Dong Yu; |
| 75 | SimulatorArena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there is no benchmark or systematic study to evaluate whether these simulated users are reliable stand-ins for real users. To address this, we introduce SimulatorArena, a benchmark of 909 annotated human–LLM conversations on two interactive tasks—math tutoring and document creation. |
Yao Dou; Michel Galley; Baolin Peng; Chris Kedzie; Weixin Cai; Alan Ritter; Chris Quirk; Wei Xu; Jianfeng Gao; |
| 76 | Corrupted But Not Broken: Understanding and Mitigating The Negative Impacts of Corrupted Data in Visual Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous approaches to address these challenges have focused on refining datasets through high-quality data collection or rule-based filtering that can be costly or limited in scope. In this paper, we conduct a systematic investigation into the impact of corrupted data on MLLMs and discover that, although corrupted data degrade model performance, such adverse effects are largely reversible, and MLLMs are corrupted but not broken. |
Yunhao Gou; Hansi Yang; Zhili Liu; Kai Chen; Yihan Zeng; Lanqing Hong; Zhenguo Li; Qun Liu; Bo Han; James Kwok; Yu Zhang; |
| 77 | Position: LLMs Can Be Good Tutors in English Education Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent efforts have begun integrating large language models (LLMs) into English education, they often rely on traditional approaches to learning tasks without fully embracing educational methodologies, thus lacking adaptability to language learning. To address this gap, we argue that **LLMs have the potential to serve as effective tutors in English Education**. |
Jingheng Ye; Shen Wang; Deqing Zou; Yibo Yan; Kun Wang; Hai-Tao Zheng; Ruitong Liu; Zenglin Xu; Irwin King; Philip S. Yu; Qingsong Wen; |
| 78 | Precise In-Parameter Concept Erasure in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches for removing such knowledge rely on fine-tuning, training low-rank adapters or fact-level editing, but these are either too coarse, too shallow, or ineffective. In this work, we propose PISCES, a novel framework for precisely erasing entire concepts from model parameters by directly editing directions that encode them in parameter space. |
Yoav Gur-Arieh; Clara Haya Suslik; Yihuai Hong; Fazl Barez; Mor Geva; |
| 79 | Intrinsic Test of Unlearning Using Parametric Knowledge Traces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a general evaluation methodology that uses vocabulary projections to inspect concepts encoded in model parameters. |
Yihuai Hong; Lei Yu; Haiqin Yang; Shauli Ravfogel; Mor Geva; |
| 80 | Alignment with Fill-In-the-Middle for Enhancing Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach that splits code snippets into smaller, granular blocks, creating more diverse DPO pairs from the same test cases. |
Houxing Ren; Zimu Lu; Weikang Shi; Haotian Hou; Yunqiao Yang; Ke Wang; Aojun Zhou; Junting Pan; Mingjie Zhan; Hongsheng Li; |
| 81 | OWL: Probing Cross-Lingual Recall of Memorized Texts Via World Literature Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper investigates multilingual and cross-lingual memorization in LLMs, probing if memorized content in one language (e. g. , English) can be recalled when presented in translation. |
Alisha Srivastava; Emir Kaan Korukluoglu; Minh Nhat Le; Duyen Tran; Chau Minh Pham; Marzena Karpinska; Mohit Iyyer; |
| 82 | Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in cybersecurity, we have noticed a lack of open-source datasets, with a particular lack of high-quality cybersecurity pretraining corpora, even though much research indicates that LLMs acquire their knowledge during pretraining. To address this, we present a comprehensive suite of datasets covering all major training stages, including pretraining, instruction fine-tuning, and reasoning distillation with cybersecurity-specific self-reflection data. |
Yao-Ching Yu; Tsun-Han Chiang; Cheng-Wei Tsai; Chien-Ming Huang; Wen-Kwang Tsao; |
| 83 | Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language model (LLM) agents often struggle in environments where rules and required domain knowledge frequently change, such as regulatory compliance and user risk screening. To address this limitation, we propose the Adaptive Reflective Interactive Agent (ARIA), an LLM agent framework designed specifically to continuously learn updated domain knowledge at test time. |
Yufei He; Ruoyu Li; Alex Chen; Yue Liu; Yulin Chen; Yuan Sui; Cheng Chen; Yi Zhu; Luca Luo; Frank Yang; Bryan Hooi; |
| 84 | Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Their language is often rigid and mechanical, lacking the human-like qualities essential for patient trust. To address these challenges, we propose ***Ask Patients with Patience (APP)***, a multi-turn LLM-based medical assistant designed for grounded reasoning, transparent diagnoses, and human-centric interaction. |
Jiayuan Zhu; Jiazhen Pan; Yuyuan Liu; Fenglin Liu; Junde Wu; |
| 85 | LightThinker: Thinking Step-by-Step Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose LightThinker, a novel method that enables LLMs to dynamically compress intermediate thoughts during reasoning. |
Jintian Zhang; Yuqi Zhu; Mengshu Sun; Yujie Luo; Shuofei Qiao; Lun Du; Da Zheng; Huajun Chen; Ningyu Zhang; |
| 86 | RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current RAG methods exhibit limited capabilities in complex RAG scenarios and suffer from limited task diversity. To address these limitations, we propose RAG-Instruct, a general method for synthesizing diverse and high-quality RAG instruction data based on any source corpus. |
Wanlong Liu; Junying Chen; Ke Ji; Li Zhou; Wenyu Chen; Benyou Wang; |
| 87 | ViLBench: A Suite for Vision-Language Process Reward Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further advance evaluation, we introduce ViLBench, a vision-language benchmark designed to require intensive process reward signals. |
Haoqin Tu; Weitao Feng; Hardy Chen; Hui Liu; Xianfeng Tang; Cihang Xie; |
| 88 | Constructions Are Revealed in Word Distributions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This requires computable models of the distribution over strings—namely, pretrained language models (PLMs). Here, we treat a RoBERTa model as a proxy for this distribution and hypothesize that constructions will be revealed within it as patterns of statistical affinity. |
Joshua Rozner; Leonie Weissweiler; Kyle Mahowald; Cory Shain; |
| 89 | BabyLM’s First Constructions: Causal Interventions Provide A Signal of Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we use Rozner et al. ’s methods to evaluate construction learning in masked language models from the 2024 BabyLM Challenge. |
Joshua Rozner; Leonie Weissweiler; Cory Shain; |
| 90 | SafeScientist: Enhancing AI Scientist Safety for Risk-Aware Scientific Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent advancements in large language model (LLM) agents have significantly accelerated scientific discovery automation, yet concurrently raised critical ethical and safety concerns. To systematically address these challenges, we introduce **SafeScientist**, an innovative AI scientist framework explicitly designed to enhance safety and ethical responsibility in AI-driven scientific exploration. |
Kunlun Zhu; Jiaxun Zhang; Ziheng Qi; Nuoxing Shang; Zijia Liu; Peixuan Han; Yue Su; Haofei Yu; Jiaxuan You; |
| 91 | Does Quantization Affect Models’ Performance on Long-context Tasks? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present the first systematic evaluation of quantized LLMs on tasks with long inputs (≥64K tokens) and long-form outputs. |
Anmol Mekala; Anirudh Atmakuru; Yixiao Song; Marzena Karpinska; Mohit Iyyer; |
| 92 | UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the problem, here we propose UNCERTAINTY-LINE (Length-INvariant Estimation), a simple debiasing procedure that regresses uncertainty scores on output length and uses the residuals as corrected, length-invariant estimates. |
Roman Vashurin; Maiya Goloburda; Preslav Nakov; Maxim Panov; |
| 93 | Scaling Rich Style-Prompted Text-to-Speech Datasets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Paralinguistic Speech Captions (ParaSpeechCaps), a large-scale dataset that annotates speech utterances with rich style captions. |
Anuj Diwan; Zhisheng Zheng; David Harwath; Eunsol Choi; |
| 94 | Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the observation that self-consistent errors often differ across LLMs, we propose a simple but effective cross‐model probe method that fuses hidden state evidence from an external verifier LLM. |
Hexiang Tan; Fei Sun; Sha Liu; Du Su; Qi Cao; Xin Chen; Jingang Wang; Xunliang Cai; Yuanzhuo Wang; Huawei Shen; Xueqi Cheng; |
| 95 | SciRIFF: A Resource to Enhance Language Model Instruction-Following Over Scientific Literature Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ScIRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following instances for training and evaluation, covering 54 tasks. |
David Wadden; Kejian Shi; Jacob Morrison; Alan Li; Aakanksha Naik; Shruti Singh; Nitzan Barzilay; Kyle Lo; Tom Hope; Luca Soldaini; Shannon Zejiang Shen; Doug Downey; Hannaneh Hajishirzi; Arman Cohan; |
| 96 | Sparse Autoencoder Features for Classifications and Transferability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We systematically analyze SAE for interpretable feature extraction from LLMs in safety-critical classification tasks. |
Jack Gallifant; Shan Chen; Kuleen Sasse; Hugo Aerts; Thomas Hartvigsen; Danielle Bitterman; |
| 97 | Agent Vs. Agent: Automated Data Generation and Red-Teaming for Custom Agentic Workflows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a dual-component automated red-teaming framework: AgentHarm-Gen generates adversarial tasks and evaluation functions tailored to arbitrary toolsets, while Red-Agent-Reflect employs iterative prompt refinement with self-reflection to develop progressively more effective attacks. |
Ninad Kulkarni; Xian Wu; Siddharth Varia; Dmitriy Bespalov; |
| 98 | VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To analyze explicit bias, we directly pose questions to VLMs related to gender and racial differences: (1) Multiple-choice questions based on a given image (e. g. , “What is the education level of the person in the image?”) |
Jen-tse Huang; Jiantong Qin; Jianping Zhang; Youliang Yuan; Wenxuan Wang; Jieyu Zhao; |
| 99 | Causal Interventions Reveal Shared Structure Across English Filler–Gap Constructions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Language Models (LMs) have emerged as powerful sources of evidence for linguists seeking to develop theories of syntax. In this paper, we argue that causal interpretability methods, applied to LMs, can greatly enhance the value of such evidence by helping us characterize the abstract mechanisms that LMs learn to use. |
Sasha Boguraev; Christopher Potts; Kyle Mahowald; |
| 100 | WebAgent-R1: Training Web Agents Via End-to-End Multi-Turn Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present WebAgent-R1, a simple yet effective end-to-end multi-turn RL framework for training web agents. |
Zhepei Wei; Wenlin Yao; Yao Liu; Weizhi Zhang; Qin Lu; Liang Qiu; Changlong Yu; Puyang Xu; Chao Zhang; Bing Yin; Hyokun Yun; Lihong Li; |
| 101 | QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the findings, we propose a new zero-shot prompting method, Question-Guided Chain-of-Captions (QG-CoC), a generalized prompting approach that effectively handles problems with an arbitrary number of images. |
Kuei-Chun Kao; Hsu Tzu-Yin; Yunqi Hong; Ruochen Wang; Cho-Jui Hsieh; |
| 102 | Socratic-MCTS: Test-Time Visual Reasoning By Asking The Right Questions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Should we simply abandon them, or is there hope for a search mechanism that can elicit hidden knowledge and induce long reasoning traces— without any additional training or supervision? In this paper, we explore this possibility using a Monte Carlo Tree Search (MCTS)-inspired algorithm, which injects subquestion–subanswer pairs into the model’s output stream. |
David Acuna; Ximing Lu; Jaehun Jung; Hyunwoo Kim; Amlan Kar; Sanja Fidler; Yejin Choi; |
| 103 | Culture Cartography: Mapping The Landscape of Cultural Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The process would benefit from mixed-initiative collaboration, where users guide the process to meaningfully reflect their cultures, and LLMs steer the process to meet the researcher’s goals. We propose CultureCartography as a methodology that operationalizes this mixed-initiative vision. |
Caleb Ziems; William Barr Held; Jane Yu; Amir Goldberg; David Grusky; Diyi Yang; |
| 104 | Step-level Verifier-guided Hybrid Test-Time Scaling for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on training-free TTS methods for reasoning. |
Kaiyan Chang; Yonghao Shi; Chenglong Wang; Hang Zhou; Chi Hu; Xiaoqian Liu; Yingfeng Luo; Yuan Ge; Tong Xiao; JingBo Zhu; |
| 105 | Collaborative Beam Search: Enhancing LLM Reasoning Via Collective Consensus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While parallel inference-time scaling methods, such as step-level beam search, offer a promising solution, existing approaches typically depend on either domain-specific external verifiers, or self-evaluation which is brittle and prompt-sensitive. To address these issues, we propose Collaborative Beam Search (CBS), an iterative framework that harnesses the collective intelligence of multiple LLMs across both generation and verification stages. |
Yangyifan Xu; Shuo Ren; Jiajun Zhang; |
| 106 | From Language to Cognition: How LLMs Outgrow The Human Language Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we find that brain alignment tracks the development of formal linguistic competence—i. e. , knowledge of linguistic rules—more closely than functional linguistic competence. |
Badr AlKhamissi; Greta Tuckute; Yingtian Tang; Taha Osama A Binhuraib; Antoine Bosselut; Martin Schrimpf; |
| 107 | Personality Vector: Modulating Personality of Large Language Models By Model Merging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method for personality modulation in LLMs via model merging. |
Seungjong Sun; Seo Yeon Baek; Jang Hyun Kim; |
| 108 | EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The rise of LLM-driven AI characters raises safety concerns, particularly for vulnerable human users with psychological disorders. To address these risks, we propose EmoAgent, a multi-agent AI framework designed to evaluate and mitigate mental health hazards in human-AI interactions. |
Jiahao Qiu; Yinghui He; Xinzhe Juan; Yimin Wang; Yuhan Liu; Zixin Yao; Yue Wu; Xun Jiang; Ling Yang; Mengdi Wang; |
| 109 | CheckEval: A Reliable LLM-as-a-Judge Framework for Evaluating Text Generation Using Checklists Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We attribute this to subjective evaluation criteria combined with Likert scale scoring in existing protocols. To address this issue, we introduce CheckEval, a checklist-based evaluation framework that improves rating reliability via decomposed binary questions. |
Yukyung Lee; JoongHoon Kim; Jaehee Kim; Hyowon Cho; Jaewook Kang; Pilsung Kang; Najoung Kim; |
| 110 | In-Context Learning Boosts Speech Recognition Via Human-like Adaptation to Speakers and Language Varieties Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a scalable framework that allows for in-context learning (ICL) in Phi-4 Multimodal (Phi-4-MM) using interleaved task prompts and audio-text pairs, and find that as few as 12 example utterances (~50 seconds) at inference time reduce word error rates by a relative 19. |
Nathan Roll; Calbert Graham; Yuka Tatsumi; Kim Tien Nguyen; Meghan Sumner; Dan Jurafsky; |
| 111 | Looking Beyond Text: Reducing Language Bias in Large Vision-Language Models Via Multimodal Dual-Attention and Soft-Image Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large vision-language models (LVLMs) have achieved impressive results in vision-language tasks. However, Therefore, we propose LACING, designed to address such bias with Mu _Ltimodal Du _Al-attention Me _Chan _Ism (MDA) a _Nd Soft-Image _Guidance (SIG). |
Haozhe Zhao; Shuzheng Si; Liang Chen; Yichi Zhang; Maosong Sun; Baobao Chang; Minjia Zhang; |
| 112 | Retrieval-augmented GUI Agents with Generative Guidelines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose RAG-GUI , a lightweight VLM that leverages web tutorials at inferencetime. |
Ran Xu; Kaixin Ma; Wenhao Yu; Hongming Zhang; Joyce C. Ho; Carl Yang; Dong Yu; |
| 113 | Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we examine how code serves as a structured medium for enhancing reasoning – providing verifiable execution paths, enforcing logical decomposition, and enabling runtime validation, and how advances in reasoning have transformed code intelligence from basic completion to sophisticated agent – enabling models to tackle complex software engineering tasks through deliberate planning and systematic debugging. |
Dayu Yang; Tianyang Liu; Daoan Zhang; Antoine Simoulin; Xiaoyi Liu; Yuwei Cao; Zhaopu Teng; Xin Qian; Grey Yang; Jiebo Luo; Julian McAuley; |
| 114 | SAEs Are Good for Steering – If You Select The Right Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we draw a distinction between two types of features: input features, which mainly capture patterns in the model’s input, and output features, those that have a human-understandable effect on the model’s output. |
Dana Arad; Aaron Mueller; Yonatan Belinkov; |
| 115 | ConsistentChat: Building Skeleton-Guided Consistent Multi-Turn Dialogues for Large Language Models from Scratch Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current instruction data synthesis methods primarily focus on single-turn instructions and often neglect cross-turn coherence, resulting in context drift and reduced task completion rates in extended conversations. To address this limitation, we propose Skeleton-Guided Multi-Turn Dialogue Generation, a framework that constrains multi-turn instruction synthesis by explicitly modeling human conversational intent. |
Jiawei Chen; Xinyan Guan; Qianhao Yuan; Mo Guozhao; Weixiang Zhou; Yaojie Lu; Hongyu Lin; Ben He; Le Sun; Xianpei Han; |
| 116 | VerIF: Verification Engineering for Reinforcement Learning in Instruction Following Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore the verification challenge in RL for instruction following and propose VerIF, a verification method that combines rule-based code verification with LLM-based verification from a large reasoning model (e. g. , QwQ-32B). |
Hao Peng; Yunjia Qi; Xiaozhi Wang; Bin Xu; Lei Hou; Juanzi Li; |
| 117 | A Head to Predict and A Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce pre-trained UQ heads: supervised auxiliary modules for LLMs that substantially enhance their ability to capture uncertainty compared to unsupervised UQ methods. |
Artem Shelmanov; Ekaterina Fadeeva; Akim Tsvigun; Ivan Tsvigun; Zhuohan Xie; Igor Kiselev; Nico Daheim; Caiqi Zhang; Artem Vazhentsev; Mrinmaya Sachan; Preslav Nakov; Timothy Baldwin; |
| 118 | Seeing Is Believing, But How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct a comprehensive evaluation of verbalized confidence in VLMs, spanning three model categories, four task domains, and three evaluation scenarios. |
Weihao Xuan; Qingcheng Zeng; Heli Qi; Junjue Wang; Naoto Yokoya; |
| 119 | Towards A Holistic and Automated Evaluation Framework for Multi-Level Comprehension of LLMs in Book-Length Contexts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce HAMLET, a holistic and automated framework for evaluating the long-context comprehension of large language models (LLMs). |
Yuho Lee; Jiaqi Deng; Nicole Hee-Yeon Kim; Hyangsuk Min; Taewon Yun; Minjeong Ban; Kim Yul; Hwanjun Song; |
| 120 | Re-Align: Aligning Vision Language Models Via Retrieval-Augmented Direct Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Re-Align, a novel alignment framework that leverages image retrieval to construct a dual-preference dataset, effectively incorporating both textual and visual preference signals. |
Shuo Xing; Peiran Li; Yuping Wang; Ruizheng Bai; Yueqi Wang; Chan-Wei Hu; Chengxuan Qian; Huaxiu Yao; Zhengzhong Tu; |
| 121 | MoSEs: Uncertainty-Aware AI-Generated Text Detection Via Mixture of Stylistics Experts with Conditional Thresholds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Mixture of Stylistic Experts (MoSEs) framework that enables stylistics-aware uncertainty quantification through conditional threshold estimation. |
Junxi Wu; Jinpeng Wang; Zheng Liu; Bin Chen; Dongjian Hu; Hao Wu; Shu-Tao Xia; |
| 122 | Selective Preference Optimization Via Token-Level Reward Function Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Selective Preference Optimization (SePO), a novel selective alignment strategy that centers on efficient key token selection without requiring strong, fine-grained supervision signals. |
Kailai Yang; Zhiwei Liu; Qianqian Xie; Jimin Huang; Erxue Min; Sophia Ananiadou; |
| 123 | Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Sequential-NIAH, a benchmark specifically designed to evaluate the capability of LLMs to extract sequential information items (known as needles) from long contexts. |
Yifei Yu; Qian-Wen Zhang; Lingfeng Qiao; Di Yin; Fang Li; Jie Wang; Chen Zeng Xi; Suncong Zheng; Xiaolong Liang; Xing Sun; |
| 124 | RAVEN++: Pinpointing Fine-Grained Violations in Advertisement Videos with Active Reinforcement Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent advancements, such as the RAVEN model, have improved coarse-grained violation detection, critical gaps persist in fine-grained understanding, explainability, and generalization. To address these limitations, we propose RAVEN++, a novel framework that introduces three key innovations: 1) Active Reinforcement Learning (RL), which dynamically adapts training to samples of varying difficulty; 2) Fine-Grained Violation Understanding, achieved through hierarchical reward functions and reasoning distillation; and 3) Progressive Multi-Stage Training, which systematically combines knowledge injection, curriculum-based passive RL, and active RL. |
Deyi Ji; Yuekui Yang; Liqun Liu; Peng Shu; Haiyang Wu; Shaogang Tang; Xudong Chen; Shaoping Ma; Tianrun Chen; Lanyun Zhu; |
| 125 | Retracing The Past: LLMs Emit Training Data When They Get Lost Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Confusion-Inducing Attacks (CIA), a principled framework for extracting memorized data by systematically maximizing model uncertainty. |
Myeongseob Ko; Nikhil Reddy Billa; Adam Nguyen; Charles Fleming; Ming Jin; Ruoxi Jia; |
| 126 | Deep Associations, High Creativity: A Simple Yet Effective Metric for Evaluating Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Drawing inspiration from human creativity assessment, we propose PACE, asking LLMs to generate Parallel Chains of Associations to Evaluate their creativity. |
Ziliang Qiu; Renfen Hu; |
| 127 | PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Many existing methods for these tasks either perform task-level verification without considering constraints or apply inference-time algorithms without adapting to instance-level complexity. To address these limitations, we propose PlanGEN, a model-agnostic and easily scalable agent framework with three key components: constraint, verification, and selection agents. |
Mihir Parmar; Xin Liu; Palash Goyal; Yanfei Chen; Long Le; Swaroop Mishra; Hossein Mobahi; Jindong Gu; Zifeng Wang; Hootan Nakhost; Chitta Baral; Chen-Yu Lee; Tomas Pfister; Hamid Palangi; |
| 128 | Efficient Model Development Through Fine-tuning Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method that transfers fine-tuning updates across model versions. |
Pin-Jie Lin; Rishab Balasubramanian; Fengyuan Liu; Nikhil Kandpal; Tu Vu; |
| 129 | Beyond Input Activations: Identifying Influential Latents By Gradient Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work is built on two key hypotheses: (1) activated latents do not contribute equally to the construction of the model’s output, and (2) only latents with high influence are effective for model steering. To validate these hypotheses, we propose Gradient Sparse Autoencoder (GradSAE), a simple yet effective method that identifies the most influential latents by incorporating output-side gradient information. |
Dong Shu; Xuansheng Wu; Haiyan Zhao; Mengnan Du; Ninghao Liu; |
| 130 | SQLWOZ: A Realistic Task-Oriented Dialogue Dataset with SQL-Based Dialogue State Representation for Complex User Requirements Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this approach does not reflect real-life scenarios in which users may express complex constraints and preferences. To address this gap, in this paper, we propose SQLWOZ, a novel TOD dataset designed to capture complex, real-world user requirements. |
Heng-Da Xu; Xian-Ling Mao; Fanshu Sun; Tian-Yi Che; Cheng-Xin Xin; Heyan Huang; |
| 131 | Mixing Inference-time Experts for Enhancing LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods in this paradigm are limited to using only a single expert and cannot improve upon multiple reasoning aspects. To address this, we propose MIXIE, a novel inference-time expert-mixing framework that dynamically determines mixing proportions for each expert, enabling contextualized and flexible fusion. |
Soumya Sanyal; Tianyi Xiao; Xiang Ren; |
| 132 | How Persuasive Is Your Context? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we introduce targeted persuasion score (TPS), designed to quantify how persuasive a given context is to an LM where persuasion is operationalized as the ability of the context to alter the LM’s answer to the question. |
Tu Nguyen; Kevin Du; Alexander Miserlis Hoyle; Ryan Cotterell; |
| 133 | AdaptThink: Reasoning Models Can Learn When to Think Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we first demonstrate that NoThinking, which prompts the reasoning model to skip thinking and directly generate the final solution, is a better choice for relatively simple tasks in terms of both performance and efficiency. Motivated by this, we propose AdaptThink, a novel RL algorithm to teach reasoning models to choose the optimal thinking mode adaptively based on problem difficulty. |
Jiajie Zhang; Nianyi Lin; Lei Hou; Ling Feng; Juanzi Li; |
| 134 | A Systematic Survey of Automatic Prompt Optimization Techniques Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate this, Automatic Prompt Optimization (APO) techniques have recently emerged that use various automated techniques to help improve the performance of LLMs on various tasks. In this paper, we present a comprehensive survey summarizing the current progress and remaining challenges in this field. |
Kiran Ramnath; Kang Zhou; Sheng Guan; Soumya Smruti Mishra; Xuan Qi; Zhengyuan Shen; Shuai Wang; Sangmin Woo; Sullam Jeoung; Yawei Wang; Haozhu Wang; Han Ding; Yuzhe Lu; Zhichao Xu; Yun Zhou; Balasubramaniam Srinivasan; Qiaojing Yan; Yueyan Chen; Haibo Ding; Panpan Xu; Lin Lee Cheong; |
| 135 | Training Compute-optimal Transformer Encoder Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first comprehensive empirical investigation of compute-optimal pretraining for encoder transformers using the Masked Language Modeling (MLM) objective. |
Megi Dervishi; Alexandre Allauzen; Gabriel Synnaeve; Yann LeCun; |
| 136 | CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through an analysis of reasoning circuits—the neural pathways LLMs use for knowledge-based inference, we find that current layer-localized KE approaches (e. g. , MEMIT, WISE), which edit only single or a few model layers, inadequately integrate updated knowledge into these reasoning pathways. To address this limitation, we present CaKE (Circuit-aware Knowledge Editing), a novel method that enhances the effective integration of updated knowledge in LLMs. |
Yunzhi Yao; Jizhan Fang; Jia-Chen Gu; Ningyu Zhang; Shumin Deng; Huajun Chen; Nanyun Peng; |
| 137 | BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing agents primarily focus on enhancing the accuracy of individual actions and often lack effective mechanisms for detecting and recovering from errors. To address these shortcomings, we propose the BacktrackAgent, a robust framework that incorporates a backtracking mechanism to improve task completion efficiency. |
Qinzhuo Wu; Pengzhi Gao; Wei Liu; Jian Luan; |
| 138 | Enhancing LLM Text Detection with Retrieved Contexts and Logits Distribution Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the distinction between the LLM-generated and human-written texts may rely on only a few tokens due to the short length or insufficient information in some texts, leading to minimal and hard-to-detect differences in logit distributions. To address this, we propose HALO, an LLM-based detection method that leverages external text corpora to evaluate the difference in the logit distribution of input text under retrieved human-written and LLM-rewritten contexts. |
Zhaoheng Huang; Yutao Zhu; Ji-Rong Wen; Zhicheng Dou; |
| 139 | LATTE: Learning to Think with Vision Specialists Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose LATTE, a family of vision-language models that have LeArned to Think wiTh vision spEcialists. |
Zixian Ma; Jianguo Zhang; Zhiwei Liu; Jieyu Zhang; Juntao Tan; Manli Shu; Juan Carlos Niebles; Shelby Heinecke; Huan Wang; Caiming Xiong; Ranjay Krishna; Silvio Savarese; |
| 140 | TopicAttack: An Indirect Prompt Injection Attack Via Topic Transition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the limitations of existing attack methods, we propose **TopicAttack**, which prompts the LLM to generate a fabricated conversational transition prompt that gradually shifts the topic toward the injected instruction, making the injection smoother and enhancing the plausibility and success of the attack. |
Yulin Chen; Haoran Li; Yuexin Li; Yue Liu; Yangqiu Song; Bryan Hooi; |
| 141 | S3: You Don’t Need That Much Data to Train A Search Agent Via RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose **s3**, a lightweight, model-agnostic framework that decouples the searcher from the generator and trains the searcher using a Gain Beyond RAG reward: the improvement in generation accuracy over naïve RAG. |
Pengcheng Jiang; Xueqiang Xu; Jiacheng Lin; Jinfeng Xiao; Zifeng Wang; Jimeng Sun; Jiawei Han; |
| 142 | SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present SwiftKV, a novel model transformation and distillation procedure targeted at reducing the prefill compute (in FLOPs) of prompt tokens while preserving high generation quality. |
Aurick Qiao; Zhewei Yao; Samyam Rajbhandari; Yuxiong He; |
| 143 | Graders Should Cheat: Privileged Information Enables Expert-Level Automated Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For instance, today’s LMs struggle on graduate-level physics and Olympiad-level math, making them unreliable graders in these domains. We show that providing *privileged information* – such as ground-truth solutions or problem-specific guidelines – improves automated evaluations on such frontier problems. |
Jin Peng Zhou; Séb Arnold; Nan Ding; Kilian Q Weinberger; Nan Hua; Fei Sha; |
| 144 | Structuring Radiology Reports: Challenging LLMs with Lightweight Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While large language models (LLMs) have shown strong capabilities in reformatting clinical text, their high computational requirements, lack of transparency, and data privacy concerns hinder practical deployment. To address these challenges, we explore lightweight encoder-decoder models (<300M parameters)—specifically T5 and BERT2BERT—for structuring radiology reports from the MIMIC-CXR and CheXpert Plus datasets. |
Johannes Moll; Louisa Fay; Asfandyar Azhar; Sophie Ostmeier; Sergios Gatidis; Tim C. Lueth; Curtis Langlotz; Jean-Benoit Delbrouck; |
| 145 | Multi-Domain Explainability of Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a fully automated method for generating local and global concept-based explanations of preferences across multiple domains. |
Nitay Calderon; Liat Ein-Dor; Roi Reichart; |
| 146 | LinguaLens: Towards Interpreting Linguistic Mechanisms of Large Language Models Via Sparse Auto-Encoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose LinguaLens, a systematic and comprehensive framework for analyzing the linguistic mechanisms of large language models, based on Sparse Auto-Encoders (SAEs). |
Yi Jing; Zijun Yao; Hongzhu Guo; Lingxu Ran; Xiaozhi Wang; Lei Hou; Juanzi Li; |
| 147 | NILE: Internal Consistency Alignment in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the effective integration and balancing of the internal knowledge of LLMs, acquired during pre-training, with existing IFT datasets remains a largely underexplored area of research. To address this gap, this work introduces NILE, a novel framework to optimize the effectiveness of IFT by adjusting IFT datasets through carefully aligning the world and internal knowledge. |
Minda Hu; Qiyuan Zhang; Yufei Wang; Bowei He; Hongru Wang; Jingyan Zhou; Liangyou Li; Yasheng Wang; Chen Ma; Irwin King; |
| 148 | SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel supervised steering approach that operates in sparse, interpretable representation spaces. |
Zirui He; Mingyu Jin; Bo Shen; Ali Payani; Yongfeng Zhang; Mengnan Du; |
| 149 | RaDeR: Reasoning-aware Dense Retrieval Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose RaDeR, a set of reasoning-based dense retrieval models trained with data derived from mathematical problem solving using large language models (LLMs). |
Debrup Das; Sam O’Nuallain; Razieh Rahimi; |
| 150 | Studying The Role of Input-Neighbor Overlap in Retrieval-Augmented Language Models Training Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we systematically investigate how varying levels of query–context overlap affect model performance during both training and inference. |
Ehsan Doostmohammadi; Marco Kuhlmann; |
| 151 | Finetuning LLMs for Human Behavior Prediction in Social Science Experiments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models (LLMs) offer a powerful opportunity to simulate the results of social science experiments. In this work, we demonstrate that finetuning LLMs directly on individual-level responses from past experiments meaningfully improves the accuracy of such simulations. |
Akaash Kolluri; Shengguang Wu; Joon Sung Park; Michael S. Bernstein; |
| 152 | LLMs Cannot Spot Math Errors, Even When Allowed to Peek Into The Solution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To that end, we propose an approach that generates an intermediate corrected student solution, aligning more closely with the original student’s solution, which helps improve performance. |
Kv Aditya Srivatsa; Kaushal Kumar Maurya; Ekaterina Kochmar; |
| 153 | Balanced Multi-Factor In-Context Learning for Multilingual Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches address these factors independently, without explicitly disentangling their combined impact, leaving optimal example selection underexplored. To address this gap, we propose balanced multi-factor ICL (BMF-ICL), a method that quantifies and optimally balances these factors for improved example selection. |
Masahiro Kaneko; Alham Fikri Aji; Timothy Baldwin; |
| 154 | Investigating How Pre-training Data Leakage Affects Models’ Reproduction and Detection Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its criticality, existing studies do not examine how leaked instances in the pre-training data influence LLMs’ output and detection capabilities. In this paper, we conduct an experimental survey to elucidate the relationship between data leakage in training datasets and its effects on the generation and detection by LLMs. |
Masahiro Kaneko; Timothy Baldwin; |
| 155 | Social Genome: Grounded Social Reasoning Abilities of Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Social Genome, the first benchmark for fine-grained, grounded social reasoning abilities of multimodal models. |
Leena Mathur; Marian Qian; Paul Pu Liang; Louis-Philippe Morency; |
| 156 | OmniThink: Expanding Knowledge Boundaries in Machine Writing Through Thinking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, vanilla-retrieved information tends to lack depth, novelty, and suffers from redundancy, which negatively impacts the quality of generated articles, leading to shallow, unoriginal, and repetitive outputs. To address these issues, we propose OmniThink, a slow-thinking machine writing framework that emulates the human-like process of iterative expansion and reflection. |
Zekun Xi; Wenbiao Yin; Jizhan Fang; Jialong Wu; Runnan Fang; Yong Jiang; Pengjun Xie; Fei Huang; Huajun Chen; Ningyu Zhang; |
| 157 | START: Self-taught Reasoner with Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Integrating computational tools with LRMs remains challenging, particularly in activating and enhancing models’ tool-use capabilities without compromising their reasoning strengths. We address these challenges through START (Self-taught Reasoner with Tools), introducing two key innovations: (1) Hint-infer, a training-free approach that activates LRMs’ latent tool-use capabilities through artificial hints, enabling test-time performance scaling; (2) Hint-RFT, a self-training framework that enables models to learn effective tool utilization through diverse hint patterns and rejection-based data synthesis. |
Chengpeng Li; Mingfeng Xue; Zhenru Zhang; Jiaxi Yang; Beichen Zhang; Bowen Yu; Binyuan Hui; Junyang Lin; Xiang Wang; Dayiheng Liu; |
| 158 | Towards Infinite-Long Prefix in Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: They are empirically efficient and effective, matching the performance of full parameter fine-tuning, but the theoretical understandings are limited. In this paper, we aim to address this limitation by studying their ability from the perspective of prefix length. |
Yingyu Liang; Zhenmei Shi; Zhao Song; Chiwun Yang; |
| 159 | Understanding Subword Compositionality of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a comprehensive set of experiments to probe how LLMs compose subword information, focusing on three key aspects: structural similarity, semantic decomposability, and form retention. |
Qiwei Peng; Yekun Chai; Anders Søgaard; |
| 160 | Debiasing Multilingual LLMs in Cross-lingual Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous studies have evaluated their cross-lingual transferability by directly applying these methods to LLM representations, revealing their limited effectiveness across languages. In this work, we therefore propose to perform debiasing in a joint latent space rather than directly on LLM representations. |
Qiwei Peng; Guimin Hu; Yekun Chai; Anders Søgaard; |
| 161 | Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce three guesstimation datasets: MARBLES, FUTURE, and ELECPRED, spanning physical estimation (e. g. , how many marbles fit in a cup) to abstract predictions (e. g. , the 2024 U. S. presidential election). |
Yun-Shiuan Chuang; Sameer Narendran; Nikunj Harlalka; Alexander Cheung; Sizhe Gao; Siddharth Suresh; Junjie Hu; Timothy T. Rogers; |
| 162 | Benchmarking Large Language Models Under Data Contamination: A Survey from Static to Dynamic Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct an in-depth analysis of existing *static* and *dynamic* benchmarks for evaluating LLMs. |
Simin Chen; Yiming Chen; Zexin Li; Yifan Jiang; Zhongwei Wan; Yixin He; Dezhi Ran; Tianle Gu; Haizhou Li; Tao Xie; Baishakhi Ray; |
| 163 | Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, language models exhibit significant performance degradation with semantically equivalent but differently phrased prompts, and existing solutions either depend on trial-and-error prompt engineering or require computationally expensive inference-time algorithms. In this study, built on the key insight that worst-case prompts exhibit a drift in embedding space, we present Latent Adversarial Paraphrasing (LAP), a dual-loop adversarial framework that optimizes a trainable perturbation as “latent continuous paraphrase” and language model performance on these perturbations iteratively. |
Tingchen Fu; Fazl Barez; |
| 164 | RACCooN: Versatile Instructional Video Editing with Auto-Generated Narratives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes RACCooN, a versatile and user-friendly video-to-paragraph-to-video editing method, supporting diverse video editing capabilities, such as removal, addition, and modification, through a unified pipeline. |
Jaehong Yoon; Shoubin Yu; Mohit Bansal; |
| 165 | CODI: Compressing Chain-of-Thought Into Continuous Space Via Self-Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce CODI (Continuous Chain-of-Thought via Self-Distillation), a novel training framework that effectively compresses natural language CoT into continuous space. |
Zhenyi Shen; Hanqi Yan; Linhai Zhang; Zhanghao Hu; Yali Du; Yulan He; |
| 166 | Draft Model Knows When to Stop: Self-Verification Speculative Decoding for Long-Form Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through both theoretical and empirical estimation, we establish that the discrepancy between the draft and target models can be approximated by the draft model’s prediction entropy: a high entropy indicates a low acceptance rate of draft tokens, and vice versa. Based on this insight, we propose SVIP: Self-Verification Length Policy for Long-Context Speculative Decoding, which is a training-free dynamic length policy for speculative decoding systems that adaptively determines the lengths of draft sequences by referring to the draft entropy. |
Ziyin Zhang; Jiahao Xu; Tian Liang; Xingyu Chen; Zhiwei He; Rui Wang; Zhaopeng Tu; |
| 167 | SAND: Boosting LLM Agents with Self-Taught Action Deliberation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, without reasoning and comparing over alternatives actions, LLM agents finetuned with these methods may over-commit towards seemingly plausible but suboptimal actions due to limited action space exploration. To address this, in this paper we propose Self-taught ActioN Deliberation (SAND) framework, enabling LLM agents to explicitly deliberate over candidate actions before committing to one. |
Yu Xia; Yiran Jenny Shen; Junda Wu; Tong Yu; Sungchul Kim; Ryan A. Rossi; Lina Yao; Julian McAuley; |
| 168 | Biased Tales: Cultural and Topic Bias in Generating Children’s Stories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As parents increasingly rely on large language models (LLMs) to craft bedtime stories, the presence of cultural and gender stereotypes in these narratives raises significant concerns. To address this issue, we present Biased Tales, a comprehensive dataset designed to analyze how biases influence protagonists’ attributes and story elements in LLM-generated stories. |
Donya Rooein; Vilém Zouhar; Debora Nozza; Dirk Hovy; |
| 169 | Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at A Time Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce STIM, a novel framework for Source-aware Token-level Identification of Memorization, which attributes each token in a reasoning chain to one of multiple memorization sources – local, mid-range, or long-range – based on their statistical co-occurrence with the token in the pretraining corpus. |
Huihan Li; You Chen; Siyuan Wang; Yixin He; Ninareh Mehrabi; Rahul Gupta; Xiang Ren; |
| 170 | When Words Smile: Generating Diverse Emotional Facial Expressions from Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent advances in talking head synthesis have achieved impressive results in lip synchronization, they tend to overlook the rich and dynamic nature of facial expressions. To fill this critical gap, we introduce an end-to-end text-to-expression model that explicitly focuses on emotional dynamics. |
Haidong Xu; Meishan Zhang; Hao Ju; Zhedong Zheng; Erik Cambria; Min Zhang; Hao Fei; |
| 171 | Recall with Reasoning: Chain-of-Thought Distillation for Mamba’s Long-Context Memory and Extrapolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work explores unlocking Mamba’s long-context memory ability by a simple-yet-effective method, Recall with Reasoning (RwR), by distilling chain-of-thought (CoT) summarization from a teacher model. |
Jun-Yu Ma; Tianqing Fang; Zhisong Zhang; Hongming Zhang; Haitao Mi; Dong Yu; |
| 172 | HS-STaR: Hierarchical Sampling for Self-Taught Reasoners Via Difficulty Estimation and Budget Reallocation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct an empirical study and find that problems near the boundary of the LLM’s reasoning capability offer significantly greater learning utility than both easy and overly difficult ones. |
Feng Xiong; Hongling Xu; Yifei Wang; Runxi Cheng; Yong Wang; Xiangxiang Chu; |
| 173 | NOVA-63: Native Omni-lingual Versatile Assessments of 63 Disciplines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate those shortcomings, we introduce NOVA-63 (Native Omni-lingual Versatile Assessments of 63 Disciplines), a comprehensive, difficult multilingual benchmark featuring 93,536 questions sourced from native speakers across 14 languages and 63 academic disciplines. |
Jinyang Zhang; Kexin Yang; Yu Wan; Muyang Ye; Baosong Yang; Fei Huang; Junyang Lin; Dayiheng Liu; |
| 174 | From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing automated metrics often fail to align with real-world physician preferences. To address this, we propose a pipeline that systematically distills real user feedback into structured checklists for note evaluation. |
Karen Zhou; John Michael Giorgi; Pranav Mani; Peng Xu; Davis Liang; Chenhao Tan; |
| 175 | UNCLE: Benchmarking Uncertainty Expressions in Long-Form Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Along with UNCLE, we propose a suite of new metrics to assess the models’ capabilities to selectively express uncertainty. |
Ruihan Yang; Caiqi Zhang; Zhisong Zhang; Xinting Huang; Dong Yu; Nigel Collier; Deqing Yang; |
| 176 | SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: After thorough investigation, we identify a safety aha moment that can activate safety reasoning and lead to a safe response. |
Kaiwen Zhou; Xuandong Zhao; Jayanth Srinivasa; Gaowen Liu; Aosong Feng; Dawn Song; Xin Eric Wang; |
| 177 | Investigating Value-Reasoning Reliability in Small Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although small Large Language models (sLLMs) have been widely deployed in practical applications, little attention has been paid to their value-reasoning abilities, particularly in terms of reasoning reliability. To address this gap, we propose a systematic evaluation framework for assessing the Value-Reasoning Reliability of sLLMs. |
Xia Du; Shuhan Sun; Pengyuan Liu; Dong Yu; |
| 178 | UniDebugger: Hierarchical Multi-Agent Framework for Unified Software Debugging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first end-to-end framework, UniDebugger, for unified debugging through multi-agent synergy. |
Cheryl Lee; Chunqiu Steven Xia; Longji Yang; Jen-tse Huang; Zhouruixing Zhu; Lingming Zhang; Michael R. Lyu; |
| 179 | RLHF Algorithms Ranked: An Extensive Evaluation Across Diverse Tasks, Rewards, and Hyperparameters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims to guide practitioners in selecting the most effective RLHF algorithm while promoting a culture of thorough and impartial benchmarking in the field. |
Lucas Spangher; Rama Kumar Pasumarthi; Nick Masiewicki; William F. Arnold; Aditi Kaushal; Dale Johnson; Peter Grabowski; Eugene Ie; |
| 180 | LeanK: Learnable K Cache Channel Pruning for Efficient Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose LeanK, a learning-based method that prunes unimportant key (K) cache channels by leveraging static channel sparsity. |
Yike Zhang; Zhiyuan He; Huiqiang Jiang; Chengruidong Zhang; Yuqing Yang; Jianyong Wang; Lili Qiu; |
| 181 | Towards AI-Assisted Psychotherapy: Emotion-Guided Generative Interventions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models (LLMs) hold promise for therapeutic interventions, yet most existing datasets rely solely on text, overlooking non-verbal emotional cues essential to real-world therapy. To address this, we introduce a multimodal dataset of 1,441 publicly sourced therapy session videos containing both dialogue and non-verbal signals such as facial expressions and vocal tone. |
Kilichbek Haydarov; Youssef Mohamed; Emilio Goldenhersch; Paul OCallaghan; Li-jia Li; Mohamed Elhoseiny; |
| 182 | Not Your Typical Government Tipline: LLM-Assisted Routing of Environmental Protection Agency Citizen Tips Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through a case study, we demonstrate how advances in large language models can be utilized to support overburdened agencies with limited capacities. |
Sharanya Majumder; Zehua Li; Derek Ouyang; Kit T Rodolfa; Elena Eneva; Julian Nyarko; Daniel E. Ho; |
| 183 | Firewall Routing: Blocking Leads to Better Hybrid Inference for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The rapid advancement of Large Language Models (LLMs) has significantly enhanced performance across various natural language processing (NLP) tasks, yet the high computational costs and latency associated with deploying such models continue to pose critical bottlenecks, limiting their broader applicability. To mitigate these challenges, we propose a dynamic hybrid inference framework, Firewall Routing, which efficiently selects between a strong and a weak LLMs based on the complexity of the query. |
Runyu Peng; Yunhua Zhou; Kai Lv; Yang Gao; Qipeng Guo; Xipeng Qiu; |
| 184 | Large Language Models Have Intrinsic Meta-Cognition, But Need A Good Lens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose AutoMeco, an Automated Meta-cognition Evaluation framework for benchmarking the existing lenses. |
Ziyang Ma; Qingyue Yuan; Zhenglin Wang; Deyu Zhou; |
| 185 | Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work, inspired by the deep thinking paradigm of DeepSeek-R1, utilizes a steering technique to enhance the reasoning ability of an LLM without external datasets. |
Zihao Li; Xu Wang; Yuzhe Yang; Ziyu Yao; Haoyi Xiong; Mengnan Du; |
| 186 | Single LLM, Multiple Roles: A Unified Retrieval-Augmented Generation Framework Using Role-Specific Token Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing studies have optimized retrieval-augmented generation (RAG) across various sub-tasks, such as query understanding and retrieval refinement, but integrating these optimizations into a unified framework remains challenging. To tackle this problem, this work proposes RoleRAG, a unified RAG framework that achieves efficient multi-task processing through role-specific token optimization. |
Yutao Zhu; Jiajie Jin; Hongjin Qian; Zheng Liu; Zhicheng Dou; Ji-Rong Wen; |
| 187 | How Is LLM Reasoning Distracted By Irrelevant Context? An Analysis Using A Controlled Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Grade School Math with Distracting Context (GSM-DC), a synthetic benchmark to evaluate Large Language Models’ (LLMs) reasoning robustness against systematically controlled irrelevant context (IC). |
Minglai Yang; Ethan Huang; Liang Zhang; Mihai Surdeanu; William Yang Wang; Liangming Pan; |
| 188 | Identifying Unlearned Data in LLMs Via Membership Inference Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such inferences can still pose significant privacy risks, as they may reveal the sensitive data in the model’s training set and the internal policies of model creators. To quantify such privacy risks, we propose a new evaluation framework **Forensic Unlearning Membership Attacks (FUMA)**, drawing on principles from membership inference attacks. |
Advit Deepak; Megan Mou; Jing Huang; Diyi Yang; |
| 189 | COCO-Tree: Compositional Hierarchical Concept Trees for Enhanced Reasoning in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present “COCO-Tree” – a novel approach that augments VLM outputs with carefully designed neurosymbolic concept trees learned from LLMs to improve VLM’s linguistic reasoning. |
Sanchit Sinha; Guangzhi Xiong; Aidong Zhang; |
| 190 | Image Difference Captioning Via Adversarial Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these limitations, we propose an adversarial direct preference optimization (ADPO) framework for IDC, which formulates IDC as a preference optimization problem under the Bradley-Terry-Luce model, directly aligning the captioning policy with pairwise difference preferences via Direct Preference Optimization (DPO). To model more accurate and diverse IDC preferences, we introduce an adversarially trained hard negative retriever that selects counterfactual captions, This results in a minimax optimization problem, which we solve via policy-gradient reinforcement learning, enabling the policy and retriever to improve jointly. |
Zihan Huang; Junda Wu; Rohan Surana; Tong Yu; David Arbour; Ritwik Sinha; Julian McAuley; |
| 191 | User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy As A Learning Signal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Asking for direct user feedback can be disruptive; thus, we study harvesting implicit user feedback from user-LM interaction logs. |
Yuhan Liu; Michael JQ Zhang; Eunsol Choi; |
| 192 | Group Preference Alignment: Customizing LLM Responses from In-Situ Conversations Only When Needed Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: LLMs often fail to meet specialized needs of distinct user groups due to their one-size-fits-all approach, and there is limited understanding of what personalization each group expects. To address this, we propose GPA a group-aware personalization framework that captures context-specific preference variations and steers LLMs accordingly. |
Ishani Mondal; Jack W. Stokes; Sujay Kumar Jauhar; Longqi Yang; Mengting Wan; Xiaofeng Xu; Xia Song; Jordan Lee Boyd-Graber; Jennifer Neville; |
| 193 | CARD: Cross-modal Agent Framework for Generative and Editable Residential Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the CARD framework, which leverages a system of specialized cross-modal agents to adapt to complex open-world environments. |
Pengyu Zeng; Jun Yin; Miao Zhang; Yuqin Dai; Jizhizi Li; ZhanXiang Jin; Shuai Lu; |
| 194 | UnifiedVisual: A Framework for Constructing Unified Vision-Language Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing datasets typically address understanding and generation in isolation, thereby limiting the performance of unified VLLMs. To bridge this critical gap, we introduce a novel dataset construction framework, UnifiedVisual, and present UnifiedVisual-240K, a high-quality dataset meticulously designed to facilitate mutual enhancement between multimodal understanding and generation. |
Pengyu Wang; Shaojun Zhou; Chenkun Tan; Xinghao Wang; Wei Huang; Zhen Ye; Zhaowei Li; Botian Jiang; Dong Zhang; Xipeng Qiu; |
| 195 | Extracting and Combining Abilities For Building Multi-lingual Ability-enhanced Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve it, we propose a **M**ulti-lingual **A**bilities **E**xtraction and **C**ombination approach, named as **MAEC**. |
Zhipeng Chen; Kun Zhou; Liang Song; Xin Zhao; Bingning Wang; Weipeng Chen; Ji-Rong Wen; |
| 196 | ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these limitations, this work begins by framing two key patterns of redundant reflection in LRMs—Confidence Deficit, wherein the model reflects on correct intermediate steps, and Termination Delay, where reflection continues after a verified, confident answer—through a confidence-guided perspective. Based on this, we introduce ConCISE (Confidence-guided Compression In Step-by-step Efficient Reasoning), a framework designed to generate concise reasoning chains, integrating Confidence Injection to boost reasoning confidence, and Early Stopping to terminate reasoning when confidence is sufficient. |
Ziqing Qiao; Yongheng Deng; Jiali Zeng; Dong Wang; Lai Wei; Guanbo Wang; Fandong Meng; Jie Zhou; Ju Ren; Yaoxue Zhang; |
| 197 | Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While numerous benchmarks have emerged to assess LALMs’ performance, they remain fragmented and lack a structured taxonomy. To bridge this gap, we conduct a comprehensive survey and propose a systematic taxonomy for LALM evaluations, categorizing them into four dimensions based on their objectives: (1) General Auditory Awareness and Processing, (2) Knowledge and Reasoning, (3) Dialogue-oriented Ability, and (4) Fairness, Safety, and Trustworthiness. |
Chih-Kai Yang; Neo S. Ho; Hung-yi Lee; |
| 198 | Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, results reveal a concerning trend of increasing sycophancy in newer open-source models. To address this, we introduce Holistic DPO, a training approach balancing positive and negative persuasion examples. |
Bryan Chen Zhengyu Tan; Daniel Wai Kit Chin; Zhengyuan Liu; Nancy F. Chen; Roy Ka-Wei Lee; |
| 199 | EvolveSearch: An Iterative Self-Evolving Search Agent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current mainstream approaches for enabling LLM web search proficiency face significant challenges: supervised fine-tuning struggles with data production in open-search domains, while RL converges quickly, limiting their data utilization efficiency. To address these issues, we propose EvolveSearch, a novel iterative self-evolution framework that combines SFT and RL to enhance agentic web search capabilities without any external human-annotated reasoning data. |
Ding-Chu Zhang; Yida Zhao; Jialong Wu; Liwen Zhang; Baixuan Li; Wenbiao Yin; Yong Jiang; Yu-Feng Li; Kewei Tu; Pengjun Xie; Fei Huang; |
| 200 | On Pruning State-Space LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Can these models be pruned to further reduce their computation costs? We adapt several pruning methods to the SSM structure, and apply them to four SSM-based LLMs across multiple tasks. |
Tamer Ghattas; Michael Hassid; Roy Schwartz; |
| 201 | NormXLogit: The Head-on-Top Never Lies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent advances in LLM interpretability show promise, many rely on complex, model-specific methods with high computational costs. To address these limitations, we propose NormXLogit, a novel technique for assessing the significance of individual input tokens. |
Sina Abbasi; Mohammad Reza Modarres; Mohammad Taher Pilehvar; |
| 202 | TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current Retrieval-Augmented Generation (RAG) systems concatenate and process numerous retrieved document chunks for prefill which requires a large volume of computation, therefore leading to significant latency in time-to-first-token (TTFT). To reduce the computation overhead as well as TTFT, we introduce TurboRAG, a hybrid offline–online paradigm that (i) pre‐computes chunk‐level key-value (KV) caches, (ii) stitches them together at inference time using independent–attention and reordered‐RoPE techniques, and (iii) preserves answer quality without changing the model architecture. |
Songshuo Lu; Hua Wang; Yutian Rong; Zhi Chen; Yaohua Tang; |
| 203 | Orthogonal Finetuning Made Scalable Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify the core computational bottleneck in OFT as its weight-centric implementation, which relies on costly matrix-matrix multiplications with cubic complexity. To overcome this, we propose OFTv2, an input-centric reformulation that instead uses matrix-vector multiplications (i. e. , matrix-free computation), reducing the computational cost to quadratic. |
Zeju Qiu; Weiyang Liu; Adrian Weller; Bernhard Schölkopf; |
| 204 | Deploying Tiny LVLM Judges for Real-World Evaluation of Chart Models: Lessons Learned and Best Practices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, tiny models (<=2B parameters) still perform poorly as judges, limiting their real-world use in resource-constrained settings. To address this, we propose two approaches to ensure cost‐efficient evaluation: (i) multi-criteria prompting, which combines separate evaluation criteria into a single query, and (ii) domain‐adaptive transfer learning, in which we fine‐tune a 2B‐parameter VLM on synthetic judgments in a chart dataset to create the ChartJudge. |
Md Tahmid Rahman Laskar; Mohammed Saidul Islam; Ridwan Mahbub; Mizanur Rahman; Amran Bhuiyan; Israt Jahan; Mir Tafseer Nayeem; Shafiq Joty; Enamul Hoque; Jimmy Huang; |
| 205 | Semantic Agreement Enables Efficient Open-Ended LLM Cascades Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Cascade systems for open-ended text generation face a fundamental challenge: determining output reliability when generation quality lies on a continuous spectrum, often with multiple valid responses. To address this, we propose _semantic agreement_—meaning-level consensus between ensemble outputs—as a training-free signal for reliable deferral. |
Duncan Soiffer; Steven Kolawole; Virginia Smith; |
| 206 | Zero-shot Multimodal Document Retrieval Via Cross-modal Question Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most documents are private, either owned by individuals or confined within corporate silos, and current retrievers struggle when faced with unseen domains or languages. To address this gap, we introduce PREMIR, a simple yet effective framework that leverages the broad knowledge of an MLLM to generate cross-modal pre-questions (preQs) before retrieval. |
Yejin Choi; Jaewoo Park; Janghan Yoon; Saejin Kim; Jaehyun Jeon; Youngjae Yu; |
| 207 | OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces OmniEval, an omnidirectional and automatic RAG benchmark for the financial domain, featured by its multi-dimensional evaluation framework: First, we categorize RAG scenarios by five task classes and 16 financial topics, leading to a matrix-based structured assessment for RAG evaluation; Next, we leverage a multi-dimensional evaluation data generation method that integrates GPT-4-based automatic generation and human annotation approaches, achieving an 87. 47% acceptance ratio in human evaluations of generated instances; Further, we utilize a multi-stage evaluation pipeline to assess both retrieval and generation performance, resulting in an all-sided evaluation of the RAG pipeline. |
Shuting Wang; Jiejun Tan; Zhicheng Dou; Ji-Rong Wen; |
| 208 | Direct Judgement Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To meet the need for strong, generalized judge models, we explore training foundational judge models at large data scales (680K) with direct preference optimization (DPO). |
PeiFeng Wang; Austin Xu; Yilun Zhou; Caiming Xiong; Shafiq Joty; |
| 209 | RLAE: Reinforcement Learning-Assisted Ensemble for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose **R**einforcement **L**earning-**A**ssisted **E**nsemble for LLMs (RLAE), a novel framework that reformulates LLM ensemble through the lens of a Markov Decision Process (MDP). |
Yuqian Fu; Yuanheng Zhu; Jiajun Chai; Guojun Yin; Wei Lin; Qichao Zhang; Dongbin Zhao; |
| 210 | Alignment for Efficient Tool Calling of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a multi-objective alignment framework that combines probabilistic knowledge boundary estimation with dynamic decision-making, allowing LLMs to better assess when to invoke tools based on their confidence. |
Hongshen Xu; Zihan Wang; Zichen Zhu; Lei Pan; Xingyu Chen; Shuai Fan; Lu Chen; Kai Yu; |
| 211 | LoRACoE: Improving Large Language Model Via Composition-based LoRA Expert Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenges, we first delve into the parameter patterns in LoRA modules, revealing that there exists task-relevant parameters that are concentrated along the rank dimension of the LoRA parameters. Based on this, we redesign the construction of experts and propose the method LoRACoE (LoRA Composition of Experts). |
Guanyu Li; Zhiheng Xi; Zhihao Zhang; Boyang Hong; Tao Gui; Qi Zhang; Xuanjing Huang; |
| 212 | Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, they overlook established safety and privacy standards, leading to systemic risks for legal compliance. To address these gaps, we formulate safety and privacy issues into contextualized compliance problems following the Contextual Integrity (CI) theory. |
Wenbin Hu; Haoran Li; Huihao Jing; Qi Hu; Ziqian Zeng; Sirui Han; Xu Heli; Tianshu Chu; Peizhao Hu; Yangqiu Song; |
| 213 | MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, we lack a reliable, easy-to-use, and simple-to-run evaluation that reflects the pedagogical abilities of models. To fill this gap, we present MathTutorBench, an open-source benchmark for holistic tutoring model evaluation. |
Jakub Macina; Nico Daheim; Ido Hakimi; Manu Kapur; Iryna Gurevych; Mrinmaya Sachan; |
| 214 | Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate how large language models (LLMs) perform latent multi-hop reasoning in prompts like “Wolfgang Amadeus Mozart’s mother’s spouse is”. To analyze this process, we introduce logit flow, an interpretability method that traces how logits propagate across layers and positions toward the final prediction. |
Zeping Yu; Yonatan Belinkov; Sophia Ananiadou; |
| 215 | SWAN: An Efficient and Scalable Approach for Long-Context Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Overall, our work presents an effective approach for scaling language models to longer contexts in a robust and efficient manner. |
Krishna C Puvvada; Faisal Ladhak; Santiago Akle Serano; Cheng-Ping Hsieh; Shantanu Acharya; Somshubra Majumdar; Fei Jia; Samuel Kriman; Simeng Sun; Dima Rekesh; Boris Ginsburg; |
| 216 | Extending Automatic Machine Translation Evaluation to Book-Length Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SEGALE, an evaluation scheme that extends existing automatic metrics to long-document translation by treating documents as continuous text and applying sentence segmentation and alignment methods. |
Kuang-Da Wang; Shuoyang Ding; Chao-Han Huck Yang; Ping-Chun Hsieh; Wen-Chih Peng; Vitaly Lavrukhin; Boris Ginsburg; |
| 217 | Governance in Motion: Co-evolution of Constitutions and AI Models for Scalable Safety Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, LLMs can exhibit distributional drift during training, and static alignment mechanisms lack the capacity to adaptively correct misaligned behaviors as they emerge. To address this limitation, we develop a two-stage framework that enables dynamic and continuous alignment. |
Chenhao Huang; Ziyu Shen; Yicong Ren; Huiyuan Zheng; Jiazheng Zhang; Mingxu Chai; Ming Zhang; Shihan Dou; Fan Mo; Jie Shi; Tao Gui; Qi Zhang; Xuanjing Huang; |
| 218 | QSpec: Speculative Decoding with Complementary Quantization Schemes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose QSPEC, a novel quantization paradigm that decouples efficiency from quality by integrating two complementary schemes via speculative decoding: low-precision joint quantization for fast drafting and high-precision weight-only quantization for accurate verification. |
Juntao Zhao; Wenhao Lu; Sheng Wang; Lingpeng Kong; Chuan Wu; |
| 219 | A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we develop methodologies to assess and mitigate misgendering across 42 languages and dialects using a participatory-design approach to design effective and appropriate guardrails across all languages. |
Sunayana Sitaram; Adrian de Wynter; Isobel McCrum; Qilong Gu; Si-Qing Chen; |
| 220 | Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SelfReVision, a lightweight and scalable self-improvement framework for vision-language procedural planning. |
Chan Young Park; Jillian Fisher; Marius Memmel; Dipika Khullar; Seoho Yun; Abhishek Gupta; Yejin Choi; |
| 221 | To Mask or to Mirror: Human-AI Alignment in Collective Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an empirical framework for assessing collective alignment, in contrast to prior work on the individual level. |
Crystal Qian; Aaron T Parisi; Clémentine Bouleau; Vivian Tsai; Maël Lebreton; Lucas Dixon; |
| 222 | From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through the lens of the scientific method, we introduce a foundational three-level taxonomy—Tool, Analyst, and Scientist—to delineate their escalating autonomy and evolving responsibilities within the research lifecycle. |
Tianshi Zheng; Zheye Deng; Hong Ting Tsang; Weiqi Wang; Jiaxin Bai; Zihao Wang; Yangqiu Song; |
| 223 | AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite long-standing efforts in accelerating scientific discovery with AI, building AI co-scientists remains challenging due to limited high-quality data for training and evaluation. To tackle this data scarcity issue, we present AutoSDT, an automatic pipeline that collects high-quality coding tasks in real-world data-driven discovery workflows. |
Yifei Li; Hanane Nour Moussa; Ziru Chen; Shijie Chen; Botao Yu; Mingyi Xue; Benjamin Burns; Tzu-Yao Chiu; Vishal Dey; Zitong Lu; Chen Wei; Qianheng Zhang; Tianyu Zhang; Song Gao; Xuhui Huang; Xia Ning; Nesreen K. Ahmed; Ali Payani; Huan Sun; |
| 224 | Is The Top Still Spinning? Evaluating Subjectivity in Narrative Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we reframe the task to manage the subjectivity involved with factuality judgments of ambiguous claims. |
Melanie Subbiah; Akankshya Mishra; Grace Kim; Liyan Tang; Greg Durrett; Kathleen McKeown; |
| 225 | Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In such cases, traditional retrieval methods fail to capture relevant context, and long-context modeling also becomes inefficient due to numerous complicated persona-related details. To address this gap, we introduce ImplexConv, a large-scale long-term dataset with 2,500 examples, each containing approximately 100 conversation sessions, designed to study implicit reasoning in personalized dialogues. |
Xintong Li; Jalend Bantupalli; Ria Dharmani; Yuwei Zhang; Jingbo Shang; |
| 226 | CoMMIT: Coordinated Multimodal Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives, where we find the unbalanced learning between the feature encoder and the LLM can cause problems of oscillation and biased learning that lead to sub-optimal convergence. |
Xintong Li; Junda Wu; Tong Yu; Rui Wang; Yu Wang; Xiang Chen; Jiuxiang Gu; Lina Yao; Julian McAuley; Jingbo Shang; |
| 227 | MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This dual limitation makes it challenging to assess LLMs’ performance in the multilingual setting comprehensively. To fill this gap, we introduce MMLU-ProX, a comprehensive benchmark covering 29 languages, built on an English benchmark. |
Weihao Xuan; Rui Yang; Heli Qi; Qingcheng Zeng; Yunze Xiao; Aosong Feng; Dairui Liu; Yun Xing; Junjue Wang; Fan Gao; Jinghui Lu; Yuang Jiang; Huitao Li; Xin Li; Kunyu Yu; Ruihai Dong; Shangding Gu; Yuekang Li; Xiaofei Xie; Felix Juefei-Xu; Foutse Khomh; Osamu Yoshie; Qingyu Chen; Douglas Teodoro; Nan Liu; Randy Goebel; Lei Ma; Edison Marrese-Taylor; Shijian Lu; Yusuke Iwasawa; Yutaka Matsuo; Irene Li; |
| 228 | ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ConvSearch-R1, the first self-driven framework that completely eliminates dependency on external rewrite supervision by leveraging reinforcement learning to optimize reformulation directly through retrieval signals. |
Changtai Zhu; Siyin Wang; Ruijun Feng; Kai Song; Xipeng Qiu; |
| 229 | MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MUSE, a comprehensive framework tackling multi-turn jailbreaks from both attack and defense angles. |
Siyu Yan; Long Zeng; Xuecheng Wu; Chengcheng Han; Kongcheng Zhang; Chong Peng; Xuezhi Cao; Xunliang Cai; Chenjuan Guo; |
| 230 | Empowering GraphRAG with Knowledge Filtering and Integration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we identify two key challenges that plague GraphRAG: (1) Retrieving noisy and irrelevant information can degrade performance and (2) Excessive reliance on external knowledge suppresses the model’s intrinsic reasoning. To address these issues, we propose GraphRAG-FI (Filtering & Integration), consisting of GraphRAG-Filtering and GraphRAG-Integration. |
Kai Guo; Harry Shomer; Shenglai Zeng; Haoyu Han; Yu Wang; Jiliang Tang; |
| 231 | ReWordBench: Benchmarking and Improving The Robustness of Reward Models with Transformed Inputs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, while recent reward models increase performance on standard benchmarks, this may partly be due to overfitting effects, which would confound an understanding of their true capability. In this work, we scrutinize the robustness of reward models and the extent of such overfitting. |
Zhaofeng Wu; Michihiro Yasunaga; Andrew Cohen; Yoon Kim; Asli Celikyilmaz; Marjan Ghazvininejad; |
| 232 | ThinkTuning: Instilling Cognitive Reflections Without Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose ThinkTuning, a GRPO-based interactive training approach where we augment the rollouts of a student model with the guidance from a teacher model. |
Aswin Rrv; Jacob Dineen; Divij Handa; Md Nayem Uddin; Mihir Parmar; Chitta Baral; Ben Zhou; |
| 233 | Too Helpful, Too Harmless, Too Honest or Just Right? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose TrinityX, a modular alignment framework that incorporates a Mixture of Calibrated Experts (MoCaE) within the Transformer architecture. |
Gautam Siddharth Kashyap; Mark Dras; Usman Naseem; |
| 234 | From General Reward to Targeted Reward: Improving Open-ended Long-context Generation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work presents effective methods to enhance the ability of LLMs to address complex open-ended questions posed by humans. |
Zhihan Guo; Jiele Wu; Wenqian Cui; Yifei Zhang; Minda Hu; Yufei Wang; Irwin King; |
| 235 | Large Language Models Badly Generalize Across Option Length, Problem Types, and Irrelevant Noun Replacements Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a “Generalization Stress Test” to assess Large Language Models’ (LLMs) generalization ability under slight and controlled perturbations, including option length, problem types, and irrelevant noun replacements. |
Guangxiang Zhao; Saier Hu; Xiaoqi Jian; Wu Jinzhu; Yuhan Wu; Lin Sun; Xiangzheng Zhang; |
| 236 | AI Knowledge Assist: An Automated Approach for The Creation of Knowledge Bases for Conversational AI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce AI Knowledge Assist, a system that extracts knowledge in the form of question-answer (QA) pairs from historical customer‐agent conversations to automatically build a knowledge base. |
Md Tahmid Rahman Laskar; Julien Bouvier Tremblay; Xue-Yong Fu; Cheng Chen; Shashi Bhushan Tn; |
| 237 | IntentionFrame: A Semi-Structured, Multi-Aspect Framework for Fine-Grained Conversational Intention Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose IntentionFrame, a semi-structured framework inspired by psychological and cognitive intention theories, which organizes conversational intents into four interrelated aspects: situation, emotion, action, and knowledge. |
Jinggui Liang; Dung Vo; Lizi Liao; |
| 238 | Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These methods often rely on large-scale supervised fine-tuning (SFT) with extensive video data and long Chain-of-Thought (CoT) annotations, making them costly and hard to scale. To address this, we present Video-RTS, a new approach to improve video reasoning capability with drastically improved data efficiency by combining data-efficient RL with a video-adaptive test-time scaling (TTS) strategy. |
Ziyang Wang; Jaehong Yoon; Shoubin Yu; Md Mohaiminul Islam; Gedas Bertasius; Mohit Bansal; |
| 239 | MemInsight: Autonomous Memory Augmentation for LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an autonomous memory augmentation approach, MemInsight, to enhance semantic data representation and retrieval mechanisms. |
Rana Salama; Jason Cai; Michelle Yuan; Anna Currey; Monica Sunkara; Yi Zhang; Yassine Benajiba; |
| 240 | DiCoRe: Enhancing Zero-shot Event Detection Via Divergent-Convergent LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose DiCoRe, a divergent-convergent reasoning framework that decouples the task of ED using Dreamer and Grounder. |
Tanmay Parekh; Kartik Mehta; Ninareh Mehrabi; Kai-Wei Chang; Nanyun Peng; |
| 241 | SNaRe: Domain-aware Data Generation for Low-Resource Event Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, when existing generation approaches are applied to specialized domains, they struggle with label noise, where annotations are incorrect, and domain drift, characterized by a distributional mismatch between generated sentences and the target domain. To address these issues, we introduce SNaRe, a domain-aware synthetic data generation framework composed of three components: Scout, Narrator, and Refiner. |
Tanmay Parekh; Yuxuan Dong; Lucas Bandarkar; Artin Kim; I-Hung Hsu; Kai-Wei Chang; Nanyun Peng; |
| 242 | Frequency & Compositionality in Emergent Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In natural languages, frequency and compositionality exhibit an inverse relationship: the most frequent words often resist regular patterns, developing idiosyncratic forms. This phenomenon, exemplified by irregular verbs where the most frequent verbs resist regular patterns, raises a compelling question: do artificial communication systems follow similar principles?Through systematic experiments with neural network agents in a referential game setting, and by manipulating input frequency through Zipfian distributions, we investigate if these systems mirror the irregular verbs phenomenon, where messages referring to frequent objects develop less compositional structure than messages referring to rare ones. |
Jean-Baptiste Sevestre; Emmanuel Dupoux; |
| 243 | Measuring Chain of Thought Faithfulness By Unlearning Reasoning Steps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a framework for measuring parametric faithfulness of generated reasoning and propose Faithfulness by Unlearning Reasoning steps (FUR), an instance of this framework. |
Martin Tutek; Fateme Hashemi Chaleshtori; Ana Marasovic; Yonatan Belinkov; |
| 244 | Sticker-TTS: Learn to Utilize Historical Experience with A Sticker-driven Test-Time Scaling Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current test-time scaling methods predominantly rely on redundant sampling, ignoring the historical experience utilization, thereby limiting computational efficiency. To overcome this limitation, we propose Sticker-TTS, a novel test-time scaling framework that coordinates three collaborative LRMs to iteratively explore and refine solutions guided by historical attempts. |
Jie Chen; Jinhao Jiang; Yingqian Min; Zican Dong; Shijie Wang; Xin Zhao; Ji-Rong Wen; |
| 245 | Words Like Knives: Backstory-Personalized Modeling and Detection of Violent Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage nonviolent communication (NVC) theory to evaluate LLMs in detecting conversational breakdowns and assessing how relationship backstory influences both human and model perception of conflicts. |
Jocelyn J Shen; Akhila Yerukola; Xuhui Zhou; Cynthia Breazeal; Maarten Sap; Hae Won Park; |
| 246 | FLARE: Faithful Logic-Aided Reasoning and Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Faithful Logic-Aided Reasoning and Exploration (FLARE), which uses LLMs to plan solutions, formalize queries into logic programs, and simulate code execution through multi-hop search without external solvers. |
Erik Arakelyan; Pasquale Minervini; Patrick Lewis; Pat Verga; Isabelle Augenstein; |
| 247 | ToM: Leveraging Tree-oriented MapReduce for Long-Context Reasoning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While effective for local reasoning, DCF struggles to capture long-range dependencies and risks inducing conflicts by processing chunks in isolation. To overcome these limitations, we propose ToM, a novel Tree-oriented MapReduce framework for long-context reasoning. |
Jiani Guo; Zuchao Li; Jie Wu; Qianren Wang; Yun Li; Lefei Zhang; Hai Zhao; Yujiu Yang; |
| 248 | Word Salad Chopper: Reasoning Models Waste A Ton Of Decoding Budget On Useless Repetitions, Self-Knowingly Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large Reasoning Models (LRMs) are often bottlenecked by the high cost of output tokens. We show that a significant portion of these tokens are useless self-repetitions — what we call “word salad” — that exhaust the decoding budget without adding value. |
Wenya Xie; Shaochen Zhong; Hoang Anh Duy Le; Zhaozhuo Xu; Jianwen Xie; Zirui Liu; |
| 249 | Composable Cross-prompt Essay Scoring By Merging Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a source-free adaptation approach that selectively merges the parameters of individually trained source models without further access to the source datasets. |
Sanwoo Lee; Kun Liang; Yunfang Wu; |
| 250 | MUZO: Leveraging Multiple Queries and Momentum for Zeroth-Order Fine-Tuning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes the Multiple-query Memory Efficient Zeroth-Order (MUZO) method, which is based on variance-reduced multiple queries to obtain the average of gradient estimates. |
Yuezhang Peng; Yuxin Liu; Fei Wen; Xie Chen; |
| 251 | Advancing E-commerce Merchants Telemarketing with Synthetic Data-Driven LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce a hybrid data synthesis framework with two main innovations. |
Qi Gou; Zehua Xia; Li Juan; Qingyang Zhao; Wenjing Yang; |
| 252 | Mapping Smarter, Not Harder: A Test-Time Reinforcement Learning Agent That Improve Without Labels or Model Updates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a reinforcement learning agent that can self-improve without labeled examples or model weight updates. |
Wen-Kwang Tsao; Yao-Ching Yu; Chien-Ming Huang; |
| 253 | PropRAG: Guiding Retrieval with Beam Search Over Proposition Paths Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce PropRAG, a novel RAG framework that shifts from triples to context-rich propositions and introduces an efficient, LLM-free online beam search over proposition paths to discover multi-step reasoning chains. |
Jingjin Wang; Jiawei Han; |
| 254 | Are LLMs Better Than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider the recent approach of LLM-as-a-judge, leveraging an ensemble of LLMs to flag potentially mislabeled examples. |
Omer Nahum; Nitay Calderon; Orgad Keller; Idan Szpektor; Roi Reichart; |
| 255 | SMART: Simulated Students Aligned with Item Response Theory for Question Difficulty Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present SMART (Simulated Students Aligned with IRT), a novel method for aligning simulated students with instructed ability, which can then be used in simulations to predict the difficulty of open-ended items. |
Alexander Scarlatos; Nigel Fernandez; Christopher Ormerod; Susan Lottridge; Andrew Lan; |
| 256 | Structure-Conditional Minimum Bayes Risk Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce three lightweight adaptations to the utility function, designed to make MBR more sensitive to structural variability in the outcome space. |
Bryan Eikema; Anna Rutkiewicz; Mario Giulianelli; |
| 257 | VeriLocc: End-to-End Cross-Architecture Register Allocation Via LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce VeriLocc, a framework that combines large language models (LLMs) with formal compiler techniques to enable generalizable and verifiable register allocation across GPU architectures. |
Lesheng Jin; Zhenyuan Ruan; Haohui Mai; Jingbo Shang; |
| 258 | DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We conduct an evaluation of different decomposition, decontextualization, and verification strategies and find that the choice of strategy matters in the resulting factuality scores. Additionally, we introduce DnDScore, a decontextualization aware verification method that validates subclaims in the context of contextual information. |
Miriam Wanner; Benjamin Van Durme; Mark Dredze; |
| 259 | Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study privacy leakage in the reasoning traces of large reasoning models used as personal agents which handle sensitive user data. |
Tommaso Green; Martin Gubri; Haritz Puerto; Sangdoo Yun; Seong Joon Oh; |
| 260 | SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a comprehensive solution for audio logical reasoning (ALR) tasks: we introduce SoundMind, a dataset of 6,446 audio–text annotated samples specifically curated to support complex reasoning. Building on this resource, we propose SoundMind-RL, a rule-based reinforcement learning (RL) algorithm designed to equip audio-language models with robust audio–text reasoning capabilities. |
Xingjian Diao; Chunhui Zhang; Keyi Kong; Weiyi Wu; Chiyu Ma; Zhongyu Ouyang; Peijun Qing; Soroush Vosoughi; Jiang Gui; |
| 261 | ProtoVQA: An Adaptable Prototypical Framework for Explainable Fine-Grained Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ProtoVQA, a unified prototypical framework that (i) learns question-aware prototypes that serve as reasoning anchors, connecting answers to discriminative image regions, (ii) applies spatially constrained matching to ensure that the selected evidence is coherent and semantically relevant, and (iii) supports both answering and grounding tasks through a shared prototype backbone. |
Xingjian Diao; Weiyi Wu; Keyi Kong; Peijun Qing; Xinwen Xu; Ming Cheng; Soroush Vosoughi; Jiang Gui; |
| 262 | Improving Multilingual Retrieval-Augmented Language Models Through Dialectic Reasoning Argumentations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make RAG more analytical, critical and grounded, we introduce Dialectic-RAG (DRAG), a modular approach guided by Argumentative Explanations, i. e. , structured reasoning process that systematically evaluates retrieved information by comparing, contrasting, and resolving conflicting perspectives. |
Leonardo Ranaldi; Federico Ranaldi; Fabio Massimo Zanzotto; Barry Haddow; Alexandra Birch; |
| 263 | SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SlideCoder, a layout-aware, retrieval-augmented framework for generating editable slides from reference images. |
Wenxin Tang; Jingyu Xiao; Wenxuan Jiang; Xi Xiao; Yuhang Wang; Xuxin Tang; Qing Li; Yuehe Ma; Junliang Liu; Shisong Tang; Michael R. Lyu; |
| 264 | Transparent and Coherent Procedural Mistake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As our reformulation enables unprecedented transparency, we leverage a natural language inference (NLI) model to formulate two automated metrics for the coherence of generated rationales. |
Shane Storks; Itamar Bar-Yossef; Yayuan Li; Zheyuan Zhang; Jason J Corso; Joyce Chai; |
| 265 | Combining Constrained and Unconstrained Decoding Via Boosting: BoostCD and Its Application to Information Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is advantageous as it allows for dynamic constraints without requiring retraining, but can lead to low-quality output during constrained decoding at test time. We overcome this problem with Boosted Constrained Decoding (BoostCD) which combines constrained and unconstrained decoding in two phases: Phase 1 decodes from the base model M twice, in constrained and unconstrained mode, obtaining two weak predictions. |
Marija Sakota; Robert West; |
| 266 | MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our results demonstrate that LLMs largely fail at this task, and that existing interventions are insufficient: standard prompt approaches provide only marginal gains, and existing, factuality-based calibration techniques can even harm faithful calibration. To address this critical gap, we introduce MetaFaith, a novel prompt-based calibration approach inspired by human metacognition. |
Gabrielle Kaili-May Liu; Gal Yona; Avi Caciularu; Idan Szpektor; Tim G. J. Rudner; Arman Cohan; |
| 267 | WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in The Wild? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces WildDoc, the inaugural benchmark designed specifically for assessing document understanding in natural environments. |
An-Lan Wang; Jingqun Tang; Lei Liao; Hao Feng; Qi Liu; Xiang Fei; Jinghui Lu; Han Wang; Hao Liu; Yuliang Liu; Xiang Bai; Can Huang; |
| 268 | Stop Looking for “Important Tokens” in Multimodal Language Models: Duplication Matters More Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Surprisingly, it usually results in inferior performance than random token pruning and leading to incompatibility to efficient attention computation operators. Instead, we propose DART (Duplication-Aware Reduction of Tokens), which prunes tokens based on its duplication with other tokens, leading to significant and training-free acceleration. |
Zichen Wen; Yifeng Gao; Shaobo Wang; Junyuan Zhang; Qintong Zhang; Weijia Li; Conghui He; Linfeng Zhang; |
| 269 | MCIP: Protecting MCP Safety Via Model Contextual Integrity Protocol Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel framework to enhance MCP safety. |
Huihao Jing; Haoran Li; Wenbin Hu; Qi Hu; Xu Heli; Tianshu Chu; Peizhao Hu; Yangqiu Song; |
| 270 | STACKFEED: Structured Textual Actor-Critic Knowledge Base Editing with FEEDback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce STACKFEED, a novel Structured Textual Actor-Critic Knowledge base editing with Feedback approach that iteratively refines the KB based on expert feedback using a multi-actor, centralized critic reinforcement learning framework. |
Shashank Kirtania; Naman Gupta; Priyanshu Gupta; Sumit Gulwani; Arun Iyer; Suresh Parthasarathy Iyengar; Arjun Radhakrishna; Sriram K. Rajamani; Gustavo Soares; |
| 271 | Social Good or Scientific Curiosity? Uncovering The Research Framing Behind NLP Artefacts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent studies manually analyzed NLP research across domains, showing that few papers explicitly identify key stakeholders, intended uses, or appropriate contexts. In this work, we propose to automate this analysis, developing a three-component system that infers research framings by first extracting key elements (means, ends, stakeholders), then linking them through interpretable rules and contextual reasoning. |
Eric Chamoun; Nedjma Ousidhoum; Michael Sejr Schlichtkrull; Andreas Vlachos; |
| 272 | Definition Generation for Word Meaning Modeling: Monolingual, Multilingual, and Cross-Lingual Perspectives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we expand Definition Generation beyond English to a suite of 22 languages and evaluate Llama-based models within a monolingual, multilingual, and cross-lingual setting. |
Francesco Periti; Roksana Goworek; Haim Dubossarsky; Nina Tahmasebi; |
| 273 | Scaling Low-Resource MT Via Synthetic Data Generation with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automatic and human evaluation confirm its overall high quality. We study its practical application by (i) identifying effective training regimes, (ii) comparing our data with the HPLT dataset, (iii) studying the effect of varying training data size, and (iiii) testing its utility beyond English-centric MT. Finally, we introduce SynOPUS, a public repository for synthetic parallel datasets. |
Ona de Gibert; Joseph Attieh; Teemu Vahtola; Mikko Aulamo; Zihao Li; Raúl Vázquez; Tiancheng Hu; Jörg Tiedemann; |
| 274 | Language-to-Space Programming for Training-Free 3D Visual Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenges, we introduce a novel method for training-free 3D visual grounding, namely **La**nguage-to-**S**pace **P**rogramming (LaSP). |
Boyu Mi; Hanqing Wang; Tai Wang; Yilun Chen; Jiangmiao Pang; |
| 275 | DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the prevalent fixed and relatively coarse-grained entity categorization in existing datasets fails to adequately assess the superior generalization and contextual understanding capabilities of LLM-based methods, thereby hindering a comprehensive demonstration of their broad application prospects. To address these limitations, we propose DynamicNER, the first NER dataset designed for LLM-based methods with dynamic categorization, introducing various entity types and entity type lists for the same entity in different context, leveraging the generalization of LLM-based NER better. |
Hanjun Luo; Yingbin Jin; Yiran Wang; Xinfeng Li; Tong Shang; Xuecheng Liu; Ruizhe Chen; Kun Wang; Hanan Salam; Qingsong Wen; Zuozhu Liu; |
| 276 | Unlearning Vs. Obfuscation: Are We Truly Removing Knowledge? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formally distinguish unlearning from obfuscation and introduce a probing-based evaluation framework to assess whether existing approaches genuinely remove targeted information. |
Guangzhi Sun; Potsawee Manakul; Xiao Zhan; Mark Gales; |
| 277 | Efficient Beam Search for Large Language Models Using Trie-Based Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a novel trie (prefix-tree)-based parallel decoding method that addresses the memory inefficiency of batch-based beam search. |
Brian J Chan; Mao-xun Huang; Jui-Hung Cheng; Chao-Ting Chen; Hen-Hsen Huang; |
| 278 | FairGen: Controlling Sensitive Attributes for Fair Generations in Diffusion Models Via Adaptive Latent Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle the challenge of mitigating generation bias towards any target attribute value (e. g. , “male” for “gender”) in diffusion models while preserving generation quality. |
Mintong Kang; Vinayshekhar Bannihatti Kumar; Shamik Roy; Abhishek Kumar; Sopan Khosla; Balakrishnan Murali Narayanaswamy; Rashmi Gangadharaiah; |
| 279 | LLMs Behind The Scenes: Enabling Narrative Scene Illustration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the task of narrative scene illustration, which involves automatically generating an image depicting a scene in a story. |
Melissa Roemmele; John Joon Young Chung; Taewook Kim; Yuqian Sun; Alex Calderwood; Max Kreminski; |
| 280 | LogiCoL: Logically-Informed Contrastive Learning for Set-based Dense Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current dense retrievers struggle with such queries, such that the retrieved results do not respect the logical constraints implied in the queries. To address this challenge, we introduce LogiCoL, a logically-informed contrastive learning objective for dense retrievers. |
Yanzhen Shen; Sihao Chen; Xueqiang Xu; Yunyi Zhang; Chaitanya Malaviya; Dan Roth; |
| 281 | PunMemeCN: A Benchmark to Explore Vision-Language Models’ Understanding of Chinese Pun Memes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce PunMemeCN, a novel benchmark designed to assess VLMs’ capabilities in processing Chinese pun memes across three progressive tasks: pun meme detection, sentiment analysis, and chat-driven meme response. |
Zhijun Xu; Siyu Yuan; Yiqiao Zhang; Jingyu Sun; Tong Zheng; Deqing Yang; |
| 282 | Reward-Shifted Speculative Sampling Is An Efficient Test-Time Weak-to-Strong Aligner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the reward-shifted speculative sampling (SSS) algorithm, in which the draft model is aligned with human preferences, while the target model remains unchanged. |
Bolian Li; Yanran Wu; Xinyu Luo; Ruqi Zhang; |
| 283 | Think-Search-Patch: A Retrieval-Augmented Reasoning Framework for Repository-Level Code Repair Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models usually suffer from multiple-file coding scenarios where strong inter-file dependencies manifest, typically demonstrated in SWE-bench. To mitigate this issue, we propose Think-Search-Patch (TSP), a retrieval-augmented reasoning framework for repository-level code repair. |
Bojian Xiong; Yikun Lei; Xikai Liu; Shaowei Zhang; Pengyun Zhu; Yan Liu; Yongqi Leng; Ling Shi; Meizhi Zhong; Yurong Zhang; Yan Gao; Yiwu; Yao Hu; Deyi Xiong; |
| 284 | EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing benchmarks focusing on single-task environments with limited constraints, lack the complexity required to fully reflect To bridge this gap, we present the Extremely Complex Instruction Following Benchmark (EIFBENCH), meticulously crafted to facilitate a more realistic and robust evaluation of LLMs. |
Tao Zou; Xinghua Zhang; Haiyang Yu; Minzheng Wang; Fei Huang; Yongbin Li; |
| 285 | Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first large-scale study of multi-dimensional persona effects in AI-AI debates over real-world moral dilemmas. |
Jiarui Liu; Yueqi Song; Yunze Xiao; Mingqian Zheng; Lindia Tjuatja; Jana Schaich Borg; Mona T. Diab; Maarten Sap; |
| 286 | Can An Individual Manipulate The Collective Decisions of Multi-Agents? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the vulnerabilities of individual LLMs and the difficulty of accessing all agents in a multi-agent system, a key question arises: If attackers only know one agent, could they still generate adversarial samples capable of misleading the collective decision?To explore this question, we formulate it as a game with incomplete information, where attackers know only one target agent and lack knowledge of the other agents in the system. With this formulation, we propose M-Spoiler, a framework that simulates agent interactions within a multi-agent system to generate adversarial samples. |
Fengyuan Liu; Rui Zhao; Shuo Chen; Guohao Li; Philip Torr; Lei Han; Jindong Gu; |
| 287 | Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SMART (Sycophancy Mitigation through Adaptive Reasoning Trajectories), reconceptualizing sycophancy as a reasoning optimization problem rather than an output alignment issue. |
Mohammad Beigi; Ying Shen; Parshin Shojaee; Qifan Wang; Zichao Wang; Chandan K. Reddy; Ming Jin; Lifu Huang; |
| 288 | (Almost) Free Modality Stitching of Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, given the complexity of training such connectors on large scale web-based datasets coupled with the ever-increasing number of available pretrained uni-modal models, the task of uni-modal models selection and subsequent connector module training becomes computationally demanding. To address this under-studied critical problem, we propose Hypernetwork Model Alignment (Hyma), a novel all-in-one solution for optimal uni-modal model selection and connector training by leveraging hypernetworks. |
Jaisidh Singh; Diganta Misra; Boris Knyazev; Antonio Orvieto; |
| 289 | SLoW: Select Low-frequency Words! Automatic Dictionary Selection for Translation on Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel and effective method which we call Select Low-frequency Words! |
Hongyuan Lu; Zixuan Li; Zefan Zhang; Wai Lam; |
| 290 | Revisiting LLM Value Probing Strategies: Are They Robust and Expressive? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we systematically compare three widely used value probing methods: token likelihood, sequence perplexity, and text generation. |
Siqi Shen; Mehar Singh; Lajanugen Logeswaran; Moontae Lee; Honglak Lee; Rada Mihalcea; |
| 291 | SynC-LLM: Generation of Large-Scale Synthetic Circuit Code with Hierarchical Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose SynC-LLM, the first technique that exploits LLM’s ability to generate new large-scale synthetic digital circuits. |
Shang Liu; Yao Lu; Wenji Fang; Jing Wang; Zhiyao Xie; |
| 292 | Language Models As Continuous Self-Evolving Data Engineers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This reliance sets a ceiling on LLM performance and is particularly challenging in low data resource scenarios where extensive supervision is unavailable. To address this issue, we propose a novel paradigm named LANCE (**LAN**guage models as **C**ontinuous self-**E**volving data engineers) that enables LLMs to train themselves by autonomously generating, cleaning, reviewing, and annotating data with preference information. |
Peidong Wang; Ming Wang; Zhiming Ma; Xiaocui Yang; Shi Feng; Daling Wang; Yifei Zhang; Kaisong Song; |
| 293 | FinMTEB: Finance Massive Text Embedding Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite this progress, existing benchmarks predominantly use general-purpose datasets, inadequately addressing the nuanced requirements of specialized domains like finance. To bridge this gap, we introduce the Finance Massive Text Embedding Benchmark (FinMTEB), a comprehensive evaluation suite specifically designed for the financial domain. |
Yixuan Tang; Yi Yang; |
| 294 | From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces Speech Back-Translation, a a scalable pipeline that improves multilingual ASR models by converting large-scale text corpora into synthetic speech via off-the-shelf text-to-speech (TTS) models. |
Tianduo Wang; Lu Xu; Wei Lu; Shanbo Cheng; |
| 295 | Linguistic Neuron Overlap Patterns to Facilitate Cross-lingual Transfer on Low-resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: From the perspective of language-bridge,we propose a simple yet effective method, namely BridgeX-ICL, to improve the zero-shot Cross-lingual In-Context Learning (X-ICL) for low-resource languages. |
Yuemei Xu; Kexin Xu; Jian Zhou; Ling Hu; Lin Gui; |
| 296 | Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the performance gap between these two paradigms has not been thoroughly explored. To address this gap, we present a fair comparison of self-supervised learning (SSL)-based discrete and continuous features under the same experimental settings. |
Dingdong Wang; Junan Li; Mingyu Cui; Dongchao Yang; Xueyuan Chen; Helen M. Meng; |
| 297 | Crisp: Cognitive Restructuring of Negative Thoughts Through Multi-turn Supportive Dialogues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, effectively implementing CR is hindered by entrenched cognitive distortions, emotional resistance, and individual differences, which existing works have not overcome. To bridge this gap, we propose CRDial, a novel framework that structures CR as theory-grounded multi-stage multi-turn dialogue, integrating multi-aspect supportive strategies for emotional management and a multi-channel loop mechanism to account for diverse individual distortions. |
Jinfeng Zhou; Yuxuan Chen; Jianing Yin; Yongkang Huang; Yihan Shi; Xikun Zhang; Libiao Peng; Rongsheng Zhang; Tangjie Lv; Zhipeng Hu; Hongning Wang; Minlie Huang; |
| 298 | Let’s Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM’s Math Capability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, this integration is challenging due to inherent disparities in problem structure and reasoning format between NL and FL. To address these challenges, we introduce **NL-FL HybridReasoning (NFL-HR)**, an end-to-end framework designed to incorporate the FL expert into NL math problem-solving. |
Ruida Wang; Yuxin Li; Yi R. Fung; Tong Zhang; |
| 299 | RoT: Enhancing Table Reasoning with Iterative Row-Wise Traversals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, Long CoT suffers from high cost for training and exhibits low reliability due to table content hallucinations. Therefore, we propose Row-of-Thought (RoT), which performs iteratively row-wise table traversal, allowing for reasoning extension and reflection-based refinement at each traversal. |
Xuanliang Zhang; Dingzirui Wang; Keyan Xu; Qingfu Zhu; Wanxiang Che; |
| 300 | Shallow Focus, Deep Fixes: Enhancing Shallow Layers Vision Attention Sinks to Alleviate Hallucination in LVLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the attention score distribution of image tokens across layers and attention heads in models, revealing an intriguing but common phenomenon: most hallucinations are closely linked to the attention sink patterns of image tokens attention matrix, where shallow layers exhibit dense sinks and deep layers exhibit the sparse. |
Xiaofeng Zhang; Yihao Quan; Chen Shen; Chaochen Gu; Xiaosong Yuan; Shaotian Yan; Jiawei Cao; Hao Cheng; Kaijie Wu; Jieping Ye; |
| 301 | Efficient Real-time Refinement of Language Model Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, in this work, we propose Streaming-VR (Streaming Verification and Refinement), a novel approach designed to enhance the efficiency of verification and refinement of LLM outputs. |
Joonho Ko; Jinheon Baek; Sung Ju Hwang; |
| 302 | Route Sparse Autoencoder to Interpret Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Route Sparse Autoencoder (RouteSAE), a new framework that integrates a routing mechanism with a shared SAE to efficiently extract features from multiple layers. |
Wei Shi; Sihang Li; Tao Liang; Mingyang Wan; Guojun Ma; Xiang Wang; Xiangnan He; |
| 303 | SATER: A Self-Aware and Token-Efficient Approach to Routing and Cascading Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further address the limitations of both approaches, we introduce SATER, a dual-mode compatible approach that fine-tunes models through shortest-response preference optimization and a confidence-aware rejection mechanism. |
Yuanzhe Shen; Yide Liu; Zisu Huang; Ruicheng Yin; Xiaoqing Zheng; Xuanjing Huang; |
| 304 | Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we seek to fully unleash the two paradigms’ strengths for mutual enhancement and ultimately achieve simultaneous improvements. |
Senjie Jin; Lu Chen; Zhiheng Xi; Yuhui Wang; Sirui Song; Yuhao Zhou; Xinbo Zhang; Peng Sun; Hong Lu; Tao Gui; Qi Zhang; Xuanjing Huang; |
| 305 | KG-RAG: Enhancing GUI Agent Decision-Making Via Knowledge Graph-Driven Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce KG-RAG, a Knowledge Graph-driven Retrieval-Augmented Generation framework that transforms fragmented UTGs into structured vector databases for efficient real-time retrieval. |
Ziyi Guan; Jason Chun Lok Li; Zhijian Hou; Pingping Zhang; Donglai Xu; Yuzhi Zhao; Mengyang Wu; Jinpeng Chen; Thanh-Toan Nguyen; Pengfei Xian; Wenao Ma; Shengchao Qin; Graziano Chesi; Ngai Wong; |
| 306 | FillerSpeech: Towards Human-Like Text-to-Speech Synthesis with Filler Insertion and Filler Style Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further advance toward human-like conversational speech synthesis, this paper presents FillerSpeech, a novel speech synthesis framework that enables natural filler insertion and control over filler style. |
Seung-Bin Kim; Jun-Hyeok Cha; Hyung-Seok Oh; Heejin Choi; Seong-Whan Lee; |
| 307 | SOCIAL SCAFFOLDS: A Generalization Framework for Social Understanding Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Without such cues, NLP models risk missing social signals, instead relying on surface patterns. We introduce SOCIAL SCAFFOLDS, an automated framework for facilitating generalization across social reasoning tasks by generating rationales that make these social cues explicit. |
Ritam Dutt; Carolyn Rose; Maarten Sap; |
| 308 | Probabilistic Soundness Guarantees in LLM Reasoning Chains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current LLM-based error detection methods often fail to detect propagated errors because earlier errors can corrupt judgments of downstream reasoning. To better detect such errors, we introduce Autoregressive Reasoning Entailment Stability (ARES), a probabilistic framework that evaluates each reasoning step based solely on previously-verified premises. |
Weiqiu You; Anton Xue; Shreya Havaldar; Delip Rao; Helen Jin; Chris Callison-Burch; Eric Wong; |
| 309 | LMR-BENCH: Evaluating LLM Agent’s Ability on Reproducing Language Modeling Research Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This task includes unique complex reasoning challenges in the intellectual synthesis of abstract concepts and the comprehension of code repositories with interdependent files. Motivated by this gap, we present LMR-BENCH, a benchmark designed to systematically evaluate the capability of LLM agents on code reproduction from Language Modeling Research. |
Shuo Yan; Ruochen Li; Ziming Luo; Zimu Wang; Daoyang Li; Liqiang Jing; Kaiyu He; Peilin Wu; Juntong Ni; George Michalopoulos; Yue Zhang; Ziyang Zhang; Mian Zhang; Zhiyu Chen; Xinya Du; |
| 310 | Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we leverage a causal mediation technique called activation patching, to measure the faithfulness of an explanation towards supporting the explained answer. |
Wei Jie Yeo; Ranjan Satapathy; Erik Cambria; |
| 311 | A Probabilistic Inference Scaling Theory for LLM Self-Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the mechanisms underlying how and why accuracy evolves during this iterative process remain unexplored. To fill this gap, we propose a probabilistic theory to model the dynamics of accuracy change and explain the performance improvements observed in multi-round self-correction. |
Zhe Yang; Yichang Zhang; Yudong Wang; Ziyao Xu; Junyang Lin; Zhifang Sui; |
| 312 | LLMInit: A Free Lunch from Large Language Models for Selective Initialization of Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel and scalable LLM-RecSys framework, LLMInit, designed to integrate pretrained LLM embeddings into CF models through selective initialization strategies. |
Weizhi Zhang; Liangwei Yang; Wooseong Yang; Henry Peng Zou; Yuqing Liu; Ke Xu; Sourav Medya; Philip S. Yu; |
| 313 | ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We then decompose visual reasoning process into two stages: proactive visual perception (i. e. , eyesight) and textual reasoning (i. e. , wisdom), and introduce a novel visual reasoning framework named ProReason. |
Jingqi Zhou; Sheng Wang; Jingwei Dong; Kai Liu; Lei Li; Jiahui Gao; Jiyue Jiang; Lingpeng Kong; Chuan Wu; |
| 314 | Parallel Continuous Chain-of-Thought with Jacobi Iteration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the sequential dependencies between latent thought tokens spoil parallel training, leading to long training time. In this paper, we propose Parallel Continuous Chain-of-Thought (PCCoT), which performs Jacobi iteration on the latent thought tokens, updating them iteratively in parallel instead of sequentially and thus improving both training and inference efficiency of continuous CoT. |
Haoyi Wu; Zhihao Teng; Kewei Tu; |
| 315 | Sparse Activation Editing for Reliable Instruction Following in Narratives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Complex narrative contexts often challenge language models’ ability to follow instructions, and existing benchmarks fail to capture these difficulties. To address this, we propose Concise-SAE, a training-free framework that improves instruction following by identifying and editing instruction-relevant neurons using only natural language instructions, without requiring labelled data. |
Runcong Zhao; Chengyu Cao; Qinglin Zhu; Xiucheng Ly; Shun Shao; Lin Gui; Ruifeng Xu; Yulan He; |
| 316 | No Need for Explanations: LLMs Can Implicitly Learn from Mistakes In-context Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To gain an understanding of *why* LLMs learn from mistakes more effectively without explicit corrective rationales, we perform a thorough analysis, investigating changes in context length and answer diversity between different prompting strategies, and their effect on performance. |
Lisa Alazraki; Maximilian Mozes; Jon Ander Campos; Tan Yi-Chern; Marek Rei; Max Bartolo; |
| 317 | A Good Plan Is Hard to Find: Aligning Models with Preferences Is Misaligned with What Helps Users Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We expose: 1) user/model preferences and agent success do not accurately predict which plans help users, so common alignment feedback can misalign with helpfulness; 2) this gap is not due to user-specific preferences, as users are similarly successful when using plans they prefer/disprefer; 3) surface-level cues like brevity and question similarity strongly link to preferences, but such biases fail to predict helpfulness. In all, we argue aligning helpful LLMs needs feedback from real user interactions—not just preferences of what looks helpful—so we discuss the plan NLP researchers can execute to solve this problem. |
Nishant Balepur; Matthew Shu; Yoo Yeon Sung; Seraphina Goldfarb-Tarrant; Shi Feng; Fumeng Yang; Rachel Rudinger; Jordan Lee Boyd-Graber; |
| 318 | Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these metrics depend on hardware and running-time choices (parallel or not, batch size, etc), and often fail to account for model size, making it difficult to interpret and obscuring the evaluation of the efficiency-effectiveness tradeoff. To address this issue, we propose for LLM-based rerankers: RPP (ranking metrics per PetaFLOP), measuring how much ranking quality (e. g. , NDCG or MRR) a method achieves per PetaFLOP, and QPP (queries per PetaFLOP), measuring how many queries can be processed per PetaFLOP. |
Zhiyuan Peng; Ting-Ruen Wei; Tingyu Song; Yilun Zhao; |
| 319 | Castle: Causal Cascade Updates in Relational Databases with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces Castle, the first framework for schema-only cascade update generation using large language models (LLMs). |
Yongye Su; Yucheng Zhang; Zeru Shi; Bruno Ribeiro; Elisa Bertino; |
| 320 | LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite their general success, Large Language Models (LLMs) often struggle with hallucinations, handling domain-specific data effectively (e. g. , crystal structures), and integrating experimental workflows. To address these challenges, we introduce LLaMP, a hierarchical multi-agent framework designed to emulate the materials science research workflow. |
Yuan Chiang; Elvis Hsieh; Chia-Hong Chou; Janosh Riebesell; |
| 321 | Learn and Unlearn: Addressing Misinformation in Multilingual LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates the propagation of information in multilingual large language models (LLMs) and evaluates the efficacy of various unlearning methods. |
TaiMing Lu; Philipp Koehn; |
| 322 | How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: EchoMist targets circulated, harmful, and ever-evolving implicit misinformation from diverse sources, including realistic human-AI conversations and social media interactions. Through extensive empirical studies on 15 state-of-the-art LLMs, we find that current models perform alarmingly poorly on this task, often failing to detect false premises and generating counterfactual explanations. |
Ruohao Guo; Wei Xu; Alan Ritter; |
| 323 | PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Patch Context Robustness Index (PCRI), the first systematic and interpretable score for quantifying MLLM robustness to variations in visual context granularity, measuring performance changes between localized image patches and full-image input. |
Hitesh Laxmichand Patel; Amit Agarwal; Srikant Panda; Hansa Meghwani; Karan Dua; Paul Li; Tao Sheng; Sujith Ravi; Dan Roth; |
| 324 | Evaluating AI for Finance: Is AI Credible at Assessing Investment Risk Appetite? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We assess whether AI systems can credibly evaluate investment risk appetite—a task that must be thoroughly validated before automation. |
Divij Chawla; Ashita Bhutada; Duc Anh Do; Abhinav Raghunathan; Vinod Sp; Cathy Guo; Dar Win Liew; Prannaya Gupta; Rishabh Bhardwaj; Rajat Bhardwaj; Soujanya Poria; |
| 325 | Tree-of-Quote Prompting Improves Factuality and Attribution in Multi-Hop and Medical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Tree-of-Quote (ToQ) prompting, a structured framework that decomposes complex questions into subquestions, generates quotes to support each step without retrieval, and selectively advances reasoning based on quote quality. |
Justin Xu; Yiming Li; Zizheng Zhang; Augustine Yui Hei Luk; Mayank Jobanputra; Samarth Oza; Ashley Murray; Meghana Reddy Kasula; Andrew Parker; David W Eyre; |
| 326 | REARANK: Reasoning Re-ranking Agent Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present REARANK, a large language model (LLM)-based listwise reasoning rerank- ing agent. |
Le Zhang; Bo Wang; Xipeng Qiu; Siva Reddy; Aishwarya Agrawal; |
| 327 | Learning to Ask: When LLM Agents Meet Unclear Instruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that due to the next-token prediction training objective, LLM agents tend to arbitrarily generate the missed argument, which may lead to hallucinations and risks. To address this issue, we propose a novel framework, Ask-when-Needed, which prompts LLM agents to ask questions to users whenever they encounter obstacles due to unclear instructions. |
Wenxuan Wang; Shi Juluan; Zixuan Ling; Yuk-Kit Chan; Chaozheng Wang; Cheryl Lee; Youliang Yuan; Jen-tse Huang; Wenxiang Jiao; Michael R. Lyu; |
| 328 | DeepResearcher: Scaling Deep Research Via Reinforcement Learning in Real-world Environments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce DeepResearcher, the first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning (RL) in real-world environments with authentic web search interactions. |
Yuxiang Zheng; Dayuan Fu; Xiangkun Hu; Xiaojie Cai; Lyumanshan Ye; Pengrui Lu; Pengfei Liu; |
| 329 | A Systematic Analysis of Base Model Choice for Reward Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a systematic analysis of the effect of base model selection on reward modeling performance. |
Kian Ahrabian; Pegah Jandaghi; Negar Mokhberian; Sai Praneeth Karimireddy; Jay Pujara; |
| 330 | Linear-Time Demonstration Selection for In-Context Learning Via Gradient Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces an algorithm to select demonstration examples for in-context learning of a query set. |
Ziniu Zhang; Zhenshuo Zhang; Dongyue Li; Lu Wang; Jennifer Dy; Hongyang R. Zhang; |
| 331 | Creativity in LLM-based Multi-Agent Systems: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This survey offers a structured framework and roadmap for advancing the development, evaluation, and standardization of creative MAS. |
Yi-Cheng Lin; Kang-Chieh Chen; Zhe-Yan Li; Tzu-Heng Wu; Tzu-Hsuan Wu; Kuan-Yu Chen; Hung-yi Lee; Yun-Nung Chen; |
| 332 | AskToAct: Enhancing LLMs Tool Use Via Self-Correcting Clarification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing interactive clarification approaches face two critical limitations: reliance on manually constructed datasets, which inherently constrains training data scale and diversity, and lack of error correction mechanisms during multi-turn clarification, leading to error accumulation that compromises both accuracy and efficiency. We present AskToAct, which addresses these challenges by exploiting the structural mapping between queries and their tool invocation solutions. |
Xuan Zhang; Yongliang Shen; Zhe Zheng; Linjuan Wu; Wenqi Zhang; Yuchen Yan; Qiuying Peng; Jun Wang; Weiming Lu; |
| 333 | MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce MiCRo, a two-stage framework that enhances personalized preference learning by leveraging large-scale binary preference datasets without requiring explicit fine-grained annotations. |
Jingyan Shen; Jiarui Yao; Rui Yang; Yifan Sun; Feng Luo; Rui Pan; Tong Zhang; Han Zhao; |
| 334 | ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided Prompt Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel prompt learning framework for MCIT to effectively alleviate forgetting of previous knowledge while managing computational complexity with natural image-text supervision. |
Fanhu Zeng; Fei Zhu; Haiyang Guo; Xu-Yao Zhang; Cheng-Lin Liu; |
| 335 | SHIFT: Selected Helpful Informative Frame for Video-guided Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this paradigm frequently incurs significant computational overhead and introduces redundant multimodal content, which degrades both efficiency and translation quality. To tackle these challenges, we propose SHIFT (Selected Helpful Informative Frame for Translation). |
Boyu Guan; Chuang Han; Yining Zhang; Yupu Liang; Zhiyang Zhang; Yang Zhao; Chengqing Zong; |
| 336 | ReSURE: Regularizing Supervision Unreliability for Multi-turn Dialogue Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods typically address data quality via static prefiltering, which decouples quality control from training and fails to mitigate turn-level error propagation. In this context, we propose **ReSURE** (REgularizing Supervision UnREliability), an adaptive learning method that dynamically down-weights unreliable supervision without explicit filtering. |
Yiming Du; Yifan Xiang; Bin Liang; Dahua Lin; Kam-Fai Wong; Fei Tan; |
| 337 | ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our analysis reveals that reasoning length is governed by a linear direction in the representation space, allowing us to induce overly short reasoning by steering the model along this direction. Building on this insight, we introduce ThinkEdit, a simple yet effective weight-editing approach to mitigate the issue of overly short reasoning. |
Chung-En Sun; Ge Yan; Tsui-Wei Weng; |
| 338 | Conditional [MASK] Discrete Diffusion Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Non-auto-regressive methods could be an alternative but often produce degenerate outputs and exhibit shortcomings in conditional generation. To address these challenges, we propose Diffusion-EAGS, a novel framework that integrates conditional masked language models into diffusion language models through the theoretical lens of a conditional Markov Random Field. |
Hyukhun Koh; Minha Jhang; Dohyung Kim; Sangmook Lee; Kyomin Jung; |
| 339 | Reliable Evaluation and Benchmarks for Statement Autoformalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Evaluating statement autoformalization, translating natural language mathematics into formal languages like Lean 4, remains a significant challenge, with few metrics, datasets, and standards to robustly measure progress. In this work, we present a comprehensive approach combining improved metrics, robust benchmarks, and systematic evaluation, to fill this gap. |
Auguste Poiroux; Gail Weiss; Viktor Kunčak; Antoine Bosselut; |
| 340 | Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing approaches leverage template-based or LLM-assisted methods for geometric CoT data creation, they often face challenges in achieving both diversity and precision. To bridge this gap, we introduce a two-stage Theorem-Validated Reverse Chain-of-Thought Reasoning Synthesis (TR-CoT) framework. |
Deng Linger; Linghao Zhu; Yuliang Liu; Yu Wang; Qunyi Xie; Jingjing Wu; Gang Zhang; Yingying Zhu; Xiang Bai; |
| 341 | GeoEdit: Geometric Knowledge Editing for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing training-based model editing methods often struggle to effectively incorporate new knowledge while preserving unrelated general knowledge. To address this challenge, we propose a novel framework called Geometric Knowledge Editing (GeoEdit). |
Yujie Feng; Li-Ming Zhan; Zexin Lu; Yongxin Xu; Xu Chu; Yasha Wang; Jiannong Cao; Philip S. Yu; Xiao-Ming Wu; |
| 342 | AIMMerging: Adaptive Iterative Model Merging Using Training Trajectories for Language Model Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Adaptive Iterative Model Merging (AimMerging), a novel CL framework that utilizes learning and forgetting signals from the training trajectory to dynamically monitor the model’s training status. |
Yujie Feng; Jian Li; Xiaoyu Dong; Pengfei Xu; Xiaohui Zhou; Yujia Zhang; Zexin Lu; Yasha Wang; Alan Zhao; Xu Chu; Xiao-Ming Wu; |
| 343 | ProLongVid: A Simple But Strong Baseline for Long-context Video Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a simple yet effective baseline for long-context video understanding, including dataset construction and training recipes. |
Rui Wang; Bohao Li; Xiyang Dai; Jianwei Yang; Yi-Ling Chen; Zhen Xing; Yifan Yang; Dongdong Chen; Xipeng Qiu; Zuxuan Wu; Yu-Gang Jiang; |
| 344 | Graph-Guided Textual Explanation Generation Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, highlight explanations–input fragments critical for the model’s predicted answers–exhibit measurable faithfulness. Building on this foundation, we propose G-TEx, a Graph-Guided Textual Explanation Generation framework designed to enhance the faithfulness of NLEs. |
Shuzhou Yuan; Jingyi Sun; Ran Zhang; Michael Färber; Steffen Eger; Pepa Atanasova; Isabelle Augenstein; |
| 345 | UI-Hawk: Unleashing The Screen Stream Understanding for Mobile GUI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing GUI agents merely depend on current visual observations and plain-text action history, ignoring the significance of history screens. To mitigate this issue, we propose **UI-Hawk**, a multi-modal GUI agent specially designed to process screen streams encountered during GUI navigation. |
Jiwen Zhang; Ya-Qi Yu; Minghui Liao; WenTao Li; Jihao Wu; Zhongyu Wei; |
| 346 | How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the interplay between social bots and misinformation on the Sina Weibo platform. |
Herun Wan; Minnan Luo; Zihan Ma; Guang Dai; Xiang Zhao; |
| 347 | TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce TinySQL, a synthetic dataset, progressing from basic to advanced SQL operations, and train models ranging from 33M to 1B parameters to establish a comprehensive testbed for interpretability. |
Abir Harrasse; Philip Quirke; Clement Neo; Dhruv Nathawani; Luke Marks; Amir Abdullah; |
| 348 | Droid: A Resource Suite for AI-Generated Code Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present DroidCollection, the most extensive open data suite for training and evaluating machine-generated code detectors, comprising over a million code samples, seven programming languages, outputs from 43 coding models, and three real-world coding domains. |
Daniil Orel; Indraneil Paul; Iryna Gurevych; Preslav Nakov; |
| 349 | Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore’s Low-Resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we introduce SGToxicGuard, a novel dataset and evaluation framework for benchmarking LLM safety in Singapore’s diverse linguistic context, including Singlish, Chinese, Malay, and Tamil. |
Yujia Hu; Ming Shan Hee; Preslav Nakov; Roy Ka-Wei Lee; |
| 350 | M-Wanda: Improving One-Shot Pruning for Multilingual LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study multilingual performance under different sparsity constraints and show that moderate ratios already substantially harm performance. |
Rochelle Choenni; Ivan Titov; |
| 351 | MEPT: Mixture of Expert Prompt Tuning As A Manifold Mapper Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although prior fine-tuning approaches demonstrate success, their rigid parameter space limits their ability to dynamically activate appropriate neural pathways, rendering them ill-equipped to adapt flexibly to the diverse and evolving data distributions. In light of this view, we propose a novel approach, Mixture of Expert Prompt Tuning (MEPT), as an effective and efficient manifold-mapping framework. |
Runjia Zeng; Guangyan Sun; Qifan Wang; Tong Geng; Sohail Dianat; Xiaotian Han; Raghuveer Rao; Xueling Zhang; Cheng Han; Lifu Huang; Dongfang Liu; |
| 352 | DPED: Multi-Layer Noise Distillation for Privacy-Preserving Text Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose (Differentially Private Embedding Distillation), a framework that leverages teacher-student distillation with multi-layer noise injection to learn high-quality embeddings while providing differential privacy guarantees. |
Shuya Feng; Yuan Hong; |
| 353 | Threading The Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We thus propose a novel LLM-based pipeline enriched with linguistically-grounded discourse segmenters to extract supporting and opposing statements for each answer option from CoTs with improved accuracy. |
Beiduo Chen; Yang Janet Liu; Anna Korhonen; Barbara Plank; |
| 354 | Table-LLM-Specialist: Language Model Specialists for Tables Using Iterative Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Table-Specialist, a new self-trained fine-tuning paradigm specifically designed for table tasks. |
Junjie Xing; Yeye He; Mengyu Zhou; Haoyu Dong; Shi Han; Dongmei Zhang; Surajit Chaudhuri; |
| 355 | Proactive Hearing Assistants That Isolate Egocentric Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce proactive hearing assistants that automatically identify and separate the wearer’s conversation partners, without requiring explicit prompts. |
Guilin Hu; Malek Itani; Tuochao Chen; Shyamnath Gollakota; |
| 356 | MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present C-IMHI, the first multi-task Chinese social media interpretable mental health instruction dataset (9K samples) with quality control and manual validation. |
Wei Zhai; Nan Bai; Qing Zhao; Jianqiang Li; Fan Wang; Hongzhi Qi; Meng Jiang; Xiaoqin Wang; Bing Xiang Yang; Guanghui Fu; |
| 357 | CausalVLBench: Benchmarking Visual Causal Reasoning in Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite increasing interest in the utility of LLMs in causal reasoning tasks such as causal discovery and counterfactual reasoning, there has been relatively little work showcasing the abilities of LVLMs on visual causal reasoning tasks. We take this opportunity to formally introduce a comprehensive causal reasoning benchmark for multi-modal in-context learning from LVLMs. |
Aneesh Komanduri; Karuna Bhaila; Xintao Wu; |
| 358 | Order Doesn’t Matter, But Reasoning Does: Training LLMs with Order-Centric Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: LLMs often rely on fixed sequential patterns rather than true logical understanding. To address this issue, we introduce an order-centric data augmentation framework based on commutativity in logical reasoning. |
Qianxi He; Qianyu He; Jiaqing Liang; Weikang Zhou; Zeye Sun; Fei Yu; Yanghua Xiao; |
| 359 | Latent Inter-User Difference Modeling for LLM Personalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent work has attempted to model such differences, the reliance on language-based prompts often hampers the effective extraction of meaningful distinctions. To address these issues, we propose Difference-aware Embedding-based Personalization (DEP), a framework that models inter-user differences in the latent space instead of relying on language prompts. |
Yilun Qiu; Tianhao Shi; Xiaoyan Zhao; Fengbin Zhu; Yang Zhang; Fuli Feng; |
| 360 | What Makes A Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present LCoT2Tree, an automated framework that converts sequential LCoTs into hierarchical tree structures and thus enables deeper structural analysis of LLM reasoning. |
Gangwei Jiang; Yahui Liu; Zhaoyi Li; Wei Bi; Fuzheng Zhang; Linqi Song; Ying Wei; Defu Lian; |
| 361 | PEBR: A Probabilistic Approach to Embedding Based Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel probabilistic Embedding-Based Retrieval (pEBR) framework. |
Han Zhang; Yunjiang Jiang; Mingming Li; Haowei Yuan; Yiming Qiu; Wen-Yun Yang; |
| 362 | Evolving Chinese Spelling Correction with Corrector-Verifier Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: LLMs are advantageous in their extensive knowledge but fall into low efficiency in character-level editing. To address this dilemma, we propose Automatic Corrector Iteration (ACI), a novel model collaboration pipeline to iteratively optimize a BERT-based corrector. |
Linfeng Liu; Hongqiu Wu; Hai Zhao; |
| 363 | Adaptively Profiling Models with Task Elicitation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce task elicitation, a method that automatically builds new evaluations to profile model behavior. |
Davis Brown; Prithvi Balehannina; Helen Jin; Shreya Havaldar; Hamed Hassani; Eric Wong; |
| 364 | Following The Autoregressive Nature of LLM Embeddings Via Compression and Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This discrepancy hinders the full utilization of LLMs’ pre-training capabilities, resulting in inefficient learning. In response to this issue, we propose AutoRegEmbed, a new contrastive learning method built on embedding conditional probability distributions, which integrates two core tasks: information compression and conditional distribution alignment. |
Jingcheng Deng; Zhongtao Jiang; Liang Pang; Zihao Wei; Liwei Chen; Kun Xu; Yang Song; Huawei Shen; Xueqi Cheng; |
| 365 | Structured Preference Optimization for Vision-Language Long-Horizon Task Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing vision-language planning methods perform well on short-horizon tasks but struggle with long-horizon reasoning in dynamic environments due to the difficulty of training models to generate high-quality reasoning processes. To address this, we propose Structured Preference Optimization (SPO), a framework that enhances reasoning and action selection for long-horizon task planning through structured evaluation and optimized training. |
Xiwen Liang; Min Lin; Weiqi Ruan; Rongtao Xu; Yuecheng Liu; Jiaqi Chen; Bingqian Lin; Yuzheng Zhuang; Xiaodan Liang; |
| 366 | Improving Neutral Point-of-View Generation with Data- and Parameter-Efficient RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper shows that parameter-efficient reinforcement learning (PE-RL) is a highly effective training regime to improve large language models’ (LLMs) ability to answer queries on sensitive topics with a Neutral Point of View (NPOV), i. e. to provide significantly more informative, diverse and impartial answers. |
Jessica Hoffmann; Christiane Ahlheim; Zac Yu; Aria Walfrand; Jarvis Jin; Marie Tano; Ahmad Beirami; Erin MacMurray van Liemt; Nithum Thain; Hakim Sidahmed; Lucas Dixon; |
| 367 | Tiny Budgets, Big Gains: Parameter Placement Strategy in Parameter Super-Efficient Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose FoRA-UA, a novel method that, using only 1–5% of the standard LoRA’s parameters, achieves state-of-the-art performance across a wide range of tasks. |
Jinman Zhao; Xueyan Zhang; Jiaru Li; Jingcheng Niu; Yulan Hu; Erxue Min; Gerald Penn; |
| 368 | Certified Mitigation of Worst-Case LLM Copyright Infringement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose BloomScrub, a remarkably simple yet highly effective inference-time approach that provides certified copyright takedown. |
Jingyu Zhang; Jiacan Yu; Marc Marone; Benjamin Van Durme; Daniel Khashabi; |
| 369 | All for One: LLMs Solve Mental Math at The Last Token With Information Transferred From Other Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, to what extent are such operations present? In this paper, on mental math tasks (i. e. , direct math calculation via next-token prediction without explicit reasoning), we investigate this question in three steps: inhibiting input-specific token computations in the initial layers, restricting the routes of information transfer in the next few layers, and forcing all computation to happen at the last token in the remaining layers. |
Siddarth Mamidanna; Daking Rai; Ziyu Yao; Yilun Zhou; |
| 370 | Z1: Efficient Test-time Scaling with Code Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an efficient test-time scaling method that trains LLMs on code-related reasoning trajectories, facilitating their reduction of excess thinking tokens while maintaining performance. |
Zhaojian Yu; Yinghao Wu; Yilun Zhao; Arman Cohan; Xiao-Ping Zhang; |
| 371 | Multi-Modal Framing Analysis of News Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Especially for framing in the news, this leaves out valuable information about editorial choices, which include not just the written article but also accompanying photographs. To overcome such limitations, we present a method for conducting multi-modal, multi-label framing analysis at scale using large (vision-) language models. |
Arnav Arora; Srishti Yadav; Maria Antoniak; Serge Belongie; Isabelle Augenstein; |
| 372 | IL-PCSR: Legal Corpus for Prior Case and Statute Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Researchers till date have addressed the two tasks independently, thus developing completely different datasets and models for each task; however, both retrieval tasks are inherently related, e. g. , similar cases tend to cite similar statutes (due to similar factual situation). In this paper, we address this gap. |
Shounak Paul; Dhananjay Ghumare; Pawan Goyal; Saptarshi Ghosh; Ashutosh Modi; |
| 373 | Improving Context Fidelity Via Native Retrieval-Augmented Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose CARE, a novel native retrieval-augmented reasoning framework that teaches LLMs to explicitly integrate in-context evidence within their reasoning process with the model’s own retrieval capabilities. |
Suyuchen Wang; Jinlin Wang; Xinyu Wang; Shiqi Li; Xiangru Tang; Sirui Hong; Xiao-Wen Chang; Chenglin Wu; Bang Liu; |
| 374 | Enhancing LLM Language Adaption Through Cross-lingual In-Context Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Cross-lingual In-context Pre-training (CrossIC-PT), a simple and scalable approach that enhances cross-lingual transfer by leveraging semantically related bilingual texts via simple next-word prediction. |
Linjuan Wu; Hao-Ran Wei; Huan Lin; Tianhao Li; Baosong Yang; Fei Huang; Weiming Lu; |
| 375 | SPaRC: A Spatial Pathfinding Reasoning Challenge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SPaRC (Spatial Pathfinding Reasoning Challenge), a dataset of 1,000 2D grid pathfinding puzzles to evaluate spatial and rule-based reasoning, requiring step-by-step planning with arithmetic and geometric rules. |
Lars Benedikt Kaesberg; Jan Philip Wahle; Terry Ruas; Bela Gipp; |
| 376 | METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose METok, a training-free, Multi-stage Event-based Token compression framework designed to accelerate VLLMs’ inference while preserving accuracy. |
Mengyue Wang; Shuo Chen; Kristian Kersting; Volker Tresp; Yunpu Ma; |
| 377 | Personalized Language Models Via Privacy-Preserving Evolutionary Model Merging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the limitations, we propose Privacy-Preserving Model Merging via Evolutionary Algorithms (PriME), a novel personalization approach that employs gradient-free methods to directly optimize utility while reducing privacy risks. |
Kyuyoung Kim; Jinwoo Shin; Jaehyung Kim; |
| 378 | NeuroAda: Activating Each Neuron’s Potential for Parameter-Efficient Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, the latter directly fine-tunes a carefully chosen subset of the original model parameters, allowing for more precise and effective adaptation, but at the cost of significantly increased memory consumption. To reconcile this trade-off, we propose NeuroAda, a novel PEFT method that enables fine-grained model finetuning while maintaining high memory efficiency. |
Zhi Zhang; Yixian Shen; Congfeng Cao; Ekaterina Shutova; |
| 379 | FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods often struggle with insufficient multimodal fusion, weak generalization, and poor interpretability. To address these challenges, we propose FlightGPT, a novel UAV VLN framework built upon Vision-Language Models (VLMs) with powerful multimodal perception capabilities. |
Hengxing Cai; Jinhan Dong; Jingjun Tan; Jingcheng Deng; Sihang Li; Zhifeng Gao; Haidong Wang; Zicheng Su; Agachai Sumalee; Renxin Zhong; |
| 380 | Benchmarking Deep Search Over Heterogeneous Enterprise Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new benchmark for evaluating Deep Search—a realistic and complex form of retrieval-augmented generation (RAG) that requires source-aware, multi-hop reasoning over diverse, sparsed, but related sources. |
Prafulla Kumar Choubey; Xiangyu Peng; Shilpa Bhagavath; Kung-Hsiang Huang; Caiming Xiong; Chien-Sheng Wu; |
| 381 | Knowledge Editing Through Chain-of-Thought Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, their reliance on few-shot prompting for task decomposition makes them unstable and less effective in generalizing across diverse tasks. In response to these limitations, we propose EditCoT, a novel knowledge editing framework that flexibly and efficiently updates LLMs across various tasks without retraining. |
Changyue Wang; Weihang Su; Qingyao Ai; Yichen Tang; Yiqun Liu; |
| 382 | X-CoT: Explainable Text-to-Video Retrieval Via LLM-based Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes X-CoT, an explainable retrieval framework upon LLM CoT reasoning in place of the embedding model-based similarity ranking. |
Prasanna Reddy Pulakurthi; Jiamian Wang; Majid Rabbani; Sohail Dianat; Raghuveer Rao; Zhiqiang Tao; |
| 383 | Mechanisms Vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Adopting a “mechanisms vs. outcomes” framework, we evaluate 32 open-weight transformer models and find that syntactic features extracted via probing fail to predict outcomes of targeted syntax evaluations across English linguistic phenomena. |
Ananth Agarwal; Jasper Jian; Christopher D Manning; Shikhar Murty; |
| 384 | Crossing Domains Without Labels: Distant Supervision for Term Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This highlights the need for more robust, scalable solutions and realistic evaluation settings. To address this, we introduce a comprehensive benchmark spanning seven diverse domains, enabling performance evaluation at both the document- and corpus-levels. |
Elena Senger; Yuri Campbell; Rob Van Der Goot; Barbara Plank; |
| 385 | Glider: Global and Local Instruction-Driven Expert Router Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe that current token-level routing mechanisms neglect the global semantic context of the input task. To address this, we propose a novel method, Global and Local Instruction Driven Expert Router (GLIDER) that proposes a multi-scale routing mechanism, encompassing a semantic global router and a learned local router. |
Pingzhi Li; Prateek Yadav; Jaehong Yoon; Jie Peng; Yi-Lin Sung; Mohit Bansal; Tianlong Chen; |
| 386 | MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing multimodal benchmarks typically overlook linguistic and visual ambiguities, relying mainly on unimodal context for disambiguation and thus failing to exploit the mutual clarification potential between modalities. To bridge this gap, we introduce MUCAR, a novel and challenging benchmark designed explicitly for evaluating multimodal ambiguity resolution across multilingual and cross-modal scenarios. |
Xiaolong Wang; Zhaolu Kang; Wangyuxuan Zhai; Xinyue Lou; Yunghwei Lai; Ziyue Wang; Yawen Wang; Kaiyu Huang; Yile Wang; Peng Li; Yang Liu; |
| 387 | Understanding The Thinking Process of Reasoning Models: A Perspective from Schoenfeld’s Episode Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel approach by applying Schoenfeld’s Episode Theory, a classic cognitive framework for human mathematical problem-solving, to analyze the reasoning traces of LRMs. |
Ming Li; Nan Zhang; Chenrui Fan; Hong Jiao; Yanbin Fu; Sydney Peters; Qingshu Xu; Robert Lissitz; Tianyi Zhou; |
| 388 | LLMs Don’t Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To collaborate effectively with humans, language models must be able to explain their decisions in natural language. We study a specific type of self-explanation: self-generated counterfactual explanations (SCEs), where a model explains its prediction by modifying the input such that it would have predicted a different outcome. |
Harry Mayne; Ryan Othniel Kearns; Yushi Yang; Andrew M. Bean; Eoin D. Delaney; Chris Russell; Adam Mahdi; |
| 389 | DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our paper proposes an alternative approach: Single-Page Hard Negative Query Generation, which goes the other way around. |
Navve Wasserman; Oliver Heinimann; Yuval Golbari; Tal Zimbalist; Eli Schwartz; Michal Irani; |
| 390 | Multilingual Language Model Pretraining Using Machine-translated Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we find that documents machine-translated from a high-quality English corpus, can contribute significantly to the pretraining quality of multilingual LLMs. |
Jiayi Wang; Yao Lu; Maurice Weber; Max Ryabinin; David Ifeoluwa Adelani; Yihong Chen; Raphael Tang; Pontus Stenetorp; |
| 391 | More Data or Better Data? A Critical Analysis of Data Selection and Synthesis for Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite various proposed data construction methods, their practical utility in real-world pipelines remains underexplored. In this work, we conduct a comprehensive analysis of open-source datasets and data synthesis techniques for mathematical reasoning, evaluating them under a unified pipeline designed to mirror training and deployment scenarios. |
Yike Zhao; Simin Guo; Ziqing Yang; Shifan Han; Dahua Lin; Fei Tan; |
| 392 | Facilitating Long Context Understanding Via Supervised Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we integrate Chain-of-Thought (CoT) reasoning into LLMs in a supervised manner to facilitate effective long-context understanding. |
Jingyang Lin; Andy Wong; Tian Xia; Shenghua He; Hui Wei; Mei Han; Jiebo Luo; |
| 393 | Reshaping Representation Space to Balance The Safety and Over-rejection in Large Audio Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an unsupervised safety-fine-tuning strategy as remedy that reshapes model’s representation space to enhance existing LALMs safety-alignment while balancing the risk of over-rejection. |
Hao Yang; Lizhen Qu; Ehsan Shareghi; Gholamreza Haffari; |
| 394 | Understanding LLMs’ Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we evaluate the cross-lingual context retrieval of over 40 LLMs across 12 languages, using cross-lingual machine reading comprehension (xMRC) as a representative scenario. |
Changjiang Gao; Hankun Lin; Xin Huang; Xue Han; Junlan Feng; Chao Deng; Jiajun Chen; Shujian Huang; |
| 395 | Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While multilingual financial NLP has revealed large performance gaps across languages, no benchmarks or LLMs have been tailored for Greek financial tasks until now. To bridge this gap, we introduce Plutus-ben, the first Greek Financial Evaluation Benchmark, and Plutus-8B, the first financial LLM fine-tuned on Greek-specific financial data. |
Xueqing Peng; Triantafillos Papadopoulos; Efstathia Soufleri; Polydoros Giannouris; Ruoyu Xiang; Yan Wang; Lingfei Qian; Jimin Huang; Qianqian Xie; Sophia Ananiadou; |
| 396 | Discursive Circuits: How Do Language Models Understand Discourse Relations? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make circuit discovery feasible, we introduce a task called Completion under Discourse Relation (CuDR), where a model completes a discourse given a specified relation. |
Yisong Miao; Min-Yen Kan; |
| 397 | Aligning LLMs for Multilingual Consistency in Enterprise Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a practical, batch-wise alignment strategy for fine-tuning LLMs, leveraging semantically equivalent multilingual data in each training batch to directly align model outputs across languages. |
Amit Agarwal; Hansa Meghwani; Hitesh Laxmichand Patel; Tao Sheng; Sujith Ravi; Dan Roth; |
| 398 | RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Region Comprehension Index (RCI), the first model-based score to directly quantify a dataset’s reliance on global versus local visual information. |
Amit Agarwal; Hitesh Laxmichand Patel; Srikant Panda; Hansa Meghwani; Jyotika Singh; Karan Dua; Paul Li; Tao Sheng; Sujith Ravi; Dan Roth; |
| 399 | JUREX-4E: Juridical Expert-Annotated Four-Element Knowledge Base for Legal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While recent work has explored prompting LLMs to follow FET, our evaluation demonstrates that LLM-generated four-elements are often incomplete and less representative, limiting their effectiveness in legal reasoning. To address these issues, we present JUREX-4E, an expert-annotated four-element knowledge base covering 155 criminal charges. |
Huanghai Liu; Quzhe Huang; Qingjing Chen; Yiran Hu; Jiayu Ma; Yun Liu; Weixing Shen; Yansong Feng; |
| 400 | ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce ESGenius, a comprehensive benchmark for evaluating and enhancing the proficiency of Large Language Models (LLMs) in Environmental, Social, and Governance (ESG) and sustainability-focused question answering. |
Chaoyue He; Xin Zhou; Yi Wu; Xinjia Yu; Yan Zhang; Lei Zhang; Di Wang; Shengfei Lyu; Hong Xu; Wang Xiaoqiao; Wei Liu; Chunyan Miao; |
| 401 | HD-PiSSA: High-Rank Distributed Orthogonal Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing parameter-efficient fine-tuning (PEFT) methods for large language models (LLMs), such as LoRA and PiSSA, constrain model updates to low-rank subspaces, limiting their expressiveness and leading to suboptimal performance on complex tasks. To address this, we introduce **H**igh-rank **D**istributed **PiSSA (HD-PiSSA)**, a distributed PEFT approach that initializes **orthogonal adapters** across different devices and aggregates their delta updates collectively on (W) for fine-tuning. |
Yiding Wang; Fanxu Meng; Xuefeng Zhang; Fan Jiang; Pingzhi Tang; Muhan Zhang; |
| 402 | XCoRe: Cross-context Coreference Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is inherently similar to what happens in the cross-document setting, in which systems infer coreference relations between mentions that are found in separate documents. In this paper, we unify these two challenging settings under the general framework of cross-context coreference, and introduce xCoRe, a new unified approach designed to efficiently handle short-, long-, and cross-document coreference resolution. |
Giuliano Martinelli; Bruno Gatti; Roberto Navigli; |
| 403 | TALON: A Multi-Agent Framework for Long-Table Exploration and Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TALON, a multi-agent framework designed for question answering over long tables. |
Ruochun Jin; Xiyue Wang; Dong Wang; Haoqi Zheng; Yunpeng Qi; Silin Yang; Meng Zhang; |
| 404 | MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present MaZO, the first framework specifically designed for multi-task LLM fine-tuning under ZO optimization. |
Zhen Zhang; Yifan Yang; Kai Zhen; Nathan Susanj; Athanasios Mouchtaris; Siegfried Kunzmann; Zheng Zhang; |
| 405 | Women, Infamous, and Exotic Beings: A Comparative Study of Honorific Usages in Wikipedia and LLMs for Bengali and Hindi Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, (i) We present the first large-scale study of third-person honorific pronoun and verb usage across 10,000 Hindi and Bengali Wikipedia articles with annotations linked to key socio-demographic attributes of the subjects, including gender, age group, fame, and cultural origin. |
Sourabrata Mukherjee; Atharva Mehta; Sougata Saha; Akhil Arora; Monojit Choudhury; |
| 406 | Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we introduce JQL, a systematic approach that efficiently curates diverse and high-quality multilingual data at scale while significantly reducing computational demands. |
Mehdi Ali; Manuel Brack; Max Lübbering; Elias Wendt; Abbas Goher Khan; Richard Rutmann; Alex Jude; Maurice Kraus; Alexander Arno Weber; Felix Stollenwerk; David Kaczér; Florian Mai; Lucie Flek; Rafet Sifa; Nicolas Flores-Herr; Joachim Koehler; Patrick Schramowski; Michael Fromm; Kristian Kersting; |
| 407 | AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key observation in this work is that T2I models frequently struggle to capture nuanced and often implicit attributes inherent in action depiction, leading to generating images that lack key contextual details. |
Vatsal Malaviya; Agneet Chatterjee; Maitreya Patel; Yezhou Yang; Chitta Baral; |
| 408 | End-to-End Learnable Psychiatric Scale Guided Risky Post Screening for Depression Detection on Social Media Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current methods utilize frozen screening models, which can miss critical information and limit overall performance due to isolated screening and detection processes. To address these limitations, we propose **E2-LPS** **E**nd-to-**E**nd **L**earnable **P**sychiatric Scale Guided Risky Post **S**creening Model) for jointly training our screening model, guided by psychiatric scales, alongside the detection model. |
Bichen Wang; Yuzhe Zi; Yixin Sun; Hao Yang; Yanyan Zhao; Bing Qin; |
| 409 | Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, obtaining proper uncertainty scores is complicated by the conditional dependency between the generation steps of an autoregressive LLM, because it is hard to model it explicitly. Here, we propose to learn this dependency from attention-based features. |
Artem Vazhentsev; Ekaterina Fadeeva; Rui Xing; Gleb Kuzmin; Ivan Lazichny; Alexander Panchenko; Preslav Nakov; Timothy Baldwin; Maxim Panov; Artem Shelmanov; |
| 410 | ModelCitizens: Representing Community Voices in Online Safety Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing toxicity detection models are typically trained on annotations that collapse diverse annotator perspectives into a single ground truth, erasing important context-specific notions of toxicity such as reclaimed language. To address this, we introduce MODELCITIZENS, a dataset of 6. |
Ashima Suvarna; Christina A Chance; Karolina Naranjo; Hamid Palangi; Sophie Hao; Thomas Hartvigsen; Saadia Gabriel; |
| 411 | Weights-Rotated Preference Optimization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose a novel Weights-Rotated Preference Optimization (RoPO) algorithm, which implicitly constrains the output layer logits with the KL divergence inherited from DPO and explicitly constrains the intermediate hidden states by fine-tuning on a multi-granularity orthogonal matrix. |
Chenxu Yang; Ruipeng Jia; Mingyu Zheng; Naibin Gu; Zheng Lin; Siyuan Chen; Weichong Yin; Hua Wu; Weiping Wang; |
| 412 | Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we (1) introduce a simple yet interpretable metric, termed date fragmentation ratio, that measures how faithfully a tokeniser preserves multi-digit date components; (2) release DateAugBench, a suite of 6500 examples spanning three temporal reasoning tasks: context-based date resolution, format-invariance puzzles, and date arithmetic across historical, contemporary, and future time periods; and (3) through layer-wise probing and causal attention-hop analyses, uncover an emergent date-abstraction mechanism whereby large language models stitch together the fragments of month, day, and year components for temporal reasoning. |
Gagan Bhatia; Maxime Peyrard; Wei Zhao; |
| 413 | Simple Yet Effective: An Information-Theoretic Approach to Multi-LLM Uncertainty Quantification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We hypothesize that LLMs make complementary predictions due to differences in training and the Zipfian nature of language, and that aggregating their outputs leads to more reliable uncertainty estimates. To leverage this, we propose MUSE (Multi-LLM Uncertainty via Subset Ensembles), a simple information-theoretic method that uses Jensen-Shannon Divergence to identify and aggregate well-calibrated subsets of LLMs. |
Maya Kruse; Majid Afshar; Saksham Khatwani; Anoop Mayampurath; Guanhua Chen; Yanjun Gao; |
| 414 | MemeIntel: Explainable Detection of Propagandistic and Hateful Memes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this challenge, we introduce MemeXplain, an explanation-enhanced dataset for propaganda memes in Arabic and hateful memes in English, making it the first large-scale resource for these tasks. To solve these tasks, we propose a novel multi-stage optimization approach and train Vision-Language Models (VLMs). |
Mohamed Bayan Kmainasi; Abul Hasnat; Md Arid Hasan; Ali Ezzat Shahroor; Firoj Alam; |
| 415 | Co-Eval: Augmenting LLM-based Evaluation with Machine Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing LLM-based evaluations often suffer from biases and misalignment, particularly in domain-specific tasks, due to limited functional understanding and knowledge gaps. To address these challenges, we first investigate the relationship between an LLM-based evaluator�s familiarity with the target task and its evaluation performance. We then introduce the Co-Eval framework, which leverages a criteria planner model and optimized machine metrics to enhance the scalability and fairness of LLM-based evaluation. |
Ling-I Wu; Weijie Wu; Minyu Chen; Jianxin Xue; Guoqiang Li; |
| 416 | MMDocIR: Benchmarking Multimodal Retrieval for Long Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its increasing popularity, there is a notable lack of a comprehensive and robust benchmark to effectively evaluate the performance of systems in such tasks. To address this gap, this work introduces a new benchmark, named MMDocIR, that encompasses two distinct tasks: page-level and layout-level retrieval. |
Kuicai Dong; Yujing Chang; Derrick Goh Xin Deik; Dexun Li; Ruiming Tang; Yong Liu; |
| 417 | Orchestrating Audio: Multi-Agent Framework for Long-Video Audio Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose LVAS-Agent, a multi-agent framework that offers a coordinated, multi-component approach to long-video audio generation. |
Yehang Zhang; Xinli Xu; Xiaojie Xu; Doudou Zhang; Li Liu; Ying-Cong Chen; |
| 418 | The LLM Already Knows: Estimating LLM-Perceived Question Difficulty Via Hidden Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach for difficulty estimation that leverages only the hidden representations produced by the target LLM. |
Yubo Zhu; Dongrui Liu; Zecheng Lin; Wei Tong; Sheng Zhong; Jing Shao; |
| 419 | An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we test 30 LLMs on common reasoning datasets under a wide range of output length budgets, and we analyze the correlation between the inference accuracy and various properties including model type, model size, prompt style, etc. |
Yi Sun; Han Wang; Jiaqiang Li; Jiacheng Liu; Xiangyu Li; Hao Wen; Yizhen Yuan; Huiwen Zheng; Yan Liang; Yuanchun Li; Yunxin Liu; |
| 420 | JI2S: Joint Influence‐Aware Instruction Data Selection for Efficient Fine‐Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Influence‐based methods address this by estimating each example’s marginal contribution to overall performance, but they typically assume additive contributions and therefore overlook higher‐order interactions among samples. To overcome these limitations, we propose JI2S, a novel framework that jointly models both marginal and combinatorial influences within sample groups. |
Jingyu Wei; Bo Liu; Tianjiao Wan; Baoyun Peng; Xingkong Ma; Mengmeng Guo; |
| 421 | ViDoRAG: Visual Document Retrieval-Augmented Generation Via Dynamic Iterative Reasoning Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on it, we identify key limitations in current RAG approaches: (i) purely visual retrieval methods struggle to effectively integrate both textual and visual features, and (ii) previous approaches often allocate insufficient reasoning tokens, limiting their effectiveness. To address these challenges, we propose ViDoRAG, a novel multi-agent RAG framework tailored for complex reasoning across visual documents. |
Qiuchen Wang; Ruixue Ding; Zehui Chen; Weiqi Wu; Shihang Wang; Pengjun Xie; Feng Zhao; |
| 422 | NUTMEG: Separating Signal From Noise in Annotator Disagreement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce NUTMEG, a new Bayesian model that incorporates information about annotator backgrounds to remove noisy annotations from human-labeled training data while preserving systematic disagreements. |
Jonathan Ivey; Susan Gauch; David Jurgens; |
| 423 | LimRank: Less Is More for Reasoning-Intensive Information Reranking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that modern LLMs can be effectively adapted using only minimal, high-quality supervision. |
Tingyu Song; Yilun Zhao; Siyue Zhang; Chen Zhao; Arman Cohan; |
| 424 | Towards Transferable Personality Representation Learning Based on Triplet Comparisons and Its Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the application side, concentrating on classifying the entire corpus limits its applicability in more common single-instance scenarios. To address these issues, we propose a new task paradigm in text-based personality representation learning. |
Kai Tang; Rui Wang; Renyu Zhu; Minmin Lin; Xiao Ding; Tangjie Lv; Changjie Fan; Runze Wu; Haobo Wang; |
| 425 | UnitCoder: Scalable Code Synthesis from Pre-training Corpora Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce UnitCoder, which directly supervises pre-training data quality through automatically generated unit tests, while ensuring the correctness via an iterative fix and refine flow. |
Yichuan Ma; Yunfan Shao; Peiji Li; Demin Song; Qipeng Guo; Linyang Li; Xipeng Qiu; Kai Chen; |
| 426 | Alignment Quality Index (AQI) : Beyond Refusals: AQI As An Intrinsic Alignment Diagnostic Via Latent Geometry, Cluster Divergence, and Layer Wise Pooled Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Aligned models are often vulnerable to jailbreaking, stochasticity of generation, and alignment faking. To address this issue, we introduce the **Alignment Quality Index (AQI)**. |
Abhilekh Borah; Chhavi Sharma; Danush Khanna; Utkarsh Bhatt; Gurpreet Singh; Hasnat Md Abdullah; Raghav Kaushik Ravi; Vinija Jain; Jyoti Patel; Shubham Singh; Vasu Sharma; Arpita Vats; Rahul Raja; Aman Chadha; Amitava Das; |
| 427 | SolEval: Benchmarking Large Language Models for Repository-level Solidity Smart Contract Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the lack of adequate benchmarks for Solidity, LLMs’ ability to generate secure, cost-effective smart contracts remains unexplored. To fill this gap, we construct SolEval, the first repository-level benchmark designed for Solidity smart contract generation, to evaluate the performance of LLMs on Solidity. |
Zhiyuan Peng; Xin Yin; Rui Qian; Peiqin Lin; YongKang Liu; Hao Zhang; Chenhao Ying; Yuan Luo; |
| 428 | Training A Utility-based Retriever Through Shared Context Attribution for Retrieval-Augmented Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes SCARLet, a framework for training utility-based retrievers in RALMs, which incorporates two key factors, multi-task generalization and inter-passage interaction. |
Yilong Xu; Jinhua Gao; Xiaoming Yu; Yuanhai Xue; Baolong Bi; Huawei Shen; Xueqi Cheng; |
| 429 | NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Research on tool calling has gathered momentum, but evaluation benchmarks and datasets representing the complexity of the tasks have lagged behind. In this work, we focus on one such complexity, nested sequencing, with the goal of extending existing benchmarks and evaluation. |
Kinjal Basu; Ibrahim Abdelaziz; Kiran Kate; Mayank Agarwal; Maxwell Crouse; Yara Rizk; Kelsey Bradford; Asim Munawar; Sadhana Kumaravel; Saurabh Goyal; Xin Wang; Luis A. Lastras; Pavan Kapanipathi; |
| 430 | AdaSteer: Your Aligned LLM Is Inherently An Adaptive Jailbreak Defender Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Activation steering offers a training-free defense method but relies on fixed steering coefficients, resulting in suboptimal protection and increased false rejections of benign inputs. To address this, we propose AdaSteer, an adaptive activation steering method that dynamically adjusts model behavior based on input characteristics. |
Weixiang Zhao; Jiahe Guo; Yulin Hu; Yang Deng; An Zhang; Xingyu Sui; Xinyang Han; Yanyan Zhao; Bing Qin; Tat-Seng Chua; Ting Liu; |
| 431 | R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent T2I models have made impressive progress in producing photorealistic images, their reasoning capability remains underdeveloped and insufficiently evaluated. To bridge this gap, we introduce R2I-Bench, a comprehensive benchmark specifically designed to rigorously assess reasoning-driven T2I generation. |
Kaijie Chen; Zihao Lin; Zhiyang Xu; Ying Shen; Yuguang Yao; Joy Rimchala; Jiaxin Zhang; Lifu Huang; |
| 432 | Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To effectively utilize multiple positive samples per query, we introduce a novel loss that maximizes their summed marginal likelihood. |
Hengran Zhang; Minghao Tang; Keping Bi; Jiafeng Guo; Shihao Liu; Daiting Shi; Dawei Yin; Xueqi Cheng; |
| 433 | RecBase: Generative Foundation Model Pretraining for Zero-Shot Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods, relying on language-level knowledge, fail to capture dynamic, item-level user interests across domains. To bridge this gap, we propose RecBase, a domain-agnostic foundational model pretrained with a recommendation-oriented objective. |
Sashuai Zhou; Weinan Gan; Qijiong Liu; Ke Lei; Jieming Zhu; Hai Huang; Yan Xia; Ruiming Tang; Zhenhua Dong; Zhou Zhao; |
| 434 | REIC: RAG-Enhanced Intent Classification at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, as companies expand their product lines, intent classification faces scalability challenges due to the increasing number of intents and variations in taxonomy across different verticals. In this paper, we introduce REIC, a Retrieval-augmented generation Enhanced Intent Classification approach, which addresses these challenges effectively. |
Ziji Zhang; Michael Yang; Zhiyu Chen; Yingying Zhuang; Shu-Ting Pi; Qun Liu; Rajashekar Maragoud; Vy Nguyen; Anurag Beniwal; |
| 435 | Follow The Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the task of Fine-grained Flowchart Attribution, which traces specific components grounding a flowchart referring LLM response. |
Manan Suri; Puneet Mathur; Nedim Lipka; Franck Dernoncourt; Ryan A. Rossi; Vivek Gupta; Dinesh Manocha; |
| 436 | Leveraging Multilingual Training for Authorship Representation: Enhancing Generalization Across Languages and Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel method for multilingual AR learning that incorporates two key innovations: probabilistic content masking, which encourages the model to focus on stylistically indicative words rather than content-specific words, and language-aware batching, which improves contrastive learning by reducing cross-lingual interference. |
Junghwan Kim; Haotian Zhang; David Jurgens; |
| 437 | Sentence Smith: Controllable Edits for Evaluating Text Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using modern parsers and a safety supervision mechanism, we show how close current methods come to this goal. Concretely, we propose the framework for English, which has three steps: 1. Parsing a sentence into a semantic graph. 2. Applying human-designed semantic manipulation rules. 3. Generating text from the manipulated graph. A final entailment check (4. ) verifies the validity of the applied transformation. |
Hongji Li; Andrianos Michail; Reto Gubelmann; Simon Clematide; Juri Opitz; |
| 438 | Recontextualizing Revitalization: A Mixed Media Approach to Reviving The Nüshu Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent Natural Language Processing (NLP) work has digitized small N�shu-Chinese corpora, but the script remains computationally inaccessible due to its handwritten, mixed-media form and dearth of multimodal resources. We address this gap with two novel datasets: N�shuVision, an image corpus of 500 rendered sentences in traditional vertical, right-to-left orthography, and N�shuStrokes, the first sequential handwriting recordings of all 397 Unicode N�shu characters by an expert calligrapher. |
Ivory Yang; Xiaobo Guo; Yuxin Wang; Hefan Zhang; Yaning Jia; William Dinauer; Soroush Vosoughi; |
| 439 | Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction Via Pseudo Query Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Lookahead Q-Cache (LAQ), a novel eviction framework that generates low-cost pseudo lookahead queries to better approximate the true decoding-stage queries. |
Yixuan Wang; Shiyu Ji; Yijun Liu; Yuzhuang Xu; Yang Xu; Qingfu Zhu; Wanxiang Che; |
| 440 | DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization Via Process Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this approach still suffers from inefficient exploration, sparse reward signals, and ambiguous global reward feedback. To address these challenges, we propose DecEx-RAG, which models RAG as a Markov Decision Process (MDP) incorporating decision-making and execution, while introducing an efficient pruning strategy to optimize data expansion. |
Yongqi Leng; Yikun Lei; Xikai Liu; Meizhi Zhong; Bojian Xiong; Yurong Zhang; Yan Gao; Yiwu; Yao Hu; Deyi Xiong; |
| 441 | Koel-TTS: Enhancing LLM Based Speech Generation with Preference Alignment and Classifier Free Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Autoregressive speech token generation models produce speech with remarkable variety and naturalness but often suffer from hallucinations and undesired vocalizations that do not conform to conditioning inputs. To address these challenges, we introduce Koel-TTS, an encoder-decoder transformer model for multilingual TTS that improves contextual adherence of speech generation LLMs through preference alignment and classifier-free guidance (CFG). |
Shehzeen Samarah Hussain; Paarth Neekhara; Xuesong Yang; Edresson Casanova; Subhankar Ghosh; Roy Fejgin; Mikyas T. Desta; Rafael Valle; Jason Li; |
| 442 | VC4VG: Optimizing Video Captions for Text-to-Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce VC4VG (Video Captioning for Video Generation), a comprehensive caption optimization framework tailored to the needs of T2V models. |
Yang Du; Zhuoran Lin; Kaiqiang Song; Biao Wang; Zhicheng Zheng; Tiezheng Ge; Bo Zheng; Qin Jin; |
| 443 | XQuant: Achieving Ultra-Low Bit KV Cache Quantization with Cross-Layer Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose XQuant, a training-free and plug-and-play framework that achieves ultra-low equivalent bit-width KV cache quantization. |
Haoqi Yang; Yao Yao; Zuchao Li; Baoyuan Qi; Liu Guoming; Hai Zhao; |
| 444 | Detecting Knowledge Boundary of Vision Large Language Models By Sampling-Based Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate the dependence on retrieval and simultaneously maintain, or even improve, the performance benefits provided by retrieval, we propose a method to detect the knowledge boundary of VLLMs, allowing for more efficient use of techniques like RAG. |
Zhuo Chen; Xinyu Wang; Yong Jiang; Zhen Zhang; Xinyu Geng; Pengjun Xie; Fei Huang; Kewei Tu; |
| 445 | PoseStitch-SLT: Linguistically Inspired Pose-Stitching for End-to-End Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose PoseStitch-SLT, a novel pre-training scheme that is inspired by linguistic-templates-based sentence generation technique. |
Abhinav Joshi; Vaibhav Sharma; Sanjeet Singh; Ashutosh Modi; |
| 446 | Calibration Across Layers: Understanding Calibration Evolution in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide a complementary perspective by investigating how calibration evolves throughout the network’s depth. |
Abhinav Joshi; Areeb Ahmad; Ashutosh Modi; |
| 447 | SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify this paradigm often leads to suboptimal alignment between modalities, significantly constraining the LLM’s ability to properly interpret and reason with visual features particularly for smaller language models. To address this fundamental limitation, we propose Supervised Embedding Alignment (SEA), a token-level supervision alignment method that enables more precise visual-text alignment during pretraining. |
Yuanyang Yin; Yaqi Zhao; Yajie Zhang; Yuanxing Zhang; Ke Lin; Jiahao Wang; Xin Tao; Pengfei Wan; Wentao Zhang; Feng Zhao; |
| 448 | VLP: Vision-Language Preference Learning for Embodied Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Vision-Language Preference learning framework, named VLP, which learns a vision-language preference model to provide feedback for embodied manipulation tasks. |
Runze Liu; Chenjia Bai; Jiafei Lyu; Shengjie Sun; Yali Du; Xiu Li; |
| 449 | Auto-Weighted Group Relative Preference Optimization for Multi-Objective Text Generation Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Failing to balance the objectives adequately can lead to overfitting or insufficient learning of each reward function. To address this problem, we propose Auto-Weighted Group Relative Policy Optimization (AW-GRPO), which adjusts reward weights during training according to the progress of the learning of each objective so far. |
Yuki Ichihara; Yuu Jinnai; |
| 450 | Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future Directions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper provides a systematic review of recent progress in optimizing compound AI systems, encompassing both numerical and language-based techniques. |
Yu-Ang Lee; Guan-Ting Yi; Mei-Yi Liu; Jui-Chao Lu; Guan-Bo Yang; Yun-Nung Chen; |
| 451 | FedMABench: Benchmarking Mobile GUI Agents on Decentralized Heterogeneous User Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the challenges, we introduce FedMABench, the first benchmark for federated training and evaluation of mobile GUI agents, specifically designed for heterogeneous scenarios. |
WenHao Wang; Zijie Yu; Rui Ye; Jianqing Zhang; Guangyi Liu; Liang Liu; Siheng Chen; Yanfeng Wang; |
| 452 | Language Models As Causal Effect Generators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present sequence-driven structural causal models (SD-SCMs), a framework for specifying causal models with user-defined structure and language-model-defined mechanisms. |
Lucius E.j. Bynum; Kyunghyun Cho; |
| 453 | InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Social deduction games (SDGs) provide a natural testbed for evaluating individualized reasoning styles, where different players may adopt diverse but contextually valid reasoning strategies under identical conditions. To address this, we introduce InMind, a cognitively grounded evaluation framework designed to assess whether LLMs can capture and apply personalized reasoning styles in SDGs. |
Zizhen Li; Chuanhao Li; Yibin Wang; Qi Chen; Diping Song; Yukang Feng; Jianwen Sun; Jiaxin Ai; Fanrui Zhang; Mingzhu Sun; Kaipeng Zhang; |
| 454 | Are Generative Models Underconfident? Better Quality Estimation with Boosted Model Probability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, lower probability does not necessarily mean lower output quality. Due to this observation, we propose a QE approach called BoostedProb, which boosts the model’s confidence in cases where there are multiple viable output options. |
Tu Anh Dinh; Jan Niehues; |
| 455 | Sparse Neurons Carry Strong Signals of Question Ambiguity in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that question ambiguity is linearly encoded in the internal representations of LLMs and can be both detected and controlled at the neuron level. |
Zhuoxuan Zhang; Jinhao Duan; Edward Kim; Kaidi Xu; |
| 456 | Understanding and Leveraging The Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the emergent expert specialization observed in mixture-of-experts architectures, this work investigates whether certain experts exhibit specialization in context utilization—offering a potential pathway toward targeted optimization for improved context faithfulness. To explore this, we propose Router Lens, a method that accurately identifies context-faithful experts. |
Jun Bai; Minghao Tong; Yang Liu; Zixia Jia; Zilong Zheng; |
| 457 | MKT: A Multi-Stage Knowledge Transfer Framework to Mitigate Catastrophic Forgetting in Multi-Domain Chinese Spelling Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the key flaw of the CSC model when adapting to multi-domain scenarios: the tendency to forget previously acquired knowledge upon learning new domain-specific knowledge (i. e. , **catastrophic forgetting**). |
Peng Xing; Yinghui Li; Shirong Ma; Xinnian Liang; Haojing Huang; Yangning Li; Shu-Yu Guo; Hai-Tao Zheng; Wenhao Jiang; Ying Shen; |
| 458 | DDO: Dual-Decision Optimization for LLM-Based Medical Consultation Via Multi-Agent Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This mismatch often results in ineffective symptom inquiry and unreliable disease diagnosis. To address this, we propose DDO, a novel LLM-based framework that performs Dual-Decision Optimization by decoupling the two sub-tasks and optimizing them with distinct objectives through a collaborative multi-agent workflow. |
Zhihao Jia; Mingyi Jia; Junwen Duan; Jianxin Wang; |
| 459 | STEER-BENCH: A Benchmark for Evaluating The Steerability of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce STEER-BENCH, a benchmark for assessing population-specific steering using contrasting Reddit communities. |
Kai Chen; Zihao He; Taiwei Shi; Kristina Lerman; |
| 460 | Predicate-Guided Generation for Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Prolog-MATH, a curated corpus designed to support mathematical reasoning in large language models (LLMs) through logic programming. |
Jiajun Chen; Yik-Cheung Tam; |
| 461 | CLLMate: A Multimodal Benchmark for Weather and Climate Events Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing environmental forecasting research focuses narrowly on predicting numerical meteorological variables (e. g. , temperature), neglecting the translation of these variables into actionable textual narratives of events and their consequences. To bridge this gap, we proposed Weather and Climate Event Forecasting (WCEF), a new task that leverages numerical meteorological raster data and textual event data to predict weather and climate events. |
Haobo Li; Zhaowei Wang; Jiachen Wang; Yueya Wang; Alexis Kai Hon Lau; Huamin Qu; |
| 462 | LORAXBENCH: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce LORAXBENCH, a benchmark that focuses on low-resource languages of Indonesia and covers 6 diverse tasks: reading comprehension, open-domain QA, language inference, causal reasoning, translation, and cultural QA. |
Alham Fikri Aji; Trevor Cohn; |
| 463 | Idiosyncratic Versus Normative Modeling of Atypical Speech Recognition: Dysarthric Case Studies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To investigate this, we compare four strategies: (a) *normative* models trained on typical speech (no personalization), (b) *idiosyncratic* models completely personalized to individuals, (c) *dysarthric-normative* models trained on other dysarthric speakers, and (d) *dysarthric-idiosyncratic* models which combine strategies by first modeling normative patterns before adapting to individual speech. In this case study, we find the dysarthric-idiosyncratic model performs better than the idiosyncratic approach while requiring less than half as much personalized data (36. |
Vishnu Raja; Adithya V Ganesan; Anand Syamkumar; Ritwik Banerjee; H. Schwartz; |
| 464 | Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present the first systematic study of LRM unlearning and reveal that conventional unlearning methods often overlook critical information leakage in reasoning traces, even when final answers are successfully removed. To address this, we propose Reasoning-aware Representation Misdirection for Unlearning (R2MU), a method that suppresses sensitive reasoning traces while preserving the model’s general reasoning ability. |
Changsheng Wang; Chongyu Fan; Yihua Zhang; Jinghan Jia; Dennis Wei; Parikshit Ram; Nathalie Baracaldo; Sijia Liu; |
| 465 | Beyond Demonstrations: Dynamic Vector Construction from Latent Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing ICV methods remain sensitive to ICL-specific factors, often use coarse or semantically fragmented representations as the source of the vector, and rely on heuristic-based injection positions, limiting their applicability. To address these issues, we propose Dynamic Vector (DyVec), which incorporates an Exhaustive Query Rotation (EQR) strategy to extract robust semantically aggregated latent representations by mitigating variance introduced by ICL. |
Wang Cai; Hsiu-Yuan Huang; Zhixiang Wang; Yunfang Wu; |
| 466 | From Parameters to Performance: A Data-Driven Study on LLM Structure and Development Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the rapid growth in model scale and capability, systematic, data-driven research on how structural configurations affect performance remains scarce. To address this gap, we present a large-scale dataset encompassing diverse open-source LLM structures and their performance across multiple benchmarks. |
Suqing Wang; Zuchao Li; Shi Luohe; Bo Du; Hai Zhao; Yun Li; Qianren Wang; |
| 467 | Reflective Agreement: Combining Self-Mixture of Agents with A Sequence Tagger for Robust Event Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conversely, generative approaches leveraging Large Language Models (LLMs) provide higher semantic flexibility and recall but suffer from hallucinations and inconsistent predictions. To address these challenges, we propose Agreement-based Reflective Inference System (ARIS), a hybrid approach combining a Self Mixture of Agents with a discriminative sequence tagger. |
Fatemeh Haji; Mazal Bethany; Cho-Yu Jason Chiang; Anthony Rios; Peyman Najafirad; |
| 468 | Advancing Oversight Reasoning Across Languages for Audit Sycophantic Behaviour Via X-Agent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make interactions consistent, reliable and safe, we introduce X-Agent, an Oversight Reasoning framework that audits human–LLM dialogues, reasons about them, captures sycophancy and corrects the final outputs. |
Giulia Pucci; Leonardo Ranaldi; |
| 469 | DiplomacyAgent: Do LLMs Balance Interests and Ethical Principles in International Events? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by think-tank decision-making philosophy, we propose DiplomacyAgent, an LLM-based multi-agent system for diplomatic position analysis. |
Jianxiang Peng; Ling Shi; Xinwei Wu; Hanwen Zhang; Fujiang Liu; Haocheng Lyu; Deyi Xiong; |
| 470 | REVIVING YOUR MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning Via Sparse Model Diffing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing evaluation methods assess performance after such interventions, there remains no general approach for detecting unintended side effects—such as unlearning biology content degrading performance on chemistry tasks, particularly when these effects are unpredictable or emergent. To address this issue, we introduce MNEME, Model diffiNg for Evaluating Mechanistic Effects, a framework for identifying these side effects using sparse model diffing. |
Aly M. Kassem; Zhuan Shi; Negar Rostamzadeh; Golnoosh Farnadi; |
| 471 | Towards Event Extraction with Massive Types: LLM-based Collaborative Annotation and Partitioning Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, to alleviate the excessively long prompts caused by massive types, we propose an LLM-based Partitioning method for EE called LLM-PEE. |
Wenxuan Liu; Zixuan Li; Long Bai; Yuxin Zuo; Daozhu Xu; Xiaolong Jin; Jiafeng Guo; Xueqi Cheng; |
| 472 | Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs’ Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods, despite improving accuracy by allocating more computational resources during inference, often suffer from path homogenization and inefficient use of intermediate results. To address these limitations, we propose Stepwise Reasoning Checkpoint Analysis (SRCA), a framework that introduces checkpoints between reasoning steps. |
Zezhong Wang; Xingshan Zeng; Weiwen Liu; Yufei Wang; Liangyou Li; Yasheng Wang; Lifeng Shang; Xin Jiang; Qun Liu; Kam-Fai Wong; |
| 473 | Compositional Generalisation for Explainable Hate Speech Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This has been linked to dataset biases and the use of sentence-level labels, which fail to teach models the underlying structure of hate speech. In this work, we show that even when models are trained with more fine-grained, span-level annotations (e. g. , “artists” is labeled as target and “are parasites” as dehumanising comparison), they struggle to disentangle the meaning of these labels from the surrounding context. |
Agostina Calabrese; Tom Sherborne; Björn Ross; Mirella Lapata; |
| 474 | Quantized But Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce TruthfulnessEval, a comprehensive evaluation framework for assessing the truthfulness of quantized LLMs across three dimensions: (1) Truthfulness on Logical Reasoning; (2) Truthfulness on Common Sense; and (3) Truthfulness on Imitative Falsehoods. |
Yao Fu; Xianxuan Long; Runchao Li; Haotian Yu; Mu Sheng; Xiaotian Han; Yu Yin; Pan Li; |
| 475 | Do LLMs Adhere to Label Definitions? Examining Their Receptivity to External Label Definitions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Do LLMs genuinely incorporate external definitions, or do they primarily rely on their parametric knowledge? To address these questions, we conduct controlled experiments across multiple explanation benchmark datasets (general and domain-specific) and label definition conditions, including expert-curated, LLM-generated, perturbed, and swapped definitions. |
Seyedali Mohammadi; Bhaskara Hanuma Vedula; Hemank Lamba; Edward Raff; Ponnurangam Kumaraguru; Francis Ferraro; Manas Gaur; |
| 476 | Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel training objective, HierarchicalTopK, which trains a single SAE to optimise reconstructions across multiple sparsity levels simultaneously. |
Nikita Balagansky; Yaroslav Aksenov; Daniil Laptev; Vadim Kurochkin; Gleb Gerasimov; Nikita Koriagin; Daniil Gavrilov; |
| 477 | ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through a systematic analysis of existing training paradigms in vision-language-action models (VLA), we identify two key challenges: spurious forgetting, where robot training overwrites crucial visual-text alignments, and task interference, where competing control and understanding tasks degrade performance when trained jointly. To overcome these limitations, we propose ChatVLA, a novel framework featuring Phased Alignment Training, which incrementally integrates multimodal data after initial control mastery, and a Mixture-of-Experts architecture to minimize task interference. |
Zhongyi Zhou; Yichen Zhu; Minjie Zhu; Junjie Wen; Ning Liu; Zhiyuan Xu; Weibin Meng; Yaxin Peng; Chaomin Shen; Feifei Feng; Yi Xu; |
| 478 | CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose a model-agnostic Diversified Multiplet Upcycling (DMU) framework for CLIP. |
Jihai Zhang; Xiaoye Qu; Tong Zhu; Yu Cheng; |
| 479 | WildScore: Benchmarking MLLMs In-the-Wild Symbolic Music Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To facilitate a comprehensive evaluation, we propose a systematic taxonomy,comprising both high-level and fine-grained musicological ontologies. |
Gagan Mundada; Yash Vishe; Amit Namburi; Xin Xu; Zachary Novack; Julian McAuley; Junda Wu; |
| 480 | Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose WriteHERE, a general agent framework that achieves human-like adaptive writing through recursive task decomposition and dynamic integration of three fundamental task types: retrieval, reasoning, and composition. |
Ruibin Xiong; Yimeng Chen; Dmitrii Khizbullin; Mingchen Zhuge; Jürgen Schmidhuber; |
| 481 | Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The rapid development of Multimodal Large Reasoning Models (MLRMs) has demonstrated broad application potential, yet their safety and reliability remain critical concerns that require systematic exploration. To address this gap, we conduct a comprehensive and systematic safety evaluation of 13 MLRMs across 5 benchmarks and unveil prevalent safety degradation phenomena in most advanced models. |
Xinyue Lou; You Li; Jinan Xu; Xiangyu Shi; Chi Chen; Kaiyu Huang; |
| 482 | ICL CIPHERS: Quantifying ”Learning” in In-Context Learning Via Substitution Ciphers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce ICL CIPHERS, a class of task reformulations based on substitution ciphers borrowed from classic cryptography. |
Zhouxiang Fang; Aayush Mishra; Muhan Gao; Anqi Liu; Daniel Khashabi; |
| 483 | Interpretable Mnemonic Generation for Kanji Learning Via Expectation-Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite recent efforts to use large language models (LLMs) to assist learners, existing methods for LLM-based keyword mnemonic generation function as a black box, offering limited interpretability. We propose a generative framework that explicitly models the mnemonic construction process as driven by a set of common rules, and learn them using a novel Expectation-Maximization-type algorithm. |
Jaewook Lee; Alexander Scarlatos; Andrew Lan; |
| 484 | Pre-trained Models Perform The Best When Token Distributions Follow Zipf’s Law Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a principled method for determining the vocabulary size by analyzing token frequency distributions through Zipf’s law. |
Yanjin He; Qingkai Zeng; Meng Jiang; |
| 485 | RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose RethinkMCTS, a framework that systematically explores and refines the reasoning process for code generation. |
Qingyao Li; Wei Xia; Xinyi Dai; Kounianhua Du; Weiwen Liu; Yasheng Wang; Ruiming Tang; Yong Yu; Weinan Zhang; |
| 486 | MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel, open-source social network simulation framework, MOSAIC, where generative language agents predict user behaviors such as liking, sharing, and flagging content. |
Genglin Liu; Vivian T. Le; Salman Rahman; Elisa Kreiss; Marzyeh Ghassemi; Saadia Gabriel; |
| 487 | PBI-Attack: Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most previous work requires access to model gradients, or is based on human knowledge (prompt engineering) to complete jailbreak, and they hardly consider the interaction of images and text, resulting in inability to jailbreak in black box scenarios or poor performance. To overcome these limitations, we propose a Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for toxicity maximization, referred to as PBI-Attack. |
Ruoxi Cheng; Yizhong Ding; Shuirong Cao; Ranjie Duan; Xiaoshuang Jia; Shaowei Yuan; Simeng Qin; Zhiqiang Wang; Xiaojun Jia; |
| 488 | Improving Low-Resource Sequence Labeling with Knowledge Fusion and Contextual Label Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches still struggle with inadequate model applicability and semantic distribution biases in domain-specific contexts. To overcome these limitations, we propose a novel framework that combines an LLM-based knowledge enhancement workflow with a span-based Knowledge Fusion for Rich and Efficient Extraction (KnowFREE) model. |
Peichao Lai; Jiaxin Gan; Feiyang Ye; Wentao Zhang; Fangcheng Fu; Yilei Wang; Bin Cui; |
| 489 | Profiler: Black-box AI-generated Text Origin Detection Via Context-aware Inference Pattern Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel black-box AI-generated text origin detection method, dubbed Profiler, which accurately predicts the origin of an input text by extracting distinct context inference patterns through calculating and analyzing novel context losses between the surrogate model’s output logits and the adjacent input context. |
Hanxi Guo; Siyuan Cheng; Xiaolong Jin; Zhuo Zhang; Guangyu Shen; Kaiyuan Zhang; Shengwei An; Guanhong Tao; Xiangyu Zhang; |
| 490 | Beyond Task-Oriented and Chitchat Dialogues: Proactive and Transition-Aware Conversational Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, real-world conversations naturally involve fluid transitions between these modes. To address this gap, we introduce TACT (TOD-And-Chitchat Transition), a dataset designed for transition-aware dialogue modeling that incorporates structurally diverse and integrated mode flows. |
Yejin Yoon; Yuri Son; Namyeong So; Minseo Kim; Minsoo Cho; Chanhee Park; Seungshin Lee; Taeuk Kim; |
| 491 | Proactive Assistant Dialogue Generation from Streaming Egocentric Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These systems must provide interactive, proactive assistance based on streaming visual inputs, yet their development is constrained by the costly and labor-intensive process of data collection and system evaluation. To address these limitations, we present a comprehensive framework with three key contributions. First, we introduce a novel data curation pipeline that synthesizes dialogues from annotated egocentric videos, resulting in ProAssist, a large-scale synthetic dialogue dataset spanning multiple domains. |
Yichi Zhang; Xin Luna Dong; Zhaojiang Lin; Andrea Madotto; Anuj Kumar; Babak Damavandi; Joyce Chai; Seungwhan Moon; |
| 492 | Understanding and Mitigating Overrefusal in LLMs from An Unveiling Perspective of Safety Decision Boundary Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our findings reveal that overrefusal is closely tied to misalignment at these boundary regions, where models struggle to distinguish subtle differences between benign and harmful content. Building on these insights, we present **RASS**, an automated framework for prompt generation and selection that strategically targets overrefusal prompts near the safety boundary. |
Licheng Pan; Yongqi Tong; Xin Zhang; Xiaolu Zhang; Jun Zhou; Zhixuan Chu; |
| 493 | EQA-RM: A Generative Embodied Reward Model with Test-time Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce EQA-RM, a novel generative multimodal reward model specifically architected for EQA, trained via our innovative Contrastive Group Relative Policy Optimization (C-GRPO) strategy to learn fine-grained behavioral distinctions. |
Yuhang Chen; Zhen Tan; Tianlong Chen; |
| 494 | Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Firstly, this paper presents a comprehensive systematic analysis of BFE vulnerabilities in key LLM components, revealing distinct sensitivities across parameters, activations, and gradients during fine-tuning and inference. Secondly, based on our findings, we introduce a novel defense strategy FlipGuard: (i) exponent bit protection, and (ii) a self-correction based fine-tuning mechanism, to address BFE consequences. |
Yuhang Chen; Zhen Tan; Ajay Kumar Jaiswal; Huaizhi Qu; Xinyu Zhao; Qi Lin; Yu Cheng; Andrew Kwong; Zhichao Cao; Tianlong Chen; |
| 495 | Weight-Aware Activation Sparsity with Constrained Bayesian Optimization Scheduling for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its promise, existing activation sparsification methods suffer from two major limitations: (1) solely relying on activation magnitude for sparsification, ignoring the coupling influence with the corresponding weights, (2) applying uniform sparsity rates across all blocks without considering block-wise sparsity sensitivity. To address these issues, this paper proposes a novel training-free weight-aware activation sparsity framework, called **WAS**. |
Ming Wang; Miao Zhang; Xuebo Liu; Liqiang Nie; |
| 496 | Uncovering The Bigger Picture: Comprehensive Event Understanding Via Diverse News Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose NEWSCOPE, a two-stage framework for diverse news retrieval that enhances event coverage by explicitly modeling semantic variation at the sentence level. |
Yixuan Tang; Yuanyuan Shi; Yiqun Sun; Anthony Kum Hoe Tung; |
| 497 | The Missing Parts: Augmenting Fact Verification with Half Truth Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the task of half-truth detection, and propose PolitiFact-Hidden, a new benchmark with 15k political claims annotated with sentence-level evidence alignment and inferred claim intent. To address this challenge, we present TRACER, a modular re-assessment framework that identifies omission-based misinformation by aligning evidence, inferring implied intent, and estimating the causal impact of hidden content. |
Yixuan Tang; Jincheng Wang; Anthony Kum Hoe Tung; |
| 498 | Skeletons Matter: Dynamic Data Augmentation for Text-to-Query Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formally define the Text-to-Query task paradigm, unifying semantic parsing tasks across various query languages. |
Yuchen Ji; Bo Xu; Jie Shi; Jiaqing Liang; Deqing Yang; Yu Mao; Hai Chen; Yanghua Xiao; |
| 499 | Scalable and Culturally Specific Stereotype Dataset Construction Via Human-LLM Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Research on stereotypes in large language models (LLMs) has largely focused on English-speaking contexts, due to the lack of datasets in other languages and the high cost of manual annotation in underrepresented cultures. To address this gap, we introduce a cost-efficient human-LLM collaborative annotation framework and apply it to construct EspanStereo, a Spanish-language stereotype dataset spanning multiple Spanish-speaking countries across Europe and Latin America. |
Weicheng Ma; John J. Guerrerio; Soroush Vosoughi; |
| 500 | DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces DiffusionAttacker, an end-to-end generative approach for jailbreak rewriting inspired by diffusion models. |
Hao Wang; Hao Li; Junda Zhu; Xinyuan Wang; Chengwei Pan; Minlie Huang; Lei Sha; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~2,000 papers), please visit Paper Digest: EMNLP-2025 (Full List).