Paper Digest: NAACL 2025 Papers & Highlights
Note: NAACL-2025 accepts more than 700 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All ~800 NAACL-2025 papers in a separate page, which takes quite some time to load.
To search for papers presented at NAACL-2025 on a specific topic, please make use of the search by venue (NAACL-2025) service. To summarize the latest research published at NAACL-2025 on a specific topic, you can utilize the review by venue (NAACL-2025) service. If you are interested in browsing papers by author, we have a comprehensive list of ~ 3,800 authors (NAACL-2025). Additionally, you may want to explore our “Best Paper” Digest (NAACL), which lists the most influential NAACL papers since 2000.
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Try us today and unlock the full potential of our services for free!
TABLE 1: Paper Digest: NAACL 2025 Papers & Highlights
Paper | Author(s) | |
---|---|---|
1 | Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage the zero-shot capabilities of LLMs to reduce stereotyping in a technique we introduce as zero-shot self-debiasing. |
Isabel O. Gallegos; Ryan Aponte; Ryan A. Rossi; Joe Barrow; Mehrab Tanjim; Tong Yu; Hanieh Deilamsalehy; Ruiyi Zhang; Sungchul Kim; Franck Dernoncourt; Nedim Lipka; Deonna Owens; Jiuxiang Gu; |
2 | KMMLU: Measuring Massive Multitask Language Understanding in Korean Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose KMMLU, a Korean benchmark with 35,030 expert-level multiple-choice questions across 45 subjects ranging from humanities to STEM. |
Guijin Son; Hanwool Lee; Sungdong Kim; Seungone Kim; Niklas Muennighoff; Taekyoon Choi; Cheonbok Park; Kang Min Yoo; Stella Biderman; |
3 | ComPO: Community Preferences for Language Model Personalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent studies have raised concerns that aggregating such diverse and often contradictory human feedback to finetune models results in generic models that generate outputs not preferred by many user groups, as they tend to average out styles and norms. To address this issue, we draw inspiration from recommendation systems and propose ComPO, a method to personalize preference optimization in LMs by contextualizing the probability distribution of model outputs with the preference provider. |
Sachin Kumar; Chan Young Park; Yulia Tsvetkov; Noah A. Smith; Hannaneh Hajishirzi; |
4 | In-Context Learning with Long-Context Models: An In-Depth Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the behavior of in-context learning (ICL) at this extreme scale on multiple datasets and models. |
Amanda Bertsch; Maor Ivgi; Emily Xiao; Uri Alon; Jonathan Berant; Matthew R. Gormley; Graham Neubig; |
5 | Benchmarking Distributional Alignment of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This notion of distributional alignment is complex, as there is significant variation in the types of attributes that are simulated. Prior works have underexplored the role of three critical variables—the question domain, steering method, and distribution expression method—which motivates our contribution of a benchmark explicitly addressing these dimensions. |
Nicole Meister; Carlos Guestrin; Tatsunori Hashimoto; |
6 | Instantly Learning Preference Alignment Via In-context DPO Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel and effective approach for HPA in a tuning-free way, named In-Context Direct Preference Optimization (ICDPO). |
Feifan Song; Yuxuan Fan; Xin Zhang; Peiyi Wang; Houfeng Wang; |
7 | Representing Rule-based Chatbots with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose using ELIZA, a classic rule-based chatbot, as a setting for formal, mechanistic analysis of Transformer-based chatbots. |
Dan Friedman; Abhishek Panigrahi; Danqi Chen; |
8 | ParaICL: Towards Parallel In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, varying combinations of few-shot demonstration examples can significantly boost accuracy across different test samples. To address this, we propose a novel method named parallel in-context learning (ParaICL) that effectively utilizes all demonstration examples without exceeding the manageable input context length. |
Xingxuan Li; Xuan-Phi Nguyen; Shafiq Joty; Lidong Bing; |
9 | High-Dimension Human Value Representation in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose UniVar, a high-dimensional neural representation of symbolic human value distributions in LLMs, orthogonal to model architecture and training data. |
Samuel Cahyawijaya; Delong Chen; Yejin Bang; Leila Khalatbari; Bryan Wilie; Ziwei Ji; Etsuko Ishii; Pascale Fung; |
10 | FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the use of instructions in IR systems. |
Orion Weller; Benjamin Chang; Sean MacAvaney; Kyle Lo; Arman Cohan; Benjamin Van Durme; Dawn Lawrie; Luca Soldaini; |
11 | The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, these benchmarks tend to focus disproportionately on specific capabilities such as instruction following, leading to coverage bias. To overcome these limitations, we introduce the BiGGen Bench, a principled generation benchmark designed to thoroughly evaluate nine distinct capabilities of LMs across 77 diverse tasks. |
Seungone Kim; Juyoung Suk; Ji Yong Cho; Shayne Longpre; Chaeeun Kim; Dongkeun Yoon; Guijin Son; Yejin Cho; Sheikh Shafayat; Jinheon Baek; Sue Hyun Park; Hyeonbin Hwang; Jinkyung Jo; Hyowon Cho; Haebin Shin; Seongyun Lee; Hanseok Oh; Noah Lee; Namgyu Ho; Se June Joo; Miyoung Ko; Yoonjoo Lee; Hyungjoo Chae; Jamin Shin; Joel Jang; Seonghyeon Ye; Bill Yuchen Lin; Sean Welleck; Graham Neubig; Moontae Lee; Kyungjae Lee; Minjoon Seo; |
12 | K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the Level-K framework from game theory and behavioral economics, which extends reasoning from simple reactions to structured strategic depth, we propose a novel framework: “K-Level Reasoning with Large Language Models (K-R). ” |
Yadong Zhang; Shaoguang Mao; Tao Ge; Xun Wang; Yan Xia; Man Lan; Furu Wei; |
13 | The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This limits our understanding of LLM performance variability in real-world applications. Our study addresses this issue by exploring key questions about the performance differences between greedy decoding and sampling, identifying benchmarks’ consistency regarding non-determinism, and examining unique model behaviors. |
Yifan Song; Guoyin Wang; Sujian Li; Bill Yuchen Lin; |
14 | CausalEval: Towards Better Causal Reasoning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We aim for this work to serve as a comprehensive resource, fostering further advancements in causal reasoning with LMs. |
Longxuan Yu; Delin Chen; Siheng Xiong; Qingyang Wu; Dawei Li; Zhikai Chen; Xiaoze Liu; Liangming Pan; |
15 | Racing Thoughts: Explaining Contextualization Errors in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: , a model may incorrectly respond “Yes” if it has not properly contextualized “bank” as a geographical feature, rather than a financial institution. We propose the LLM Race Conditions Hypothesis as an explanation of contextualization errors of this form. |
Michael A. Lepori; Michael Curtis Mozer; Asma Ghandeharioun; |
16 | Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we extend the evaluation to real-world user queries and non-English-centric LLMs, offering a broader examination of multilingual performance. |
Chaoqun Liu; Wenxuan Zhang; Yiran Zhao; Anh Tuan Luu; Lidong Bing; |
17 | AdaMergeX: Cross-Lingual Transfer with Large Language Models Via Adaptive Adapter Merging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View |
Yiran Zhao; Wenxuan Zhang; Huiming Wang; Kenji Kawaguchi; Lidong Bing; |
18 | Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current methods for generating these suffixes are computationally expensive and have low Attack Success Rates (ASR), especially against well-aligned models like Llama2 and Llama3. To overcome these limitations, we introduce **ADV-LLM**, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability. |
Chung-En Sun; Xiaodong Liu; Weiwei Yang; Tsui-Wei Weng; Hao Cheng; Aidan San; Michel Galley; Jianfeng Gao; |
19 | Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This can enable a new paradigm of front-end development in which multimodal large language models (MLLMs) directly convert visual designs into code implementations. In this work, we construct Design2Code – the first real-world benchmark for this task. |
Chenglei Si; Yanzhe Zhang; Ryan Li; Zhengyuan Yang; Ruibo Liu; Diyi Yang; |
20 | REL-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an interaction-centered evaluation approach called Rel-A. |
Kaitlyn Zhou; Jena D. Hwang; Xiang Ren; Nouha Dziri; Dan Jurafsky; Maarten Sap; |
21 | Multi-Conditional Ranking with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we define and explore the task of multi-conditional ranking by introducing MCRank, a benchmark tailored for assessing multi-conditional ranking across various item types and conditions. |
Pouya Pezeshkpour; Estevam Hruschka; |
22 | AlgoPuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Algorithmic Multimodal Puzzles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the novel task of multimodal puzzle solving, framed within the context of visual question-answering. |
Deepanway Ghosal; Vernon Toh; Yew Ken Chia; Soujanya Poria; |
23 | ResearchAgent: Iterative Research Idea Generation Over Scientific Literature with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, novel, impactful research often stems from both a deep understanding of prior work, and a cross-pollination of ideas across domains and fields. To enhance the productivity of researchers, we propose ResearchAgent, which leverages the encyclopedic knowledge and linguistic reasoning capabilities of Large Language Models (LLMs) to assist them in their work. |
Jinheon Baek; Sujay Kumar Jauhar; Silviu Cucerzan; Sung Ju Hwang; |
24 | CharacterBox: Evaluating The Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose CharacterBox, which is a simulation sandbox designed to generate situational fine-grained character behavior trajectories. |
Lei Wang; Jianxun Lian; Yi Huang; Yanqi Dai; Haoxuan Li; Xu Chen; Xing Xie; Ji-Rong Wen; |
25 | DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The random traversing order may generate unreliable pseudo-demonstrations and lead to error accumulation. To address this problem, we reformulate ZS-**ICL** as a planning problem and propose a **D**emonstration-**AW**are Mo**N**te Carlo Tree Search (MCTS) approach (DAWN-ICL), which leverages MCTS to strategically plan the problem-solving trajectories for ZS-ICL. |
Xinyu Tang; Xiaolei Wang; Xin Zhao; Ji-Rong Wen; |
26 | Improving Retrospective Language Agents Via Joint Policy Gradient Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, although fine-tuning methods significantly enhance the capabilities of smaller LLMs, the fine-tuned agents often lack the potential for self-reflection and self-improvement. To address these challenges, we introduce a novel agent framework named RetroAct, which is a framework that jointly optimizes both task-planning and self-reflective evolution capabilities in language agents. |
Xueyang Feng; Bo Lan; Quanyu Dai; Lei Wang; Jiakai Tang; Xu Chen; Zhenhua Dong; Ji-Rong Wen; |
27 | Vision-Language Models Can Self-Improve Reasoning Via Reflection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a simple yet effective self-training framework, R3V, which iteratively enhances the model’s Vision-language Reasoning by Reflecting on CoT Rationales. |
Kanzhi Cheng; Li YanTao; Fangzhi Xu; Jianbing Zhang; Hao Zhou; Yang Liu; |
28 | FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on our findings, we propose a novel approach for synthetic data generation, CG2C, that leverages multi-hop reasoning on context graphs extracted from documents. |
Deren Lei; Yaxi Li; Siyao Li; Mengya Hu; Rui Xu; Ken Archer; Mingyu Wang; Emily Ching; Alex Deng; |
29 | Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models Through Continual Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Hephaestus-Forge, the first large-scale pre-training corpus designed to enhance the fundamental capabilities of LLM agents in API function calling, intrinsic reasoning and planning, and adapting to environmental feedback. |
Yuchen Zhuang; Jingfeng Yang; Haoming Jiang; Xin Liu; Kewei Cheng; Sanket Lokegaonkar; Yifan Gao; Qing Ping; Tianyi Liu; Binxuan Huang; Zheng Li; Zhengyang Wang; Pei Chen; Ruijie Wang; Rongzhi Zhang; Nasser Zalmout; Priyanka Nigam; Bing Yin; Chao Zhang; |
30 | Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the features they use to incrementally process their linguistic input are not well understood. In this paper, we fill this gap by studying the mechanisms underlying garden path sentence processing in LMs. |
Michael Hanna; Aaron Mueller; |
31 | Rethinking Word Similarity: Semantic Similarity Through Classification Confusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new measure of similarity, Word Confusion, that reframes semantic similarity in terms of feature-based classification confusion. |
Kaitlyn Zhou; Haishan Gao; Sarah Li Chen; Dan Edelstein; Dan Jurafsky; Chen Shani; |
32 | RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose RAG-Star, a novel RAG approach that integrates the retrieved information to guide the tree-based deliberative reasoning process that relies on the inherent knowledge of LLMs. |
Jinhao Jiang; Jiayi Chen; Junyi Li; Ruiyang Ren; Shijie Wang; Xin Zhao; Yang Song; Tao Zhang; |
33 | Language Models Encode Numbers Using Digit Representations in Base 10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A natural hypothesis is that these errors stem from how LLMs represent numbers, and specifically, whether their representations of numbers capture their numeric values. We tackle this question from the observation that LLM errors on numerical tasks are often distributed across the digits of the answer rather than normally around its numeric value. |
Amit Arnold Levy; Mor Geva; |
34 | A Probabilistic Framework for LLM Hallucination Detection Via Belief Tree Propagation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe Belief Tree Propagation (BTProp), a probabilistic framework for LLM hallucination detection. |
Bairu Hou; Yang Zhang; Jacob Andreas; Shiyu Chang; |
35 | Extracting and Understanding The Superficial Knowledge in Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This leads to the question: Is alignment predominantly superficial? In this paper, we delve into this question and provide a quantitative analysis. |
Runjin Chen; Gabriel Jacob Perin; Xuxi Chen; Xilun Chen; Yan Han; Nina S. T. Hirata; Junyuan Hong; Bhavya Kailkhura; |
36 | A Top-down Graph-based Tool for Modeling Classical Semantic Maps: A Case Study of Supplementary Adverbs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel graph-based algorithm that automatically generates conceptual spaces and SMMs in a top-down manner. |
Zhu Liu; Cunliang Kong; Ying Liu; Maosong Sun; |
37 | A Distributional Perspective on Word Learning in Neural Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we propose an array of signatures that improve on earlier approaches by capturing knowledge of both where the target word can and cannot occur as well as gradient preferences about the word’s appropriateness. |
Filippo Ficarra; Ryan Cotterell; Alex Warstadt; |
38 | Sharpness-Aware Minimization for Topic Models with High-Quality Document Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose to apply an efficient optimization method to improve the generalization and performance of topic models. |
Tung Nguyen; Tue Le; Hoang Tran Vuong; Quang Duc Nguyen; Duc Anh Nguyen; Linh Ngo Van; Sang Dinh; Thien Huu Nguyen; |
39 | ImgTrojan: Jailbreaking Vision-Language Models with ONE Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel jailbreaking attack against VLMs, aiming to bypass their safety barrier when a user inputs harmful instructions. |
Xijia Tao; Shuai Zhong; Lei Li; Qi Liu; Lingpeng Kong; |
40 | Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing research on UI/UX automation often requires high-fidelity inputs like Figma designs or detailed screenshots, limiting accessibility and impeding efficient design iteration. To bridge this gap, we introduce Sketch2Code, a benchmark that evaluates state-of-the-art Vision Language Models (VLMs) on automating the conversion of rudimentary sketches into webpage prototypes. |
Ryan Li; Yanzhe Zhang; Diyi Yang; |
41 | GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography Interviewing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For automatic evaluation, we derive user proxies from multiple autobiographies and employ LLM-as-a-judge to score LLM behaviors. |
Jinhao Duan; Xinyu Zhao; Zhuoxuan Zhang; Eunhye Grace Ko; Lily Boddy; Chenan Wang; Tianhao Li; Alexander Rasgon; Junyuan Hong; Min Kyung Lee; Chenxi Yuan; Qi Long; Ying Ding; Tianlong Chen; Kaidi Xu; |
42 | Open-World Evaluation for Retrieving Diverse Perspectives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study retrieving a set of documents that covers various perspectives on a complex and contentious question (e. g. , will ChatGPT do more harm than good?) |
Hung-Ting Chen; Eunsol Choi; |
43 | Reverse Thinking Makes LLMs Stronger Reasoners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable Large Language Models (LLMs) to perform reverse thinking, we introduce Reverse-Enhanced Thinking (RevThink), a framework composed of data augmentation and learning objectives. |
Justin Chen; Zifeng Wang; Hamid Palangi; Rujun Han; Sayna Ebrahimi; Long Le; Vincent Perot; Swaroop Mishra; Mohit Bansal; Chen-Yu Lee; Tomas Pfister; |
44 | Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a sequence-level one-forward-one-backward (1F1B) PP method, named Seq1F1B, tailored for training LLMs on long sequences with high training throughput and memory efficiency. |
Sun Ao; Weilin Zhao; Xu Han; Cheng Yang; Xinrong Zhang; Zhiyuan Liu; Chuan Shi; Maosong Sun; |
45 | MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose MEDA, a novel approach specifically designed for the complexities of multimodal settings, dynamically allocating KV cache sizes based on attention entropy to better adapt to multimodal interactions. |
Zhongwei Wan; Hui Shen; Xin Wang; Che Liu; Zheda Mai; Mi Zhang; |
46 | Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models Via Abstraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To evaluate the extent to which LLMs perform robust reasoning instead of relying on superficial logical chains, we propose a new evaluation dataset, the Concept-Reversed Winograd Schema Challenge (CR-WSC), based on the famous Winograd Schema Challenge (WSC) dataset. |
Kaiqiao Han; Tianqing Fang; Zhaowei Wang; Yangqiu Song; Mark Steedman; |
47 | DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, K-12 educators using digital learning platforms may need to examine and provide feedback across many images of students’ math work. To assess the potential of VLMs to support educators in settings like this one, we introduce DrawEduMath, an English-language dataset of 2,030 images of students’ handwritten responses to K-12 math problems. |
Sami Baral; Li Lucy; Ryan Knight; Alice Ng; Luca Soldaini; Neil Heffernan; Kyle Lo; |
48 | HIGGS: Pushing The Limits of Large Language Model Quantization Via The Linearity Theorem Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a “linearity theorem” establishing a direct relationship between the layer-wise reconstruction error and the model perplexity increase due to quantization. |
Vladimir Malinovskii; Andrei Panferov; Ivan Ilin; Han Guo; Peter Richtárik; Dan Alistarh; |
49 | Unfamiliar Finetuning Examples Control How Language Models Hallucinate Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large language models are known to hallucinate, but the underlying mechanism that govern how models hallucinate are not yet fully understood. In this work, we find that unfamiliar examples in the models’ finetuning data – those that introduce concepts beyond the base model’s scope of knowledge – are crucial in shaping these errors. |
Katie Kang; Eric Wallace; Claire Tomlin; Aviral Kumar; Sergey Levine; |
50 | AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Currently, there is a clear lack of high-quality, human-annotated datasets that address the full spectrum of LLM-related safety risks and are usable for commercial applications. To bridge this gap, we propose a comprehensive and adaptable taxonomy for categorizing safety risks, structured into 12 top-level hazard categories with an extension to 9 fine-grained subcategories. |
Shaona Ghosh; Prasoon Varshney; Makesh Narsimhan Sreedhar; Aishwarya Padmakumar; Traian Rebedea; Jibin Rajan Varghese; Christopher Parisien; |
51 | Evaluating Bias in LLMs for Job-Resume Matching: Gender, Race, and Education Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study examines the performance and fairness of LLMs in job-resume matching tasks within the English language and U. S. context. It evaluates how factors such as gender, race, and educational background influence model decisions, providing critical insights into the fairness and reliability of LLMs in HR applications. |
Hayate Iso; Pouya Pezeshkpour; Nikita Bhutani; Estevam Hruschka; |
52 | AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages. |
Shamsuddeen Hassan Muhammad; Idris Abdulmumin; Abinew Ali Ayele; David Ifeoluwa Adelani; Ibrahim Said Ahmad; Saminu Mohammad Aliyu; Paul Röttger; Abigail Oppong; Andiswa Bukula; Chiamaka Ijeoma Chukwuneke; Ebrahim Chekol Jibril; Elyas Abdi Ismail; Esubalew Alemneh; Hagos Tesfahun Gebremichael; Lukman Jibril Aliyu; Meriem Beloucif; Oumaima Hourrane; Rooweither Mabuya; Salomey Osei; Samuel Rutunda; Tadesse Destaw Belay; Tadesse Kebede Guge; Tesfa Tegegne Asfaw; Lilian Diana Awuor Wanzare; Nelson Odhiambo Onyango; Seid Muhie Yimam; Nedjma Ousidhoum; |
53 | Cracking The Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce IndoCareer, a dataset comprising 8,834 multiple-choice questions designed to evaluate performance in vocational and professional certification exams across various fields. |
Fajri Koto; |
54 | Cross-lingual Transfer of Reward Models in Multilingual Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate the cross-lingual transfer of RMs trained in diverse languages, primarily from English. |
Jiwoo Hong; Noah Lee; Rodrigo Martínez-Castaño; César Rodríguez; James Thorne; |
55 | Towards Automatic Evaluation for Image Transcreation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Attempts to define this as a formal Machine Learning (ML) problem have been impeded by the lack of automatic evaluation mechanisms, with previous work relying solely on human evaluation. In this paper, we seek to close this gap by proposing a suite of automatic evaluation metrics inspired by machine translation (MT) metrics, categorized into: a) Object-based, b) Embedding-based, and c) VLM-based. |
Simran Khanuja; Vivek Iyer; Xiaoyu He; Graham Neubig; |
56 | FiNE: Filtering and Improving Noisy Data Elaborately with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View |
Junliang He; Ziyue Fan; Shaohui Kuang; Li Xiaoqing; Kai Song; Yaqian Zhou; Xipeng Qiu; |
57 | Towards Knowledge Checking in Retrieval-augmented Generation: A Representation Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims to provide a systematic study on knowledge checking in RAG systems. |
Shenglai Zeng; Jiankun Zhang; Bingheng Li; Yuping Lin; Tianqi Zheng; Dante Everaert; Hanqing Lu; Hui Liu; Hui Liu; Yue Xing; Monica Xiao Cheng; Jiliang Tang; |
58 | Sociodemographic Prompting Is Not Yet An Effective Approach for Simulating Subjective Judgments with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, leveraging the POPQUORN dataset, we evaluate nine popular LLMs on their abilityto understand demographic differences in two subjective judgment tasks: politeness and offensiveness. |
Huaman Sun; Jiaxin Pei; Minje Choi; David Jurgens; |
59 | Natural Language Processing for Human Resources: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While recent breakthroughs in NLP have generated significant interest in its industrial applications, a comprehensive overview of how NLP can be applied across HR activities is still lacking. This paper discovers opportunities for researchers and practitioners to harness NLP’s transformative potential in this domain. |
Naoki Otani; Nikita Bhutani; Estevam Hruschka; |
60 | Mastering The Craft of Data Synthesis for CodeLLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Data synthesis and filtering techniques have been widely adopted and shown to be highly effective in this context. In this paper, we present a focused survey and taxonomy of these techniques, emphasizing recent advancements. |
Meng Chen; Philip Arthur; Qianyu Feng; Cong Duy Vu Hoang; Yu-Heng Hong; Mahdi Kazemi Moghaddam; Omid Nezami; Duc Thien Nguyen; Gioacchino Tangari; Duy Vu; Thanh Vu; Mark Johnson; Krishnaram Kenthapadi; Don Dharmasiri; Long Duong; Yuan-Fang Li; |
61 | Language Models Largely Exhibit Human-like Constituent Ordering Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One prominent theory presents the notion that constituent ordering is directly correlated with constituent weight: a measure of the constituent’s length or complexity. |
Ada Tur; Gaurav Kamath; Siva Reddy; |
62 | CRMArena: Understanding The Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, deploying and evaluating these agents is challenging due to the lack of realistic benchmarks that reflect the complexity of real-world CRM tasks. To address this issue, we introduce CRMArena, a novel benchmark designed to evaluate AI agents on realistic tasks grounded in professional work environments. |
Kung-Hsiang Huang; Akshara Prabhakar; Sidharth Dhawan; Yixin Mao; Huan Wang; Silvio Savarese; Caiming Xiong; Philippe Laban; Chien-Sheng Wu; |
63 | Does Liking Yellow Imply Driving A School Bus? Semantic Leakage in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we identify and characterize a phenomenon never discussed before, which we call semantic leakage, where models leak irrelevant information from the prompt into the generation in unexpected ways. |
Hila Gonen; Terra Blevins; Alisa Liu; Luke Zettlemoyer; Noah A. Smith; |
64 | Explanation Based In-Context Demonstrations Retrieval for Multilingual Grammatical Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, selecting effective in-context examples remains challenging, as the similarity between input texts does not necessarily correspond to similar grammatical error patterns. In this paper, we propose a novel retrieval method based on natural language grammatical error explanations (GEE) to address this issue. |
Wei Li; Wen Luo; Guangyue Peng; Houfeng Wang; |
65 | Investigating The (De)Composition Capabilities of Large Language Models in Natural-to-Formal Language Conversion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Humans have strong capabilities of decomposition and composition in natural-to-formal language conversion (N2F) when faced with an unfamiliar formal language, and can easily cope with compositional gaps and counter-intuitive symbolic names. To investigate whether large language models (LLMs) have this set of basic capabilities in N2F, we propose the STD framework. |
Ziyao Xu; Houfeng Wang; |
66 | CAMIEval: Enhancing NLG Evaluation Through Multidimensional Comparative Instruction-Following Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods encounter the following challenges: (1) distinguishing instruction-following ability, (2) being applicable across diverse NLG tasks, and (3) identifying low-quality outputs. To address these issues, we propose CAMIEval, a multidimensional comparative evaluation method based on instruction-following. |
Ziyue Fan; Junliang He; Li Xiaoqing; Shaohui Kuang; Kai Song; Yaqian Zhou; Xipeng Qiu; |
67 | From Distributional to Overton Pluralism: Investigating Large Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We analyze two aspects of post-alignment distributional shift of LLM responses. |
Thom Lake; Eunsol Choi; Greg Durrett; |
68 | Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose FRAMES (Factuality, Retrieval, And reasoning MEasurement Set), a high-quality evaluation dataset designed to test LLMs’ ability to provide factual responses, assess retrieval capabilities, and evaluate the reasoning required to generate final answers. |
Satyapriya Krishna; Kalpesh Krishna; Anhad Mohananey; Steven Schwarcz; Adam Stambler; Shyam Upadhyay; Manaal Faruqui; |
69 | Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples Within Packs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The mainstream approaches in SFT ensure that each token in the attention calculation phase only focuses on tokens within its own short sequence, without providing additional learning signals for the preceding context. To address these challenges, we introduce Threshold Filtering Packing (TFP), a method that selects samples with related context while maintaining sufficient diversity within the same pack. |
Jiancheng Dong; Lei Jiang; Wei Jin; Lu Cheng; |
70 | Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we studied LLM’s linguistic preference in a cross-language RAG-based information search setting. |
Nikhil Sharma; Kenton Murray; Ziang Xiao; |
71 | Multilingual Reasoning Via Self-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve LLMs’ multilingual reasoning abilities, we propose a modular approach that instructs the models to structure reasoning passages in a different problem space and then self-refine their capabilities to deliver step-wise reasoning passages that lead to the solution. |
Leonardo Ranaldi; Giulia Pucci; |
72 | Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we demonstrate a new method to identify training data known to proprietary LLMs like GPT-4 without requiring any access to model weights or token probabilities, by using information-guided probes. |
Abhilasha Ravichander; Jillian Fisher; Taylor Sorensen; Ximing Lu; Maria Antoniak; Bill Yuchen Lin; Niloofar Mireshghallah; Chandra Bhagavatula; Yejin Choi; |
73 | CVE-Bench: Benchmarking LLM-based Software Engineering Agent’s Ability to Repair Real-World CVE Vulnerabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce CVE-Bench, an evaluation framework consisting of 509 Common Vulnerabilities and Exposures (CVEs) from four programming languages and 120 popular open-source repositories. |
Peiran Wang; Xiaogeng Liu; Chaowei Xiao; |
74 | Reward-Guided Tree Search for Inference Time Alignment of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we proposed DARWIN, an inference-time alignment method that leverage the guidance of a reward model to achieve alignment through reward-guided tree search. |
Chia-Yu Hung; Navonil Majumder; Ambuj Mehrish; Soujanya Poria; |
75 | LegalViz: Legal Text Visualization By Text To Diagram Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To disclose expert knowledge for non-experts, we explore the problem of visualizing legal texts with easy-to-understand diagrams and propose a novel dataset of LegalViz with 23 languages and 7,010 cases of legal document and visualization pairs, using the DOT graph description language of Graphviz. |
Eri Onami; Taiki Miyanishi; Koki Maeda; Shuhei Kurita; |
76 | StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce StyleDistance, a novel approach to training stronger content-independent style embeddings. |
Ajay Patel; Jiacheng Zhu; Justin Qiu; Zachary Horvitz; Marianna Apidianaki; Kathleen McKeown; Chris Callison-Burch; |
77 | Generative Prompt Internalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Prompts used in recent large language model based applications are often fixed and lengthy, leading to significant computational overhead. To address this challenge, we propose Generative Prompt Internalization (GenPI), a lightweight method that employs a joint training approach. |
Haebin Shin; Lei Ji; Yeyun Gong; Sungdong Kim; Eunbi Choi; Minjoon Seo; |
78 | Towards Reliable and Practical Phishing Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With recent advances in AI, we discuss how to construct a reliable and practical phishing detection system using language models. For this system, we introduce the first large-scale Korean dataset for phishing detection, encompassing six types of phishing attacks. |
Hyowon Cho; Minjoon Seo; |
79 | Predicting The Target Word of Game-playing Conversations Using A Low-Rank Dialect Adapter for Decoder Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend the idea of dialect adapters to decoder models in our architecture called LoRDD. |
Dipankar Srirag; Aditya Joshi; Jacob Eisenstein; |
80 | LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, despite these achievements, current evaluations are mostly limited to specific mathematical topics, and it remains unclear whether LLMs are genuinely engaging in reasoning. To address these gaps, we present the Mathematical Topics Tree (MaTT) benchmark, a challenging and structured benchmark that offers 1,958 questions across a wide array of mathematical subjects, each paired with a detailed hierarchical chain of topics. |
Arash Gholami Davoodi; Seyed Pouyan Mousavi Davoudi; Pouya Pezeshkpour; |
81 | AI-Assisted Human Evaluation of Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View |
Vilém Zouhar; Tom Kocmi; Mrinmaya Sachan; |
82 | PORT: Preference Optimization on Reasoning Traces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes using preference optimization methods on Chain-of-Thought steps in order to improve the mathematical reasoning performances of language models. |
Salem Lahlou; Abdalgader Abubaker; Hakim Hacid; |
83 | Teaching Models to Balance Resisting and Accepting Persuasion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to balance positive and negative persuasion, we introduce **P**ersuasion-**B**alanced **T**raining (or **PBT**), which leverages multi-agent recursive dialogue trees to create data and trains models via preference optimization to accept persuasion *when appropriate*. |
Elias Stengel-Eskin; Peter Hase; Mohit Bansal; |
84 | On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to uncover the origins of entity-related cultural biases in LMs by analyzing several contributing factors, including the representation of entities in pre-training data and the impact of variations in linguistic phenomena across languages. |
Tarek Naous; Wei Xu; |
85 | Simulating Classroom Education with LLM-Empowered Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose SimClass, a multi-agent classroom simulation teaching framework. |
Zheyuan Zhang; Daniel Zhang-Li; Jifan Yu; Linlu Gong; Jinchang Zhou; Zhanxin Hao; Jianxiao Jiang; Jie Cao; Huiqin Liu; Zhiyuan Liu; Lei Hou; Juanzi Li; |
86 | What Goes Into A LM Acceptability Judgment? Rethinking The Impact of Frequency and Length Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Prior works in comparing LM and human acceptability judgments treat these effects uniformly across models, making a strong assumption that models require the same degree of adjustment to control for length and unigram frequency effects. We propose MORCELA, a new linking theory between LM scores and acceptability judgments where the optimal level of adjustment for these effects is estimated from data via learned parameters for length and unigram frequency. |
Lindia Tjuatja; Graham Neubig; Tal Linzen; Sophie Hao; |
87 | Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate state-of-the-art multilingual retrieval models on the Hindi-BEIR benchmark, identifying task and domain-specific challenges that impact Hindi retrieval performance. Building on the insights from these results, we introduce NLLB-E5, a multilingual retrieval model that leverages a zero-shot approach to support Hindi without the need for Hindi training data. |
Arkadeep Acharya; Rudra Murthy; Vishwajeet Kumar; Jaydeep Sen; |
88 | CORD: Balancing COnsistency and Rank Distillation for Robust Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus propose CORD, balancing COnsistency and Rank Distillation: CORD adaptively samples noise-controlled perturbations from an interpolation space, ensuring both consistency and respect for the rank prior. |
Youngwon Lee; Seung-won Hwang; Daniel F Campos; Filip Graliński; Zhewei Yao; Yuxiong He; |
89 | GloCOM: A Short Text Neural Topic Model Via Global Clustering Context Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although data aggregation offers a potential solution, existing neural topic models often overlook it due to time complexity, poor aggregation quality, and difficulty in inferring topic proportions for individual documents. In this paper, we propose a novel model, **GloCOM** (**Glo**bal **C**lustering C**O**ntexts for Topic **M**odels), which addresses these challenges by constructing aggregated global clustering contexts for short documents, leveraging text embeddings from pre-trained language models. |
Quang Duc Nguyen; Tung Nguyen; Duc Anh Nguyen; Linh Ngo Van; Sang Dinh; Thien Huu Nguyen; |
90 | Enhancing Discriminative Representation in Similar Relation Clusters for Few-Shot Continual Relation Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing studies have demonstrated promising results in FCRE, they often overlook the issue of similar relations, which is a critical factor contributing to catastrophic forgetting. In this work, we propose Sirus–a novel method that utilizes relation descriptions and dynamic clustering on these descriptions to identify similar relations. |
Anh Duc Le; Nam Le Hai; Thanh Xuan Nguyen; Linh Ngo Van; Nguyen Thi Ngoc Diep; Sang Dinh; Thien Huu Nguyen; |
91 | Eliciting Critical Reasoning in Retrieval-Augmented Generation Via Contrastive Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate how to elicit critical arguments in RAG via contrastive explanations. |
Leonardo Ranaldi; Marco Valentino; Andre Freitas; |
92 | EC-Tab2Text: Aspect-Based Text Generation from E-Commerce Product Tables Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large Language Models (LLMs) have demonstrated exceptional versatility across diverse domains, yet their application in e-commerce remains underexplored due to a lack of domain-specific datasets. To address this gap, we introduce eC-Tab2Text, a novel dataset designed to capture the intricacies of e-commerce, including detailed product attributes and user-specific queries. |
Luis Antonio Gutierrez Guanilo; Mir Tafseer Nayeem; Cristian Jose Lopez Del Alamo; Davood Rafiei; |
93 | Reversed Attention: On The Gradient Descent Of Attention Layers In GPT Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the mathematics of the backward pass of attention, revealing that it implicitly calculates an attention matrix we refer to as “Reversed Attention”. |
Shahar Katz; Lior Wolf; |
94 | Fine-grained Fallacy Detection with Human Label Variation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce FAINA, the first dataset for fallacy detection that embraces multiple plausible answers and natural disagreement. |
Alan Ramponi; Agnese Daffara; Sara Tonelli; |
95 | Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we adapt Mahalanobis Distance (MD) – a well-established UQ technique in classification tasks – for text generation and introduce a new supervised UQ method. |
Artem Vazhentsev; Lyudmila Rvanova; Ivan Lazichny; Alexander Panchenko; Maxim Panov; Timothy Baldwin; Artem Shelmanov; |
96 | UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce UniHGKR, a unified instruction-aware heterogeneous knowledge retriever that (1) builds a unified retrieval space for heterogeneous knowledge and (2) follows diverse user instructions to retrieve knowledge in specified types. |
Dehai Min; Zhiyang Xu; Guilin Qi; Lifu Huang; Chenyu You; |
97 | Is In-Context Learning A Type of Error-Driven Learning? Evidence from The Inverse Frequency Effect in Structural Priming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new way of diagnosing whether ICL is functionally performing error-driven learning. |
Zhenghao Zhou; Robert Frank; R. Thomas McCoy; |
98 | Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore the extent to which LLMs share representations of morphsyntactic concepts such as grammatical number, gender, and tense across languages. |
Jannik Brinkmann; Chris Wendler; Christian Bartelt; Aaron Mueller; |
99 | Ethical Concern Identification in NLP: A Corpus of ACL Anthology Ethics Statements Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce EthiCon, a corpus of 1,580 ethical concern statements extracted from scientific papers published in the ACL Anthology. |
Antonia Karamolegkou; Sandrine Schiller Hansen; Ariadni Christopoulou; Filippos Stamatiou; Anne Lauscher; Anders Søgaard; |
100 | Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that the measure using global grouping and Pearson correlation coefficient exhibits the best performance in both discriminative power and ranking consistency. |
Mingqi Gao; Xinyu Hu; Li Lin; Xiaojun Wan; |
101 | Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For queries like “Find me a highly rated camera for wildlife photography compatible with my Nikon F-Mount lenses”, existing methods may generate expansions that are semantically similar but structurally unrelated to user intents. To handle such semi-structured queries with both textual and relational requirements, in this paper we propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG). |
Yu Xia; Junda Wu; Sungchul Kim; Tong Yu; Ryan A. Rossi; Haoliang Wang; Julian McAuley; |
102 | StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these models often face issues such as slow inference speeds, reliance on complex pre-trained neural codec representations, and difficulties in achieving naturalness and high similarity to reference speakers. To address these challenges, this work introduces StyleTTS-ZS, an efficient zero-shot TTS model that leverages distilled time-varying style diffusion to capture diverse speaker identities and prosodies. |
Yinghao Aaron Li; Xilin Jiang; Cong Han; Nima Mesgarani; |
103 | Mitigating Hallucinations in Multi-modal Large Language Models Via Image Token Attention-Guided Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we delve into the intrinsic characteristics of hallucination from the perspective of interaction between input and output tokens. |
Xinhao Xu; Hui Chen; Mengyao Lyu; Sicheng Zhao; Yizhe Xiong; Zijia Lin; Jungong Han; Guiguang Ding; |
104 | AdaCAD: Adaptively Decoding to Balance Conflicts Between Contextual and Parametric Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a fine-grained, instance-level approach called AdaCAD, which dynamically infers the weight of adjustment based on the degree of conflict, as measured by the Jensen-Shannon divergence between distributions representing contextual and parametric knowledge. |
Han Wang; Archiki Prasad; Elias Stengel-Eskin; Mohit Bansal; |
105 | Breaking Down Power Barriers in On-Device Streaming ASR: Insights and Solutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We found that the influence of these parameters on power consumption varies depending on factors such as invocation frequency and memory allocation. Leveraging these insights, we propose design principles that enhance on-device speech recognition models by reducing power consumption with minimal impact on accuracy. |
Yang Li; Yuan Shangguan; Yuhao Wang; Liangzhen Lai; Ernie Chang; Changsheng Zhao; Yangyang Shi; Vikas Chandra; |
106 | Tricking Retrievers with Influential Tokens: An Efficient Black-Box Corpus Poisoning Attack Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods for crafting such passages, such as random token replacement or training inversion models, are often slow and computationally expensive, requiring either access to retriever’s gradients or large computational resources. To address these limitations, we propose Dynamic Importance-Guided Genetic Algorithm (DIGA), an efficient black-box method that leverages two key properties of retrievers: insensitivity to token order and bias towards influential tokens. |
Cheng Wang; Yiwei Wang; Yujun Cai; Bryan Hooi; |
107 | Transferable Post-training Via Inverse Value Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose modeling the changes at the logits level during post-training using a separate neural network (i. e. , the value network). |
Xinyu Lu; Xueru Wen; Yaojie Lu; Bowen Yu; Hongyu Lin; Haiyang Yu; Le Sun; Xianpei Han; Yongbin Li; |
108 | IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent evaluations of LLMs on coreference resolution have revealed that traditional output formats and evaluation metrics do not fully capture the models’ referential understanding. To address this, we introduce IdentifyMe, a new benchmark for mention resolution presented in a multiple-choice question (MCQ) format, commonly used for evaluating LLMs. |
Kawshik Manikantan; Makarand Tapaswi; Vineet Gandhi; Shubham Toshniwal; |
109 | NormAd: A Framework for Measuring The Cultural Adaptability of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce NormAd, an evaluation framework to assess LLMs’ cultural adaptability, specifically measuring their ability to judge social acceptability across varying levels of cultural norm specificity, from abstract values to explicit social norms. |
Abhinav Sukumar Rao; Akhila Yerukola; Vishwa Shah; Katharina Reinecke; Maarten Sap; |
110 | AI-LieDar : Examine The Trade-off Between Utility and Truthfulness in LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Truthfulness (adherence to factual accuracy) and utility (satisfying human needs and instructions) are both fundamental aspects of Large Language Models, yet these goals often conflict (e. g. , sell a car with known flaws), making it challenging to achieve both in real-world deployments. We propose AI-LieDar, a framework to study how LLM-based agents navigate these scenarios in an multi-turn interactive setting. |
Zhe Su; Xuhui Zhou; Sanketh Rangreji; Anubha Kabra; Julia Mendelsohn; Faeze Brahman; Maarten Sap; |
111 | Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we are the first to specialize LLMs for the task of simulating survey response distributions. |
Yong Cao; Haijiang Liu; Arnav Arora; Isabelle Augenstein; Paul Röttger; Daniel Hershcovich; |
112 | SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the mechanistic sources of uncertainty in large language models (LLMs), an area with important implications for language model reliability and trustworthiness. |
Carter Teplica; Yixin Liu; Arman Cohan; Tim G. J. Rudner; |
113 | Hello Again! LLM-powered Personalized Agent for Long-term Dialogue Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent progress in the human-like cognitive and reasoning capabilities of LLMs suggests that LLM-based agents could significantly enhance automated perception, decision-making, and problem-solving. In response to this potential, we introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent), which incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation. |
Hao Li; Chenghao Yang; An Zhang; Yang Deng; Xiang Wang; Tat-Seng Chua; |
114 | Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that privacy is not solely about PII patterns. |
Haoran Li; Wei Fan; Yulin Chen; Cheng Jiayang; Tianshu Chu; Xuebing Zhou; Peizhao Hu; Yangqiu Song; |
115 | VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities Via Single-Stage Joint Speech-Text Supervised Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Another critical challenge with SpeechLMs is catastrophic forgetting, where models optimized for speech tasks suffer significant degradation in text-only performance. To mitigate these issues, we propose a novel single-stage joint speech-text SFT approach on the low-rank adaptation (LoRA) of the LLM backbone. |
Yifan Peng; Krishna C Puvvada; Zhehuai Chen; Piotr Zelasko; He Huang; Kunal Dhawan; Ke Hu; Shinji Watanabe; Jagadeesh Balam; Boris Ginsburg; |
116 | CartesianMoE: Boosting Knowledge Sharing Among Experts Via Cartesian Product Routing in Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, inspired by collective matrix factorization to learn shared knowledge among data, we propose CartesianMoE, which implements more effective knowledge sharing among experts in more like a multiplication manner. |
Zhenpeng Su; Xing W; Zijia Lin; Yizhe Xiong; Minxuan Lv; Guangyuan Ma; Hui Chen; Songlin Hu; Guiguang Ding; |
117 | MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate how iterative collaboration among multiple instances and types of large language models (LLMs) enhances subtasks in the refinement process, such as error detection, critiquing unfaithful sentences, and making corrections based on critiques. |
David Wan; Justin Chen; Elias Stengel-Eskin; Mohit Bansal; |
118 | DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DreamSync, a simple yet effective training algorithm that improves T2I models to be faithful to the text input. |
Jiao Sun; Deqing Fu; Yushi Hu; Su Wang; Royi Rassin; Da-Cheng Juan; Dana Alon; Charles Herrmann; Sjoerd Van Steenkiste; Ranjay Krishna; Cyrus Rashtchian; |
119 | MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, in our paper, we propose a new evaluation paradigm for MLLMs, which is evaluating MLLMs with per-sample criteria using potent MLLM as the judge. |
Wentao Ge; Shunian Chen; Hardy Chen; Nuo Chen; Junying Chen; Zhihong Chen; Wenya Xie; Shuo Yan; ChenghaoZhu ChenghaoZhu; Ziyue Lin; Dingjie Song; Xidong Wang; Anningzhe Gao; Zhang Zhiyi; Jianquan Li; Xiang Wan; Benyou Wang; |
120 | Babysit A Language Model From Scratch: Interactive Language Learning By Trials and Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a trial-and-demonstration (TnD) learning framework that incorporates three distinct components: student trials, teacher demonstrations, and a reward conditioned on language competence at various developmental stages. |
Ziqiao Ma; Zekun Wang; Joyce Chai; |
121 | DIRAS: Efficient LLM Annotation of Document Relevance for Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these concerns, RAG developers need to annotate information retrieval (IR) data for their domain of interest, which is challenging because (1) domain-specific queries usually need nuanced definitions of relevance beyond shallow semantic relevance; and (2) human or GPT-4 annotation is costly and cannot cover all (query, document) pairs (i. e. , annotation selection bias), thus harming the effectiveness in evaluating IR recall. To address these challenges, we propose DIRAS (**D**omain-specific **I**nformation **R**etrieval **A**nnotation with **S**calability), a manual-annotation-free schema that fine-tunes open-sourced LLMs to consider nuanced relevance definition and annotate (partial) relevance labels with calibrated relevance scores. |
Jingwei Ni; Tobias Schimanski; Meihong Lin; Mrinmaya Sachan; Elliott Ash; Markus Leippold; |
122 | AgentSense: Benchmarking Social Intelligence of Language Agents Through Interactive Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios. |
Xinyi Mou; Jingcong Liang; Jiayu Lin; Xinnong Zhang; Xiawei Liu; Shiyue Yang; Rong Ye; Lei Chen; Haoyu Kuang; Xuanjing Huang; Zhongyu Wei; |
123 | Investigating Hallucinations in Simultaneous Machine Translation: Knowledge Distillation Solution and Components Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, traditional offline machine translation (OMT) models exhibit significantly fewer hallucinations. Motivated by this disparity, we propose Knowledge Distillation for SiMT (KD-SiMT), a simple yet effective method that utilizes the OMT model to mitigate hallucinations in SiMT. |
Donglei Yu; Xiaomian Kang; Yuchen Liu; Feifei Zhai; Nanchang Cheng; Yu Zhou; Chengqing Zong; |
124 | Fingerspelling Within Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that 1) substantially improves understanding of fingerspelling (and translation quality overall), but the effect of 2) is mixed. |
Garrett Tanzer; |
125 | FLEURS-ASL: Including American Sign Language in Massively Multilingual Multitask Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to help converge the fields, we introduce FLEURS-ASL, an extension of the multiway parallel benchmarks FLORES (for text) and FLEURS (for speech) to support their first sign language (as video), American Sign Language, translated by 5 Certified Deaf Interpreters. |
Garrett Tanzer; |
126 | Multimodal Needle in A Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, a comprehensive evaluation of their long-context capabilities remains underexplored. To address these gaps, we introduce the MultiModal Needle-in-a-haystack (MMNeedle) benchmark, specifically designed to assess the long-context capabilities of MLLMs. |
Hengyi Wang; Haizhou Shi; Shiwei Tan; Weiyi Qin; Wenyuan Wang; Tunyu Zhang; Akshay Nambi; Tanuja Ganu; Hao Wang; |
127 | MATO: A Model-Agnostic Training Optimization for Aspect Sentiment Triplet Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a **M**odel-**A**gnostic **T**raining **O**ptimization (**MATO**) to improve ASTE model inference consistent with expected results facing triplet element diversity. |
Shaopeng Tang; Lin Li; Xiaohui Tao; Leqi Zhong; Qing Xie; |
128 | Stronger Models Are Not Always Stronger Teachers for Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches typically assume that larger or stronger models are stronger teachers for instruction tuning, and hence simply adopt larger models as response generators to the synthetic instructions. In this paper, we challenge this commonly-adopted assumption. |
Zhangchen Xu; Fengqing Jiang; Luyao Niu; Bill Yuchen Lin; Radha Poovendran; |
129 | PeerQA: A Scientific Question Answering Dataset from Peer Reviews Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present PeerQA, a real-world, scientific, document-level Question Answering (QA) dataset. |
Tim Baumgärtner; Ted Briscoe; Iryna Gurevych; |
130 | MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by interpretability literature exploring the internals of models, we propose a novel class of model-internal confidence estimators (MICE) to better assess confidence when calling tools. |
Nishant Subramani; Jason Eisner; Justin Svegliato; Benjamin Van Durme; Yu Su; Sam Thomson; |
131 | Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce RoMMath, the first benchmark designed to evaluate the capabilities and robustness of multimodal large language models (MLLMs) in handling multimodal math reasoning, particularly when faced with adversarial perturbations. |
Yilun Zhao; Guo Gan; Chen Zhao; Arman Cohan; |
132 | Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent studies have highlighted the presence of cultural biases in Large Language Models (LLMs), yet often lack a robust methodology to dissect these phenomena comprehensively. Our work aims to bridge this gap by delving into the Food domain—a universally relevant yet culturally diverse aspect of human life. |
Li Zhou; Taelin Karidi; Wanlong Liu; Nicolas Garneau; Yong Cao; Wenyu Chen; Haizhou Li; Daniel Hershcovich; |
133 | LiPO: Listwise Preference Optimization Through Learning-to-Rank Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formulate the LM alignment as a listwise ranking problem and describe the LiPO framework, where the policy can potentially learn more effectively from a ranked list of plausible responses given the prompt. |
Tianqi Liu; Zhen Qin; Junru Wu; Jiaming Shen; Misha Khalman; Rishabh Joshi; Yao Zhao; Mohammad Saleh; Simon Baumgartner; Jialu Liu; Peter J Liu; Xuanhui Wang; |
134 | From Evidence to Belief: A Bayesian Epistemology Approach to Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates the knowledge of language models from the perspective of Bayesian epistemology. |
Minsu Kim; Sangryul Kim; James Thorne; |
135 | Smurfs: Multi-Agent System Using Context-Efficient DFSDT for Tool Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce “Smurfs,” a novel multi-agent system (MAS) that enhances DFSDT with a modular, context-efficient, and training-free design. |
Junzhi Chen; Juhao Liang; Benyou Wang; |
136 | PICLe: Pseudo-annotations for In-Context Learning in Low-Resource Named Entity Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we conduct a perturbation study of in-context demonstrations for low-resource Named Entity Detection (NED). |
Sepideh Mamooler; Syrielle Montariol; Alexander Mathis; Antoine Bosselut; |
137 | Self-DC: When to Reason and When to Act? Self Divide-and-Conquer for Compositional Unknown Questions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce a Self Divide-and-Conquer (Self-DC) framework, accompanying with the first Compositional unknown Question-Answering dataset (CuQA). |
Hongru Wang; Boyang Xue; Baohang Zhou; Tianhua Zhang; Cunxiang Wang; Huimin Wang; Guanhua Chen; Kam-Fai Wong; |
138 | Grammar Control in Dialogue Response Generation for Language Learning Chatbots Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We comprehensively evaluate prompting, fine-tuning, and decoding strategies for grammar-controlled dialogue response generation. |
Dominik Glandorf; Peng Cui; Detmar Meurers; Mrinmaya Sachan; |
139 | Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we conduct the first in-depth analysis of the role padding tokens play in T2I models. |
Michael Toker; Ido Galil; Hadas Orgad; Rinon Gal; Yoad Tewel; Gal Chechik; Yonatan Belinkov; |
140 | CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present CFinBench: a meticulously crafted, the most comprehensive evaluation benchmark to date, for assessing the financial knowledge of LLMs under Chinese context. |
Ying Nie; Binwei Yan; Tianyu Guo; Hao Liu; Haoyu Wang; Wei He; Binfan Zheng; Weihao Wang; Qiang Li; Weijian Sun; Yunhe Wang; Dacheng Tao; |
141 | THREAD: Thinking Deeper with Recursive Spawning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large language models (LLMs) have shown impressive capabilities across diverse settings, but still struggle as the length and complexity of the context increases. To address this challenge, we propose Thinking Recursively and Dynamically (ThReaD). |
Philip Schroeder; Nathaniel W. Morgan; Hongyin Luo; James R. Glass; |
142 | Model Surgery: Modulating LLM’s Behavior Via Simple Parameter Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we observe that surprisingly, directly editing a small subset of parameters can effectively modulate specific behaviors of LLMs, such as detoxification and resistance to jailbreaking, with only inference-level computational resources. |
Huanqian Wang; Yang Yue; Rui Lu; Jingxin Shi; Andrew Zhao; Shenzhi Wang; Shiji Song; Gao Huang; |
143 | Mitigating Tail Narrowing in LLM Self-Improvement Via Socratic-Guided Sampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Guided Self-Improvement (GSI), a strategy aimed at improving the efficiency of sampling challenging heavy-tailed data. |
Yiwen Ding; Zhiheng Xi; Wei He; Lizhuoyuan Lizhuoyuan; Yitao Zhai; Shi Xiaowei; Xunliang Cai; Tao Gui; Qi Zhang; Xuanjing Huang; |
144 | Learning to Summarize from LLM-generated Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce FeedSum, a large-scale dataset containing multi-dimensional LLM feedback on summaries of varying quality across diverse domains. |
Hwanjun Song; Taewon Yun; Yuho Lee; Jihwan Oh; Gihun Lee; Jason Cai; Hang Su; |
145 | ReIFE: Re-evaluating Instruction-Following Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, there is a lack of comprehensive evaluation of these LLM-based evaluators across two dimensions: the base LLMs and the evaluation protocols. Therefore, we present a thorough meta-evaluation of instruction following, including 25 base LLMs and 15 recently proposed evaluation protocols, on 4 human-annotated datasets, assessing the evaluation accuracy of the LLM-evaluators. |
Yixin Liu; Kejian Shi; Alexander Fabbri; Yilun Zhao; PeiFeng Wang; Chien-Sheng Wu; Shafiq Joty; Arman Cohan; |
146 | Are We Done with MMLU? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, we find that 57% of the analysed questions in the Virology subset contain errors. To address this issue, we introduce a comprehensive framework for identifying dataset errors using a novel error annotation protocol. |
Aryo Pradipta Gema; Joshua Ong Jun Leang; Giwon Hong; Alessio Devoto; Alberto Carlo Maria Mancino; Rohit Saxena; Xuanli He; Yu Zhao; Xiaotang Du; Mohammad Reza Ghasemi Madani; Claire Barale; Robert McHardy; Joshua Harris; Jean Kaddour; Emile Van Krieken; Pasquale Minervini; |
147 | XLAM: A Family of Large Action Models to Empower AI Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce xLAM, a series of large action models designed for AI agent tasks. |
Jianguo Zhang; Tian Lan; Ming Zhu; Zuxin Liu; Thai Quoc Hoang; Shirley Kokane; Weiran Yao; Juntao Tan; Akshara Prabhakar; Haolin Chen; Zhiwei Liu; Yihao Feng; Tulika Manoj Awalgaonkar; Rithesh R N; Zeyuan Chen; Ran Xu; Juan Carlos Niebles; Shelby Heinecke; Huan Wang; Silvio Savarese; Caiming Xiong; |
148 | WaterPool: A Language Model Watermark Mitigating Trade-Offs Among Imperceptibility, Efficacy and Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce WaterPool, a simple yet effective key module that preserves a complete key sampling space for imperceptibility while utilizing semantics-based search to improve the key restoration process. |
Baizhou Huang; Xiaojun Wan; |
149 | B4: A Black-Box Scrubbing Attack on LLM Watermarks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Targeting at a more realistic black-box threat model with fewer assumptions, we here propose B4, a black-box scrubbing attack on watermarks. |
Baizhou Huang; Xiao Pu; Xiaojun Wan; |
150 | A Bayesian Optimization Approach to Machine Translation Reranking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This algorithm scores candidates iteratively, choosing next candidates by balancing between exploration, choosing to score those that differ from candidates already scored, and exploitation, choosing to score those that resemble high-scoring candidates. This procedure finds high-scoring candidates while scoring only a fraction of the candidates list; given candidate lists of 200 random samples (before deduplication), our method achieves the same CometKiwi score using only 70 scoring evaluations on average compared to scoring a random subset of 180 candidates. |
Julius Cheng; Maike Züfle; Vilém Zouhar; Andreas Vlachos; |
151 | Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel evaluation framework based on sub-question coverage, which measures how well a RAG system addresses different facets of a question. |
Kaige Xie; Philippe Laban; Prafulla Kumar Choubey; Caiming Xiong; Chien-Sheng Wu; |
152 | Can Large Language Models Invent Algorithms to Improve Themselves? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the methods for improving LLMs are still designed by humans, which restricts the invention of new model-improving algorithms to human expertise and imagination. To address this, we propose the Self-Developing framework, which enables LLMs to autonomously generate and learn model-improvement algorithms. |
Yoichi Ishibashi; Taro Yano; Masafumi Oyamada; |
153 | Is Your LLM Outdated? A Deep Look at Temporal Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the concept of temporal generalization in LLMs, including bias in past and future generalizations. |
ChenghaoZhu ChenghaoZhu; Nuo Chen; Yufei Gao; Yunyi Zhang; Prayag Tiwari; Benyou Wang; |
154 | Revealing The Barriers of Language Agents in Planning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we apply the feature attribution study and identify two key factors that hinder agent planning: the limited role of constraints and the diminishing influence of questions. |
Jian Xie; Kexun Zhang; Jiangjie Chen; Siyu Yuan; Kai Zhang; Yikai Zhang; Lei Li; Yanghua Xiao; |
155 | On Behalf of The Stakeholders: Trends in NLP Model Interpretability in The Era of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address three fundamental questions: Why do we need interpretability, what are we interpreting, and how? |
Nitay Calderon; Roi Reichart; |
156 | Differentially Private Learning Needs Better Model Initialization and Self-Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce DPRefine, a three-phase method that initializes a model using data synthesis from a small pre-trained LM with rigorous filtering, applies DP finetuning on private data, and performs self-distillation to refine outputs. |
Ivoline C. Ngong; Joseph Near; Niloofar Mireshghallah; |
157 | Arabic Dataset for LLM Safeguard Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To uncover the impact of different stances in handling sensitive and controversial topics, we propose a dual-perspective evaluation framework. |
Yasser Ashraf; Yuxia Wang; Bin Gu; Preslav Nakov; Timothy Baldwin; |
158 | Causally Modeling The Linguistic and Social Factors That Predict Email Response Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we aim to model the intents, expectations, and responsiveness in email exchanges. |
Yinuo Xu; Hong Chen; Sushrita Rakshit; Aparna Ananthasubramaniam; Omkar Yadav; Mingqian Zheng; Michael Jiang; Lechen Zhang; Bowen Yi; Kenan Alkiek; Abraham Israeli; Bangzhao Shu; Hua Shen; Jiaxin Pei; Haotian Zhang; Miriam Schirmer; David Jurgens; |
159 | Self-Pluralising Culture Alignment for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose CultureSPA, a Self-Pluralising Culture Alignment framework that allows LLMs to simultaneously align to pluralistic cultures. |
Shaoyang Xu; Yongqi Leng; Linhao Yu; Deyi Xiong; |
160 | Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While previous studies have explored using large multimodal models (LMMs) as reward models for guiding preference modeling, their ability to accurately assess the quality of generated responses and their alignment with video content has not been conclusively demonstrated. This paper introduces a novel framework that utilizes detailed video captions as a proxy of video content, enabling language models to incorporate this information as supporting evidence for scoring video Question Answering (QA) predictions. |
Ruohong Zhang; Liangke Gui; Zhiqing Sun; Yihao Feng; Keyang Xu; Yuanhan Zhang; Di Fu; Chunyuan Li; Alexander G Hauptmann; Yonatan Bisk; Yiming Yang; |
161 | RuleR: Improving LLM Controllability By Rule-based Data Recycling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR), a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, which creates new training tasks to consolidate the controllability of LLMs. |
Ming Li; Han Chen; Chenguang Wang; Dang Nguyen; Dianqi Li; Tianyi Zhou; |
162 | Divergent Thoughts Toward One Goal: LLM-based Multi-Agent Collaboration System for Electronic Design Automation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Any errors will lead to the instability and failure of EDA flow automation. To address these challenges, we introduce EDAid, a multi-agent collaboration system where multiple agents harboring divergent thoughts converge towards a common goal, ensuring reliable and successful EDA flow automation. |
Haoyuan Wu; Haisheng Zheng; Zhuolun He; Bei Yu; |
163 | Sparser Mixture-of-Adapters with Cross-Layer Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View |
Ziyue Li; Tianyi Zhou; |
164 | Diverse In-Context Example Selection After Decomposing Programs and Aligned Utterances Improves Semantic Parsing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We focus on decomposing the pool of available ICE trees into fragments, some of which may be better suited to solving the test instance. |
Mayank Kothyari; Sunita Sarawagi; Soumen Chakrabarti; Gaurav Arora; Srujana Merugu; |
165 | From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct a systematic and principled evaluation of multimodal ICL for models of different scales on a broad spectrum of new yet critical tasks. |
Nan Xu; Fei Wang; Sheng Zhang; Hoifung Poon; Muhao Chen; |
166 | Cross-Lingual Transfer Learning for Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There has been increasing interest in building multilingual foundation models for NLP and speech research. This paper examines how to expand the speech translation capability of these models with restricted data. |
Rao Ma; Mengjie Qian; Yassir Fathullah; Siyuan Tang; Mark Gales; Kate Knill; |
167 | Can Unconfident LLM Annotations Be Used for Confident Conclusions? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Confidence-driven inference: a method that combines LLM annotations and LLM confidence indicators to strategically select which human annotations should be collected, with the goal of producing accurate statistical estimates and provably valid confidence intervals while reducing the number of human annotations needed. |
Kristina Gligoric; Tijana Zrnic; Cinoo Lee; Emmanuel Candes; Dan Jurafsky; |
168 | VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View |
Zejun Li; Ruipu Luo; Jiwen Zhang; Minghui Qiu; Xuanjing Huang; Zhongyu Wei; |
169 | MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce new scoring criteria to the evaluation, including memory injection, emotion support (ES) proficiency, and intimacy, to comprehensively assess generated responses. |
Junqing He; Liang Zhu; Rui Wang; Xi Wang; Gholamreza Haffari; Jiaxing Zhang; |
170 | On The Impact of Fine-Tuning on Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our research investigates the effect of fine-tuning on the reasoning abilities of LLMs, addressing critical questions regarding the impact of task-specific fine-tuning on overall reasoning capabilities, the influence of fine-tuning on Chain-of-Thought (CoT) reasoning performance, and the implications for the faithfulness of CoT reasonings. |
Elita Lobo; Chirag Agarwal; Himabindu Lakkaraju; |
171 | SHADES: Towards A Multilingual Assessment of Stereotypes in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While research has attempted to identify and mitigate such biases, most efforts have been concentrated around English, lagging the rapid advancement of LLMs in multilingual settings. In this paper, we introduce a new multilingual parallel dataset SHADES to help address this issue, designed for examining culturally-specific stereotypes that may be learned by LLMs. |
Margaret Mitchell; Giuseppe Attanasio; Ioana Baldini; Miruna Clinciu; Jordan Clive; Pieter Delobelle; Manan Dey; Sil Hamilton; Timm Dill; Jad Doughman; Ritam Dutt; Avijit Ghosh; Jessica Zosa Forde; Carolin Holtermann; Lucie-Aimée Kaffee; Tanmay Laud; Anne Lauscher; Roberto L Lopez-Davila; Maraim Masoud; Nikita Nangia; Anaelia Ovalle; Giada Pistilli; Dragomir Radev; Beatrice Savoldi; Vipul Raheja; Jeremy Qin; Esther Ploeger; Arjun Subramonian; Kaustubh Dhole; Kaiser Sun; Amirbek Djanibekov; Jonibek Mansurov; Kayo Yin; Emilio Villa Cueva; Sagnik Mukherjee; Jerry Huang; Xudong Shen; Jay Gala; Hamdan Al-Ali; Tair Djanibekov; Nurdaulet Mukhituly; Shangrui Nie; Shanya Sharma; Karolina Stanczak; Eliza Szczechla; Tiago Timponi Torrent; Deepak Tunuguntla; Marcelo Viridiano; Oskar Van Der Wal; Adina Yakefu; Aurélie Névéol; Mike Zhang; Sydney Zink; Zeerak Talat; |
172 | ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, current work overlooks the coherence between turns of dialogues, leading to a gap between the synthesized data and real-world scenarios. To address these issues, we propose a Graph-based Sampling strategy to sample more relevant tool combinations, and a Planned-generation strategy to create plans that guide the synthesis of coherent dialogues. |
Zezhong Wang; Xingshan Zeng; Weiwen Liu; Liangyou Li; Yasheng Wang; Lifeng Shang; Xin Jiang; Qun Liu; Kam-Fai Wong; |
173 | IrokoBench: A New Benchmark for African Languages in The Age of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce IrokoBench—a human-translated benchmark dataset for 17 typologically-diverse low-resource African languages covering three tasks: natural language inference(AfriXNLI), mathematical reasoning(AfriMGSM), and multi-choice knowledge-based QA(AfriMMLU). |
David Ifeoluwa Adelani; Jessica Ojo; Israel Abebe Azime; Jian Yun Zhuang; Jesujoba Oluwadara Alabi; Xuanli He; Millicent Ochieng; Sara Hooker; Andiswa Bukula; En-Shiun Annie Lee; Chiamaka Ijeoma Chukwuneke; Happy Buzaaba; Blessing Kudzaishe Sibanda; Godson Koffi Kalipe; Jonathan Mukiibi; Salomon Kabongo Kabenamualu; Foutse Yuehgoh; Mmasibidi Setaka; Lolwethu Ndolela; Nkiruka Odu; Rooweither Mabuya; Salomey Osei; Shamsuddeen Hassan Muhammad; Sokhar Samb; Tadesse Kebede Guge; Tombekai Vangoni Sherman; Pontus Stenetorp; |
174 | How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent research has focused on literary machine translation (MT) as a new challenge in MT. However, the evaluation of literary MT remains an open problem. We contribute to this ongoing discussion by introducing LITEVAL-CORPUS, a paragraph-level parallel corpus containing verified human translations and outputs from 9 MT systems, which totals over 2k translations and 13k evaluated sentences across four language pairs, costing 4. |
Ran Zhang; Wei Zhao; Steffen Eger; |
175 | Pretrained Image-Text Models Are Secretly Video Captioners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our adapted model demonstrates top-tier performance on major benchmarks, ranking 2nd on MSR-VTT and MSVD, and 3rd on VATEX. |
Chunhui Zhang; Yiren Jian; Zhongyu Ouyang; Soroush Vosoughi; |
176 | Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: How can “weak teacher models” (Bowman et al. , 2022) such as average human annotators or existing AI systems, effectively supervise LLMs to improve performance on hard reasoning tasks, especially those that challenge and requires expertise or daily practice from the teacher models? In this paper, we seek for empirical answers to this question by investigating various data-driven strategies that offer supervision data at different quality levels upon tasks of varying complexity. |
Xuan He; Da Yin; Nanyun Peng; |
177 | CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, on challenging coding tasks with extremely large search space, current agentic approaches still struggle with multi-stage planning, generating, and debugging. To address this problem, we propose CodeTree, a framework for LLM agents to efficiently explore the search space in different stages of the code generation process. |
Jierui Li; Hung Le; Yingbo Zhou; Caiming Xiong; Silvio Savarese; Doyen Sahoo; |
178 | EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current LLMs exhibit satisfactory instruction-following capabilities based on instruction-following fine-tuning process. Motivated by this, in this paper, we introduce EASYTOOL, a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction to fully leverage instruction-following capabilities of LLMs for easier tool usage. |
Siyu Yuan; Kaitao Song; Jiangjie Chen; Xu Tan; Yongliang Shen; Kan Ren; Dongsheng Li; Deqing Yang; |
179 | EvoAgent: Towards Automatic Multi-Agent Generation Via Evolutionary Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce EVOAGENT, a generic method to automatically extend specialized agents to multi-agent systems via the evolutionary algorithm, thereby improving the effectiveness of LLM-based agents in solving tasks. |
Siyu Yuan; Kaitao Song; Jiangjie Chen; Xu Tan; Dongsheng Li; Deqing Yang; |
180 | ALPACA AGAINST VICUNA: Using LLMs to Uncover Memorization of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the overlooked impact of instruction-tuning on memorization in large language models (LLMs), which has largely been studied in base, pre-trained models. |
Aly M. Kassem; Omar Mahmoud; Niloofar Mireshghallah; Hyunwoo Kim; Yulia Tsvetkov; Yejin Choi; Sherif Saad; Santu Rana; |
181 | QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is common for a single image to be associated with multiple questions, and LVLMs may still answer other questions correctly even for an adversarial image attacked by a specific question. To address this, we introduce the query-agnostic visual attack (QAVA), which aims to create robust adversarial examples that generate incorrect responses to unspecified and unknown questions. |
Yudong Zhang; Ruobing Xie; Jiansheng Chen; Xingwu Sun; Zhanhui Kang; Yu Wang; |
182 | LLM The Genius Paradox: A Linguistic and Math Expert’s Struggle with Simple Word-based Counting Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we carefully design multiple evaluation settings to investigate validity of prevalent conjectures. |
Nan Xu; Xuezhe Ma; |
183 | Anticipating Future with Large Language Model for Simultaneous Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by human interpreters’ technique to forecast future words before hearing them, we propose Translation by Anticipating Future (TAF), a method to improve translation quality while retaining low latency. |
Siqi Ouyang; Oleksii Hrinchuk; Zhehuai Chen; Vitaly Lavrukhin; Jagadeesh Balam; Lei Li; Boris Ginsburg; |
184 | NAT: Enhancing Agent Tuning with Negative Samples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For instance, existing SFT approaches typically utilize only positive examples, limiting their efficiency in low-resource scenarios. To address this, we introduce Negative-Aware Training (NAT), a straightforward yet effective method that leverages both successful and failed trajectories for fine-tuning, maximizing the utility of limited resources. |
Renxi Wang; Xudong Han; Yixuan Zhang; Timothy Baldwin; Haonan Li; |
185 | Towards Operationalizing Right to Data Protection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce **RegText**, a framework that injects imperceptible spurious correlations into natural language datasets, effectively rendering them unlearnable without affecting semantic content. |
Abhinav Java; Simra Shahid; Chirag Agarwal; |
186 | Analyzing Memorization in Large Language Models Through The Lens of Model Attribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing research has mainly focused on post-hoc analyses—such as extracting memorized content or developing memorization metrics—without exploring the underlying architectural factors that contribute to memorization. In this work, we investigate memorization from an architectural lens by analyzing how attention modules at different layers impact its memorization and generalization performance. |
Tarun Ram Menta; Susmit Agrawal; Chirag Agarwal; |
187 | GroundCocoa: A Benchmark for Evaluating Compositional & Conditional Reasoning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we choose to study compositional and conditional reasoning, two aspects that are central to human cognition, and introduce GroundCocoa – a lexically diverse benchmark connecting these reasoning skills to the real-world problem of flight booking. |
Harsh Kohli; Sachin Kumar; Huan Sun; |
188 | Large Language Models Are Cross-Lingual Knowledge-Free Reasoners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the relationship between capabilities in different languages is less explored. In this work, we decompose the process of reasoning tasks into two separated components: knowledge retrieval and knowledge-free reasoning, and analyze the relationship between cross-lingual transferability and these two components. |
Peng Hu; Sizhe Liu; Changjiang Gao; Xin Huang; Xue Han; Junlan Feng; Chao Deng; Shujian Huang; |
189 | SafetyQuizzer: Timely and Dynamic Evaluation on The Safety of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This hinders the effective application of these benchmarks in continuous evaluation tasks. To address these limitations, we propose SafetyQuizzer, a question-generation framework designed to evaluate the safety of LLMs more sustainably in the Chinese context. |
Zhichao Shi; Shaoling Jing; Yi Cheng; Hao Zhang; Yuanzhuo Wang; Jie Zhang; Huawei Shen; Xueqi Cheng; |
190 | A Novel Computational Modeling Foundation for Automatic Coherence Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such an approach may not capture the full range of factors contributing to coherence. To remedy this, in this work we employ the formal linguistic definition by Reinhart:1980 of what makes a discourse coherent, consisting of three conditions, cohesion, consistency and relevance, and formalize these conditions as respective computational tasks, which are in turn jointly trained. |
Aviya Maimon; |
191 | SELFGOAL: Your Language Agents Already Know How to Achieve High-level Goals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present SELFGOAL, a novel automatic approach designed to enhance agents’ capabilities to achieve high-level goals with limited human prior and environmental feedback. |
Ruihan Yang; Jiangjie Chen; Yikai Zhang; Siyu Yuan; Aili Chen; Kyle Richardson; Yanghua Xiao; Deqing Yang; |
192 | Language Models “Grok” to Copy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel perspective that Transformer-based language models develop copying abilities similarly to grokking, which refers to sudden generalization on test set long after the model fit to the training set. |
Ang Lv; Ruobing Xie; Xingwu Sun; Zhanhui Kang; Rui Yan; |
193 | Little Giants: Synthesizing High-Quality Embedding Data at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce SPEED, a framework that aligns open-source small models (8B) to efficiently generate large-scale synthetic embedding data. |
Haonan Chen; Liang Wang; Nan Yang; Yutao Zhu; Ziliang Zhao; Furu Wei; Zhicheng Dou; |
194 | FactTrack: Time-Aware World State Tracking in Story Outlines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel method, FactTrack, for tracking atomic facts and addressing factual contradictions. |
Zhiheng Lyu; Kevin Yang; Lingpeng Kong; Dan Klein; |
195 | Scaling LLM Inference Efficiently with Optimized Sample Compute Allocation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our experiments show that with our learned mixed allocation, we can achieve accuracy better than the best single configuration with 128x less compute on code generation and 25x less compute on 4 reasoning tasks. |
Kexun Zhang; Shang Zhou; Danqing Wang; William Yang Wang; Lei Li; |
196 | Decoding Hate: Exploring Language Models’ Reactions to Hate Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 5, GPT-4, and Gemini Pro) to hate speech. Through qualitative analysis, we aim to reveal the spectrum of responses these models produce, highlighting their capacity to handle hate speech inputs. |
Paloma Piot; Javier Parapar; |
197 | Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, the results demonstrate a strong negative correlation between stance bias and stance detection performance, underscoring the importance of mitigating bias to enhance the utility of LLMs in stance detection. Therefore, in this paper, we propose a Counterfactual Augmented Calibration Network (FACTUAL), which a novel calibration network is devised to calibrate potential bias in the stance prediction of LLMs. |
Ang Li; Jingqian Zhao; Bin Liang; Lin Gui; Hui Wang; Xi Zeng; Xingwei Liang; Kam-Fai Wong; Ruifeng Xu; |
198 | Investigating Human Values in Online Communities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To study the dynamics of communities online, we propose a method to computationally analyse values present on Reddit. |
Nadav Borenstein; Arnav Arora; Lucie-Aimée Kaffee; Isabelle Augenstein; |
199 | Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods, including LLMs, rely on rigid outlines or lack macro-level planning, making it difficult to achieve both contextual consistency and coherent plot development in long-form story generation. To address this issues, we propose Dynamic Hierarchical Outlining with Memory-Enhancement long-form story generation method, named DOME, to generate the long-form story with coherent content and plot. |
Qianyue Wang; Jinwu Hu; Zhengping Li; Yufeng Wang; Daiyuan Li; Yu Hu; Mingkui Tan; |
200 | Evaluating Defeasible Reasoning in LLMs with DEFREASING Related Papers Related Patents Related Grants Related Venues Related Experts View |
Emily Allaway; Kathleen McKeown; |
201 | Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While many previous works have addressed this issue in LLM via machine unlearning, it remains largely unexplored for MLLMs. To tackle this challenge, we introduce Multimodal Large Language Model Unlearning Benchmark (MLLMU-Bench), a novel benchmark aimed at advancing the understanding of multimodal machine unlearning. |
Zheyuan Liu; Guangyao Dou; Mengzhao Jia; Zhaoxuan Tan; Qingkai Zeng; Yongle Yuan; Meng Jiang; |
202 | CultureInstruct: Curating Multi-Cultural Instructions at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models, despite their remarkable success in recent years, still exhibit severe cultural bias. Therefore, in this paper, we introduce CultureInstruct, a large-scale instruction-tuning dataset designed to reduce cultural bias in LLMs. |
Viet Thanh Pham; Zhuang Li; Lizhen Qu; Gholamreza Haffari; |
203 | Automatic Input Rewriting Improves Translation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an empirical study of 21 input rewriting methods with 3 open-weight LLMs for translating from English into 6 target languages. |
Dayeon Ki; Marine Carpuat; |
204 | LRQ: Optimizing Post-Training Quantization for Large Language Models By Learning Low-Rank Weight-Scaling Matrices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing post-training quantization (PTQ) techniques for quantizing weights and activations of LLMs still suffer from non-negligible accuracy drops, especially on massive multitask language understanding. To address this issue, we propose Low-Rank Quantization (LRQ) – a simple yet effective post-training weight quantization method for LLMs that reconstructs the outputs of an intermediate Transformer block by leveraging low-rank weight-scaling matrices, replacing the conventional full weight-scaling matrices that entail as many learnable scales as their associated weights. |
Jung Hyun Lee; Jeonghoon Kim; June Yong Yang; Se Jung Kwon; Eunho Yang; Kang Min Yoo; Dongsoo Lee; |
205 | Inference-Time Selective Debiasing to Enhance Fairness in Text Classification Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose selective debiasing – an inference-time safety mechanism designed to enhance the overall model quality in terms of prediction performance and fairness, especially in scenarios where retraining the model is impractical. |
Gleb Kuzmin; Neemesh Yadav; Ivan Smirnov; Timothy Baldwin; Artem Shelmanov; |
206 | Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, a U. S. domestic travel planning benchmark TravelPlanner was proposed in Xie et al. (2024), where the best LLM OpenAI o1-preview can only find viable travel plans with a 10% success rate given all needed information. In this work, we tackle this by proposing an LLM-based planning framework that formalizes and solves complex multi-constraint planning problems as constrained satisfiability problems, which are further consumed by sound and complete satisfiability solvers. |
Yilun Hao; Yongchao Chen; Yang Zhang; Chuchu Fan; |
207 | CAVE: Controllable Authorship Verification Explanations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current offline AV models however have lower downstream utility due to limited accuracy (eg: traditional stylometry AV systems) and lack of accessible post-hoc explanations. In this work, we address the above challenges by developing a trained, offline model CAVE (Controllable Authorship Verification Explanations). |
Sahana Ramnath; Kartik Pandey; Elizabeth Boschee; Xiang Ren; |
208 | UFO: A UI-Focused Agent for Windows OS Interaction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce UFO, a UI-Fcused agent designed to fulfill user requests tailored to Windows OS applications by observing and analyzing the GUI and control information of these applications. |
Chaoyun Zhang; Liqun Li; Shilin He; Xu Zhang; Bo Qiao; Si Qin; Minghua Ma; Yu Kang; Qingwei Lin; Saravan Rajmohan; Dongmei Zhang; Qi Zhang; |
209 | Exploring The Potential of Large Language Models for Heterophilic Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the potential of LLMs for modeling heterophilic graphs and propose a novel two-stage framework: LLM-enhanced edge discriminator and LLM-guided edge reweighting. |
Yuxia Wu; Shujie Li; Yuan Fang; Chuan Shi; |
210 | EmojiPrompt: Generative Prompt Obfuscation for Privacy-Preserving Communication with Cloud-based LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, they also introduce privacy concerns: firstly, numerous studies underscore the risks to user privacy posed by jailbreaking cloud-based LLMs; secondly, the LLM service providers have access to all user data, which deters individuals from confidently utilizing such services. To address such concerns, we propose a simple yet effective paradigm, **EmojiPrompt**, to protect user privacy. |
Sam Lin; Wenyue Hua; Zhenting Wang; Mingyu Jin; Lizhou Fan; Yongfeng Zhang; |
211 | DenseSSM: State Space Models with Dense Hidden Connection for Efficient Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces DenseSSM, a novel approach to enhance the flow of hidden information between layers in SSMs. |
Wei He; Kai Han; Yehui Tang; Chengcheng Wang; Yujie Yang; Tianyu Guo; Yunhe Wang; |
212 | The Impact of Domain-Specific Terminology on Machine Translation for Finance in European Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present the first impact analysis of domain-specific terminology on multilingual MT for finance, focusing on European languages within the subdomain of macroeconomics. |
Arturo Oncevay; Charese Smiley; Xiaomo Liu; |
213 | Intrinsic Bias Is Predicted By Pretraining Data and Correlates with Downstream Performance in Vision-Language Encoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present the largest comprehensive analysis to-date of how the upstream pre-training factors and downstream performance of CLIP models relate to their intrinsic biases. |
Kshitish Ghate; Isaac Slaughter; Kyra Wilson; Mona T. Diab; Aylin Caliskan; |
214 | Lost in Inference: Rediscovering The Role of Natural Language Inference for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate if NLI tasks, that are rarely used for LLM evaluation, can still be informative for evaluating LLMs. |
Lovish Madaan; David Esiobu; Pontus Stenetorp; Barbara Plank; Dieuwke Hupkes; |
215 | BPO: Towards Balanced Preference Optimization Between Knowledge Breadth and Depth in Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: BPO is motivated by the observation that the usefulness of knowledge varies across samples, necessitating tailored learning of knowledge depth. To achieve this, we introduce gradient-based clustering, estimating the knowledge informativeness and usefulness of each augmented sample based on the model’s optimization direction. |
Sizhe Wang; Yongqi Tong; Hengyuan Zhang; Dawei Li; Xin Zhang; Tianlong Chen; |
216 | On The Analysis and Distillation of Emergent Outlier Properties in Pre-trained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that emergent outlier dimensions contribute significantly more to zero-shot performance than non-outlier dimensions. Based on this, we propose the Emergent Outlier Focused Distillation (EOFD) method, which prioritizes critical outlier dimensions in distillation using a weighted MSE loss. |
Tianyang Zhao; Kunwar Yashraj Singh; Srikar Appalaraju; Peng Tang; Ying Nian Wu; Li Erran Li; |
217 | SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, adapting general-purpose RAG systems to specialized fields such as science and medicine poses unique challenges due to distribution shifts and limited access to domain-specific data. To tackle this, we propose SimRAG, a self-training approach that equips LLMs with joint capabilities of question answering and question generation for domain adaptation. |
Ran Xu; Hui Liu; Sreyashi Nag; Zhenwei Dai; Yaochen Xie; Xianfeng Tang; Chen Luo; Yang Li; Joyce C. Ho; Carl Yang; Qi He; |
218 | PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a prompt optimization approach that uses a smaller, fine-tuned language model to compress input data for evaluation prompt, thus reducing token usage and computational cost when using larger LLMs for downstream evaluation. |
Daniil Larionov; Steffen Eger; |
219 | PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from Related Example Banks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose PromptRefine, a novel Alternating Minimization approach for example selection that improves ICL performance on low-resource Indic languages. |
Soumya Suvra Ghosal; Soumyabrata Pal; Koyel Mukherjee; Dinesh Manocha; |
220 | WorkTeam: Constructing Workflows from Natural Language with Multi-Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent advancements in Large Language Models (LLMs) have improved the generation of workflows from natural language instructions (aka NL2Workflow), yet existing single LLM agent-based methods face performance degradation on complex tasks due to the need for specialized knowledge and the strain of task-switching. To tackle these challenges, we propose WorkTeam, a multi-agent NL2Workflow framework comprising a supervisor, orchestrator, and filler agent, each with distinct roles that collaboratively enhance the conversion process. |
Hanchao Liu; Rongjun Li; Weimin Xiong; Ziyu Zhou; Wei Peng; |
221 | World Models with Hints of Large Language Models for Goal Achieving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by human cognition, we propose a new multi-modal model-based RL approach named Dreaming with Large Language Models (DLLM). |
Zeyuan Liu; Ziyu Huan; Xiyao Wang; Jiafei Lyu; Jian Tao; Xiu Li; Furong Huang; Huazhe Xu; |
222 | PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce PoisonedParrot: the first stealthy data poisoning attack that induces an LLM to generate copyrighted content even when the model has not been directly trained on the specific copyrighted material. |
Michael-Andrei Panaitescu-Liess; Pankayaraj Pathmanathan; Yigitcan Kaya; Zora Che; Bang An; Sicheng Zhu; Aakriti Agrawal; Furong Huang; |
223 | Benchmarking Language Model Creativity: A Case Study on Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a framework for quantifying LLM creativity that incorporates the two design ingredients: (1) We introduce DENIAL PROMPTING which pushes LLMs to develop more creative solutions to a given problem by incrementally imposing new constraints on the previous solution, compelling LLMs to adopt new strategies. |
Yining Lu; Dixuan Wang; Tianjian Li; Dongwei Jiang; Sanjeev Khudanpur; Meng Jiang; Daniel Khashabi; |
224 | Effective Skill Unlearning Through Intervention and Abstention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View |
Yongce Li; Chung-En Sun; Tsui-Wei Weng; |
225 | SMAB: MAB Based Word Sensitivity Estimation Framework and Its Applications in Adversarial Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Though effective, calculating sensitivity at scale using this framework is costly because of exponential time complexity. Therefore, we introduce a Sensitivity-based Multi-Armed Bandit framework (SMAB), which provides a scalable approach for calculating word-level local (sentence-level) and global (aggregated) sensitivities concerning an underlying text classifier for any dataset. |
Saurabh Kumar Pandey; Sachin Vashistha; Debrup Das; Somak Aditya; Monojit Choudhury; |
226 | Dynamic Uncertainty Ranking: Enhancing Retrieval-Augmented In-Context Learning for Long-Tail Knowledge in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To take advantage of the uncertainty in ICL for guiding LLM predictions toward correct answers on long-tail samples, we propose a reinforcement learning-based dynamic uncertainty ranking method for retrieval-augmented ICL that accounts for the varying impact of each retrieved sample on LLM predictions. |
Shuyang Yu; Runxue Bao; Parminder Bhatia; Taha Kass-Hout; Jiayu Zhou; Cao Xiao; |
227 | IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce IFIR, the first comprehensive benchmark designed to evaluate instruction-following information retrieval (IR) in expert domains. |
Tingyu Song; Guo Gan; Mingsheng Shang; Yilun Zhao; |
228 | The State and Fate of Summarization Datasets: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, it is challenging to discover existing resources or identify coherent research directions. To address this, we survey a large body of work spanning 133 datasets in over 100 languages, creating a novel ontology covering sample properties, collection methods and distribution. |
Noam Dahan; Gabriel Stanovsky; |
229 | Learning to Substitute Words with Model-based Score Ranking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To circumvent this issue, we instead employ a model-based scoring (BARTScore) to quantify sentence quality, thus forgoing the need for human annotations. Specifically, we use this score to define a distribution for each word substitution, allowing one to test whether a substitution is statistically superior relative to others. |
Hongye Liu; Ricardo Henao; |
230 | Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Large Multi-modal Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve LMM model performance on underrepresented data, we propose and evaluate several prompting strategies using non-English, geographic, and socioeconomic attributes. |
Joan Nwatu; Oana Ignat; Rada Mihalcea; |
231 | Step-by-Step Fact Verification System for Medical Claims with Explainable Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we apply an iterative FV system on three medical fact-checking datasets and evaluate it with multiple settings, including different LLMs, external web search, and structured reasoning using logic predicates. |
Juraj Vladika; Ivana Hacajova; Florian Matthes; |
232 | MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial performance, undermining the credibility of these evaluations. To address this issue while maintaining the efficiency of MCQ evaluations, we propose MMEVALPRO, a benchmark designed to avoid Type-I errors through a trilogy evaluation pipeline and more rigorous metrics. |
Jinsheng Huang; Liang Chen; Taian Guo; Fu Zeng; Yusheng Zhao; Bohan Wu; Ye Yuan; Haozhe Zhao; Zhihui Guo; Yichi Zhang; Jingyang Yuan; Wei Ju; Luchen Liu; Tianyu Liu; Baobao Chang; Ming Zhang; |
233 | Prompt Compression for Large Language Models: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Leveraging large language models (LLMs) for complex natural language tasks typically requires long-form prompts to convey detailed requirements and information, which results in increased memory usage and inference costs. To mitigate these challenges, multiple efficient methods have been proposed, with prompt compression gaining significant research interest. |
Zongqian Li; Yinhong Liu; Yixuan Su; Nigel Collier; |
234 | Balancing Forget Quality and Model Utility: A Reverse KL-Divergence Knowledge Distillation Approach for Better Unlearning in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods based on gradient ascent and its variants often struggle with balancing forget quality and model utility, leading to either over unlearning or partial unlearning. To address this challenge, we propose Reverse KL-Divergence based Knowledge Distillation for Unlearning (RKLU), a novel unlearning method for LLMs. |
Bichen Wang; Yuzhe Zi; Yixin Sun; Yanyan Zhao; Bing Qin; |
235 | PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce PAT (Parameter-free Audio-Text aligner), a simple and training-free method aimed at boosting zero-shot audio classification performance of CLAP-like ALMs. |
Ashish Seth; Ramaneswaran Selvakumar; Sonal Kumar; Sreyan Ghosh; Dinesh Manocha; |
236 | Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, Temperature Sampling exhibits lower variance in gradient estimation, leading to faster convergence but a higher risk of overfitting. Based on these insights, we propose Cooldown, a strategy that starts by heavily upsampling low-resource languages to accelerate convergence and gradually reduces the upsampling to prevent overfitting—achieving the best of both worlds. |
Tianjian Li; Haoran Xu; Weiting Tan; Kenton Murray; Daniel Khashabi; |
237 | A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To systematically investigate different techniques of cross-layer KV sharing, we propose a unified framework that covers several recent methods and their novel variants. |
You Wu; Haoyi Wu; Kewei Tu; |
238 | S2-MAD: Breaking The Token Barrier to Enhance Multi-Agent Debate Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this strategy results in a significant increase in token costs, presenting a barrier to scalability. To address this challenge, we introduce a novel sparsification strategy designed to reduce token costs within MAD. |
Yuting Zeng; Weizhe Huang; Lei Jiang; Tongxuan Liu; XiTai Jin; Chen Tianying Tiana; Jing Li; Xiaohua Xu; |
239 | FedSpaLLM: Federated Pruning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenge of pruning LLMs in privacy-preserving settings, we propose FedSpaLLM, the first federated learning framework designed specifically for pruning LLMs. |
Guangji Bai; Yijiang Li; Zilinghan Li; Liang Zhao; Kibaek Kim; |
240 | Evaluating and Improving Graph to Text Generation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To further improve LLMs in planning with graph sequences and grounding in truth, we introduce a new graph-to-text dataset, PlanGTG, annotated with two sub-tasks: reordering and attribution. |
Jie He; Yijun Yang; Wanqiu Long; Deyi Xiong; Victor Gutierrez Basulto; Jeff Z. Pan; |
241 | SLM-Mod: Small Language Models Surpass LLMs at Content Moderation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models (LLMs) have shown promise in many natural language understanding tasks, including content moderation. |
Xianyang Zhan; Agam Goyal; Yilun Chen; Eshwar Chandrasekharan; Koustuv Saha; |
242 | Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision–Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How well do current vision-language models (VLMs) navigate these nuances? To investigate this, we create the first multimodal and multilingual parallel hate speech dataset, annotated by a multiculturally diverse set of annotators, called Multi3Hate. |
Minh Duc Bui; Katharina Von Der Wense; Anne Lauscher; |
243 | Evaluating Input Feature Explanations Through A Unified Diagnostic Evaluation Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these explanation types have only been studied in isolation, making it difficult to judge their respective applicability. To bridge this gap, we develop a unified framework that facilitates an automated and direct comparison between highlight and interactive explanations comprised of four diagnostic properties. |
Jingyi Sun; Pepa Atanasova; Isabelle Augenstein; |
244 | Knowledge Graph Guided Evaluation of Abstention Techniques Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on evaluating the underlying techniques that cause models to abstain. |
Kinshuk Vasisht; Navreet Kaur; Danish Pruthi; |
245 | Verifiable By Design: Aligning Language Models to Quote from Pre-Training Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: _We propose Quote-Tuning, which demonstrates the feasibility of aligning models to quote. |
Jingyu Zhang; Marc Marone; Tianjian Li; Benjamin Van Durme; Daniel Khashabi; |
246 | Meta-Cultural Competence: Climbing The Right Hill of Cultural Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, “culture” is a complex, multifaceted topic, and its awareness, representation, and modeling in LLMs and LLM-based applications can be defined and measured in numerous ways. In this position paper, we ask what does it mean for an LLM to possess “cultural awareness”, and through a thought experiment, which is an extension of the Octopus test proposed by Bender and Koller (2020), we argue that it is not cultural awareness or knowledge, rather meta-cultural competence, which is required of an LLM and LLM-based AI system that will make it useful across various, including completely unseen, cultures. |
Sougata Saha; Saurabh Kumar Pandey; Monojit Choudhury; |
247 | Reading Between The Lines: Can LLMs Identify Cross-Cultural Communication Gaps? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the extent and patterns of gaps in understandability of book reviews due to the presence of culturally-specific items and elements that might be alien to users from another culture. |
Sougata Saha; Saurabh Kumar Pandey; Harshit Gupta; Monojit Choudhury; |
248 | Stronger Universal and Transferable Attacks By Suppressing Refusals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Contrary to this belief, we find that the adversarial prompts discovered by such optimizers are inherently prompt-universal and transferable, even when optimized on a single model and a single harmful request. To further exploit this phenomenon, we introduce IRIS, a new objective to these optimizers to explicitly deactivate the safety feature to create an even stronger universal and transferable attack. |
David Huang; Avidan Shah; Alexandre Araujo; David Wagner; Chawin Sitawarin; |
249 | Pointwise Mutual Information As A Performance Gauge for Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there is no method to date that exploits this phenomenon to improve generation. To fill this gap, in this study, we show that the pointwise mutual information between a context and a question is an effective gauge for language model performance. |
Tianyu Liu; Jirui Qi; Paul He; Arianna Bisazza; Mrinmaya Sachan; Ryan Cotterell; |
250 | Towards Lifelong Dialogue Agents Via Timeline-based Memory Management Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present THEANINE, a framework for LLM-based lifelong dialogue agents. |
Kai Tzu-iunn Ong; Namyoung Kim; Minju Gwak; Hyungjoo Chae; Taeyoon Kwon; Yohan Jo; Seung-won Hwang; Dongha Lee; Jinyoung Yeo; |
251 | REFFLY: Melody-Constrained Lyrics Editing Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces REFFLY (REvision Framework For LYrics), the first revision framework for editing and generating melody-aligned lyrics. |
Songyan Zhao; Bingxuan Li; Yufei Tian; Nanyun Peng; |
252 | Knowledge Graph-Guided Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Knowledge Graph-Guided Retrieval Augmented Generation (KG2RAG) framework that utilizes knowledge graphs (KGs) to provide fact-level relationships between chunks, improving the diversity and coherence of the retrieved results. |
Xiangrong Zhu; Yuexiang Xie; Yi Liu; Yaliang Li; Wei Hu; |
253 | LLM-Based Explicit Models of Opponents for Multi-Agent Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Explicit Models of Opponents (EMO) based on Large Language Models (LLMs), enabling agents to better predict and adapt to diverse, dynamic multi-agent interactions. |
XiaoPeng Yu; Wanpeng Zhang; Zongqing Lu; |
254 | SylloBio-NLI: Evaluating Large Language Models on Biomedical Syllogistic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents SylloBio-NLI, a novel framework that leverages external ontologies to systematically instantiate diverse syllogistic arguments for biomedical NLI. |
Magdalena Wysocka; Danilo Carvalho; Oskar Wysocki; Marco Valentino; Andre Freitas; |
255 | Evaluating LLMs for Quotation Attribution in Literary Texts: A Case Study of LLaMa3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we evaluate the ability of Llama-3 at attributing utterances of direct-speech to their speaker in novels. |
Gaspard Michel; Elena V. Epure; Romain Hennequin; Christophe Cerisara; |
256 | Getting More Juice Out of Your Data: Hard Pair Refinement Enhances Visual-Language Models Without Extra Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce HELIP, a cost-effective strategy that improves CLIP models by exploiting challenging text-image pairs within existing datasets in continuous training. |
Haonan Wang; Minbin Huang; Runhui Huang; Lanqing Hong; Hang Xu; Tianyang Hu; Xiaodan Liang; Zhenguo Li; Hong Cheng; Kenji Kawaguchi; |
257 | The Stochastic Parrot on LLM’s Shoulder: A Summative Assessment of Physical Concept Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a summative assessment over a carefully designed physical concept understanding task, P HYSI C O. |
Mo Yu; Lemao Liu; Junjie Wu; Tsz Ting Chung; Shunchi Zhang; Jiangnan Li; Dit-Yan Yeung; Jie Zhou; |
258 | Where Is The Answer? An Empirical Study of Positional Bias for Parametric Knowledge Extraction in Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, LMs suffer from a phenomenon called “perplexity curse”; despite minimizing document perplexity during training, LMs struggle to extract information via a question prompt. In this paper, we study the problem by fine-tuning LMs for new data and find a very intriguing fact that all studied LMs suffer from positional bias in the training document, i. e. , they struggle to answer questions about the information described in the middle or at the end of the training document. |
Kuniaki Saito; Chen-Yu Lee; Kihyuk Sohn; Yoshitaka Ushiku; |
259 | Communication Makes Perfect: Persuasion Dataset Construction Via Multi-LLM Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a multi-LLM communication framework designed to enhance the generation of persuasive data automatically. |
Weicheng Ma; Hefan Zhang; Ivory Yang; Shiyu Ji; Joice Chen; Farnoosh Hashemi; Shubham Mohole; Ethan Gearey; Michael Macy; Saeed Hassanpour; Soroush Vosoughi; |
260 | Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery Without Task Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Instruct-LF, a goal-oriented latent factor discovery system that integrates LLM’s instruction-following ability with statistical models to handle large, noisy datasets where LLM reasoning alone falls short. |
Zhouhang Xie; Tushar Khot; Bhavana Dalvi Mishra; Harshit Surana; Julian McAuley; Peter Clark; Bodhisattwa Prasad Majumder; |
261 | How to Make LLMs Forget: On Reversing In-Context Knowledge Edits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our continuous reversal tokens prove particularly effective, with minimal impact on unedited prompts. Through analysis of output distributions, attention patterns, and token rankings, we provide insights into IKE’s effects on LLMs and how reversal tokens mitigate them. |
Paul Youssef; Zhixue Zhao; Jörg Schlötterer; Christin Seifert; |
262 | RAG LLMs Are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct a detailed comparative analysis of RAG and non-RAG frameworks with eleven LLMs. |
Bang An; Shiyue Zhang; Mark Dredze; |
263 | Reverse Question Answering: Can An LLM Write A Question So Hard (or Bad) That It Can’t Answer? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By finding question and answer types that lead to RQA errors, we suggest improvements for LLM reasoning. |
Nishant Balepur; Feng Gu; Abhilasha Ravichander; Shi Feng; Jordan Lee Boyd-Graber; Rachel Rudinger; |
264 | Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel framework, entity decomposition with filtering, or EDF. |
Reza Averly; Xia Ning; |
265 | IHEval: Evaluating Language Models on Following The Instruction Hierarchy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its importance, this topic receives limited attention, and there is a lack of comprehensive benchmarks for evaluating models’ ability to follow the instruction hierarchy. We bridge this gap by introducing IHEval, a novel benchmark comprising 3,538 examples across nine tasks, covering cases where instructions in different priorities either align or conflict. |
Zhihan Zhang; Shiyang Li; Zixuan Zhang; Xin Liu; Haoming Jiang; Xianfeng Tang; Yifan Gao; Zheng Li; Haodong Wang; Zhaoxuan Tan; Yichuan Li; Qingyu Yin; Bing Yin; Meng Jiang; |
266 | Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To reduce the potential redundancies of datasets, we make the first attempt and propose a novel dynamic data mixture for MoE instruction tuning. |
Tong Zhu; Daize Dong; Xiaoye Qu; Jiacheng Ruan; Wenliang Chen; Yu Cheng; |
267 | Self-Harmonized Chain of Thought Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose ECHO (Self-Harmonized Chain of Thought), a novel method that unifies diverse solution paths into a consistent and effective reasoning pattern. |
Ziqi Jin; Wei Lu; |
268 | A Systematic Examination of Preference Learning Through The Lens of Instruction-Following Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we systematically investigate how specific attributes of preference datasets affect the alignment and downstream performance of LLMs in instruction-following tasks. |
Joongwon Kim; Anirudh Goyal; Aston Zhang; Bo Xiong; Rui Hou; Melanie Kambadur; Dhruv Mahajan; Hannaneh Hajishirzi; Liang Tan; |
269 | Pay More Attention to Images: Numerous Images-Oriented Multimodal Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering that most existing metrics evaluate summaries from a unimodal perspective, we propose a new Multimodal Information evaluation (M-info) method, measuring the differences between the generated summary and the multimodal input. |
Min Xiao; Junnan Zhu; Feifei Zhai; Chengqing Zong; Yu Zhou; |
270 | UDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data Regimes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, the distillation process requires a large amount of data thereby limiting its applicability in low-resource settings. To address this, we propose a distillation framework that does not require any labeled data. |
Abdul Waheed; Karima Kadaoui; Bhiksha Raj; Muhammad Abdul-Mageed; |
271 | Sneaking Syntax Into Transformer Language Models with Tree Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work instead aims to softly inject syntactic inductive biases into given transformer circuits, through a structured regularizer. |
Ananjan Nandi; Christopher D Manning; Shikhar Murty; |
272 | An Efficient Gloss-Free Sign Language Translation Using Spatial Configurations and Motion Dynamics with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By contrast, we emphasize the importance of capturing the spatial configurations and motion dynamics in sign language. With this in mind, we introduce Spatial and Motion-based Sign Language Translation (SpaMo), a novel LLM-based SLT framework. |
Eui Jun Hwang; Sukmin Cho; Junmyeong Lee; Jong C. Park; |
273 | Audio Is The Achilles’ Heel: Red Teaming Audio Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we comprehensively red team the safety of five advanced audio LMMs under three settings: (i) harmful questions in both audio and text formats, (ii) harmful questions in text format accompanied by distracting non-speech audio, and (iii) speech-specific jailbreaks. |
Hao Yang; Lizhen Qu; Ehsan Shareghi; Gholamreza Haffari; |
274 | C2: Scalable Auto-Feedback for LLM-based Chart Generation Related Papers Related Patents Related Grants Related Venues Related Experts View |
Woosung Koh; Janghan Yoon; MinHyung Lee; Youngjin Song; Jaegwan Cho; Jaehyun Kang; Taehyeon Kim; Se-Young Yun; Youngjae Yu; Bongshin Lee; |
275 | Substance Beats Style: Why Beginning Students Fail to Code with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that substance beats style: a poor grasp of technical vocabulary is merely correlated with prompt failure; that the information content of prompts predicts success; that students get stuck making trivial edits; and more. |
Francesca Lucchetti; Zixuan Wu; Arjun Guha; Molly Q Feldman; Carolyn Jane Anderson; |
276 | EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel method that can resolve the issue of inconsistent tokens accepted by different samples without necessitating an increase in memory or computing overhead. |
Yunsheng Ni; Chuanjian Liu; Yehui Tang; Kai Han; Yunhe Wang; |
277 | Self-calibration for Language Model Quantization and Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose self-calibration as a solution. |
Miles Williams; George Chrysostomou; Nikolaos Aletras; |
278 | Entropy-Based Decoding for Retrieval-Augmented Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite their success, retrieval-augmented LLMs still face the distractibility issue, where the generated responses are negatively influenced by noise from both external and internal knowledge sources. In this paper, we introduce a novel, training-free decoding method guided by entropy considerations to mitigate this issue. |
Zexuan Qiu; Zijing Ou; Bin Wu; Jingjing Li; Aiwei Liu; Irwin King; |
279 | SSMLoRA: Enhancing Low-Rank Adaptation with State Space Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose SSMLoRA (**S**tate **S**pace **M**odel **L**ow-**R**ank **A**daptation), an extension of LoRA that incorporates a State Space Model (SSM) to interconnect low-rank matrices. |
Jiayang Yu; Yihang Zhang; Bin Wang; Peiqin Lin; YongKang Liu; Shi Feng; |
280 | Palette of Language Models: A Solver for Controlled Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A common approach is to linearly combine single-attribute models, but this strategy often overlooks attribute overlaps and can lead to conflicts. Therefore, we propose a novel combination strategy inspired by the Law of Total Probability and Conditional Mutual Information Minimization on generative language models. |
Zhe Yang; Yi Huang; Yaqin Chen; XiaotingWu XiaotingWu; Junlan Feng; Chao Deng; |
281 | Learning to Solve Domain-Specific Calculation Problems with Knowledge-Intensive Programs Generator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate into knowledge-intensive calculation problems. |
Chengyuan Liu; Shihang Wang; Lizhi Qing; Jun Lin; Ji Zhang; Fei Wu; Kun Kuang; |
282 | From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose integrating attention analysis with LLaVA-CAM, concretely, attention scores highlight relevant regions during forward propagation, while LLaVA-CAM captures gradient changes through backward propagation, revealing key image features. |
Xiaofeng Zhang; Yihao Quan; Chen Shen; Xiaosong Yuan; Shaotian Yan; Liang Xie; Wenxiao Wang; Chaochen Gu; Hao Tang; Jieping Ye; |
283 | What Did I Do Wrong? Quantifying LLMs’ Sensitivity and Consistency to Prompt Engineering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Developers that want to include these models in their software stack, however, face a dreadful challenge: debugging LLMs’ inconsistent behavior across minor variations of the prompt. We therefore introduce two metrics for classification tasks, namely *sensitivity* and *consistency*, which are complementary to task performance. |
Federico Errica; Davide Sanvito; Giuseppe Siracusano; Roberto Bifulco; |
284 | Measuring Memorization in Language Models Via Probabilistic Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce probabilistic discoverable extraction, which, without additional cost, relaxes discoverable extraction by considering multiple queries to quantify the probability of extracting a target sequence. |
Jamie Hayes; Marika Swanberg; Harsh Chaudhari; Itay Yona; Ilia Shumailov; Milad Nasr; Christopher A. Choquette-Choo; Katherine Lee; A. Feder Cooper; |
285 | ToW: Thoughts of Words Improve Reasoning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce thoughts of words (ToW), a novel training-time data-augmentation method for next-word prediction. |
Zhikun Xu; Ming Shen; Jacob Dineen; Zhaonan Li; Xiao Ye; Shijie Lu; Aswin Rrv; Chitta Baral; Ben Zhou; |
286 | Language Models Predict Empathy Gaps Between Social In-groups and Out-groups Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Studies of human psychology have demonstrated that people are more motivated to extend empathy to in-group members than out-group members (Cikara et al. , 2011). In this study, we investigate how this aspect of intergroup relations in humans is replicated by LLMs in an emotion intensity prediction task. |
Yu Hou; Hal Daumé Iii; Rachel Rudinger; |
287 | CORRECT: Context- and Reference-Augmented Reasoning and Prompting for Fact-Checking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most fact-checking models mainly focus on the reasoning within evidence sentences, and ignore the auxiliary contexts and references. To address this problem, we propose a novel method, Context- and Reference-augmented Reasoning and Prompting. |
Delvin Ce Zhang; Dongwon Lee; |
288 | Entangled Relations: Leveraging NLI and Meta-analysis to Enhance Biomedical Relation Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent research efforts have explored the potential of leveraging natural language inference (NLI) techniques to enhance relation extraction (RE). In this vein, we introduce MetaEntail-RE, a novel adaptation method that harnesses NLI principles to enhance RE performance. |
William P Hogan; Jingbo Shang; |
289 | Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design Related Papers Related Patents Related Grants Related Venues Related Experts View |
Mohan Zhang; Pingzhi Li; Jie Peng; Mufan Qiu; Tianlong Chen; |
290 | FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents FinEval, a benchmark designed to evaluate LLMs’ financial domain knowledge and practical abilities. |
Xin Guo; Haotian Xia; Zhaowei Liu; Hanyang Cao; Zhi Yang; Zhiqiang Liu; Sizhe Wang; Jinyi Niu; Chuqi Wang; Yanhui Wang; Xiaolong Liang; Xiaoming Huang; Bing Zhu; Zhongyu Wei; Yun Chen; Weining Shen; Liwen Zhang; |
291 | PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This imbalance presents a significant challenge, as LLMs are increasingly used across diverse contexts without adequate evaluation of their cultural competence in non-English languages, including Persian. To address this gap, we introduce PerCul, a carefully constructed dataset designed to assess the sensitivity of LLMs toward Persian culture. |
Erfan Moosavi Monazzah; Vahid Rahimzadeh; Yadollah Yaghoobzadeh; Azadeh Shakery; Mohammad Taher Pilehvar; |
292 | ACCESS : A Benchmark for Abstract Causal Event Discovery and Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View |
Vy Vo; Lizhen Qu; Tao Feng; Yuncheng Hua; Xiaoxi Kang; Songhai Fan; Tim Dwyer; Lay-Ki Soon; Gholamreza Haffari; |
293 | VisCGEC: Benchmarking The Visual Chinese Grammatical Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View |
Xiaoman Wang; Dan Yuan; Xin Liu; Yike Zhao; Xiaoxiao Zhang; Xizhi Chen; Yunshi Lan; |
294 | MGM: Global Understanding of Audience Overlap Graphs for Predicting The Factuality and The Bias of News Media Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the classification problem of profiling news media from the lens of political bias and factuality. |
Muhammad Arslan Manzoor; Ruihong Zeng; Dilshod Azizov; Preslav Nakov; Shangsong Liang; |
295 | DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we begin with a detailed analysis aimed at disentangling risks through step-by-step reasoning within multimodal inputs. |
Jianyu Liu; Hangyu Guo; Ranjie Duan; Xingyuan Bu; Yancheng He; Shilong Li; Hui Huang; Jiaheng Liu; Yucheng Wang; Chenchen Jing; Xingwei Qu; Xiao Zhang; Pei Wang; Yanan Wu; Jihao Gu; Yangguang Li; Jianke Zhu; |
296 | LLM2: Let Large Language Models Harness System 2 Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Drawing inspiration from the dual-process theory of human cognition, we introduce LLM2, a novel framework that combines an LLM (System 1) with a process-based verifier (System 2). |
Cheng Yang; Chufan Shi; Siheng Li; Bo Shui; Yujiu Yang; Wai Lam; |
297 | Navigating The Path of Writing: Outline-guided Text Generation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose WritingPath, a framework that uses explicit outlines to guide LLMs in generating goal-oriented, high-quality text. |
Yukyung Lee; Soonwon Ka; Bokyung Son; Pilsung Kang; Jaewook Kang; |
298 | Graph Neural Network Enhanced Retrieval for Question Answering of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel retrieval method, called GNN-Ret, which leverages graph neural networks (GNNs) to enhance retrieval by exploiting the relatedness between passages. |
Zijian Li; Qingyan Guo; Jiawei Shao; Lei Song; Jiang Bian; Jun Zhang; Rui Wang; |
299 | AudioBench: A Universal Benchmark for Audio Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce AudioBench, a universal benchmark designed to evaluate Audio Large Language Models (AudioLLMs). |
Bin Wang; Xunlong Zou; Geyu Lin; Shuo Sun; Zhuohan Liu; Wenyu Zhang; Zhengyuan Liu; AiTi Aw; Nancy F. Chen; |
300 | It Is Not Only The Negative That Deserves Attention! Understanding, Generation & Evaluation of (Positive) Moderation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We advance the understanding of Positive Moderation by annotating a dataset on 13 moderation properties, e. g. neutrality, clarity and curiosity. We extract instructions from professional moderation guidelines and use them to prompt LLaMA to generate such moderation. |
Iman Jundi; Eva Maria Vecchi; Carlotta Quensel; Neele Falk; Gabriella Lapesa; |
301 | Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an approach to summary faithfulness evaluation in which multiple LLM-based agents are assigned initial stances (regardless of what their belief might be) and forced to come up with a reason to justify the imposed belief, thus engaging in a multi-round debate to reach an agreement. |
Mahnaz Koupaee; Jake W. Vincent; Saab Mansour; Igor Shalyminov; Han He; Hwanjun Song; Raphael Shu; Jianfeng He; Yi Nian; Amy Wing-mei Wong; Kyu J. Han; Hang Su; |
302 | Hazards in Daily Life? Enabling Robots to Proactively Detect and Resolve Anomalies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we introduce a multi-agent brainstorming approach, where agents collaborate and generate diverse scenarios covering household hazards, hygiene management, and child safety. |
Zirui Song; Guangxian Ouyang; Meng Fang; Hongbin Na; Zijing Shi; Zhenhao Chen; Fu Yujie; Zeyu Zhang; Shiyu Jiang; Miao Fang; Ling Chen; Xiuying Chen; |
303 | WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and in underrepresented cultural contexts. To evaluate their understanding of such knowledge, we introduce WorldCuisines, a massive-scale benchmark for multilingual and multicultural, visually grounded language understanding. |
Genta Indra Winata; Frederikus Hudi; Patrick Amadeus Irawan; David Anugraha; Rifki Afina Putri; Wang Yutong; Adam Nohejl; Ubaidillah Ariq Prathama; Nedjma Ousidhoum; Afifa Amriani; Anar Rzayev; Anirban Das; Ashmari Pramodya; Aulia Adila; Bryan Wilie; Candy Olivia Mawalim; Cheng Ching Lam; Daud Abolade; Emmanuele Chersoni; Enrico Santus; Fariz Ikhwantri; Garry Kuwanto; Hanyang Zhao; Haryo Akbarianto Wibowo; Holy Lovenia; Jan Christian Blaise Cruz; Jan Wira Gotama Putra; Junho Myung; Lucky Susanto; Maria Angelica Riera Machin; Marina Zhukova; Michael Anugraha; Muhammad Farid Adilazuarda; Natasha Christabelle Santosa; Peerat Limkonchotiwat; Raj Dabre; Rio Alexander Audino; Samuel Cahyawijaya; Shi-Xiong Zhang; Stephanie Yulia Salim; Yi Zhou; Yinxuan Gui; David Ifeoluwa Adelani; En-Shiun Annie Lee; Shogo Okada; Ayu Purwarianti; Alham Fikri Aji; Taro Watanabe; Derry Tanti Wijaya; Alice Oh; Chong-Wah Ngo; |
304 | A Template Is All You Meme Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To demonstrate the power of meme templates, we create TSplit, a method to reorganize datasets, where a template or templatic instance can only appear in either the training or test split. |
Luke Bates; Peter Ebert Christensen; Preslav Nakov; Iryna Gurevych; |
305 | Legal Judgment Prediction Based on Knowledge-enhanced Multi-Task and Multi-Label Text Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the challenge of predicting relevant law articles and charges within the framework of legal judgment prediction, treating it as a multi-task and multi-label text classification problem. |
Ang Li; Yiquan Wu; Ming Cai; Adam Jatowt; Xiang Zhou; Weiming Lu; Changlong Sun; Fei Wu; Kun Kuang; |
306 | Style Transfer with Multi-iteration Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Numerous recent techniques for text style transfer characterize their approaches as variants of reinforcement learning and preference optimization. In this work, we consider the relationship between these approaches and a class of optimization approaches developed primarily for (non-neural) statistical machine translation, formerly known as ‘tuning’. |
Shuai Liu; Jonathan May; |
307 | MixLLM: Dynamic Routing in Mixed Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the challenges involve: (1) dynamic trade-offs among quality, cost, and latency; (2) enabling continual learning in deployed systems; and (3) navigating a varying (e. g. , new LLM addition or old LLM removal) set of LLM candidates over time. To bridge these gaps, we develop MixLLM, a dynamic contextual-bandit-based routing system for query-LLM assignment. |
Xinyuan Wang; Yanchi Liu; Wei Cheng; Xujiang Zhao; Zhengzhang Chen; Wenchao Yu; Yanjie Fu; Haifeng Chen; |
308 | CAST: Corpus-Aware Self-similarity Enhanced Topic Modelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In parallel, it is found that functional words are frequently selected over topical words. To address these limitations, we introduce **CAST**: **C**orpus-**A**ware **S**elf-similarity Enhanced **T**opic modelling, a novel topic modelling method that builds upon candidate centroid word embeddings contextualized on the dataset, and a novel self-similarity-based method to filter out less meaningful tokens. |
Yanan Ma; Chenghao Xiao; Chenhan Yuan; Sabine N Van Der Veer; Lamiece Hassan; Chenghua Lin; Goran Nenadic; |
309 | Zero-Shot ATC Coding with Large Language Models for Clinical Assessments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Manual assignment of Anatomical Therapeutic Chemical (ATC) codes to prescription records is a significant bottleneck in healthcare research and operations at Ontario Health and InterRAI Canada, requiring extensive expert time and effort. To automate this process while maintaining data privacy, we develop a practical approach using locally deployable large language models (LLMs). |
Zijian Chen; John-Michael Gamble; Jimmy Lin; |
310 | M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While, fully synthetic datasets are a promising alternative, research on their use in multilingual domain is limited as existing approaches still rely on machine translation to improve multilingual performance. To bridge this gap we introduce M2Lingual, the first fully synthetic, multi-turn multilingual dataset having 175K conversations across 70 languages with a balanced mix of high, low and mid-resourced languages. |
Rishabh Maheshwary; Vikas Yadav; Hoang H Nguyen; Khyati Mahajan; Sathwik Tejaswi Madhusudhan; |
311 | Auto-Cypher: Improving LLMs on Cypher Generation Via LLM-supervised Generation-verification Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an automated, LLM Supervised, pipeline to generate high quality synthetic data for Text2Cypher. |
Aman Tiwari; Shiva Krishna Reddy Malay; Vikas Yadav; Masoud Hashemi; Sathwik Tejaswi Madhusudhan; |
312 | LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the first systematic evaluation examining format bias in performance of large language models (LLMs). |
Do Xuan Long; Ngoc-Hai Nguyen; Tiviatis Sim; Hieu Dao; Shafiq Joty; Kenji Kawaguchi; Nancy F. Chen; Min-Yen Kan; |
313 | AutoParLLM: GNN-guided Context Generation for Zero-Shot Code Parallelization Using LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes AutoParLLM, a novel way to generate context using guidance from graph neural networks (GNNs) to generate efficient parallel codes. |
Quazi Ishtiaque Mahmud; Ali TehraniJamsaz; Hung D Phan; Le Chen; Mihai Capotă; Theodore L. Willke; Nesreen K. Ahmed; Ali Jannesari; |
314 | Retrieval, Reasoning, Re-ranking: A Context-Enriched Framework for Knowledge Graph Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Apart from triples, entity contexts (e. g. , labels, descriptions, aliases) also play a significant role in augmenting KGs. To address these limitations, we propose KGR3, a context-enriched framework for KGC. |
Muzhi Li; Cehao Yang; Chengjin Xu; Xuhui Jiang; Yiyan Qi; Jian Guo; Ho-fung Leung; Irwin King; |
315 | Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing approaches to automatic prompting aim to optimize individual techniques instead of compositions of techniques and their dependence on the input. To fill this gap, we propose an adaptive prompting approach that predicts the optimal prompt composition ad-hoc for a given input. |
Maximilian Spliethöver; Tim Knebler; Fabian Fumagalli; Maximilian Muschalik; Barbara Hammer; Eyke Hüllermeier; Henning Wachsmuth; |
316 | Harnessing and Evaluating The Intrinsic Extrapolation Ability of Large Language Models for Vehicle Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Safe integration of LLMs into vehicles, however, necessitates their thorough understanding of dynamic traffic environments. Towards this end, this study introduces a framework leveraging LLMs’ built-in extrapolation capabilities for vehicle trajectory prediction, thereby evaluating their comprehension of the evolution of traffic agents’ behaviors and interactions over time. |
Jiawei Liu; Yanjiao Liu; Xun Gong; Tingting Wang; Hong Chen; Yunfeng Hu; |
317 | GameTox: A Comprehensive Dataset and Analysis for Enhanced Toxicity Detection in Online Gaming Communities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce GameTox, a novel dataset comprising 53K game chat utterances annotated for toxicity detection through intent classification and slot filling. |
Usman Naseem; Shuvam Shiwakoti; Siddhant Bikram Shah; Surendrabikram Thapa; Qi Zhang; |
318 | CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To evaluate the effectiveness of LLMs in handling complex code development tasks of research projects, particularly for NLP/CV/AI/ML/DM topics, we introduce CSR-Bench, a benchmark for Computer Science Research projects. |
Yijia Xiao; Runhui Wang; Luyang Kong; Davor Golac; Wei Wang; |
319 | Towards Quantifying Commonsense Reasoning with Mechanistic Insights Related Papers Related Patents Related Grants Related Venues Related Experts View |
Abhinav Joshi; Areeb Ahmad; Divyaksh Shukla; Ashutosh Modi; |
320 | Towards Efficient and Multifaceted Computer-assisted Pronunciation Training Leveraging Hierarchical Selective State Space Model and Decoupled Cross-entropy Loss Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is generally expected that a full-fledged CAPT system should perform both functionalities simultaneously and efficiently. In response to this surging demand, we in this work first propose HMamba, a novel CAPT approach that seamlessly integrates APA and MDD tasks in parallel. |
Fu-An Chao; Berlin Chen; |
321 | ALiiCE: Evaluating Positional Fine-grained Citation Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing research on citation generation is predominantly limited to sentence-level statements, neglecting the significance of positional fine-grained citations that can appear anywhere within sentences. To facilitate further exploration of the positional fine-grained citation generation, we propose ALiiCE, the first automatic evaluation framework for this task. |
Yilong Xu; Jinhua Gao; Xiaoming Yu; Baolong Bi; Huawei Shen; Xueqi Cheng; |
322 | Is It Navajo? Accurate Language Detection for Endangered Athabaskan Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study evaluates Google’s Language Identification (LangID) tool, which does not currently support any Native American languages. To address this, we introduce a random forest classifier trained on Navajo and twenty erroneously suggested languages by LangID. |
Ivory Yang; Weicheng Ma; Chunhui Zhang; Soroush Vosoughi; |
323 | Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Detecting factual inconsistency for long document summarization remains challenging, given the complex structure of the source article and long summary length. In this work, we study factual inconsistency errors and connect them with a line of discourse analysis. |
Yang Zhong; Diane Litman; |
324 | Identifying Emerging Concepts in Large Corpora Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new method to identify emerging concepts in large text corpora. |
Sibo Ma; Julian Nyarko; |
325 | Are Explicit Belief Representations Necessary? A Comparison Between Large Language Models and Bayesian Probabilistic Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To date, the most successful computationally explicit approach to making inferences about others’ beliefs is the Rational Speech Act (RSA) framework, a Bayesian probabilistic model that encodes explicit representations of beliefs. In the present study, we ask whether LLMs outperform RSA in predicting human belief inferences, even though they do not explicitly encode belief representations. |
Dingyi Pan; Ben Bergen; |
326 | Social Norms in Cinema: A Cross-Cultural Analysis of Shame, Pride and Prejudice Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the first cross-cultural dataset of over 10k shame/pride-related expressions with underlying social expectations from ~5. |
Sunny Rai; Khushang Zaveri; Shreya Havaldar; Soumna Nema; Lyle Ungar; Sharath Chandra Guntuku; |
327 | SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce , a novel SVD-based LLM compression method that optimizes singular value truncation in SVD compression with two key strategies. |
Xin Wang; Samiul Alam; Zhongwei Wan; Hui Shen; Mi Zhang; |
328 | Exploring Large Language Models for Effective Rumor Detection on Social Media Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore using Large Language Models (LLMs) for rumor detection on social media. |
Yirong Zeng; Xiao Ding; Bibo Cai; Ting Liu; Bing Qin; |
329 | Instruct-of-Reflection: Enhancing Large Language Models Iterative Reflection Capabilities Via Dynamic-Meta Instruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on our empirical evidence, we find that current static reflection methods may lead to redundant, drift, and stubborn issues. To mitigate this, we introduce **I**nstruct-**o**f-**R**eflec**t**ion (**IoRT**), a novel and general reflection framework that leverages dynamic-meta instruction to enhance the iterative reflection capability of LLMs. |
Liping Liu; Chunhong Zhang; Likang Wu; Chuang Zhao; Zheng Hu; Ming He; Jianping Fan; |
330 | Chinese Morph Resolution in E-commerce Live Streaming Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: E-commerce live streaming in China, particularly on platforms like Douyin, has become a major sales channel, but hosts often use morphs to evade scrutiny and engage in false advertising. This study introduces the Live Auditory Morph Resolution (LiveAMR) task to detect such violations. |
Jiahao Zhu; Jipeng Qiang; Ran Bai; Chenyu Liu; Xiaoye Ouyang; |
331 | Context-Efficient Retrieval with Factual Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we demonstrate that pre-processing the external corpus into semi-structured “atomic facts” makes retrieval more efficient. |
Yanhong Li; David Yunis; David McAllester; Jiawei Zhou; |
332 | AgentMove: A Large Language Model Based Agentic Framework for Zero-shot Next Location Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce AgentMove, a systematic agentic prediction framework to achieve generalized next location prediction. |
Jie Feng; Yuwei Du; Jie Zhao; Yong Li; |
333 | On Positional Bias of Faithfulness for Long-form Summarization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large Language Models (LLMs) often exhibit positional bias in long-context settings, under-attending to information in the middle of inputs. We investigate the presence of this bias in long-form summarization, its impact on faithfulness, and various techniques to mitigate this bias. |
David Wan; Jesse Vig; Mohit Bansal; Shafiq Joty; |
334 | Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering the limitation of previous approaches, we propose MCA, which constructs an expert prompt and an adversarial prompt for each objective to contrast at the decoding time and balances the objectives through combining the contrast. |
Tingchen Fu; Yupeng Hou; Julian McAuley; Rui Yan; |
335 | Enhancing Language Model Hypernetworks with Restart: A Study on Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a comprehensive investigation into optimization strategies for hypernetworks remains absent. To address this gap, we analyze the loss landscape of hypernetworks and propose that restart optimization strategies can improve their performance for language models. |
Yihan Zhang; Jie Fu; Rongrong Ji; Jie Chen; |
336 | Markov Chain of Thought for Efficient Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View |
Wen Yang; Minpeng Liao; Kai Fan; |
337 | FLEX: Expert-level False-Less EXecution Metric for Text-to-SQL Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, this paper introduces **FLEX(False-Less EXecution)**, a novel approach to evaluating text-to-SQL systems using large language models (LLMs) to emulate human expert-level evaluation of SQL queries. |
Heegyu Kim; Jeon Taeyang; SeungHwan Choi; Seungtaek Choi; Hyunsouk Cho; |
338 | Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we critically assess the limitations of the state-of-the-art training-free technique, the logit lens, in handling generalized visual hallucinations. |
Anirudh Phukan; Divyansh Divyansh; Harshit Kumar Morj; Vaishnavi Vaishnavi; Apoorv Saxena; Koustava Goswami; |
339 | H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods employ either textual reasoning, which excels in semantic interpretation but struggles with mathematical operations, or symbolic reasoning, which handles computations well but lacks semantic understanding. This paper introduces a novel algorithm H-STAR that integrates both symbolic and semantic (textual) approaches in a two-stage process to address these limitations. |
Nikhil Abhyankar; Vivek Gupta; Dan Roth; Chandan K. Reddy; |
340 | AID: Adaptive Integration of Detectors for Safe AI with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, defining safety is complex, given that entities across domains may interpret it through varied lenses and develop safety detectors—models trained to identify specific unsafe content based on predefined criteria. To address this complexity, we introduce the approach of Adaptive Integration of Detectors (AID) to orchestrate the strengths of multiple pretrained detectors to ensure comprehensive effectiveness in diverse scenarios. |
Xinran Wang; Enmao Diao; Qi Le; Jie Ding; Ali Anwar; |
341 | Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this research, we propose MBR-BoN, a variant of BoN that aims to mitigate reward hacking at inference time by incorporating the Minimum Bayes Risk (MBR) objective as a proximity regularization term. |
Yuu Jinnai; Tetsuro Morimura; Kaito Ariu; Kenshi Abe; |
342 | Fine-Tuning Large Language Models with Sequential Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that existing instruction-tuned models usually struggle to adhere to a query with multiple intentions, which impairs their performance when the completion of several tasks is demanded by a single command. |
Hanxu Hu; Simon Yu; Pinzhen Chen; Edoardo Ponti; |
343 | Bottom-Up Synthesis of Knowledge-Grounded Task-Oriented Dialogues with Iteratively Self-Refined Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a bottom-up conversation synthesis approach, where QA pairs are generated first and then combined into a coherent dialogue. |
Kun Qian; Maximillian Chen; Siyan Li; Arpit Sharma; Zhou Yu; |
344 | Is A Peeled Apple Still Red? Evaluating LLMs’ Ability for Conceptual Combination with Property Type Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, previous studies have evaluated a limited set of properties and have not examined the generative process. To address this gap, we introduce the Conceptual Combination with Property Type dataset (CCPT), which consists of 12. |
Seokwon Song; Taehyun Lee; Jaewoo Ahn; Jae Hyuk Sung; Gunhee Kim; |
345 | Behavior-SD: Behaviorally Aware Spoken Dialogue Generation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These models struggle to explicitly model these behavioral traits, resulting in a less natural and personalized communication style that aligns with user needs. To address this challenge, we make two key contributions. First, we introduce Behavior-SD, a large-scale dataset containing over 100K spoken dialogues (2,164 hours) annotated with various conversational behaviors, synthesized via LLMs to model diverse full-duplex interactions. |
Sehun Lee; Kang-wook Kim; Gunhee Kim; |
346 | COVE: COntext and VEracity Prediction for Out-of-context Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce COVE, a new method that predicts first the true COntext of the image and then uses it to predict the VEracity of the caption. |
Jonathan Tonglet; Gabriel Thiem; Iryna Gurevych; |
347 | One Fish, Two Fish, But Not The Whole Sea: Alignment Reduces Language Models’ Conceptual Diversity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by human studies, we use a new way of measuring the conceptual diversity of synthetically-generated LLM “populations” by relating the internal variability of simulated individuals to the population-level variability. |
Sonia Krishna Murthy; Tomer Ullman; Jennifer Hu; |
348 | Steering Knowledge Selection Behaviours in LLMs Via SAE-Based Representation Engineering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose SpARE, a training-free representation engineering method that uses pre-trained sparse auto-encoders (SAEs) to control the knowledge selection behaviour of LLMs. |
Yu Zhao; Alessio Devoto; Giwon Hong; Xiaotang Du; Aryo Pradipta Gema; Hongru Wang; Xuanli He; Kam-Fai Wong; Pasquale Minervini; |
349 | TurkingBench: A Challenge Benchmark for Web Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such tasks are often found on crowdsourcing platforms, where crowdworkers engage in challenging micro-tasks within web-based environments. Building on this idea, we present TurkingBench, a benchmark consisting of tasks presented as web pages with textual instructions and multi-modal contexts. |
Kevin Xu; Yeganeh Kordi; Tanay Nayak; Adi Asija; Yizhong Wang; Kate Sanders; Adam Byerly; Jingyu Zhang; Benjamin Van Durme; Daniel Khashabi; |
350 | Superlatives in Context: Modeling The Implicit Semantics of Superlatives Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we provide an extensive computational study on the semantics of superlatives. |
Valentina Pyatkin; Bonnie Webber; Ido Dagan; Reut Tsarfaty; |
351 | Take The Essence and Discard The Dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct a focused review of recent data selection techniques for fine-tuning LLMs, analyzing a dozen key studies. |
Ziche Liu; Rui Ke; Yajiao Liu; Feng Jiang; Haizhou Li; |
352 | Evaluating Evidence Attribution in Generated Fact Checking Explanations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore evidence attribution for fact-checking explanation generation. |
Rui Xing; Timothy Baldwin; Jey Han Lau; |
353 | Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Self-consistency mitigates hallucinations in Large Language Models (LLMs) by sampling multiple reasoning paths, but it lacks a systematic approach to determine the optimal number of samples or select the most faithful rationale. To address this limitation, we introduce Reasoning-Aware Self-Consistency (RASC), a novel framework that enhances sampling efficiency and reasoning faithfulness by dynamically evaluating both outputs and rationales. |
Guangya Wan; Yuqi Wu; Jie Chen; Sheng Li; |
354 | The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our contributions are as follows: (1) We introduce MosAIC, a Multi-Agent framework to enhance cross-cultural Image Captioning using LMMs with distinct cultural personas; (2) We provide a dataset of culturally enriched image captions in English for images from China, India, and Romania across three datasets: GeoDE, GD-VCR, CVQA; (3) We propose a culture-adaptable metric for evaluating cultural information within image captions; and (4) We show that the multi-agent interaction outperforms single-agent models across different metrics, and offer valuable insights for future research. |
Longju Bai; Angana Borah; Oana Ignat; Rada Mihalcea; |
355 | Adapting Sentence-level Automatic Metrics for Document-level Simplification Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our study, we propose a novel approach to adapt existing sentence-level metrics for paragraph- or document-level simplification. |
Mounica Maddela; Fernando Alva-Manchego; |
356 | Exploring Straightforward Methods for Automatic Conversational Red-Teaming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the use of off-the-shelf LLMs in conversational red-teaming settings, where an attacker LLM attempts to elicit undesired outputs from a target LLM. |
George Kour; Naama Zwerdling; Marcel Zalmanovici; Ateret Anaby Tavor; Ora Nova Fandina; Eitan Farchi; |
357 | A Diverse and Effective Retrieval-Based Debt Collection System with Expert Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a debt collection system based on real debtor-collector data from a major commercial bank. |
Jiaming Luo; Weiyi Luo; Guoqing Sun; Mengchen Zhu; Haifeng Tang; Kenny Q. Zhu; Mengyue Wu; |
358 | AutoKB: Automated Creation of Structured Knowledge Bases for Domain-Specific Support Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, a sanitized KB is essential to ensure solution accuracy, precision, and domain compliance. To address this, we propose AutoKB, an automated pipeline for building a domain-specific KB with a hierarchical tree structure that maps user issues to precise and domain-compliant solutions. |
Rishav Sahay; Arihant Jain; Purav Aggarwal; Anoop Saladi; |
359 | Diversity Helps Jailbreak Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We have uncovered a powerful jailbreak technique that leverages large language models’ ability to diverge from prior context, enabling them to bypass safety constraints and … |
Weiliang Zhao; Daniel Ben-Levi; Wei Hao; Junfeng Yang; Chengzhi Mao; |
360 | Few-shot Personalization of LLMs with Mis-aligned Responses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new approach for a few-shot personalization of LLMs with their mis-aligned responses (Fermi). |
Jaehyung Kim; Yiming Yang; |
361 | Wav2Prompt: End-to-End Speech Prompt Learning and Task-based Fine-tuning for Text-based LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Wav2Prompt uses a straightforward training process with only the same data used to train an automatic speech recognition (ASR) model. |
Keqi Deng; Guangzhi Sun; Phil Woodland; |
362 | LLaMA-Berry: Pairwise Optimization for Olympiad-level Mathematical Reasoning Via O1-like Monte Carlo Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents LLaMA-Berry, an advanced mathematical reasoning framework to enhance the problem-solving ability of large language models (LLMs). |
Di Zhang; Jianbo Wu; Jingdi Lei; Tong Che; Jiatong Li; Tong Xie; Xiaoshui Huang; Shufei Zhang; Marco Pavone; Yuqiang Li; Wanli Ouyang; Dongzhan Zhou; |
363 | ProSE: Diffusion Priors for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, for a wide variety of applications, SE systems need to be employed in real-time, and traditional diffusion models (DMs) requiring many iterations of a large model during inference are inefficient. To address these issues, we propose ProSE (diffusion-based Priors for SE), a novel methodology based on an alternative framework for applying diffusion models to SE. |
Sonal Kumar; Sreyan Ghosh; Utkarsh Tyagi; Anton Jeran Ratnarajah; Chandra Kiran Reddy Evuru; Ramani Duraiswami; Dinesh Manocha; |
364 | SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we adopt a red-teaming strategy to enhance LLM safety and introduce SeqAR, a simple yet effective framework to design jailbreak prompts automatically. |
Yan Yang; Zeguan Xiao; Xin Lu; Hongru Wang; Xuetao Wei; Hailiang Huang; Guanhua Chen; Yun Chen; |
365 | Commonality and Individuality! Integrating Humor Commonality with Speaker Individuality for Humor Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current methods for humor recognition mainly suffer from two limitations: (1) they solely focus on one aspect of humor commonalities, ignoring the multifaceted nature of humor; and (2) they typically overlook the critical role of speaker individuality, which is essential for a comprehensive understanding of humor expressions. To bridge these gaps, we introduce the Commonality and Individuality Incorporated Network for Humor Recognition (CIHR), a novel model designed to enhance humor recognition by integrating multifaceted humor commonalities with the distinctive individuality of speakers. |
Haohao Zhu; Xiaokun Zhang; Zeyuan Zeng; Junyu Lu; Zewen Bai; Liang Yang; Hongfei Lin; |
366 | MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose MiLoRA, a simple yet effective LLM finetuning approach that only updates the minor singular components of the weight matrix while keeping the principal singular components frozen. |
Hanqing Wang; Yixia Li; Shuo Wang; Guanhua Chen; Yun Chen; |
367 | Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the acoustic modeling capabilities of frozen self-supervised speech model (S3M) features, we propose MixGoP, a novel approach that leverages Gaussian mixture models to model phoneme distributions with multiple subclusters. |
Kwanghee Choi; Eunjung Yeo; Kalvin Chang; Shinji Watanabe; David R Mortensen; |
368 | When2Call: When (not) to Call Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a new benchmark, When2Call, which evaluates tool-calling decision-making: when to generate a tool call, when to ask follow-up questions and when to admit the question can’t be answered with the tools provided. |
Hayley Ross; Ameya Sunil Mahabaleshwarkar; Yoshi Suhara; |
369 | CoRAG: Collaborative Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce CoRAG, a framework extending RAG to collaborative settings, where clients jointly train a shared model using a collaborative passage store. |
Aashiq Muhamed; Mona T. Diab; Virginia Smith; |
370 | MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the downstream tasks of medical calculators, which use standardized tests to assess an individual’s health status. |
Yakun Zhu; Shaohang Wei; Xu Wang; Kui Xue; Shaoting Zhang; Xiaofan Zhang; |
371 | Evaluating and Mitigating Object Hallucination in Large Vision-Language Models: Can They Still See Removed Objects? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a more challenging benchmark for evaluating object hallucinations by removing objects from images and then asking the model whether it can still see the removed objects. |
Yixiao He; Haifeng Sun; Pengfei Ren; Jingyu Wang; Huazheng Wang; Qi Qi; Zirui Zhuang; Jing Wang; |
372 | Understanding LLMs’ Fluid Intelligence Deficiency: An Analysis of The ARC Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the challenges LLMs face in demonstrating fluid intelligence through controlled experiments, using the most representative ARC task as an example. |
Junjie Wu; Mo Yu; Lemao Liu; Dit-Yan Yeung; Jie Zhou; |
373 | KS-Lottery: Finding Certified Lottery Tickets for Multilingual Transfer in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose KS-Lottery, a method to identify a small subset of LLM parameters highly effective in multilingual fine-tuning. |
Fei Yuan; Chang Ma; Shuai Yuan; Qiushi Sun; Lei Li; |
374 | CONSTRUCTA: Automating Commercial Construction Schedules in Fabrication Facilities with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose CONSTRUCTA, a novel framework leveraging LLMs to optimize construction schedules in complex projects like semiconductor fabrication. |
Yifan Zhang; Xue Yang; |
375 | Taxi1500: A Dataset for Multilingual Text Classification in 1500 Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One reason is the lack of evaluation datasets that cover a diverse range of languages, particularly those that are low-resource or endangered. To address this gap, we present a large-scale text classification dataset encompassing 1504 languages many of which have otherwise limited or no annotated data. |
Chunlan Ma; Ayyoob Imani; Haotian Ye; Renhao Pei; Ehsaneddin Asgari; Hinrich Schuetze; |
376 | STAR: Spectral Truncation and Rescale for Model Merging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose **S**pectral **T**runcation **A**nd **R**escale (STAR) that aims at mitigating “merging conflicts” by truncating small components in the respective spectral spaces, which is followed by an automatic parameter rescaling scheme to retain the nuclear norm of the original matrix. |
Yu-Ang Lee; Ching-Yun Ko; Tejaswini Pedapati; I-Hsin Chung; Mi-Yen Yeh; Pin-Yu Chen; |
377 | STRUX: An LLM for Decision-Making with Structured Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new LLM decision-making framework called STRUX, which enhances LLM decision-making by providing structured explanations. |
Yiming Lu; Yebowen Hu; Hassan Foroosh; Wei Jin; Fei Liu; |
378 | Diversify-verify-adapt: Efficient and Robust Retrieval-Augmented Ambiguous Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although the iterative RAG approach has been proposed to address this problem, it comes at the cost of significantly reduced efficiency. To address these issues, we propose the diversify-verify-adapt (DIVA) framework. |
Yeonjun In; Sungchul Kim; Ryan A. Rossi; Mehrab Tanjim; Tong Yu; Ritwik Sinha; Chanyoung Park; |
379 | JAWAHER: A Multidialectal Dataset of Arabic Proverbs for LLM Benchmarking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This reveals a persistent cultural gap in LLMs, which complicates their ability to accurately process culturally rich and diverse figurative language, such as proverbs. To address this, we introduce *Jawaher*, a benchmark designed to assess LLMs’ capacity to comprehend and interpret Arabic proverbs. |
Samar Mohamed Magdy; Sang Yun Kwon; Fakhraddin Alwajih; Safaa Taher Abdelfadil; Shady Shehata; Muhammad Abdul-Mageed; |
380 | CoRAC: Integrating Selective API Document Retrieval with Question Semantic Intent for Code Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a knowledge-based framework, CoRAC, an automatic code question responder that enhances understanding through selective API document retrieval and question semantic intent clustering. |
YunSeok Choi; CheolWon Na; Jee-Hyong Lee; |
381 | Learning Vs Retrieval: The Role of In-Context Examples in Regression with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a framework for evaluating in-context learning mechanisms, which we claim are a combination of retrieving internal knowledge and learning from in-context examples by focusing on regression tasks. |
Aliakbar Nafar; K. Brent Venable; Parisa Kordjamshidi; |
382 | GraphLSS: Integrating Lexical, Structural, and Semantic Features for Long Document Extractive Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present GraphLSS, a heterogeneous graph construction for long document extractive summarization, incorporating Lexical, Structural, and Semantic features. |
Margarita Bugueño; Hazem Abou Hamdan; Gerard De Melo; |
383 | ReachAgent: Enhancing Mobile Agent Via Page Reaching and Operation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing agents tend to focus on most task-relevant elements at each step, leading to local optimal solutions and ignoring the overall GUI flow. To address this issue, we constructed a training dataset called MobileReach, which breaks the task into page reaching and operation subtasks. |
Qinzhuo Wu; Wei Liu; Jian Luan; Bin Wang; |
384 | UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce UnDIAL (Unlearning via Self-Distillation on Adjusted Logits), a novel and robust unlearning method. |
Yijiang River Dong; Hongzhou Lin; Mikhail Belkin; Ramon Huerta; Ivan Vulić; |
385 | Prototypical Extreme Multi-label Classification with A Dynamic Margin Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose PRIME, a XMC method that employs a novel prototypical contrastive learning technique to reconcile efficiency and performance surpassing brute-force approaches. |
Kunal Dahiya; Diego Ortego; David Jimenez-Cabello; |
386 | Can LLMs Convert Graphs to Text-Attributed Graphs? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While promising, this method relies heavily on the availability of text-attributed graph data, which is difficult to obtain in practice. To bridge this gap, we propose a novel method named Topology-Aware Node description Synthesis (TANS), leveraging large language models (LLMs) to convert existing graphs into text-attributed graphs. |
Zehong Wang; Sidney Liu; Zheyuan Zhang; Tianyi Ma; Chuxu Zhang; Yanfang Ye; |
387 | FinLLM-B: When Large Language Models Meet Financial Breakout Trading Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The reason is that the unique data and specific knowledge are required in breakout detection. To address these issues, we created the first financial breakout dataset and introduce FinLLM-B, the premier large language model for financial breakout detection, which enhances the effectiveness of breakout trading strategies. |
Kang Zhang; Osamu Yoshie; Lichao Sun; Weiran Huang; |
388 | Forest for The Trees: Overarching Prompting Evokes High-Level Reasoning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Overarching Prompting (OaP), a simple prompting method that elicits the high-level thinking of LLMs. |
Haoran Liao; Shaohua Hu; Zhihao Zhu; Hao He; Yaohui Jin; |
389 | Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we present Language model Ensemble with Monte Carlo Tree Search (LE-MCTS), a novel framework for process-level ensembling of language models. |
Sungjin Park; Xiao Liu; Yeyun Gong; Edward Choi; |
390 | Beyond Literal Token Overlap: Token Alignability for Multilinguality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose subword token alignability as a new way to understand the impact and quality of multilingual tokenisation. |
Katharina Hämmerl; Tomasz Limisiewicz; Jindřich Libovický; Alexander Fraser; |
391 | Generating Complex Question Decompositions in The Face of Distribution Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One way of improving LLM training and fine-tuning is to leverage synthetic training data, but the superior performance of supervised approaches collapses in the face of distribution shifts, making them unsuitable for generating synthetic data across new domains and at scale. To address this, we propose an approach to generate synthetic decomposition data with only five annotated examples; we do this by (i) extending recent advancements in using LLM-as-judge and for reranking in novel ways, as well as (ii) using a panel of smaller-sized LLMs for data generation instead of resource-intensive larger models. |
Kelvin Han; Claire Gardent; |
392 | SAPIENT: Mastering Multi-turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods train Reinforcement Learning (RL)-based agent with greedy action selection or sampling strategy, and may suffer from suboptimal conversational planning. To address this, we present a novel Monte Carlo Tree Search (MCTS)-based CRS framework SAPIENT. |
Hanwen Du; Bo Peng; Xia Ning; |
393 | Label Drop for Multi-Aspect Relation Modeling in Universal Information Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While multiple-target instruction UIE allows for the extraction of multiple relations simultaneously, the inclusion of irrelevant relations introduces decision complexity and impacts extraction accuracy. Therefore, for multi-relation extraction, we propose LDNet, which incorporates multi-aspect relation modeling and a label drop mechanism. |
Lu Yang; Jiajia Li; En Ci; Lefei Zhang; Zuchao Li; Ping Wang; |
394 | Don’t Touch My Diacritics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The common practice of preprocessing text before feeding it into NLP models introduces many decision points which have unintended consequences on model performance. In this opinion piece, we focus on the handling of diacritics in texts originating in many languages and scripts. |
Kyle Gorman; Yuval Pinter; |
395 | PlagBench: Exploring The Duality of Large Language Models in Plagiarism Generation and Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Also, how LLMs can facilitate the detection of LLM-generated plagiarism remains largely unexplored. To address these gaps, we introduce PlagBench, a dataset of 46. |
Jooyoung Lee; Toshini Agrawal; Adaku Uchendu; Thai Le; Jinghui Chen; Dongwon Lee; |
396 | Evaluating The Performance of RAG Methods for Conversational AI in The Airport Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we built three different Retrieval-Augmented Generation (RAG) methods, including traditional RAG, SQL RAG, and Knowledge Graph-based RAG (Graph RAG). |
Yuyang Li; Pjm Kerbusch; Rhr Pruim; Tobias Käfer; |
397 | Efficient and Effective Prompt Tuning Via Prompt Decomposition and Compressed Outer Product Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Achieving high efficiency and performance remains an ongoing challenge. To address these issues, we propose a novel Low-parameters Prompt Tuning (LAMP) method, which leverages prompt decomposition and compressed outer product. |
Pengxiang Lan; Haoyu Xu; Enneng Yang; Yuliang Liang; Guibing Guo; Jianzhe Zhao; Xingwei Wang; |
398 | Interpret and Control Dense Retrieval with Sparse Latent Features Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a novel approach using sparse autoencoders (SAE) to interpret and control dense embeddings via the learned latent sparse features. |
Hao Kang; Tevin Wang; Chenyan Xiong; |
399 | Defense Against Prompt Injection Attacks Via Mixture of Encodings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its efficacy, this method can degrade LLM performance on certain NLP tasks. To address this challenge, we propose a novel defense mechanism: mixture of encodings, which utilizes multiple character encodings, including Base64. |
Ruiyi Zhang; David Sullivan; Kyle Jackson; Pengtao Xie; Mei Chen; |
400 | Aligning Sentence Simplification with ESL Learner’s Proficiency for Language Acquisition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study goes a step further and aims to facilitate ESL learners’ language acquisition by simplification. |
Guanlin Li; Yuki Arase; Noel Crespi; |
401 | Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Language Model Council (LMC), where a group of LLMs collaborate to create tests, respond to them, and evaluate each other’s responses to produce a ranking in a democratic fashion. |
Justin Zhao; Flor Miriam Plaza-del-Arco; Amanda Cercas Curry; |
402 | Capturing Human Cognitive Styles with Language: Towards An Experimental Evaluation Paradigm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we introduce an experiment-based framework for evaluating language-based cognitive style models against human behavior. |
Vasudha Varadarajan; Syeda Mahwish; Xiaoran Liu; Julia Buffolino; Christian Luhmann; Ryan L. Boyd; H. Schwartz; |
403 | ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Parity LLM Data Valuation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to offer a third-party data valuation approach that benefits both data providers and model developers. |
Yanzhou Pan; Huawei Lin; Yide Ran; Jiamin Chen; Xiaodong Yu; Weijie Zhao; Denghui Zhang; Zhaozhuo Xu; |
404 | MixRevDetect: Towards Detecting AI-Generated Content in Hybrid Peer Reviews Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods fail to detect finer-grained AI-generated points within mixed-authorship reviews. To address this gap, we propose MixRevDetect, a novel method to identify AI-generated points in peer reviews. |
Sandeep Kumar; Samarth Garg; Sagnik Sengupta; Tirthankar Ghosal; Asif Ekbal; |
405 | A Fair Comparison Without Translationese: English Vs. Target-language Instructions for Multilingual LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior studies suggested that English instructions are more effective than target-language instructions even for non-English tasks; however, these studies often use datasets and instructions translated from English, which introduce biases known as translationese, hindering an unbiased comparison. To address this issue, we conduct a fair comparison between English and target-language instructions by eliminating translationese effects. |
Taisei Enomoto; Hwichan Kim; Zhousi Chen; Mamoru Komachi; |
406 | Related Knowledge Perturbation Matters: Rethinking Multiple Pieces of Knowledge Editing in Same-Subject Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View |
Zenghao Duan; Wenbin Duan; Zhiyi Yin; Yinghan Shen; Shaoling Jing; Jie Zhang; Huawei Shen; Xueqi Cheng; |
407 | Text2Sql: Pure Fine-Tuning and Pure Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This approach has two issues: 1) the model’s context is limited when dealing with a large number of database tables; 2) the question is often related to only a few tables, leading to excessive irrelevant information that distracts the model. To address these issues, we employed pure fine-tuning strategy to reduce redundancy. |
Gao yu Zhu; Wei Shao; Xichou Zhu; Lei Yu; Jiafeng Guo; Xueqi Cheng; |
408 | ScratchEval: Are GPT-4o Smarter Than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these benchmarks are limited to specific visual programming scenarios where the logic reasoning and the multimodal understanding capacities are split apart. To fill this gap, we propose ScratchEval, a novel benchmark designed to evaluate the visual programming reasoning ability of LMMs. |
Rao Fu; Ziyang Luo; Hongzhan Lin; Zhen Ye; Jing Ma; |
409 | NLI Under The Microscope: What Atomic Hypothesis Decomposition Reveals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We use atomic decomposition of hypotheses in two natural language reasoning tasks, traditional NLI and defeasible NLI, to form atomic sub-problems, or granular inferences that models must weigh when solving the overall problem. |
Neha Srikanth; Rachel Rudinger; |
410 | Towards A Perspectivist Turn in Argument Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: One crucial reason seems to be the yet unexplored availability of suitable datasets. We fill this gap by conducting a systematic review of argument quality datasets. |
Julia Romberg; Maximilian Maurer; Henning Wachsmuth; Gabriella Lapesa; |
411 | Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the high inference costs associated with these models have not received adequate attention, particularly when compared to the focus on training costs in existing research. In response to this gap, our study conducts a comprehensive benchmarking of LLM inference energy across a wide range of NLP tasks, where we analyze the impact of different models, tasks, prompts, and system-related factors on inference energy. |
Soham Poddar; Paramita Koley; Janardan Misra; Niloy Ganguly; Saptarshi Ghosh; |
412 | MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose Adaptive MultiScale-Headed Attention (Ada-MSHA), adaptively selecting and mixing attention heads, which are treated as contextualization experts. |
Langlin Huang; Mengyu Bu; Yang Feng; |
413 | Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, the lack of reference explanations means we cannot easily evaluate the reasoning of model decisions, a crucial component of supporting doctors in making complex medical decisions. To address these challenges, we construct two new datasets: JAMA Clinical Challenge and Medbullets. |
Hanjie Chen; Zhouxiang Fang; Yash Singla; Mark Dredze; |
414 | Mutual-pairing Data Augmentation for Fewshot Continual Relation Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, we explore Sharpness-Aware Minimization (SAM) in Few-shot Continual Learning. Our extensive experiments uncover fascinating behaviors of SAM across tasks and offer valuable insights for future research in this dynamic field. |
Nguyen Hoang Anh; Quyen Tran; Thanh Xuan Nguyen; Nguyen Thi Ngoc Diep; Linh Ngo Van; Thien Huu Nguyen; Trung Le; |
415 | Personalized Help for Optimizing Low-Skilled Users’ Strategy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We augment Cicero, a natural language agent that demonstrates superhuman performance in Diplomacy, to generate both move and message advice based on player intentions. |
Feng Gu; Wichayaporn Wongkamjan; Jordan Lee Boyd-Graber; Jonathan K. Kummerfeld; Denis Peskoff; Jonathan May; |
416 | Self-Generated Critiques Boost Reward Modeling for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We hypothesize that predicting both critiques and the scalar reward would improve reward modeling ability. Motivated by this, we propose Critic-RM, a framework that improves reward models using self-generated critiques without extra supervision. |
Yue Yu; Zhengxing Chen; Aston Zhang; Liang Tan; Chenguang Zhu; Richard Yuanzhe Pang; Yundi Qian; Xuewei Wang; Suchin Gururangan; Chao Zhang; Melanie Kambadur; Dhruv Mahajan; Rui Hou; |
417 | Logit Separability-Driven Samples and Multiple Class-Related Words Selection for Advancing In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, we find that incorporating multiple class-related words for each sample, rather than relying on a single class name, improves performance by offering a broader range of label information. Building on these insights, we propose LICL, a logit separability-based method that jointly organizes samples and integrates multiple class-related words into each sample-label pair. |
Zixiao Zhu; Zijian Feng; Hanzhang Zhou; Junlang Qian; Kezhi Mao; |
418 | AutoEval-ToD: Automated Evaluation of Task-oriented Dialog Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose, AutoEval-TOD, an automated end-to-end evaluation framework using large language models (LLMs). |
Arihant Jain; Purav Aggarwal; Rishav Sahay; Chaosheng Dong; Anoop Saladi; |
419 | VIT-Pro: Visual Instruction Tuning for Product Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a cost-efficient approach for collecting training data to train a generative VLM for e-commerce product images. |
Vishnu Prabhakaran; Purav Aggarwal; Vishruit Kulshreshtha; Arunita Das; Sahini Venkata Sitaram Sruti; Anoop Saladi; |
420 | Octopus: On-device Language Model for Function Calling of Software APIs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study presents a framework to train a series of on-device LLMs optimized for invoking software APIs. |
Wei Chen; Zhiyuan Li; Mingyuan Ma; |
421 | Elevating Legal LLM Responses: Harnessing Trainable Logical Structures and Semantic Knowledge with Legal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the Logical-Semantic Integration Model (LSIM), a novel supervised framework that bridges semantic and logical coherence. |
Rujing Yao; Yang Wu; Chenghao Wang; Jingwei Xiong; Fang Wang; Xiaozhong Liu; |
422 | Rethinking The Role of LLMs for Document-level Relation Extraction: A Refiner with Task Distribution and Probability Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, another noteworthy challenge and discovery we reveal: the small language models (SLMs) for DocRE tend to classify existing relations as ”no relation” (NA), while LLMs tend to predict existing relations for all entity pairs. To address these challenges, we propose a novel method that utilizes LLMs as a refiner, employing task distribution and probability fusion. |
Fu Zhang; Xinlong Jin; Jingwei Cheng; Hongsen Yu; Huangming Xu; |
423 | Are LLM-Judges Robust to Expressions of Uncertainty? Investigating The Effect of Epistemic Markers on LLM-based Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, evaluation in the presence of epistemic markers has been largely overlooked, raising a critical question: Could the use of epistemic markers in LLM-generated outputs lead to unintended negative consequences? To address this, we present EMBER, a benchmark designed to assess the robustness of LLM-judges to epistemic markers in both single and pairwise evaluation settings. |
Dongryeol Lee; Yerin Hwang; Yongil Kim; Joonsuk Park; Kyomin Jung; |
424 | ALTER: Augmentation for Large-Table-Based Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View |
Han Zhang; Yuheng Ma; Hanfang Yang; |
425 | MHumanEval – A Multilingual Benchmark to Evaluate Large Language Models for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent works have addressed test coverage and programming language (PL) diversity, code generation from low-resource language prompts remains largely unexplored. To address this gap, we introduce mHumanEval, an extended benchmark supporting prompts in over 200 natural languages. |
Md Nishat Raihan; Antonios Anastasopoulos; Marcos Zampieri; |
426 | Bayelemabaga: Creating Resources for Bambara NLP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Bayelemabaga, the most extensive curated multilingual dataset for machine translation in the Bambara language, the vehicular language of Mali. |
Allahsera Auguste Tapo; Kevin Assogba; Christopher M Homan; M. Mustafa Rafique; Marcos Zampieri; |
427 | Local Prompt Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Local Prompt Optimization (LPO) that integrates with any general automatic prompt engineering method. |
Yash Jain; Vishal Chowdhary; |
428 | LLaSA: Large Language and Structured Data Assistant Related Papers Related Patents Related Grants Related Venues Related Experts View |
Yao Xu; Shizhu He; Jiabei Chen; ZengXiangrong ZengXiangrong; Bingning Wang; Guang Liu; Jun Zhao; Kang Liu; |
429 | MoDification: Mixture of Depths Made Easy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, however, we discover that MoD can barely transform existing LLMs without costly training over an extensive number of tokens. |
Chen Zhang; Meizhi Zhong; Qimeng Wang; Xuantao Lu; Zheyu Ye; Chengqiang Lu; Yan Gao; Yao Hu; Kehai Chen; Min Zhang; Dawei Song; |
430 | Efficient Continual Pre-training of LLMs for Low-resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal is to drastically reduce CPT cost. |
Arijit Nag; Soumen Chakrabarti; Animesh Mukherjee; Niloy Ganguly; |
431 | See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we analyze model behavior under dominant modality bias and theoretically show that unaligned gradients or differences in gradient magnitudes prevent balanced convergence of the loss. |
Junehyoung Kwon; MiHyeon Kim; Eunju Lee; Juhwan Choi; YoungBin Kim; |
432 | TaeBench: Improving Quality of Toxic Adversarial Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an annotation pipeline for quality control of generated toxic adversarial examples (TAE). |
Jennifer Zhu; Dmitriy Bespalov; Liwen You; Ninad Kulkarni; Yanjun Qi; |
433 | DSRAG: A Double-Stream Retrieval-Augmented Generation Framework for Countless Intent Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the single retrieval route sometimes fails to recall target intents and causes incorrect results. To alleviate the above challenges, we introduce the DSRAG framework combining query-to-query (Q2Q) and query-to-metadata (Q2M) double-stream RAG approaches. |
Pei Guo; Enjie Liu; Ruichao Zhong; Mochi Gao; Yunzhi Tan; Bo Hu; Zang Li; |
434 | Coverage-based Fairness in Multi-document Summarization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new summary-level fairness measure, Equal Coverage, which is based on coverage of documents with different social attribute values and considers the redundancy within documents. |
Haoyuan Li; Yusen Zhang; Rui Zhang; Snigdha Chaturvedi; |
435 | An Interpretable and Crosslingual Method for Evaluating Second-Language Dialogues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We found the evaluation framework to be robust across languages, revealing language-specific and language-universal relationships between micro-level and macro-level features. |
Rena Wei Gao; Xuetong Wu; Carsten Roever; Jing Wu; Long Lv; Jingxuan Wu; Jey Han Lau; |
436 | WHoW: A Cross-domain Approach for Analysing Conversation Moderation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose WHoW, an evaluation framework for analyzing the facilitation strategies of moderators across different domains/scenarios by examining their motives (Why), dialogue acts (How) and target speaker (Who). |
Ming-Bin Chen; Lea Frermann; Jey Han Lau; |
437 | WebQuality: A Large-scale Multi-modal Web Page Quality Assessment Dataset with Multiple Scoring Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The assessment of web page quality plays a critical role in a range of downstream applications, yet there is a notable absence of datasets for the evaluation of web page quality. This research presents the pioneering task of web page quality assessment and introduces the first comprehensive, multi-modal Chinese dataset named WebQuality specifically designed for this task. |
Tao Zhang; Yige Wang; ZhuHangyu ZhuHangyu; Li Xin; Chen Xiang; Tian Hua Zhou; Jin Ma; |
438 | IMRRF: Integrating Multi-Source Retrieval and Redundancy Filtering for LLM-based Fake News Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, the retrieved evidence frequently contains substantial redundant information, which can interfere with the LLMs’ judgment. To address these limitations, we propose a Multiple Knowledge Sources Retrieval and LLM Knowledge Conversion framework, which enriches the evidence available for claim verification. |
Dayang Li; Fanxiao Li; Bingbing Song; Li Tang; Wei Zhou; |
439 | Fighting Spurious Correlations in Text Classification Via A Causal Learning Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This issue limits the robustness and generalization of models, especially when faced with out-of-distribution data where such spurious correlations no longer hold. To address this challenge, we propose the Causally Calibrated Robust Classifier (CCR), which aims to reduce models’ reliance on spurious correlations and improve model robustness. |
Yuqing Zhou; Ziwei Zhu; |
440 | Characterizing The Role of Similarity in The Property Inferences of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate how LMs perform property inheritance with behavioral and causal representational analysis experiments. |
Juan Diego Rodriguez; Aaron Mueller; Kanishka Misra; |
441 | Watching The AI Watchdogs: A Fairness and Robustness Analysis of AI Safety Moderation Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we thus examine the fairness and robustness of four widely-used, closed-source ASM classifiers: OpenAI Moderation API, Perspective API, Google Cloud Natural Language (GCNL) API, and Clarifai API. |
Akshit Achara; Anshuman Chhabra; |
442 | CPRM: A LLM-based Continual Pre-training Framework for Relevance Modeling in Commercial Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, structured item text remains underutilized, and there is a shortage in the supply of corresponding queries and background knowledge. We thereby propose CPRM (Continual Pre-training for Relevance Modeling), a framework designed for the continual pre-training of LLMs to address these issues. |
Kaixin Wu; Yixin Ji; Zeyuan Chen; Qiang Wang; Cunxiang Wang; Hong Liu; Baijun Ji; Xu Jia; Zhongyi Liu; Jinjie Gu; Yuan Zhou; Linjian Mo; |
443 | PAPILLON: Privacy Preservation from Internet-based and Local Language Model Ensembles Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Toward preserving user privacy while retaining the best quality, we propose Privacy-Conscious Delegation, a novel task for chaining API-based and local models. |
Siyan Li; Vethavikashini Chithrra Raghuram; Omar Khattab; Julia Hirschberg; Zhou Yu; |
444 | The Impact of Visual Information in Chinese Characters: Evaluating Large Models’ Ability to Recognize and Utilize Radicals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we establish a benchmark to evaluate LLMs’ and VLMs’ understanding of visual elements in Chinese characters, including radicals, composition structures, strokes, and stroke counts. |
Xiaofeng Wu; Karl Stratos; Wei Xu; |
445 | CodexGraph: Bridging Large Language Models and Code Repositories Via Code Graph Databases Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Similarity-based retrieval often has low recall in complex tasks, while manual tools and APIs are typically task-specific and require expert knowledge, reducing their generalizability across diverse code tasks and real-world applications. To mitigate these limitations, we introduce CodexGraph, a system that integrates LLM agents with graph database interfaces extracted from code repositories. |
Xiangyan Liu; Bo Lan; Zhiyuan Hu; Yang Liu; Zhicheng Zhang; Fei Wang; Michael Qizhe Shieh; Wenmeng Zhou; |
446 | A Unified Supervised and Unsupervised Dialogue Topic Segmentation Framework Based on Utterance Pair Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this paper proposes UPS (Utterance Pair Segment), a dialogue topic segmentation method based on utterance pair relationship modeling, unifying the supervised and unsupervised network architectures. |
Shihao Yang; Ziyi Zhang; Yue Jiang; Chunsheng Qin; Shuhua Liu; |
447 | MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces MedEthicEval, a novel benchmark designed to systematically evaluate LLMs in the domain of medical ethics. |
Haoan Jin; Jiacheng Shi; Hanhui Xu; Kenny Q. Zhu; Mengyue Wu; |
448 | ConMeC: A Dataset for Metonymy Resolution with Common Nouns Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We argue that NLP systems should be capable of identifying the metonymic use of common nouns in context. We create a new metonymy dataset ConMeC, which consists of 6,000 sentences, where each sentence is paired with a target common noun and annotated by humans to indicate whether that common noun is used metonymically or not in that context. We also introduce a chain-of-thought based prompting method for detecting metonymy using large language models (LLMs). |
Saptarshi Ghosh; Tianyu Jiang; |
449 | ITALIC: An Italian Culture-Aware Natural Language Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ITALIC, a large-scale benchmark dataset of 10,000 multiple-choice questions designed to evaluate the natural language understanding of the Italian language and culture. |
Andrea Seveso; Daniele Potertì; Edoardo Federici; Mario Mezzanzanica; Fabio Mercorio; |
450 | Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Surprisingly, we find that simple object-based visual prompting—overlaying visual cues (e. g. , bounding box, circle) on images—can significantly mitigate such hallucination; however, different visual prompts (VPs) vary in effectiveness. To address this, we propose Black-Box Visual Prompt Engineering (BBVPE), a framework to identify optimal VPs that enhance LVLM responses without needing access to model internals. |
Sangmin Woo; Kang Zhou; Yun Zhou; Shuai Wang; Sheng Guan; Haibo Ding; Lin Lee Cheong; |
451 | MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a simple efficient technique to combine the best of both worlds. |
Nandan Thakur; Suleman Kazi; Ge Luo; Jimmy Lin; Amin Ahmad; |
452 | Active Few-Shot Learning for Text Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This problem arises due to the heavy reliance on a limited number of support samples, which hampers consistent performance improvement even when more support samples are added. To address this challenge, we propose an active learning-based instance selection mechanism that identifies effective support instances from the unlabeled pool and can work with different LLMs. |
Saeed Ahmadnia; Arash Yousefi Jordehi; Mahsa Hosseini Khasheh Heyran; Seyed Abolghasem Mirroshandel; Owen Rambow; Cornelia Caragea; |
453 | Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the increasing capabilities of LLMs, past research has not explored their capabilities in detecting ADRs related to psychiatric medications or in providing effective harm reduction strategies. To address this, we introduce the **Psych-ADR** benchmark and the **A**dverse **D**rug Reaction **R**esponse **A**ssessment (**ADRA**) framework to systematically evaluate LLM performance in detecting ADR expressions and delivering expert-aligned mitigation strategies. |
Mohit Chandra; Siddharth Sriraman; Gaurav Verma; Harneet Singh Khanuja; Jose Suarez Campayo; Zihang Li; Michael L. Birnbaum; Munmun De Choudhury; |
454 | VividMed: Vision Language Model with Versatile Visual Grounding for Medicine Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The lack of medical data further compounds these obstacles. To address these challenges, we present VividMed, a vision language model with versatile visual grounding for medicine. |
Lingxiao Luo; Bingda Tang; Xuanzhong Chen; Rong Han; Ting Chen; |
455 | JRE-L: Journalist, Reader, and Editor LLMs in The Loop for Science Journalism for The General Audience Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose JRE-L, a framework that integrates three LLMs mimicking the writing-reading-feedback-revision loop. |
Gongyao Jiang; Xinran Shi; Qiong Luo; |
456 | MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: State-of-the-art MoE merging methods only work with homogeneous model architectures and rely on simple unweighted averaging to merge expert layers, which does not address parameter interference and requires extensive fine-tuning of the merged MoE to restore performance. To address these limitations, this paper introduces new MoE merging techniques, including strategies to mitigate parameter interference, routing heuristics to reduce the need for MoE fine-tuning, and a novel method for merging experts with different architectures. |
Yuhang Zhou; Giannis Karamanolakis; Victor Soto; Anna Rumshisky; Mayank Kulkarni; Furong Huang; Wei Ai; Jianhua Lu; |
457 | InfoPO: On Mutual Information Maximization for Large Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite these benefits, these methods rely on explicit assumptions about the Bradley-Terry (BT) model, which makes them prone to overfitting and results in suboptimal performance, particularly on reasoning-heavy tasks. To address these challenges, we propose a principled preference fine-tuning algorithm called InfoPO, which effectively and efficiently aligns large language models using preference data. |
Teng Xiao; Zhen Ge; Sujay Sanghavi; Tian Wang; Julian Katz-Samuels; Marc Versage; Qingjun Cui; Trishul Chilimbi; |
458 | From Allies to Adversaries: Manipulating LLM Tool-Calling Through Adversarial Injection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this integration also introduces new security vulnerabilities, particularly in the tool scheduling mechanisms of LLM, which have not been extensively studied. To fill this gap, we present ToolCommander, a novel framework designed to exploit vulnerabilities in LLM tool-calling systems through adversarial tool injection. |
Rupeng Zhang; Haowei Wang; Junjie Wang; Mingyang Li; Yuekai Huang; Dandan Wang; Qing Wang; |
459 | PA-RAG: RAG Alignment Via Multi-Perspective Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Consequently, optimizing the RAG generator from multiple preference perspectives while maintaining its end-to-end LLM form remains a challenge. To bridge this gap, we propose Multiple Perspective Preference Alignment for Retrieval-Augmented Generation (PA-RAG), a method for optimizing the generator of RAG systems to align with RAG requirements comprehensively. |
Jiayi Wu; Hengyi Cai; Lingyong Yan; Hao Sun; Xiang Li; Shuaiqiang Wang; Dawei Yin; Ming Gao; |
460 | Handling Missing Entities in Zero-Shot Named Entity Recognition: Integrated Recall and Retrieval Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A key challenge is handling missing entities while ensuring accurate type recognition, hindered by: 1) the pre-training assumption that each entity has a single type, overlooking diversity, and 2) insufficient contextual knowledge for type reasoning. To address this, we propose IRRA (Integrated Recall and Retrieval Augmentation), a novel two-stage framework leveraging large language model techniques. |
Ruichu Cai; Junhao Lu; Zhongjie Chen; Boyan Xu; Zhifeng Hao; |
461 | DETQUS: Decomposition-Enhanced Transformers for QUery-focused Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional transformer-based approaches face challenges due to token limitations and the complexity of reasoning over large tables. To address these challenges, we introduce DETQUS (Decomposition-Enhanced Transformers for QUery-focused Summarization), a system designed to improve summarization accuracy by leveraging tabular decomposition alongside a fine-tuned encoder-decoder model. |
Yasir Khan; Xinlei Wu; Sangpil Youm; Justin Ho; Aryaan Mehboob Shaikh; Jairo Garciga; Rohan Sharma; Bonnie J Dorr; |
462 | Understanding LLM Development Through Longitudinal Study: Insights from The Open Ko-LLM Leaderboard Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By extending the analysis duration, we aim to provide a more comprehensive understanding of the progression in developing Korean large language models (LLMs). |
Chanjun Park; Hyeonwoo Kim; |
463 | Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, the benchmark suite is largely composed of translated versions of their English counterparts, which may not fully capture the intricacies of the Korean language. To address these issues, we propose Open Ko-LLM Leaderboard2, an improved version of the earlier Open Ko-LLM Leaderboard. |
Hyeonwoo Kim; Dahyun Kim; Jihoo Kim; Sukyung Lee; Yungi Kim; Chanjun Park; |
464 | Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we systematically explore the abilities of open LLMs with less than ten billion parameters to handle multilingual machine translation (MT) tasks. |
Menglong Cui; Pengzhi Gao; Wei Liu; Jian Luan; Bin Wang; |
465 | Soft Syntactic Reinforcement for Neural Event Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While syntactic information is crucial for EE, there is a need for effective methods to incorporate syntactic knowledge into PLMs. To address this gap, we present a novel method to incorporate syntactic information into PLM-based models for EE, which do not require external syntactic parsers to produce syntactic features of task data. |
Anran Hao; Jian Su; Shuo Sun; Teo Yong Sen; |
466 | Grounding Fallacies Misrepresenting Scientific Publications in Evidence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These publications only superficially seem to support the false claim, when logical fallacies are applied. In this work, we aim to detect and to highlight such fallacies, which requires assessing the exact content of the misrepresented publications. |
Max Glockner; Yufang Hou; Preslav Nakov; Iryna Gurevych; |
467 | The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We identify language-selective units within 18 popular LLMs, using the same localization approach that is used in neuroscience. |
Badr AlKhamissi; Greta Tuckute; Antoine Bosselut; Martin Schrimpf; |
468 | LLM-guided Plan and Retrieval: A Strategic Alignment for Interpretable User Satisfaction Estimation in Dialogue Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods for USE face challenges due to limited understanding of underlying reasons for user dissatisfaction and the high costs of annotating user intentions. To address these challenges, we propose PRAISE (Plan and Retrieval Alignment for Interpretable Satisfaction Estimation), an interpretable framework for effective user satisfaction prediction. |
Sangyeop Kim; Sohhyung Park; Jaewon Jung; Jinseok Kim; Sungzoon Cho; |
469 | Mitigating Heterogeneity Among Factor Tensors Via Lie Group Manifolds for Tensor Decomposition Based Temporal Knowledge Graph Embedding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we found that inherent heterogeneity among factor tensors in tensor decomposition significantly hinders the tensor fusion process and further limits the performance of link prediction. To overcome this limitation, we introduce a novel method that maps factor tensors onto a unified smooth Lie group manifold to make the distribution of factor tensors approximating homogeneous in tensor decomposition. |
Jiang Li; Xiangdong Su; Guanglai Gao; |
470 | DPL: Diverse Preference Learning Without A Reference Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Diverse Preference Learning (DPL), a reference model-free method that simultaneously learns a baseline desirability in LLM responses while being robust to the diversity of preference annotations. |
Abhijnan Nath; Andrey Volozin; Saumajit Saha; Albert Aristotle Nanda; Galina Grunin; Rahul Bhotika; Nikhil Krishnaswamy; |
471 | FLIQA-AD: A Fusion Model with Large Language Model for Better Diagnose and MMSE Prediction of Alzheimer’s Disease Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes a multi-task Fusion Language Image Question Answering model (FLIQA-AD) to perform AD identification and Mini Mental State Examination (MMSE) prediction. |
Junhao Chen; Zhiyuan Ding; Yan Liu; Xiangzhu Zeng; Ling Wang; |
472 | Towards Rationality in Language and Multimodal Agents: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work discusses how to build more rational language and multimodal agents and what criteria define rationality in intelligent systems. |
Bowen Jiang; Yangxinyu Xie; Xiaomeng Wang; Yuan Yuan; Zhuoqun Hao; Xinyi Bai; Weijie J Su; Camillo Jose Taylor; Tanwi Mallick; |
473 | Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work addresses the challenge of transitioning pre-trained NMT models from absolute Sinusoidal PEs to Relative PEs, such as RoPE and ALiBi, without compromising performance. |
Varun Gumma; Pranjal A Chitale; Kalika Bali; |
474 | Cascading Large Language Models for Salient Event Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents CALLMSAE, a CAscading Large Language Model framework for SAlient Event graph generation, which leverages the capabilities of LLMs and eliminates the need for costly human annotations. |
Xingwei Tan; Yuxiang Zhou; Gabriele Pergola; Yulan He; |
475 | EmoDynamiX: Emotional Support Dialogue Strategy Prediction By Modelling MiXed Emotions and Discourse Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, implicit strategy planning lacks transparency, and recent studies show that LLMs’ inherent preference bias towards certain socio-emotional strategies hinders the delivery of high-quality emotional support. To address this challenge, we propose decoupling strategy prediction from language generation, and introduce a novel dialogue strategy prediction framework, EmoDynamiX, which models the discourse dynamics between user fine-grained emotions and system strategies using a heterogeneous graph for better performance and transparency. |
Chenwei Wan; Matthieu Labeau; Chloé Clavel; |
476 | A Mixed-Language Multi-Document News Summarization Dataset and A Graphs-Based Extract-Generate Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the lack of datasets for MLMD news summarization has constrained the development of research in this area. To fill this gap, we construct a mixed-language multi-document news summarization dataset (MLMD-news), which contains four different languages and 10,992 source document cluster and target summary pairs. |
Shengxiang Gao; Fang Nan; Yongbing Zhang; Yuxin Huang; Kaiwen Tan; Zhengtao Yu; |
477 | LLMs Vs Established Text Augmentation Techniques for Classification: When Do The Benefits Outweight The Costs? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The generative large language models (LLMs) are increasingly being used for data augmentation tasks, where text samples are LLM-paraphrased and then used for classifier fine-tuning. |
Jan Cegin; Jakub Simko; Peter Brusilovsky; |
478 | LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these studies do not fully capture the complexity of real-world software development, which often requires the use of rapidly-evolving public libraries. To address this gap, we introduce LibEvolutionEval, a comprehensive study that emphasizes the need to understand library evolution to perform accurate in-line code completions. |
Sachit Kuhar; Wasi Uddin Ahmad; Zijian Wang; Nihal Jain; Haifeng Qian; Baishakhi Ray; Murali Krishna Ramanathan; Xiaofei Ma; Anoop Deoras; |
479 | Can Post-Training Quantization Benefit from An Additional QLoRA Integration? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Model compression techniques such as quantization are often leveraged to alleviate resource demand, but they may have a negative impact on the generation quality. In this study, we explore the integration of 4-bit Post-training Quantization (PTQ) with QLoRA to address these issues. |
Xiliang Zhu; Elena Khasanova; Cheng Chen; |
480 | Improving Data Annotation for Low-Resource Relation Extraction with Logical Rule-Augmented Collaborative Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the precision and generalization of abstract logic, in this paper, we propose distilling logical rules to uniformly represent task knowledge sourced from distinct origins and facilitate deductive reasoning. |
Xiyang Liu; Chunming Hu; Richong Zhang; Junfan Chen; Baowen Xu; |
481 | MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current benchmarks primarily focus on single-chart tasks, neglecting the multi-hop reasoning required to extract and integrate information from multiple charts, which is essential in practical applications. To fill this gap, we introduce MultiChartQA, a benchmark that evaluates MLLMs’ capabilities in four key areas: direct question answering, parallel question answering, comparative reasoning, and sequential reasoning. |
Zifeng Zhu; Mengzhao Jia; Zhihan Zhang; Lang Li; Meng Jiang; |
482 | MedCodER: A Generative AI Assistant for Medical Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce MedCodER, an emerging Generative AI framework for automatic medical coding that leverages extraction, retrieval, and re-ranking techniques as core components. |
Krishanu Das Baksi; Elijah Soba; John J Higgins; Ravi Saini; Jaden Wood; Jane Cook; Jack I Scott; Nirmala Pudota; Tim Weninger; Edward Bowen; Sanmitra Bhattacharya; |
483 | Exploring Safety-Utility Trade-Offs in Personalized Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As large language models (LLMs) become increasingly integrated into daily applications, it is essential to ensure they function fairly across diverse user demographics. In this work, we show that LLMs suffer from personalization bias, where their performance is impacted when they are personalized to a user’s identity. |
Anvesh Rao Vijjini; Somnath Basu Roy Chowdhury; Snigdha Chaturvedi; |
484 | RAP: A Metric for Balancing Repetition and Performance in Open-Source Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Repetition-Aware Performance (RAP), a novel evaluation metric that quantifies and integrates repetition penalty into the assessment of model performance, enabling tuning of RPP. |
Donghao Huang; Thanh-Son Nguyen; Fiona Liausvia; Zhaoxia Wang; |
485 | A Logical Fallacy-Informed Framework for Argument Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: An important factor contributing to LLMs’ suboptimal performance in generating coherent arguments is their oversight of logical fallacies. To address this issue, we introduce fallacy-informed preference optimization (FIPO) that helps steer LLMs toward generating logically sound arguments. |
Luca Mouchel; Debjit Paul; Shaobo Cui; Robert West; Antoine Bosselut; Boi Faltings; |
486 | MoDS: Moderating A Mixture of Document Speakers to Summarize Debatable Queries in Document Collections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce **Debatable QFS (DQFS)**, a task to create summaries that answer debatable queries via documents with opposing perspectives; summaries must *comprehensively cover* all sources and *balance perspectives*, favoring no side. |
Nishant Balepur; Alexa Siu; Nedim Lipka; Franck Dernoncourt; Tong Sun; Jordan Lee Boyd-Graber; Puneet Mathur; |
487 | AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As the integration of large language models into daily life is on the rise, there is still a lack of dataset for *advising on subjective and personal dilemmas*. To address this gap, we introduce AdvisorQA, which aims to improve LLMs’ capability to offer advice for deeply subjective concerns, utilizing the LifeProTips Reddit forum. |
Minbeom Kim; Hwanhee Lee; Joonsuk Park; Hwaran Lee; Kyomin Jung; |
488 | HISTOIRESMORALES: A French Dataset for Assessing Moral Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite significant progress in languages like English and Chinese, French has seen little attention in this area, leaving a gap in understanding how LLMs handle moral reasoning in this language. To address this gap, we introduce HistoiresMorales, a French dataset derived from MoralStories, created through translation and subsequently refined with the assistance of native speakers to guarantee grammatical accuracy and adaptation to the French cultural context. |
Thibaud Leteno; Irina Proskurina; Antoine Gourru; Julien Velcin; Charlotte Laclau; Guillaume Metzler; Christophe Gravier; |
489 | Reliability of Topic Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that the standard practice for quantifying topic model reliability fails to capture essential aspects of the variation in two widely-used topic models. |
Kayla Schroeder; Zach Wood-Doughty; |
490 | DREAM: Improving Video-Text Retrieval Through Relevance-Based Augmentation Using Large Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further enrich video and text information, we propose a relevance-based augmentation method, where LLMs and VGMs generate and integrate new relevant information into the original data. |
Yimu Wang; Shuai Yuan; Bo Xue; Xiangru Jian; Wei Pang; Mushi Wang; Ning Yu; |
491 | Option Symbol Matters: Investigating and Mitigating Multiple-Choice Option Symbol Bias of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we reveal that current LLMs’ performance in MCQA could be heavily influenced by the choice of option symbol sets, due to the option symbol bias. |
Zhen Yang; Ping Jian; Chengzhi Li; |
492 | CogLM: Tracking Cognitive Development of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we construct a benchmark CogLM (Cognitive Ability Evaluation for Language Model) based on PTC to assess the cognitive levels of LLMs. |
Xinglin Wang; Peiwen Yuan; Shaoxiong Feng; Yiwei Li; Boyuan Pan; Heda Wang; Yao Hu; Kan Li; |
493 | MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current large language models (LLMs) like GPT-4 struggle with professional MCQG due to outdated knowledge, hallucination issues, and prompt sensitivity, resulting in unsatisfactory quality and difficulty. To address these challenges, we propose MCQG-SRefine, an LLM self-refine-based (Critique and Correction) framework for converting medical cases into high-quality USMLE-style questions. |
Zonghai Yao; Aditya Parashar; Huixue Zhou; Won Seok Jang; Feiyun Ouyang; Zhichao Yang; Hong Yu; |
494 | ReGLA: Refining Gated Linear Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we embarked on a comprehensive exploration of three key components that substantially impact the performance of the Gated Linear Attention module: feature maps, normalization, and the gating mechanism. |
Peng Lu; Ivan Kobyzev; Mehdi Rezagholizadeh; Boxing Chen; Philippe Langlais; |
495 | MiCEval: Unveiling Multimodal Chain of Thought’s Quality Via Image Description and Reasoning Steps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its popularity, there is a notable absence of automated methods for evaluating the quality of reasoning steps in MCoT. To address this gap, we propose **Multimodal Chain-of-Thought Evaluation (MiCEval)**, a framework designed to assess the correctness of reasoning chains by evaluating the quality of both the description and each reasoning step. |
Xiongtao Zhou; Jie He; Lanyu Chen; Jingyu Li; Haojing Chen; Victor Gutierrez Basulto; Jeff Z. Pan; Hanjie Chen; |
496 | ChaI-TeA: A Benchmark for Evaluating Autocompletion of Interactions with LLM-based Chatbots Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present **ChaI-TeA**: **Cha**t **I**n**te**raction **A**utocomplete; An autocomplete evaluation framework for LLM-based chatbot interactions. |
Shani Goren; Oren Kalinsky; Tomer Stav; Yuri Rapoport; Yaron Fairstein; Ram Yazdi; Nachshon Cohen; Alexander Libov; Guy Kushilevitz; |
497 | Exploiting Edited Large Language Models As General Scientific Optimizers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to LLM’s **high sensitivity to the prompts** and **tendency to get lost in lengthy prompts**, these methods struggle to effectively utilize the observational feedback from each optimization step, which severely hinders the applications for real-world scenarios. To address these challenges, we propose a conceptually simple and general bi-level optimization method, namely **G**eneral **S**cientific **O**ptimizers (GSO). |
Qitan Lv; Tianyu Liu; Hong Wang; |
498 | Soft Prompting for Unlearning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by corresponding data protection guidelines, we investigate machine unlearning for LLMs. |
Karuna Bhaila; Minh-Hao Van; Xintao Wu; |
499 | Prototype Conditioned Generative Replay for Continual Learning in NLP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This leads to issues of semantic inconsistency and scale inconsistency. To tackle these challenges, we propose a Prototype Conditioned Generative Replay (PCGR) method, which enhances generative reply by incorporating task-level statistics through a Prototype Conditioned Variational Autoencoder (PCVAE). |
Xi Chen; Min Zeng; |
500 | Stealthy Jailbreak Attacks on Large Language Models Via Benign Data Mirroring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an improved transfer attack method that guides malicious prompt construction by locally training a mirror model of the target black-box model through benign data distillation. |
Honglin Mu; Han He; Yuxin Zhou; Yunlong Feng; Yang Xu; Libo Qin; Xiaoming Shi; Zeming Liu; Xudong Han; Qi Shi; Qingfu Zhu; Wanxiang Che; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~800 papers), please visit Paper Digest: NAACL-2025 (Full List).