Paper Digest: Recent Papers on Machine Translation
Paper Digest Team extracted all recent Machine Translation related papers on our radar, and generated highlight sentences for them. The results are then sorted by relevance & date. In addition to this ‘static’ page, we also provide a real-time version of this article, which has more coverage and is updated in real time to include the most recent updates on this topic.
This curated list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that gets you the personalized and comprehensive updates on the latest research in your field. It also empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Experience the full potential of our services today!
TABLE 1: Paper Digest: Recent Papers on Machine Translation
Paper | Author(s) | Source | Date | |
---|---|---|---|---|
1 | Using Sign Language Production As Data Augmentation to Enhance Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose leveraging recent advancements in Sign Language Production to augment existing sign language datasets and enhance the performance of Sign Language Translation models. |
Harry Walsh; Maksym Ivashechkin; Richard Bowden; | arxiv-cs.CL | 2025-06-11 |
2 | TACTIC: Translation Agents with Cognitive-Theoretic Interactive Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These insights emphasize how human translators employ different cognitive strategies, such as balancing literal and free translation, refining expressions based on context, and iteratively evaluating outputs. To address this limitation, we propose a cognitively informed multi-agent framework called TACTIC, which stands for T ranslation A gents with Cognitive- T heoretic Interactive Collaboration. |
WEIYA LI et. al. | arxiv-cs.CL | 2025-06-09 |
3 | ConECT Dataset: Overcoming Data Scarcity in Context-Aware E-Commerce MT Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Neural Machine Translation (NMT) has improved translation by using Transformer-based models, but it still struggles with word ambiguity and context. |
Mikołaj Pokrywka; Wojciech Kusa; Mieszko Rutkowski; Mikołaj Koszowski; | arxiv-cs.CL | 2025-06-05 |
4 | Design of Intelligent Proofreading System for English Translation Based on CNN and BERT Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel hybrid approach for robust proofreading that combines convolutional neural networks (CNN) with Bidirectional Encoder Representations from Transformers (BERT). |
Feijun Liu; Huifeng Wang; Kun Wang; Yizhen Wang; | arxiv-cs.CL | 2025-06-05 |
5 | It’s Not A Walk in The Park! Challenges of Idiom Translation in Speech-to-text Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we systematically evaluate idiom translation as compared to conventional news translation in both text-to-text machine translation (MT) and speech-to-text translation (SLT) systems across two language pairs (German to English, Russian to English). |
IULIIA ZAITOVA et. al. | arxiv-cs.CL | 2025-06-03 |
6 | Different Speech Translation Models Encode and Translate Speaker Gender Differently Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: If so, what are the implications for the speaker’s gender assignment in translation? We address these questions from an interpretability perspective, using probing methods to assess gender encoding across diverse ST models. |
DENNIS FUCCI et. al. | arxiv-cs.CL | 2025-06-02 |
7 | How Programming Concepts and Neurons Are Shared in Code Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the relationship between multiple PLs and English in the concept space of LLMs. |
Amir Hossein Kargaran; Yihong Liu; François Yvon; Hinrich Schütze; | arxiv-cs.CL | 2025-06-01 |
8 | Translate With Care: Addressing Gender Bias, Neutrality, and Reasoning in Large Language Model Translations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Translate-with-Care (TWC) dataset, comprising 3,950 challenging scenarios across six low- to mid-resource languages, to assess translation systems’ performance. |
Pardis Sadat Zahraei; Ali Emami; | arxiv-cs.CL | 2025-05-31 |
9 | CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate whether images can act as cultural context in multimodal translation. |
EMILIO VILLA-CUEVA et. al. | arxiv-cs.CL | 2025-05-30 |
10 | VietMix: A Naturally Occurring Vietnamese-English Code-Mixed Corpus with Iterative Augmentation for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Machine translation systems fail when processing code-mixed inputs for low-resource languages. We address this challenge by curating VietMix, a parallel corpus of naturally occurring code-mixed Vietnamese text paired with expert English translations. |
HIEU TRAN et. al. | arxiv-cs.CL | 2025-05-30 |
11 | BeaverTalk: Oregon State University’s IWSLT 2025 Simultaneous Speech Translation System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper discusses the construction, fine-tuning, and deployment of BeaverTalk, a cascaded system for speech-to-text translation as part of the IWSLT 2025 simultaneous translation task. |
Matthew Raffel; Victor Agostinelli; Lizhong Chen; | arxiv-cs.CL | 2025-05-29 |
12 | TAT-R1: Terminology-Aware Translation with Reinforcement Learning and Word Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose \textbf{TAT-R1}, a terminology-aware translation model trained with reinforcement learning and word alignment. |
Zheng Li; Mao Zheng; Mingyang Song; Wenjie Yang; | arxiv-cs.CL | 2025-05-27 |
13 | GMU Systems for The IWSLT 2025 Low-Resource Speech Translation Shared Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the GMU systems for the IWSLT 2025 low-resource speech translation shared task. |
Chutong Meng; Antonios Anastasopoulos; | arxiv-cs.CL | 2025-05-27 |
14 | KIT’s Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents KIT’s submissions to the IWSLT 2025 low-resource track. |
ZHAOLIN LI et. al. | arxiv-cs.CL | 2025-05-26 |
15 | SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Sequential Policy Optimization for Simultaneous Machine Translation (SeqPO-SiMT), a new policy optimization framework that defines the simultaneous machine translation (SiMT) task as a sequential decision making problem, incorporating a tailored reward to enhance translation quality while reducing latency. |
Ting Xu; Zhichao Huang; Jiankai Sun; Shanbo Cheng; Wai Lam; | arxiv-cs.CL | 2025-05-26 |
16 | Evaluating Machine Translation Models for English-Hindi Language Pairs: A Comparative Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The study aims to provide insights into the effectiveness of different machine translation approaches in handling both general and specialized language domains. |
Ahan Prasannakumar Shetty; | arxiv-cs.CL | 2025-05-26 |
17 | How Well Do Large Reasoning Models Translate? A Comprehensive Evaluation for Multi-Domain Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we compare the performance of LRMs with traditional LLMs across 15 representative domains and four translation directions. |
Yongshi Ye; Biao Fu; Chongxuan Huang; Yidong Chen; Xiaodong Shi; | arxiv-cs.CL | 2025-05-26 |
18 | MT$^{3}$: Scaling MLLM-based Text Image Machine Translation Via Multi-Task Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent advances in large-scale Reinforcement Learning (RL) have improved reasoning in Large Language Models (LLMs) and Multimodal LLMs (MLLMs), but their application to end-to-end TIMT is still underexplored. To bridge this gap, we introduce MT$^{3}$, the first framework to apply Multi-Task RL to MLLMs for end-to-end TIMT. |
ZHAOPENG FENG et. al. | arxiv-cs.CL | 2025-05-26 |
19 | TULUN: Transparent and Adaptable Low-resource Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While researchers have proposed various techniques for domain adaptation, these approaches typically require model fine-tuning, making them impractical for non-technical users and small organizations. To address this gap, we propose Tulun, a versatile solution for terminology-aware translation, combining neural MT with large language model (LLM)-based post-editing guided by existing glossaries and translation memories. |
RAPHAËL MERX et. al. | arxiv-cs.CL | 2025-05-24 |
20 | Building A Functional Machine Translation Corpus for Kpelle Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the first publicly available English-Kpelle dataset for machine translation, comprising over 2000 sentence pairs drawn from everyday communication, religious texts, and educational materials. |
Kweku Andoh Yamoah; Jackson Weako; Emmanuel J. Dorley; | arxiv-cs.CL | 2025-05-24 |
21 | Low-Resource NMT: A Case Study on The Written and Spoken Languages in Hong Kong Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes a transformer-based neural machine translation (NMT) system for written-Chinese-to-written-Cantonese translation. |
Hei Yi Mak; Tan Lee; | arxiv-cs.CL | 2025-05-23 |
22 | Mutarjim: Advancing Bidirectional Arabic-English Translation with A Small Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Mutarjim, a compact yet powerful language model for bidirectional Arabic-English translation. |
KHALIL HENNARA et. al. | arxiv-cs.CL | 2025-05-23 |
23 | Comparative Analysis of Subword Tokenization Approaches for Indian Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper examines how different subword tokenization techniques, such as SentencePiece, Byte Pair Encoding (BPE), and WordPiece Tokenization, affect ILs. |
Sudhansu Bala Das; Samujjal Choudhury; Tapas Kumar Mishra; Bidyut Kr. Patra; | arxiv-cs.CL | 2025-05-22 |
24 | SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most advanced MT-specific LLMs heavily rely on external supervision signals during training, such as human-annotated reference data or trained reward models (RMs), which are often expensive to obtain and challenging to scale. To overcome this limitation, we propose a Simple Self-Rewarding (SSR) Reinforcement Learning (RL) framework for MT that is reference-free, fully online, and relies solely on self-judging rewards. |
Wenjie Yang; Mao Zheng; Mingyang Song; Zheng Li; | arxiv-cs.CL | 2025-05-22 |
25 | Exploring In-Image Machine Translation with Real-World Background Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issue, we propose the DebackX model, which separates the background and text-image from the source image, performs translation on text-image directly, and fuses the translated text-image with the background, to generate the target image. |
Yanzhi Tian; Zeming Liu; Zhengyang Liu; Yuhang Guo; | arxiv-cs.CL | 2025-05-21 |
26 | FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present FuxiMT, a novel Chinese-centric multilingual machine translation model powered by a sparsified large language model (LLM). |
Shaolin Zhu; Tianyu Dong; Bo Li; Deyi Xiong; | arxiv-cs.CL | 2025-05-20 |
27 | TransBench: Benchmarking Machine Translation for Industrial-Scale Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing evaluation frameworks inadequately assess performance in specialized contexts, creating a gap between academic benchmarks and real-world efficacy. To address this, we propose a three-level translation capability framework: (1) Basic Linguistic Competence, (2) Domain-Specific Proficiency, and (3) Cultural Adaptation, emphasizing the need for holistic evaluation across these dimensions. |
HAIJUN LI et. al. | arxiv-cs.CL | 2025-05-20 |
28 | SlangDIT: Benchmarking LLMs in Interpretative Slang Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the benchmark, we propose a deep thinking model, named SlangOWL. |
Yunlong Liang; Fandong Meng; Jiaan Wang; Jie Zhou; | arxiv-cs.CL | 2025-05-20 |
29 | ExTrans: Multilingual Deep Reasoning Translation Via Exemplar-Enhanced Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With a carefully designed lightweight reward modeling in RL, we can simply transfer the strong MT ability from a single direction into multiple (i.e., 90) translation directions and achieve impressive multilingual MT performance. |
Jiaan Wang; Fandong Meng; Jie Zhou; | arxiv-cs.CL | 2025-05-19 |
30 | Multi-head Temporal Latent Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes Multi-head Temporal Latent Attention (MTLA), which further reduces the KV cache size along the temporal dimension, greatly lowering the memory footprint of self-attention inference. |
Keqi Deng; Philip C. Woodland; | arxiv-cs.LG | 2025-05-18 |
31 | LLM-Based Evaluation of Low-Resource Machine Translation: A Reference-less Dialect Guided Approach with A Refined Sylheti-English Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a comprehensive framework that enhances LLM-based MT evaluation using a dialect guided approach. |
Md. Atiqur Rahman; Sabrina Islam; Mushfiqul Haque Omi; | arxiv-cs.CL | 2025-05-18 |
32 | Multilingual Machine Translation with Quantum Encoder Decoder Attention-based Convolutional Variational Circuits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Cloud-based multilingual translation services like Google Translate and Microsoft Translator achieve state-of-the-art translation capabilities. These services inherently use large multilingual language models such as GRU, LSTM, BERT, GPT, T5, or similar encoder-decoder architectures with attention mechanisms as the backbone. |
Subrit Dikshit; Ritu Tiwari; Priyank Jain; | arxiv-cs.CL | 2025-05-14 |
33 | Privacy-Preserving Real-Time Vietnamese-English Translation on IOS Using Edge AI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This research addresses the growing need for privacy-preserving and accessible language translation by developing a fully offline Neural Machine Translation (NMT) system for Vietnamese-English translation on iOS devices. |
Cong Le; | arxiv-cs.SE | 2025-05-12 |
34 | Do Not Change Me: On Transferring Entities Without Modification in Neural Machine Translation — A Multilingual Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the abilities of popular NMT models, including models from the OPUS project, Google Translate, MADLAD, and EuroLLM, to preserve entities such as URL addresses, IBAN numbers, or emails when producing translations between four languages: English, German, Polish, and Ukrainian. |
Dawid Wisniewski; Mikolaj Pokrywka; Zofia Rostek; | arxiv-cs.CL | 2025-05-09 |
35 | TopicVD: A Topic-Based Dataset of Video-Guided Multimodal Machine Translation for Documentaries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, they fail to meet the demands of real-world MMT tasks, such as documentary translation. In this study, we developed TopicVD, a topic-based dataset for video-supported multimodal machine translation of documentaries, aiming to advance research in this field. |
Jinze Lv; Jian Chen; Zi Long; Xianghua Fu; Yin Chen; | arxiv-cs.CL | 2025-05-08 |
36 | Data Augmentation With Back Translation for Low Resource Languages: A Case of English and Luganda Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper,we explore the application of Back translation (BT) as a semi-supervised technique to enhance Neural Machine Translation(NMT) models for the English-Luganda language pair, specifically addressing the challenges faced by low-resource languages. |
Richard Kimera; Dongnyeong Heo; Daniela N. Rim; Heeyoul Choi; | arxiv-cs.CL | 2025-05-05 |
37 | AI-Assisted Human Evaluation of Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View |
Vilém Zouhar; Tom Kocmi; Mrinmaya Sachan; | naacl | 2025-05-04 |
38 | The Impact of Domain-Specific Terminology on Machine Translation for Finance in European Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present the first impact analysis of domain-specific terminology on multilingual MT for finance, focusing on European languages within the subdomain of macroeconomics. |
Arturo Oncevay; Charese Smiley; Xiaomo Liu; | naacl | 2025-05-04 |
39 | Detect, Disambiguate, and Translate: On-Demand Visual Reasoning for Multimodal Machine Translation with Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These systems face three key issues: they often overlook that visual signals are unnecessary in many cases, they lack transparency in how visual information is used for disambiguation when needed, and they have yet to fully explore the potential of large-scale vision-language models (LVLMs) for MMT tasks. To address these issues, we propose the Detect, Disambiguate, and Translate (DeDiT) framework, the first reasoning-based framework for MMT leveraging LVLMs. |
DANYANG LIU et. al. | naacl | 2025-05-04 |
40 | SwissADT: An Audio Description Translation System for Swiss Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce SwissADT, an **emerging** ADT system for three main Swiss languages and English, designed for future use by our industry partners. |
Lukas Fischer; Yingqiang Gao; Alexa Lintner; Annette Rios; Sarah Ebling; | naacl | 2025-05-04 |
41 | Automatic Input Rewriting Improves Translation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an empirical study of 21 input rewriting methods with 3 open-weight LLMs for translating from English into 6 target languages. |
Dayeon Ki; Marine Carpuat; | naacl | 2025-05-04 |
42 | FLEURS-ASL: Including American Sign Language in Massively Multilingual Multitask Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to help converge the fields, we introduce FLEURS-ASL, an extension of the multiway parallel benchmarks FLORES (for text) and FLEURS (for speech) to support their first sign language (as video), American Sign Language, translated by 5 Certified Deaf Interpreters. |
Garrett Tanzer; | naacl | 2025-05-04 |
43 | Characterizing The Effects of Translation on Intertextuality Using Multilingual Embedding Spaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate the use of multilingual embedding spaces to characterize the preservation of intertextuality, one common rhetorical device, across human and machine translation. |
Hope McGovern; Hale Sirin; Tom Lippincott; | naacl | 2025-05-04 |
44 | MHumanEval – A Multilingual Benchmark to Evaluate Large Language Models for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent works have addressed test coverage and programming language (PL) diversity, code generation from low-resource language prompts remains largely unexplored. To address this gap, we introduce mHumanEval, an extended benchmark supporting prompts in over 200 natural languages. |
Md Nishat Raihan; Antonios Anastasopoulos; Marcos Zampieri; | naacl | 2025-05-04 |
45 | Large Language Models for Persian-English Idiom Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces two parallel datasets of sentences containing idiomatic expressions for Persian→English and English→Persian translations, with Persian idioms sampled from our PersianIdioms resource, a collection of 2,200 idioms and their meanings, with 700 including usage examples. |
Sara Rezaeimanesh; Faezeh Hosseini; Yadollah Yaghoobzadeh; | naacl | 2025-05-04 |
46 | Same Evaluation, More Tokens: On The Effect of Input Length for Machine Translation Evaluation Using Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent work has shown that large language models (LLMs) can serve as reliable and interpretable sentence-level translation evaluators via MQM error span annotations. |
Tobias Domhan; Dawei Zhu; | arxiv-cs.CL | 2025-05-03 |
47 | Team ACK at SemEval-2025 Task 2: Beyond Word-for-Word Machine Translation for English-Korean Pairs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate 13 models (LLMs and MT models) using automatic metrics and human assessment by bilingual annotators. |
DANIEL LEE et. al. | arxiv-cs.CL | 2025-04-29 |
48 | To MT or Not to MT: An Eye-tracking Study on The Reception By Dutch Readers of Different Translation and Creativity Levels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article presents the results of a pilot study involving the reception of a fictional short story translated from English into Dutch under four conditions: machine translation (MT), post-editing (PE), human translation (HT) and original source text (ST). |
Kyo Gerrits; Ana Guerberof-Arenas; | arxiv-cs.CL | 2025-04-28 |
49 | Optimising ChatGPT for Creativity in Literary Translation: A Case Study from English Into Dutch, Chinese, Catalan and Spanish Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study examines the variability of Chat-GPT machine translation (MT) outputs across six different configurations in four languages,with a focus on creativity in a literary text. |
Shuxiang Du; Ana Guerberof Arenas; Antonio Toral; Kyo Gerrits; Josep Marco Borillo; | arxiv-cs.CL | 2025-04-25 |
50 | Low-Resource Neural Machine Translation Using Recurrent Neural Networks and Transfer Learning: A Case Study on English-to-Igbo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we develop Neural Machine Translation (NMT) and Transformer-based transfer learning models for English-to-Igbo translation – a low-resource African language spoken by over 40 million people across Nigeria and West Africa. |
Ocheme Anthony Ekle; Biswarup Das; | arxiv-cs.CL | 2025-04-24 |
51 | Memory Reviving, Continuing Learning and Beyond: Evaluation of Pre-trained Encoders and Decoders for Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct a systematic study on the impact of pre-trained encoders and decoders in multimodal translation models. |
Zhuang Yu; Shiliang Sun; Jing Zhao; Tengfei Song; Hao Yang; | arxiv-cs.CL | 2025-04-24 |
52 | DIMT25@ICDAR2025: HW-TSC’s End-to-End Document Image Machine Translation System Leveraging Large Vision-Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the technical solution proposed by Huawei Translation Service Center (HW-TSC) for the End-to-End Document Image Machine Translation for Complex Layouts competition at the 19th International Conference on Document Analysis and Recognition (DIMT25@ICDAR2025). |
ZHANGLIN WU et. al. | arxiv-cs.CV | 2025-04-24 |
53 | Comparing Large Language Models and Traditional Machine Translation Tools for Translating Medical Consultation Summaries: A Pilot Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study evaluates how well large language models (LLMs) and traditional machine translation (MT) tools translate medical consultation summaries from English into Arabic, Chinese, and Vietnamese. |
Andy Li; Wei Zhou; Rashina Hoda; Chris Bain; Peter Poon; | arxiv-cs.CL | 2025-04-23 |
54 | FairTranslate: An English-French Dataset for Gender Bias Evaluation in Machine Translation By Overcoming Gender Binarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents FairTranslate, a novel, fully human-annotated dataset designed to evaluate non-binary gender biases in machine translation systems from English to French. |
Fanny Jourdan; Yannick Chevalier; Cécile Favre; | arxiv-cs.CL | 2025-04-22 |
55 | CLIRudit: Cross-Lingual Information Retrieval of Scientific Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By making the dataset and code publicly available, we aim to facilitate further research that will help make scientific knowledge more accessible across language barriers. |
Francisco Valentini; Diego Kozlowski; Vincent Larivière; | arxiv-cs.IR | 2025-04-22 |
56 | The Paradox of Poetic Intent in Back-Translation: Evaluating The Quality of Large Language Models in Chinese Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study constructs a diverse corpus encompassing Chinese scientific terminology, historical translation paradoxes, and literary metaphors. |
Li Weigang; Pedro Carvalho Brom; | arxiv-cs.CL | 2025-04-22 |
57 | Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The rise of Large Language Models (LLMs) has reshaped machine translation (MT), but multilingual MT still relies heavily on parallel data for supervised fine-tuning (SFT), facing challenges like data scarcity for low-resource languages and catastrophic forgetting. To address these issues, we propose TRANS-ZERO, a self-play framework that leverages only monolingual data and the intrinsic multilingual knowledge of LLM. |
WEI ZOU et. al. | arxiv-cs.CL | 2025-04-20 |
58 | A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Machine translation of captions has pushed multilingual capabilities in vision-language models (VLMs), but data comes mainly from English speakers, indicating a perceptual bias and lack of model flexibility. In this work, we address this challenge and outline a data-efficient framework to instill multilingual VLMs with greater understanding of perceptual diversity. |
Kyle Buettner; Jacob Emmerson; Adriana Kovashka; | arxiv-cs.CV | 2025-04-19 |
59 | Multilingual Contextualization of Large Language Models for Document-Level Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a method to improve LLM-based long-document translation through targeted fine-tuning on high-quality document-level data, which we curate and introduce as DocBlocks. |
Miguel Moura Ramos; Patrick Fernandes; Sweta Agrawal; André F. T. Martins; | arxiv-cs.CL | 2025-04-16 |
60 | Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Menu OCR and Translation Benchmark (MOTBench), a specialized evaluation framework emphasizing the pivotal role of menu translation in cross-cultural communication. |
ZHANGLIN WU et. al. | arxiv-cs.LG | 2025-04-15 |
61 | Investigating Numerical Translation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study focuses on evaluating the reliability of LLM-based machine translation systems when handling numerical data. |
W. Tang; | icassp | 2025-04-15 |
62 | AskQE: Question Answering As Automatic Evaluation for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce AskQE, a question generation and answering framework designed to detect critical MT errors and provide actionable feedback, helping users decide whether to accept or reject MT outputs even without the knowledge of the target language. |
Dayeon Ki; Kevin Duh; Marine Carpuat; | arxiv-cs.CL | 2025-04-15 |
63 | Textless Streaming Speech-to-Speech Translation Using Semantic Speech Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a transducer-based speech translation model that outputs discrete speech tokens in a low-latency streaming fashion. |
J. Zhao; | icassp | 2025-04-15 |
64 | Non-Autoregressive Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the non-autoregressive language model (NA-LM) for multimodal machine translation. |
G. LIU et. al. | icassp | 2025-04-15 |
65 | Automated Python Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To that end, we introduce the task of automatically translating Python’s natural modality (keywords, error types, identifiers, etc.) into other human languages. |
Joshua Otten; Antonios Anastasopoulos; Kevin Moran; | arxiv-cs.CL | 2025-04-15 |
66 | MT-R1-Zero: Advancing LLM-based Machine Translation Via R1-Zero-like Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce MT-R1-Zero, the first open-source adaptation of the R1-Zero RL framework for MT without supervised fine-tuning or cold-start. |
ZHAOPENG FENG et. al. | arxiv-cs.CL | 2025-04-14 |
67 | Can You Map It to English? The Role of Cross-Lingual Alignment in Multilingual Performance of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For this purpose, we introduce cross-lingual alignment metrics such as the Discriminative Alignment Index (DALI) to quantify the alignment at an instance level for discriminative tasks. |
Kartik Ravisankar; Hyojung Han; Marine Carpuat; | arxiv-cs.CL | 2025-04-12 |
68 | Enhancing Contrastive Demonstration Selection with Semantic Diversity for Robust In-Context Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DiverseConE (Diversity-Enhanced Contrastive Example Selection), a novel approach for demonstration selection in in-context learning for machine translation. |
Owen Patterson; Chee Ng; | arxiv-cs.CL | 2025-04-12 |
69 | Redefining Machine Translation on Social Network Services with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces RedTrans, a 72B LLM tailored for SNS translation, trained on a novel dataset developed through three innovations: (1) Supervised Finetuning with Dual-LLM Back-Translation Sampling, an unsupervised sampling method using LLM-based back-translation to select diverse data for large-scale finetuning; (2) Rewritten Preference Optimization (RePO), an algorithm that identifies and corrects erroneous preference pairs through expert annotation, building reliable preference corpora; and (3) RedTrans-Bench, the first benchmark for SNS translation, evaluating phenomena like humor localization, emoji semantics, and meme adaptation. |
HONGCHENG GUO et. al. | arxiv-cs.CL | 2025-04-10 |
70 | High-Resource Translation:Turning Abundance Into Accessibility Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel approach to constructing an English-to-Telugu translation model by leveraging transfer learning techniques and addressing the challenges associated with low-resource languages. |
Abhiram Reddy Yanampally; | arxiv-cs.CL | 2025-04-08 |
71 | GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing evaluation frameworks are disproportionately focused on English and a handful of high-resource languages, thereby overlooking the realistic performance of LLMs in multilingual and lower-resource scenarios. To address this gap, we introduce GlotEval, a lightweight framework designed for massively multilingual evaluation. |
HENGYU LUO et. al. | arxiv-cs.CL | 2025-04-05 |
72 | MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present the first systematic study on medical ST, to our best knowledge, by releasing MultiMed-ST, a large-scale ST dataset for the medical domain, spanning all translation directions in five languages: Vietnamese, English, German, French, Traditional Chinese and Simplified Chinese, together with the models. |
KHAI LE-DUC et. al. | arxiv-cs.CL | 2025-04-04 |
73 | Limitations of Religious Data and The Importance of The Target Domain: Towards Machine Translation for Guinea-Bissau Creole Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new dataset for machine translation of Guinea-Bissau Creole (Kiriol), comprising around 40 thousand parallel sentences to English and Portuguese. |
Jacqueline Rowe; Edward Gow-Smith; Mark Hepple; | arxiv-cs.CL | 2025-04-03 |
74 | State-of-the-Art Translation of Text-to-Gloss Using MBART : A Case Study of Bangla Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the results, this study proposes a new paradigm for text-to-gloss task using mBART models. |
Sharif Md. Abdullah; Abhijit Paul; Shebuti Rayana; Ahmedul Kabir; Zarif Masud; | arxiv-cs.CL | 2025-04-03 |
75 | Overcoming Vocabulary Constraints with Pixel-level Fallback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to augment pretrained language models with a vocabulary-free encoder that generates input embeddings from text rendered as pixels. |
Jonas F. Lotz; Hendra Setiawan; Stephan Peitz; Yova Kementchedjhieva; | arxiv-cs.CL | 2025-04-02 |
76 | You Cannot Feed Two Birds with One Score: The Accuracy-Naturalness Tradeoff in Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, researchers in the machine translation community usually assess translations using a single score intended to capture semantic accuracy and the naturalness of the output simultaneously. In this paper, we build on recent advances in information theory to mathematically prove and empirically demonstrate that such single-score summaries do not and cannot give the complete picture of a system’s true performance. |
Gergely Flamich; David Vilar; Jan-Thorsten Peter; Markus Freitag; | arxiv-cs.CL | 2025-03-31 |
77 | VNJPTranslate: A Comprehensive Pipeline for Vietnamese-Japanese Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce VNJPTranslate, a pipeline designed to systematically address the Vi-Ja translation task. |
Hoang Hai Phan; Nguyen Duc Minh Vu; Nam Dang Phuong; | arxiv-cs.CL | 2025-03-31 |
78 | Is LLM The Silver Bullet to Low-Resource Languages Machine Translation? Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Low-Resource Languages (LRLs) present significant challenges in natural language processing due to their limited linguistic resources and underrepresentation in standard datasets. … |
YEWEI SONG et. al. | arxiv-cs.CL | 2025-03-31 |
79 | Beyond Vanilla Fine-Tuning: Leveraging Multistage, Multilingual, and Domain-Specific Methods for Low-Resource Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, conventional single-stage fine-tuning methods struggle in extremely low-resource NMT settings, where training data is very limited. This paper contributes to artificial intelligence by proposing two approaches for adapting msLLMs in these challenging scenarios: (1) continual pre-training (CPT), where the msLLM is further trained with domain-specific monolingual data to compensate for the under-representation of LRLs, and (2) intermediate task transfer learning (ITTL), a method that fine-tunes the msLLM with both in-domain and out-of-domain parallel data to enhance its translation capabilities across various domains and tasks. |
Sarubi Thillainathan; Songchen Yuan; En-Shiun Annie Lee; Sanath Jayasena; Surangika Ranathunga; | arxiv-cs.CL | 2025-03-28 |
80 | Low-resource Machine Translation for Code-switched Kazakh-Russian Language Pair Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method to build a machine translation model for code-switched Kazakh-Russian language pair with no labeled data. |
Maksim Borisov; Zhanibek Kozhirbayev; Valentin Malykh; | arxiv-cs.CL | 2025-03-25 |
81 | HausaNLP at SemEval-2025 Task 2: Entity-Aware Fine-tuning Vs. Prompt Engineering in Entity-Aware Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents our findings for SemEval 2025 Task 2, a shared task on entity-aware machine translation (EA-MT). |
ABDULHAMID ABUBAKAR et. al. | arxiv-cs.CL | 2025-03-25 |
82 | Automatically Generating Chinese Homophone Words to Probe Machine Translation Estimation Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, whether these models are robust to the challenge of preserving emotional nuances has been left largely unexplored. To address this gap, we introduce a novel method inspired by information theory which generates challenging Chinese homophone words related to emotions, by leveraging the concept of self-information. |
Shenbin Qian; Constantin Orăsan; Diptesh Kanojia; Félix do Carmo; | arxiv-cs.CL | 2025-03-20 |
83 | Scaling Laws for Downstream Task Performance in Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the scaling behavior in a transfer learning setting, where LLMs are finetuned for machine translation tasks. |
BERIVAN ISIK et. al. | iclr | 2025-03-17 |
84 | LLMs for Translation: Historical, Low-Resourced Languages and Contemporary AI Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large Language Models (LLMs) have demonstrated remarkable adaptability in performing various tasks, including machine translation (MT), without explicit training. |
Merve Tekgurler; | arxiv-cs.CL | 2025-03-14 |
85 | New Trends for Modern Machine Translation with Large Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify three foundational shifts: 1) contextual coherence, where LRMs resolve ambiguities and preserve discourse structure through explicit reasoning over cross-sentence and complex context or even lack of context; 2) cultural intentionality, enabling models to adapt outputs by inferring speaker intent, audience expectations, and socio-linguistic norms; 3) self-reflection, LRMs can perform self-reflection during the inference time to correct the potential errors in translation especially extremely noisy cases, showing better robustness compared to simply mapping X->Y translation. |
SINUO LIU et. al. | arxiv-cs.CL | 2025-03-13 |
86 | Explicit Learning and The LLM in Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study explores an LLM’s ability to learn new languages using explanations found in a grammar book$\unicode{x2014}$a process we term explicit learning. |
Malik Marmonier; Rachel Bawden; Benoît Sagot; | arxiv-cs.CL | 2025-03-12 |
87 | Word2winners at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper describes our system for SemEval 2025 Task 7: Previously Fact-Checked Claim Retrieval. |
Amirmohammad Azadi; Sina Zamani; Mohammadmostafa Rostamkhani; Sauleh Eetemadi; | arxiv-cs.CL | 2025-03-11 |
88 | Contextual Cues in Machine Translation: Investigating The Potential of Multi-Source Input Strategies in LLMs and NMT Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore the impact of multi-source input strategies on machine translation (MT) quality, comparing GPT-4o, a large language model (LLM), with a traditional multilingual neural machine translation (NMT) system. |
Lia Shahnazaryan; Patrick Simianer; Joern Wuebker; | arxiv-cs.CL | 2025-03-10 |
89 | Assumed Identities: Quantifying Gender Bias in Machine Translation of Gender-Ambiguous Occupational Terms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, we introduce GRAPE, a probability-based metric designed to evaluate gender bias by analyzing aggregated model responses. Alongside this, we present GAMBIT-MT, a benchmarking dataset in English with gender-ambiguous occupational terms. |
Orfeas Menis Mastromichalakis; Giorgos Filandrianos; Maria Symeonaki; Giorgos Stamou; | arxiv-cs.CL | 2025-03-06 |
90 | Comparative Study of Zero-Shot Cross-Lingual Transfer for Bodo POS and NER Tagging Using Gemini 2.0 Flash Thinking Experimental Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article presents a comparative empirical study investigating the effectiveness of Google’s Gemini 2.0 Flash Thinking Experiment model for zero-shot cross-lingual transfer of POS and NER tagging to Bodo. |
SANJIB NARZARY et. al. | arxiv-cs.CL | 2025-03-06 |
91 | SwiLTra-Bench: The Swiss Legal Translation Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this process traditionally relies on professionals who must be both legal experts and skilled translators — creating bottlenecks and impacting effective access to justice. To address this challenge, we introduce SwiLTra-Bench, a comprehensive multilingual benchmark of over 180K aligned Swiss legal translation pairs comprising laws, headnotes, and press releases across all Swiss languages along with English, designed to evaluate LLM-based translation systems. |
JOEL NIKLAUS et. al. | arxiv-cs.CL | 2025-03-03 |
92 | Direct Speech to Speech Translation: A Review Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our review examines the evolution of S2ST, comparing traditional cascade models which rely on automatic speech recognition (ASR), machine translation (MT), and text to speech (TTS) components with newer end to end and direct speech translation (DST) models that bypass intermediate text representations. |
Mohammad Sarim; Saim Shakeel; Laeeba Javed; Mohammad Nadeem; | arxiv-cs.CL | 2025-03-03 |
93 | Parallel Corpora for Machine Translation in Low-resource Indic Languages: A Comprehensive Review Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To the best of our knowledge, this paper presents the first comprehensive review of parallel corpora specifically tailored for low-resource Indic languages in the context of machine translation. |
Rahul Raja; Arpita Vats; | arxiv-cs.CL | 2025-03-02 |
94 | Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide the system description of our submission as part of the English to Lowres Multimodal Translation Task at the Workshop on Asian Translation (WAT2024). |
SHAHARUKH KHAN et. al. | arxiv-cs.CL | 2025-02-27 |
95 | R1-T1: Fully Incentivizing Translation Capability in LLMs Via Reasoning Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces R1-Translator (R1-T1), a novel framework to achieve inference-time reasoning for general MT via reinforcement learning (RL) with human-aligned CoTs comprising six common patterns. |
MINGGUI HE et. al. | arxiv-cs.CL | 2025-02-26 |
96 | Unsupervised Translation of Emergent Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study employs unsupervised neural machine translation (UNMT) techniques to decipher ECs formed during referential games with varying task complexities, influenced by the semantic diversity of the environment. |
IDO LEVY et. al. | aaai | 2025-02-25 |
97 | EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we observe that both direct and pivot translations are noisy and achieve less satisfactory performance. We propose EBBS, an ensemble method with a novel bi-level beam search algorithm, where each ensemble component explores its own prediction step by step at the lower level but all components are synchronized by a "soft voting" mechanism at the upper level. |
Yuqiao Wen; Behzad Shayegh; Chenyang Huang; Yanshuai Cao; Lili Mou; | aaai | 2025-02-25 |
98 | Contextual Effects of Sentiment Deployment in Human and Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper illustrates how the overall sentiment of a text may be shifted in translation and the implications for automated sentiment analyses, particularly those that utilize machine translation and assess findings via semantic similarity metrics. |
Lindy Comstock; Priyanshu Sharma; Mikhail Belov; | arxiv-cs.CL | 2025-02-25 |
99 | Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing evidence shows that LLMs are prompt-sensitive and it is sub-optimal to apply the fixed prompt to any input for downstream machine translation tasks. To address this issue, we propose an adaptive few-shot prompting (AFSP) framework to automatically select suitable translation demonstrations for various source input sentences to further elicit the translation capability of an LLM for better machine translation. |
Lei Tang; Jinghui Qin; Wenxuan Ye; Hao Tan; Zhijing Yang; | aaai | 2025-02-25 |
100 | UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces UrduLLaMA 1.0, a model derived from the open-source Llama-3.1-8B-Instruct architecture and continually pre-trained on 128 million Urdu tokens, capturing the rich diversity of the language. |
Layba Fiaz; Munief Hassan Tahir; Sana Shams; Sarmad Hussain; | arxiv-cs.CL | 2025-02-24 |
101 | Using Machine Learning to Detect Fraudulent SMSs in Chichewa Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a first dataset for SMS fraud detection in Chichewa, a major language in Africa, and reports on experiments with machine learning algorithms for classifying SMSs in Chichewa as fraud or non-fraud. |
Amelia Taylor; Amoss Robert; | arxiv-cs.LG | 2025-02-24 |
102 | MultiSlav: Using Cross-Lingual Knowledge Transfer to Combat The Curse of Multilinguality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we explore multiple approaches for extending the available data-regime in NMT and we prove cross-lingual benefits even in 0-shot translation regime for low-resource languages. |
ARTUR KOT et. al. | arxiv-cs.CL | 2025-02-20 |
103 | English Please: Evaluating Machine Translation with Large Language Models for Multilingual Bug Reports Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we conduct the first comprehensive evaluation of machine translation (MT) performance on bug reports, analyzing the capabilities of DeepL, AWS Translate, and large language models such as ChatGPT, Claude, Gemini, LLaMA, and Mistral using data from the Visual Studio Code GitHub repository, specifically focusing on reports labeled with the english-please tag. |
Avinash Patil; Siru Tao; Aryan Jadon; | arxiv-cs.CL | 2025-02-20 |
104 | Identifying Gender Stereotypes and Biases in Automated Translation from English to Italian Using Similarity Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper is a collaborative effort between Linguistics, Law, and Computer Science to evaluate stereotypes and biases in automated translation systems. |
Fatemeh Mohammadi; Marta Annamaria Tamborini; Paolo Ceravolo; Costanza Nardocci; Samira Maghool; | arxiv-cs.CL | 2025-02-17 |
105 | GLoT: A Novel Gated-Logarithmic Transformer for Efficient Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Gated-Logarithmic Transformer (GLoT) that captures the long-term temporal dependencies of the sign language as a time-series data. |
Nada Shahin; Leila Ismail; | arxiv-cs.CL | 2025-02-17 |
106 | WMT24++: Expanding The Language Coverage of WMT24 to 55 Languages & Dialects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we extend the WMT24 dataset to cover 55 languages by collecting new human-written references and post-edits for 46 new languages and dialects in addition to post-edits of the references in 8 out of 9 languages in the original WMT24 dataset. |
DANIEL DEUTSCH et. al. | arxiv-cs.CL | 2025-02-17 |
107 | Truth Knows No Language: Evaluating Truthfulness Beyond English Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a professionally translated extension of the TruthfulQA benchmark designed to evaluate truthfulness in Basque, Catalan, Galician, and Spanish. |
BLANCA CALVO FIGUERAS et. al. | arxiv-cs.CL | 2025-02-13 |
108 | Multilingual Non-Autoregressive Machine Translation Without Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose an M-DAT approach to non-autoregressive multilingual machine translation. |
CHENYANG HUANG et. al. | arxiv-cs.CL | 2025-02-06 |
109 | BOUQuET: Dataset, Benchmark and Open Initiative for Universal Quality Evaluation in Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents BOUQuET, a multicentric and multi-register/domain dataset and benchmark, and its broader collaborative extension initiative. |
THE OMNILINGUAL MT TEAM et. al. | arxiv-cs.CL | 2025-02-06 |
110 | High-Fidelity Simultaneous Speech-To-Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Hibiki, a decoder-only model for simultaneous speech translation. |
TOM LABIAUSSE et. al. | arxiv-cs.CL | 2025-02-05 |
111 | Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to tackle streaming speaker change detection and gender classification by incorporating speaker embeddings into a transducer-based streaming end-to-end speech translation model. |
PEIDONG WANG et. al. | arxiv-cs.SD | 2025-02-04 |
112 | Beyond English: Evaluating Automated Measurement of Moral Foundations in Non-English Discourse with A Chinese Case Study Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study explores computational approaches for measuring moral foundations (MFs) in non-English corpora. |
Calvin Yixiang Cheng; Scott A Hale; | arxiv-cs.CL | 2025-02-04 |
113 | When End-to-End Is Overkill: Rethinking Cascaded Speech-to-Text Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the benefits of incorporating multiple candidates from ASR and self-supervised speech features into MT. Our analysis reveals that the primary cause of cascading errors stems from the increased divergence between similar samples in the speech domain when mapped to the text domain. |
Anna Min; Chenxu Hu; Yi Ren; Hang Zhao; | arxiv-cs.CL | 2025-02-01 |
114 | An Efficient Approach for Machine Translation on Low-resource Languages: A Case Study in Vietnamese-Chinese Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed an approach for machine translation in low-resource languages such as Vietnamese-Chinese. |
Tran Ngoc Son; Nguyen Anh Tu; Nguyen Minh Tri; | arxiv-cs.CL | 2025-01-31 |
115 | Cross-Language Approach for Quranic QA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these systems face unique challenges, including the linguistic disparity between questions written in Modern Standard Arabic and answers found in Quranic verses written in Classical Arabic, and the small size of existing datasets, which further restricts model performance. To address these challenges, we adopt a cross-language approach by (1) Dataset Augmentation: expanding and enriching the dataset through machine translation to convert Arabic questions into English, paraphrasing questions to create linguistic diversity, and retrieving answers from an English translation of the Quran to align with multilingual training requirements; and (2) Language Model Fine-Tuning: utilizing pre-trained models such as BERT-Medium, RoBERTa-Base, DeBERTa-v3-Base, ELECTRA-Large, Flan-T5, Bloom, and Falcon to address the specific requirements of Quranic QA. |
Islam Oshallah; Mohamed Basem; Ali Hamdi; Ammar Mohammed; | arxiv-cs.CL | 2025-01-29 |
116 | A Comparison of Data Filtering Techniques for English-Polish LLM-based Machine Translation in The Biomedical Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper evaluates the impact of commonly used data filtering techniques, such as LASER, MUSE, and LaBSE, on English-Polish translation within the biomedical domain. |
Jorge del Pozo Lérida; Kamil Kojs; János Máté; Mikołaj Antoni Barański; Christian Hardmeier; | arxiv-cs.CL | 2025-01-27 |
117 | Improving Estonian Text Simplification Through Pretrained Language Models and Custom Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study introduces an approach to Estonian text simplification using two model architectures: a neural machine translation model and a fine-tuned large language model (LLaMA). |
Eduard Barbu; Meeri-Ly Muru; Sten Marcus Malva; | arxiv-cs.CL | 2025-01-26 |
118 | Domain-Specific Machine Translation to Translate Medicine Brochures in English to Sorani Kurdish Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Access to Kurdish medicine brochures is limited, depriving Kurdish-speaking communities of critical health information. To address this problem, we developed a specialized Machine Translation (MT) model to translate English medicine brochures into Sorani Kurdish using a parallel corpus of 22,940 aligned sentence pairs from 319 brochures, sourced from two pharmaceutical companies in the Kurdistan Region of Iraq (KRI). |
Mariam Shamal; Hossein Hassani; | arxiv-cs.CL | 2025-01-23 |
119 | CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Direct Preference Optimization (DPO) has emerged as a simpler and more efficient alternative, but its performance depends heavily on the quality of preference data. To address this, we propose Confidence-Reward driven Preference Optimization (CRPO), a novel method that combines reward scores with model confidence to improve data selection for fine-tuning. |
GUOFENG CUI et. al. | arxiv-cs.CL | 2025-01-23 |
120 | Proverbs Run in Pairs: Evaluating Proverb Translation Capability of Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We construct a translation dataset of standalone proverbs and proverbs in conversation for four language pairs. |
Minghan Wang; Viet-Thanh Pham; Farhad Moghimifar; Thuy-Trang Vu; | arxiv-cs.CL | 2025-01-21 |
121 | Solving The Unsolvable: Translating Case Law in Hong Kong Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, it evolves from a single-agent system to a multi-agent system, incorporating Translator, Annotator, and Proofreader agents. This multi-agent approach, supported by a grant, aims to facilitate efficient, high-quality translation of judicial judgments by integrating advanced artificial intelligence and continuous feedback mechanisms, thus better meeting the needs of a bilingual legal system. |
King-kui Sin; Xi Xuan; Chunyu Kit; Clara Ho-yan Chan; Honic Ho-kin Ip; | arxiv-cs.CL | 2025-01-16 |
122 | ViBidirectionMT-Eval: Machine Translation for Vietnamese-Chinese and Vietnamese-Lao Language Pair Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an results of the VLSP 2022-2023 Machine Translation Shared Tasks, focusing on Vietnamese-Chinese and Vietnamese-Lao machine translation. |
Hong-Viet Tran; Minh-Quy Nguyen; Van-Vinh Nguyen; | arxiv-cs.CL | 2025-01-15 |
123 | AFRIDOC-MT: Document-level MT Corpus for African Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, Hausa, Swahili, Yor\`ub\’a, and Zulu. |
JESUJOBA O. ALABI et. al. | arxiv-cs.CL | 2025-01-10 |
124 | Investigating Numerical Translation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study focuses on evaluating the reliability of LLM-based machine translation systems when handling numerical data. |
WEI TANG et. al. | arxiv-cs.CL | 2025-01-08 |
125 | Quality Estimation Based Feedback Training for Improving Pronoun Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Pronoun translation is a longstanding challenge in neural machine translation (NMT), often requiring inter-sentential context to ensure linguistic accuracy. To address this, we introduce ProNMT, a novel framework designed to enhance pronoun and overall translation quality in context-aware machine translation systems. |
Harshit Dhankhar; Baban Gain; Asif Ekbal; Yogesh Mani Tripathi; | arxiv-cs.CL | 2025-01-06 |
126 | Crossing Language Borders: A Pipeline for Indonesian Manhwa Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this project, we develop a practical and efficient solution for automating the Manhwa translation from Indonesian to English. |
Nithyasri Narasimhan; Sagarika Singh; | arxiv-cs.LG | 2025-01-02 |
127 | Advancing Explainability in Neural Machine Translation: Analytical Metrics for Attention and Alignment Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The interpretability of these models, especially their internal attention mechanisms, is critical for building trust and verifying that these systems behave as intended. In this work, we introduce a systematic framework to quantitatively evaluate the explainability of an NMT model attention patterns by comparing them against statistical alignments and correlating them with standard machine translation quality metrics. |
Anurag Mishra; | arxiv-cs.AI | 2024-12-24 |
128 | Multiple References with Meaningful Variations Improve Literary Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We classify the semantic similarity between paraphrases into three levels: low, medium, and high, and fine-tune three different models (mT5-large, LLaMA-2-7B, and Opus-MT) for literary MT tasks. |
Si Wu; John Wieting; David A. Smith; | arxiv-cs.CL | 2024-12-24 |
129 | Towards Global AI Inclusivity: A Large-Scale Multilingual Terminology Dataset (GIST) Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce GIST, a large-scale multilingual AI terminology dataset containing 5K terms extracted from top AI conference papers spanning 2000 to 2023. |
JIARUI LIU et. al. | arxiv-cs.CL | 2024-12-24 |
130 | RepoTransBench: A Real-World Benchmark for Repository-Level Code Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such benchmarks do not accurately reflect real-world demands, where entire repositories often need to be translated, involving longer code length and more complex functionalities. To address this gap, we propose a new benchmark, named RepoTransBench, which is a real-world repository-level code translation benchmark with an automatically executable test suite. |
YANLI WANG et. al. | arxiv-cs.SE | 2024-12-23 |
131 | Investigating Length Issues in Document-level Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we challenge the ability of MT systems to handle texts comprising up to several thousands of tokens. |
Ziqian Peng; Rachel Bawden; François Yvon; | arxiv-cs.CL | 2024-12-23 |
132 | Ensuring Consistency for In-Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The former entails incorporating image information during translation, while the latter involves maintaining consistency between the style of the text-image and the original image, ensuring background integrity. To address these consistency requirements, we introduce a novel two-stage framework named HCIIT (High-Consistency In-Image Translation) which involves text-image translation using a multimodal multilingual large language model in the first stage and image backfilling with a diffusion model in the second stage. |
CHENGPENG FU et. al. | arxiv-cs.CL | 2024-12-23 |
133 | A Thorough Investigation Into The Application of Deep CNN for Enhancing Natural Language Processing Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, traditional NLP models struggle with accuracy and efficiency. This paper introduces Deep Convolutional Neural Networks (DCNN) into NLP to address these issues. |
CHANG WENG et. al. | arxiv-cs.CL | 2024-12-20 |
134 | Mention Attention for Pronoun Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We assume that extracting additional mention features can help pronoun translation. Therefore, we introduce an additional mention attention module in the decoder to pay extra attention to source mentions but not non-mention tokens. |
Gongbo Tang; Christian Hardmeier; | arxiv-cs.CL | 2024-12-19 |
135 | Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Why do we build local large language models (LLMs)? |
KOSHIRO SAITO et. al. | arxiv-cs.CL | 2024-12-18 |
136 | Understanding and Analyzing Model Robustness and Knowledge-Transfer in Multilingual Neural Machine Translation Using TX-Ray Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, Multilingual Neural Machine Translation (MNMT) in extremely low-resource settings remains underexplored. This research investigates how knowledge transfer across languages can enhance MNMT in such scenarios. |
Vageesh Saxena; Sharid Loáiciga; Nils Rethmeier; | arxiv-cs.CL | 2024-12-18 |
137 | The Role of Handling Attributive Nouns in Improving Chinese-To-English Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we specifically target the translation challenges posed by attributive nouns in Chinese, which frequently cause ambiguities in English translation. |
Lisa Wang; Adam Meyers; John E. Ortega; Rodolfo Zevallos; | arxiv-cs.CL | 2024-12-18 |
138 | Findings of The WMT 2024 Shared Task on Discourse-Level Literary Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We focus on three language directions: Chinese-English, Chinese-German, and Chinese-Russian, with the latter two ones newly added. |
LONGYUE WANG et. al. | arxiv-cs.CL | 2024-12-16 |
139 | Analyzing The Attention Heads for Pronoun Disambiguation in Context-aware Machine Translation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the role of attention heads in Context-aware Machine Translation models for pronoun disambiguation in the English-to-German and English-to-French language directions. |
Paweł Mąka; Yusuf Can Semerci; Jan Scholtes; Gerasimos Spanakis; | arxiv-cs.CL | 2024-12-15 |
140 | Large Language Models for Persian $ \leftrightarrow $ English Idiom Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces two parallel datasets of sentences containing idiomatic expressions for Persian$\rightarrow$English and English$\rightarrow$Persian translations, with Persian idioms sampled from our PersianIdioms resource, a collection of 2,200 idioms and their meanings, with 700 including usage examples. |
Sara Rezaeimanesh; Faezeh Hosseini; Yadollah Yaghoobzadeh; | arxiv-cs.CL | 2024-12-13 |
141 | Shiksha: A Technical Domain Focused Translation Dataset and Model for Indian Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Finding a translation dataset that tends to these domains in particular, poses a difficult challenge. In this paper, we address this by creating a multilingual parallel corpus containing more than 2.8 million rows of English-to-Indic and Indic-to-Indic high-quality translation pairs across 8 Indian languages. |
Advait Joglekar; Srinivasan Umesh; | arxiv-cs.CL | 2024-12-12 |
142 | Multi-perspective Alignment for Increasing Naturalness in Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the reinforcement learning from human feedback framework, we introduce a novel method that rewards both naturalness and content preservation. |
Huiyuan Lai; Esther Ploeger; Rik van Noord; Antonio Toral; | arxiv-cs.CL | 2024-12-11 |
143 | Domain-Specific Translation with Open-Source Large Language Models: Resource-Oriented Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we compare the domain-specific translation performance of open-source autoregressive decoder-only large language models (LLMs) with task-oriented machine translation (MT) models. |
Aman Kassahun Wassie; Mahdi Molaei; Yasmin Moslem; | arxiv-cs.CL | 2024-12-08 |
144 | Representation Purification for End-to-End Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conceptualize speech representation as a combination of content-agnostic and content-relevant factors. |
Chengwei Zhang; Yue Zhou; Rui Zhao; Yidong Chen; Xiaodong Shi; | arxiv-cs.CL | 2024-12-05 |
145 | BhashaVerse : Translation Ecosystem for Indian Subcontinent Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper focuses on developing translation models and related applications for 36 Indian languages, including Assamese, Awadhi, Bengali, Bhojpuri, Braj, Bodo, Dogri, English, Konkani, Gondi, Gujarati, Hindi, Hinglish, Ho, Kannada, Kangri, Kashmiri (Arabic and Devanagari), Khasi, Mizo, Magahi, Maithili, Malayalam, Marathi, Manipuri (Bengali and Meitei), Nepali, Oriya, Punjabi, Sanskrit, Santali, Sinhala, Sindhi (Arabic and Devanagari), Tamil, Tulu, Telugu, and Urdu. |
Vandan Mujadia; Dipti Misra Sharma; | arxiv-cs.CL | 2024-12-05 |
146 | Agent AI with LangGraph: A Modular Framework for Enhancing Machine Translation Using Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the transformative role of Agent AI and LangGraph in advancing the automation and effectiveness of machine translation (MT). |
Jialin Wang; Zhihua Duan; | arxiv-cs.CL | 2024-12-04 |
147 | A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose and evaluate the feasibility of a two-stage pipeline to evaluate literary machine translation, in a fine-grained manner, from English to Korean. |
SHEIKH SHAFAYAT et. al. | arxiv-cs.CL | 2024-12-02 |
148 | A Multi-way Parallel Named Entity Annotated Corpus for English, Tamil and Sinhala Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a multi-way parallel English-Tamil-Sinhala corpus annotated with Named Entities (NEs), where Sinhala and Tamil are low-resource languages. |
SURANGIKA RANATHUNGA et. al. | arxiv-cs.CL | 2024-12-02 |
149 | Towards Santali Linguistic Inclusion: Building The First Santali-to-English Translation Model Using MT5 Transformer and Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our paper aims to include Santali to the NPL spectrum. |
SYED MOHAMMED MOSTAQUE BILLAH et. al. | arxiv-cs.CL | 2024-11-29 |
150 | Aligning Pre-trained Models for Spoken Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates a novel approach to end-to-end speech translation (ST) based on aligning frozen pre-trained automatic speech recognition (ASR) and machine translation (MT) models via a small connector module (Q-Former, our Subsampler-Transformer Encoder). |
Šimon Sedláček; Santosh Kesiraju; Alexander Polok; Jan Černocký; | arxiv-cs.CL | 2024-11-27 |
151 | SwissADT: An Audio Description Translation System for Swiss Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By collecting well-crafted AD data augmented with video clips in German, French, Italian, and English, and leveraging the power of Large Language Models (LLMs), we aim to enhance information accessibility for diverse language populations in Switzerland by automatically translating AD scripts to the desired Swiss language. |
Lukas Fischer; Yingqiang Gao; Alexa Lintner; Sarah Ebling; | arxiv-cs.CL | 2024-11-22 |
152 | Benchmarking GPT-4 Against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study presents a comprehensive evaluation of GPT-4’s translation capabilities compared to human translators of varying expertise levels. |
JIANHAO YAN et. al. | arxiv-cs.CL | 2024-11-20 |
153 | A Comparative Study of Text Retrieval Models on DaReCzech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article presents a comprehensive evaluation of 7 off-the-shelf document retrieval models: Splade, Plaid, Plaid-X, SimCSE, Contriever, OpenAI ADA and Gemma2 chosen to determine their performance on the Czech retrieval dataset DaReCzech. |
Jakub Stetina; Martin Fajcik; Michal Stefanik; Michal Hradis; | arxiv-cs.IR | 2024-11-19 |
154 | Fine-Tuning Large Language Models to Translate: Will A Touch of Noisy Data in Misaligned Languages Suffice? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditionally, success in multilingual machine translation can be attributed to three key factors in training data: large volume, diverse translation directions, and high quality. In the current practice of fine-tuning large language models (LLMs) for translation, we revisit the importance of these factors. |
DAWEI ZHU et. al. | emnlp | 2024-11-11 |
155 | Chain-of-Dictionary Prompting Elicits Translation in Large Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present a novel framework, CoD, Chain-of-Dictionary Prompting, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities for LLMs. |
HONGYUAN LU et. al. | emnlp | 2024-11-11 |
156 | MiTTenS: A Dataset for Evaluating Gender Mistranslation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Translation systems, including foundation models capable of translation, can produce errors that result in gender mistranslation, and such errors can be especially harmful. To measure the extent of such potential harms when translating into and out of English, we introduce a dataset, MiTTenS, covering 26 languages from a variety of language families and scripts, including several traditionally under-represented in digital resources. |
Kevin Robinson; Sneha Kudugunta; Romina Stella; Sunipa Dev; Jasmijn Bastings; | emnlp | 2024-11-11 |
157 | Using Language Models to Disambiguate Lexical Choices in Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We work with native speakers of nine languages to create DTAiLS, a dataset of 1,377 sentence pairs that exhibit cross-lingual concept variation when translating from English. |
Josh Barua; Sanjay Subramanian; Kayo Yin; Alane Suhr; | emnlp | 2024-11-11 |
158 | SpeechQE: Estimating The Quality of Direct Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formulate the task of quality estimation for speech translation (SpeechQE), construct a benchmark, and evaluate a family of systems based on cascaded and end-to-end architectures. |
HyoJung Han; Kevin Duh; Marine Carpuat; | emnlp | 2024-11-11 |
159 | Reconsidering Sentence-Level Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Historically, sign language machine translation has been posed as a sentence-level task: datasets consisting of continuous narratives are chopped up and presented to the model as isolated clips. In this work, we explore the limitations of this task framing. |
Garrett Tanzer; Maximus Shengelia; Ken Harrenstien; David Uthus; | emnlp | 2024-11-11 |
160 | Error Analysis of Multilingual Language Models in Machine Translation: A Case Study of English-Amharic Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We employed both automatic and human evaluation methods to analyze translation errors. |
Hizkiel Mitiku Alemayehu; Hamada M Zahera; Axel-Cyrille Ngonga Ngomo; | emnlp | 2024-11-11 |
161 | Isochrony-Controlled Speech-to-Text Translation: A Study on Translating from Sino-Tibetan to Indo-European Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous methods often controlled the number of words or characters generated by the Machine Translation model to approximate the source sentence’s length without considering the isochrony of pauses and speech segments, as duration can vary between languages. To address this, we present improvements to the duration alignment component of our sequence-to-sequence ST model. |
MIDIA YOUSEFI et. al. | arxiv-cs.CL | 2024-11-11 |
162 | Building Resources for Emakhuwa: Machine Translation and News Classification Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a comprehensive collection of NLP resources for Emakhuwa, Mozambique’s most widely spoken language. |
Felermino D. M. A. Ali; Henrique Lopes Cardoso; Rui Sousa-Silva; | emnlp | 2024-11-11 |
163 | Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem of cross-cultural translation on two fronts: (i) we introduce XC-Translate, the first large-scale, manually-created benchmark for machine translation that focuses on text that contains potentially culturally-nuanced entity names, and (ii) we propose KG-MT, a novel end-to-end method to integrate information from a multilingual knowledge graph into a neural machine translation model by leveraging a dense retrieval mechanism. |
SIMONE CONIA et. al. | emnlp | 2024-11-11 |
164 | CULL-MT: Compression Using Language and Layer Pruning for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present CULL-MT, a compression method for machine translation models based on structural layer pruning and selected language directions. |
Pedram Rostami; Mohammad Javad Dousti; | arxiv-cs.CL | 2024-11-10 |
165 | Fineweb-Edu-Ar: Machine-translated Corpus to Support Arabic Small Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This report introduces FineWeb-Edu-Ar, a machine-translated version of the exceedingly popular (deduplicated) FineWeb-Edu dataset from HuggingFace. |
Sultan Alrashed; Dmitrii Khizbullin; David R. Pugh; | arxiv-cs.CL | 2024-11-10 |
166 | Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a series of advanced explorations of Transformer architecture design to minimize the error compared to the true “solution.” |
BEI LI et. al. | arxiv-cs.CL | 2024-11-05 |
167 | Context-Informed Machine Translation of Manga Using Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate to what extent multimodal large language models (LLMs) can provide effective manga translation, thereby assisting manga authors and publishers in reaching wider audiences. |
Philip Lippmann; Konrad Skublicki; Joshua Tanner; Shonosuke Ishiwatari; Jie Yang; | arxiv-cs.CL | 2024-11-04 |
168 | MetaMetrics-MT: Tuning Meta-Metrics for Machine Translation Via Human Preference Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present MetaMetrics-MT, an innovative metric designed to evaluate machine translation (MT) tasks by aligning closely with human preferences through Bayesian optimization with Gaussian Processes. |
David Anugraha; Garry Kuwanto; Lucky Susanto; Derry Tanti Wijaya; Genta Indra Winata; | arxiv-cs.CL | 2024-11-01 |
169 | Evaluation of Speech Translation Subtitles Generated By ASR with Unnecessary Word Detection Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This study addresses the problem of generating understandable speech translation subtitles for spontaneous speech, such as lectures and talks, which often contain disfluencies … |
Makoto Hotta; Chee Siang Leow; N. Kitaoka; Hiromitsu Nishizaki; | 2024 IEEE 13th Global Conference on Consumer Electronics … | 2024-10-29 |
170 | Anticipating Future with Large Language Model for Simultaneous Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by human interpreters’ technique to forecast future words before hearing them, we propose $\textbf{T}$ranslation by $\textbf{A}$nticipating $\textbf{F}$uture (TAF), a method to improve translation quality while retraining low latency. |
SIQI OUYANG et. al. | arxiv-cs.CL | 2024-10-29 |
171 | GrammaMT: Improving Machine Translation with Grammar-Informed In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce GrammaMT, a grammatically-aware prompting approach for machine translation that uses Interlinear Glossed Text (IGT), a common form of linguistic description providing morphological and lexical annotations for source sentences. |
Rita Ramos; Everlyn Asiko Chimoto; Maartje ter Hoeve; Natalie Schluter; | arxiv-cs.CL | 2024-10-24 |
172 | Dialectal and Low-Resource Machine Translation for Aromanian Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The primary contribution of this research is twofold: (1) the creation of the most extensive Aromanian-Romanian parallel corpus to date, consisting of 79,000 sentence pairs, and (2) the development and comparative analysis of several machine translation models optimized for Aromanian. |
Alexandru-Iulius Jerpelea; Alina Rădoi; Sergiu Nisioi; | arxiv-cs.CL | 2024-10-23 |
173 | Can General-Purpose Large Language Models Generalize to English-Thai Machine Translation ? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models (LLMs) perform well on common tasks but struggle with generalization in low-resource and low-computation settings. We examine this limitation by testing various LLMs and specialized translation models on English-Thai machine translation and code-switching datasets. |
JIRAT CHIARANAIPANICH et. al. | arxiv-cs.CL | 2024-10-22 |
174 | Learning from Others’ Mistakes: Finetuning Machine Translation Models with Span-level Error Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the potential of utilizing fine-grained span-level annotations from offline datasets to improve model quality. |
LILY H. ZHANG et. al. | arxiv-cs.CL | 2024-10-21 |
175 | On Creating An English-Thai Code-switched Machine Translation in Medical Domain Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our research prioritizes not merely improving translation accuracy but also maintaining medical terminology in English within the translated text through code-switched (CS) translation. We developed a method to produce CS medical translation data, fine-tuned a CS translation model with this data, and evaluated its performance against strong baselines, such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. |
PARINTHAPAT PENGPUN et. al. | arxiv-cs.CL | 2024-10-21 |
176 | MHumanEval — A Multilingual Benchmark to Evaluate Large Language Models for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While recent works have addressed test coverage and programming language (PL) diversity, code generation from low-resource language prompts remains largely unexplored. To address this gap, we introduce mHumanEval, an extended benchmark supporting prompts in over 200 natural languages. |
Nishat Raihan; Antonios Anastasopoulos; Marcos Zampieri; | arxiv-cs.CL | 2024-10-19 |
177 | MHumanEval – A Multilingual Benchmark to Evaluate Large Language Models for Code Generation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent advancements in large language models (LLMs) have significantly enhanced code generation from natural language prompts. The HumanEval Benchmark, developed by OpenAI, … |
Nishat Raihan; Antonios Anastasopoulos; Marcos Zampieri; | ArXiv | 2024-10-19 |
178 | Context-Aware or Context-Insensitive? Assessing LLMs’ Performance in Document-Level Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on document-level translation, where some words cannot be translated without context from outside the sentence. |
Wafaa Mohammed; Vlad Niculae; | arxiv-cs.CL | 2024-10-18 |
179 | NLIP_Lab-IITH Multilingual MT System for WAT24 MT Shared Task Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper describes NLIP Lab’s multilingual machine translation system for the WAT24 shared task on multilingual Indic MT task for 22 scheduled languages belonging to 4 language families. |
Maharaj Brahma; Pramit Sahoo; Maunendra Sankar Desarkar; | arxiv-cs.CL | 2024-10-17 |
180 | Quantity Vs. Quality of Monolingual Source Data in Automatic Text Translation: Can It Be Too Little If It Is Too Good? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it has been shown that too much of this data can be detrimental to the performance of the model if the available parallel data is comparatively extremely low. In this study, we investigate whether the monolingual data can also be too little and if this reduction, based on quality, has any effect on the performance of the translation model. |
Idris Abdulmumin; Bashir Shehu Galadanci; Garba Aliyu; Shamsuddeen Hassan Muhammad; | arxiv-cs.CL | 2024-10-17 |
181 | Is Hate Lost in Translation?: Evaluation of Multilingual LGBTQIA+ Hate Speech Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the challenges of detecting LGBTQIA+ hate speech of large language models across multiple languages, including English, Italian, Chinese and (code-switched) English-Tamil, examining the impact of machine translation and whether the nuances of hate speech are preserved across translation. |
Fai Leui Chan; Duke Nguyen; Aditya Joshi; | arxiv-cs.CL | 2024-10-14 |
182 | IsoChronoMeter: A Simple and Effective Isochronic Translation Evaluation Metric Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using ICM we demonstrate the shortcomings of state-of-the-art translation systems and show the need for new methods. |
Nikolai Rozanov; Vikentiy Pankov; Dmitrii Mukhutdinov; Dima Vypirailenko; | arxiv-cs.CL | 2024-10-14 |
183 | Machine Translation Evaluation Benchmark for Wu Chinese: Workflow and Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a FLORES+ dataset as an evaluation benchmark for modern Wu Chinese machine translation models and showcase its compatibility with existing Wu data. |
Hongjian Yu; Yiming Shi; Zherui Zhou; Christopher Haberland; | arxiv-cs.CL | 2024-10-14 |
184 | Code-Mixer Ya Nahi: Novel Approaches to Measuring Multilingual LLMs’ Code-Mixing Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Rule-Based Prompting, a novel prompting technique to generate code-mixed sentences. |
Ayushman Gupta; Akhil Bhogal; Kripabandhu Ghosh; | arxiv-cs.CL | 2024-10-14 |
185 | ChakmaNMT: A Low-resource Machine Translation On Chakma Language Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The geopolitical division between the indigenous Chakma population and mainstream Bangladesh creates a significant cultural and linguistic gap, as the Chakma community, mostly … |
AUNABIL CHAKMA et. al. | arxiv-cs.CL | 2024-10-14 |
186 | QE-EBM: Using Quality Estimators As Energy Loss for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose QE-EBM, a method of employing quality estimators as trainable loss networks that can directly backpropagate to the NMT model. |
Gahyun Yoo; Jay Yoon Lee; | arxiv-cs.CL | 2024-10-14 |
187 | Ukrainian-to-English Folktale Corpus: Parallel Corpus Creation and Augmentation for Machine Translation in Low-resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We have created a new Ukrainian-To-English parallel corpus of familiar Ukrainian folktales based on available English translations and suggested several new ones. We offer a combined domain-specific approach to building and augmenting this corpus, considering the nature of the domain and differences in the purpose of human versus machine translation. |
Olena Burda-Lassen; | arxiv-cs.CL | 2024-10-13 |
188 | NusaMT-7B: Machine Translation for Low-Resource Indonesian Languages with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces NusaMT-7B, an LLM-based machine translation model for low-resource Indonesian languages, starting with Balinese and Minangkabau. |
William Tan; Kevin Zhu; | arxiv-cs.CL | 2024-10-10 |
189 | Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these tools provide limited insights for fine-grained system-level comparisons and the analysis of instance-level defects. To address these limitations, we introduce Translation Canvas, an explainable interface designed to pinpoint and analyze translation systems’ performance: 1) Translation Canvas assists machine translation researchers in comprehending system-level model performance by identifying common errors (their frequency and severity) and analyzing relationships between different systems based on various evaluation metrics. |
Chinmay Dandekar; Wenda Xu; Xi Xu; Siqi Ouyang; Lei Li; | arxiv-cs.CL | 2024-10-07 |
190 | Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a series of advanced explorations of Transformer architecture design to minimize the error compared to the true “solution.” |
BEI LI et. al. | nips | 2024-10-07 |
191 | TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion yet facilitates end-to-end inference through joint probability. |
CHENYANG LE et. al. | nips | 2024-10-07 |
192 | CTC-GMM: CTC Guided Modality Matching for Fast and Accurate Streaming Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a methodology named Connectionist Temporal Classification guided modality matching (CTC-GMM) that enhances the streaming ST model by leveraging extensive machine translation (MT) text data. |
Rui Zhao; Jinyu Li; Ruchao Fan; Matt Post; | arxiv-cs.CL | 2024-10-07 |
193 | Efficient Minimum Bayes Risk Decoding Using Low-Rank Matrix Completion Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel approach for approximating MBR decoding using matrix completion techniques, focusing on a machine translation task. |
Firas Trabelsi; David Vilar; Mara Finkelstein; Markus Freitag; | nips | 2024-10-07 |
194 | QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the problem of sampling a set of high-quality and diverse translations. |
GONÇALO FARIA et. al. | nips | 2024-10-07 |
195 | Scaling Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we push forward the frontier of SLT by scaling pretraining data, model size, and number of translation directions. |
Biao Zhang; Garrett Tanzer; Orhan Firat; | nips | 2024-10-07 |
196 | Cogs in A Machine, Doing What They’re Meant to Do – The AMI Submission to The WMT24 General Translation Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the submission of the \’Arni Magnusson Institute’s team to the WMT24 General translation task. We work on the English->Icelandic translation direction. Our … |
Atli Jasonarson; Hinrik Hafsteinsson; Bjarki ‘Armannsson; Steinth’or Steingr’imsson; | ArXiv | 2024-10-04 |
197 | Cogs in A Machine, Doing What They’re Meant to Do — The AMI Submission to The WMT24 General Translation Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the submission of the \’Arni Magnusson Institute’s team to the WMT24 General translation task. |
Atli Jasonarson; Hinrik Hafsteinsson; Bjarki Ármannsson; Steinþór Steingrímsson; | arxiv-cs.CL | 2024-10-04 |
198 | Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our evaluation of prominent LLMs reveals a discernible performance gap against traditional MT systems, highlighting domain overfitting and catastrophic forgetting issues after fine-tuning on domain-limited corpora. To mitigate this, we propose a domain Chain of Thought (CoT) fine-tuning technique that utilizes the intrinsic multi-domain intelligence of LLMs to improve translation performance. |
TIANXIANG HU et. al. | arxiv-cs.CL | 2024-10-03 |
199 | Efficient Technical Term Translation: A Knowledge Distillation Approach for Parenthetical Terminology Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the challenge of accurately translating technical terms, which are crucial for clear communication in specialized fields. We introduce the Parenthetical Terminology Translation (PTT) task, designed to mitigate potential inaccuracies by displaying the original term in parentheses alongside its translation. |
Jiyoon Myung; Jihyeon Park; Jungki Son; Kyungro Lee; Joohyung Han; | arxiv-cs.CL | 2024-10-01 |
200 | Disentangling Singlish Discourse Particles with Task-Driven Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: After disentanglement, we cluster these discourse particles to differentiate their pragmatic functions, and perform Singlish-to-English machine translation. Our work provides a computational method to understanding Singlish discourse particles, and opens avenues towards a deeper comprehension of the language and its usage. |
Linus Tze En Foo; Lynnette Hui Xian Ng; | arxiv-cs.CL | 2024-09-30 |
201 | Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, aligning their representations poses challenges due to the significant semantic gap between vision and text, as well as the lower quality of non-English representations caused by pre-trained encoders and data noise. To overcome these challenges, we propose LECCR, a novel solution that incorporates the multi-modal large language model (MLLM) to improve the alignment between visual and non-English representations. |
YABING WANG et. al. | arxiv-cs.CV | 2024-09-30 |
202 | AVIATE: Exploiting Translation Variants of Artifacts to Improve IR-based Traceability Recovery in Bilingual Software Projects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the translation can also bring in synonymous terms that are not consistent with those in the bilingual projects (e.g., another translation of ShuXing as property). Therefore, we propose an enhancement strategy called AVIATE that exploits translation variants from different translators by utilizing the word pairs that appear simultaneously across the translation variants from different kinds artifacts (a.k.a. consensual biterms). |
KEXIN SUN et. al. | arxiv-cs.SE | 2024-09-28 |
203 | Can LLMs Really Learn to Translate A Low-Resource Language from One Grammar Book? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Machine Translation from One Book (Tanzer et al., 2024) suggests that prompting long-context LLMs with one grammar book enables English-Kalamang translation, an XLR language unseen by LLMs – a noteworthy case of linguistics helping an NLP task. We investigate the source of this translation ability, finding almost all improvements stem from the book’s parallel examples rather than its grammatical explanations. |
Seth Aycock; David Stap; Di Wu; Christof Monz; Khalil Sima’an; | arxiv-cs.CL | 2024-09-27 |
204 | On Translating Technical Terminology: A Translation Workflow for Machine-Translated Acronyms Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The typical workflow for a professional translator to translate a document from its source language (SL) to a target language (TL) is not always focused on what many language … |
Richard Yue; John E. Ortega; Kenneth Ward Church; | arxiv-cs.CL | 2024-09-26 |
205 | Predicting Anchored Text from Translation Memories for Machine Translation Using Deep Learning Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we show that for a large part of those words which are anchored, we can use other techniques that are based on machine learning approaches such as Word2Vec. |
Richard Yue; John E. Ortega; | arxiv-cs.CL | 2024-09-26 |
206 | Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We do so by introducing Direct Quality Optimization (DQO), a variant of DPO leveraging a pre-trained translation quality estimation model as a proxy for human preferences, and verify the improvements with both automatic metrics and human evaluation. |
Kaden Uhlig; Joern Wuebker; Raphael Reinauer; John DeNero; | arxiv-cs.CL | 2024-09-26 |
207 | Multilingual Transfer and Domain Adaptation for Low-Resource Languages of Spain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article introduces the submission status of the Translation into Low-Resource Languages of Spain task at (WMT 2024) by Huawei Translation Service Center (HW-TSC). |
YUANCHANG LUO et. al. | arxiv-cs.CL | 2024-09-24 |
208 | Context-aware and Style-related Incremental Decoding Framework for Discourse-Level Literary Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This report outlines our approach for the WMT24 Discourse-Level Literary Translation Task, focusing on the Chinese-English language pair in the Constrained Track. |
YUANCHANG LUO et. al. | arxiv-cs.AI | 2024-09-24 |
209 | Machine Translation Advancements of Low-Resource Indian Languages By Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the submission by Huawei Translation Center (HW-TSC) to the WMT24 Indian Languages Machine Translation (MT) Shared Task. |
BIN WEI et. al. | arxiv-cs.CL | 2024-09-24 |
210 | Choose The Final Translation from NMT and LLM Hypotheses Using MBR Decoding: HW-TSC’s Submission to The WMT24 General MT Shared Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the WMT24 general machine translation (MT) shared task, where we participate in the English to Chinese (en2zh) language pair. |
ZHANGLIN WU et. al. | arxiv-cs.AI | 2024-09-23 |
211 | HW-TSC’s Submission to The CCMT 2024 Machine Translation Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the submission of Huawei Translation Services Center (HW-TSC) to machine translation tasks of the 20th China Conference on Machine Translation (CCMT 2024). |
ZHANGLIN WU et. al. | arxiv-cs.AI | 2024-09-23 |
212 | Scaling Laws of Decoder-Only Models on The Multilingual Machine Translation Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work explores the scaling laws of decoder-only models on the multilingual and multidomain translation task. |
Gaëtan Caillaut; Raheel Qader; Mariam Nakhlé; Jingshu Liu; Jean-Gabriel Barthélemy; | arxiv-cs.CL | 2024-09-23 |
213 | Brotherhood at WMT 2024: Leveraging LLM-Generated Contextual Conversations for Cross-Lingual Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe our system under the team name Brotherhood for the English-to-Lowres Multi-Modal Translation Task. |
Siddharth Betala; Ishan Chokshi; | arxiv-cs.CL | 2024-09-23 |
214 | RoMath: A Mathematical Reasoning Benchmark in Romanian Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces RoMath, a Romanian mathematical reasoning benchmark suite comprising three subsets: Baccalaureate, Competitions and Synthetic, which cover a range of mathematical domains and difficulty levels, aiming to improve non-English language models and promote multilingual AI development. |
Adrian Cosma; Ana-Maria Bucur; Emilian Radoi; | arxiv-cs.CL | 2024-09-17 |
215 | American Sign Language to Text Translation Using Transformer and Seq2Seq with LSTM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study compares the Transformer with the Sequence-to-Sequence (Seq2Seq) model in translating sign language to text. |
Gregorius Guntur Sunardi Putra; Adifa Widyadhani Chanda D’Layla; Dimas Wahono; Riyanarto Sarno; Agus Tri Haryono; | arxiv-cs.CL | 2024-09-17 |
216 | GOSt-MT: A Knowledge Graph for Occupation-related Gender Biases in Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel approach to studying occupation-related gender bias through the creation of the GOSt-MT (Gender and Occupation Statistics for Machine Translation) Knowledge Graph. |
ORFEAS MENIS MASTROMICHALAKIS et. al. | arxiv-cs.CL | 2024-09-17 |
217 | Task Arithmetic for Language Expansion in Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, we aim to build a one-to-many ST system from existing one-to-one ST systems using task arithmetic without re-training. Direct application of task arithmetic in ST leads to language confusion; therefore, we introduce an augmented task arithmetic method incorporating a language control model to ensure correct target language generation. |
YAO-FEI CHENG et. al. | arxiv-cs.CL | 2024-09-17 |
218 | Translating Step-by-Step: Decomposing The Translation Process for Improved Translation Quality of Long-Form Texts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present a step-by-step approach to long-form text translation, drawing on established processes in translation studies. |
Eleftheria Briakou; Jiaming Luo; Colin Cherry; Markus Freitag; | arxiv-cs.CL | 2024-09-10 |
219 | Evaluation of Google Translate for Mandarin Chinese Translation Using Sentiment and Semantic Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we provide an automated assessment of translation quality of Google Translate with human experts using sentiment and semantic analysis. |
Xuechun Wang; Rodney Beard; Rohitash Chandra; | arxiv-cs.CL | 2024-09-08 |
220 | Open Language Data Initiative: Advancing Low-Resource Machine Translation for Karakalpak Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study presents several contributions for the Karakalpak language: a FLORES+ devtest dataset translated to Karakalpak, parallel corpora for Uzbek-Karakalpak, Russian-Karakalpak and English-Karakalpak of 100,000 pairs each and open-sourced fine-tuned neural models for translation across these languages. |
Mukhammadsaid Mamasaidov; Abror Shopulatov; | arxiv-cs.CL | 2024-09-06 |
221 | Creating Domain-Specific Translation Memories for Machine Translation Fine-tuning: The TRENCARD Bilingual Cardiology Corpus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The article introduces a semi-automatic TM preparation methodology leveraging primarily translation tools used by translators in favor of data quality and control by the translators. |
Gokhan Dogru; | arxiv-cs.CL | 2024-09-04 |
222 | A Data Selection Approach for Enhancing Low Resource Machine Translation Using Cross-Lingual Sentence Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate the impact of data quality issues, we propose a data filtering approach based on cross-lingual sentence representations. |
Nidhi Kowtal; Tejas Deshpande; Raviraj Joshi; | arxiv-cs.CL | 2024-09-04 |
223 | Towards Cross-Lingual Explanation of Artwork in Large-scale Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, multilingual QA benchmarks that create datasets using machine translation have cultural differences and biases, remaining issues for use as evaluation tasks. To address these challenges, this study created an extended dataset in multiple languages without relying on machine translation. |
SHINTARO OZAKI et. al. | arxiv-cs.CL | 2024-09-02 |
224 | Human Versus Neural Machine Translation Creativity: A Study on Manipulated MWEs in Literature Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the digital era, the (r)evolution of neural machine translation (NMT) has reshaped both the market and translators’ workflow. However, the adoption of this technology has not … |
Gloria Corpas Pastor; Laura Noriega-Santiáñez; | Inf. | 2024-09-02 |
225 | Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach that consists of reranking translation candidates with a classifier that distinguishes between original and translated text. |
Esther Ploeger; Huiyuan Lai; Rik van Noord; Antonio Toral; | arxiv-cs.CL | 2024-08-30 |
226 | From Rule-Based Models to Deep Learning Transformers Architectures for Natural Language Processing and Sign Language Translation Systems: Survey, Taxonomy and Performance Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there are few works on sign language machine translation considering the particularity of the language being continuous and dynamic. This paper aims to address this void, providing a retrospective analysis of the temporal evolution of sign language machine translation algorithms and a taxonomy of the Transformers architectures, the most used approach in language translation. |
Nada Shahin; Leila Ismail; | arxiv-cs.AI | 2024-08-27 |
227 | Cultural Adaptation of Menus: A Fine-Grained Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the ChineseMenuCSI dataset, the largest for Chinese-English menu corpora, annotated with CSI vs Non-CSI labels and a fine-grained test set. |
Zhonghe Zhang; Xiaoyu He; Vivek Iyer; Alexandra Birch; | arxiv-cs.CL | 2024-08-24 |
228 | Expanding FLORES+ Benchmark for More Low-Resource Settings: Portuguese-Emakhuwa Machine Translation Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present baseline results from training a Neural Machine Translation system and fine-tuning existing multilingual translation models. |
Felermino D. M. Antonio Ali; Henrique Lopes Cardoso; Rui Sousa-Silva; | arxiv-cs.CL | 2024-08-21 |
229 | Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Babel-ImageNet, a massively multilingual benchmark that offers (partial) translations of ImageNet labels to 100 languages, built without machine translation or manual annotation. |
Gregor Geigle; Radu Timofte; Goran Glava�; | acl | 2024-08-20 |
230 | Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address key challenges facing LLMs fine-tuned for SimulMT, validate classical SimulMT concepts and practices in the context of LLMs, explore adapting LLMs that are fine-tuned for NMT to the task of SimulMT, and introduce Simul-LLM, the first open-source fine-tuning and evaluation pipeline development framework for LLMs focused on SimulMT. |
Victor Agostinelli; Max Wild; Matthew Raffel; Kazi Fuad; Lizhong Chen; | acl | 2024-08-20 |
231 | The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our findings emphasize the need for fine-tuning strategies that preserve the benefits of LLMs for machine translation. |
David Stap; Eva Hasler; Bill Byrne; Christof Monz; Ke Tran; | acl | 2024-08-20 |
232 | Self-Modifying State Modeling for Simultaneous Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, building decision paths requires unidirectional encoders to simulate streaming source inputs, which impairs the translation quality of SiMT models. To solve these issues, we propose Self-Modifying State Modeling (SM2), a novel training paradigm for SiMT task. |
Donglei Yu; Xiaomian Kang; Yuchen Liu; Yu Zhou; Chengqing Zong; | acl | 2024-08-20 |
233 | Speech Sense Disambiguation: Tackling Homophone Ambiguity in End-to-End Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To facilitate this, we first create a comprehensive homophone dictionary and an annotated dataset rich with homophone information established based on speech-text alignment. Building on this unique dictionary, we introduce AmbigST, an innovative homophone-aware contrastive learning approach that integrates a homophone-aware masking strategy. |
TENGFEI YU et. al. | acl | 2024-08-20 |
234 | What Is The Best Way for ChatGPT to Translate Poetry? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite promising outcomes, our analysis reveals persistent issues in the translations generated by ChatGPT that warrant attention. To address these shortcomings, we propose an Explanation-Assisted Poetry Machine Translation (EAPMT) method, which leverages monolingual poetry explanation as a guiding information for the translation process. |
Shanshan Wang; Derek Wong; Jingming Yao; Lidia Chao; | acl | 2024-08-20 |
235 | Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Intuitively, with a growing number of seen languages the encoder sentence representation grows more flexible and easily adaptable to new languages. In this work, we test this hypothesis by zero-shot translating from unseen languages. |
Carlos Mullov; Quan Pham; Alexander Waibel; | acl | 2024-08-20 |
236 | Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose a Retrieval-Augmented Machine Translation (RAT) method which incorporates knowledge related to classical poetry for advancing the translation of Chinese Poetry in LLMs. |
ANDONG CHEN et. al. | arxiv-cs.CL | 2024-08-19 |
237 | Cross-Lingual Conversational Speech Summarization with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We build a baseline cascade-based system using open-source speech recognition and machine translation models. |
Max Nelson; Shannon Wotherspoon; Francis Keith; William Hartmann; Matthew Snover; | arxiv-cs.CL | 2024-08-12 |
238 | Simplifying Translations for Children: Iterative Simplification Considering Age of Acquisition with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a method that replaces words with high Age of Acquisitions (AoA) in translations with simpler words to match the translations to the user’s level. |
Masashi Oshika; Makoto Morishita; Tsutomu Hirao; Ryohei Sasano; Koichi Takeda; | arxiv-cs.CL | 2024-08-08 |
239 | Evaluating The Translation Performance of Large Language Models Based on Euas-20 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the significant progress in translation performance achieved by large language models, machine translation still faces many challenges. Therefore, in this paper, we construct the dataset Euas-20 to evaluate the performance of large language models on translation tasks, the translation ability on different languages, and the effect of pre-training data on the translation ability of LLMs for researchers and developers. |
Yan Huang; Wei Liu; | arxiv-cs.CL | 2024-08-06 |
240 | Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Intuitively, with a growing number of seen languages the encoder sentence representation grows more flexible and easily adaptable to new languages. In this work, we test this hypothesis by zero-shot translating from unseen languages. |
Carlos Mullov; Ngoc-Quan Pham; Alexander Waibel; | arxiv-cs.CL | 2024-08-05 |
241 | Improving Multilingual Neural Machine Translation By Utilizing Semantic and Linguistic Features Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose to exploit both semantic and linguistic features between multiple languages to enhance multilingual translation. |
Mengyu Bu; Shuhao Gu; Yang Feng; | arxiv-cs.CL | 2024-08-02 |
242 | In-Context Example Selection Via Similarity Search Improves Low-Resource Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on machine translation (MT), a task that has been shown to benefit from in-context translation examples. |
Armel Zebaze; Benoît Sagot; Rachel Bawden; | arxiv-cs.CL | 2024-08-01 |
243 | Encoder–Decoder Calibration for Multimodal Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The main purpose of multimodal machine translation (MMT) is to improve the quality of translation results by taking the corresponding visual context as an additional input. … |
Turghun Tayir; Lin Li; Bei Li; Jianquan Liu; Kong Aik Lee; | IEEE Transactions on Artificial Intelligence | 2024-08-01 |
244 | Generating Gender Alternatives in Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key technical contribution is a novel semi-supervised solution for generating alternatives that integrates seamlessly with standard MT models and maintains high performance without requiring additional components or increasing inference overhead. |
SARTHAK GARG et. al. | arxiv-cs.CL | 2024-07-29 |
245 | The Power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our findings reveal pervasive gender bias across all models, with base LLMs exhibiting a higher degree of bias compared to NMT models. To combat this bias, we explore prompting engineering techniques applied to an instruction-tuned LLM. |
Aleix Sant; Carlos Escolano; Audrey Mash; Francesca De Luca Fornaciari; Maite Melero; | arxiv-cs.CL | 2024-07-26 |
246 | Machine Translation Hallucination Detection for Low and High Resource Languages Using Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper evaluates sentence-level hallucination detection approaches using Large Language Models (LLMs) and semantic similarity within massively multilingual embeddings. |
KENZA BENKIRANE et. al. | arxiv-cs.CL | 2024-07-23 |
247 | Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Gender bias has been a focal point in the study of bias in machine translation and language models. |
YIJIE CHEN et. al. | arxiv-cs.CL | 2024-07-23 |
248 | Fine-grained Gender Control in Machine Translation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle controlled translation in a more realistic setting of inputs with multiple entities and propose Gender-of-Entity (GoE) prompting method for LLMs. |
Minwoo Lee; Hyukhun Koh; Minsung Kim; Kyomin Jung; | arxiv-cs.CL | 2024-07-21 |
249 | CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation Units Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With our dataset, CoVoSwitch, spanning 13 languages, we evaluate the code-switching translation performance of two multilingual translation models, M2M-100 418M and NLLB-200 600M. |
Yeeun Kang; | arxiv-cs.CL | 2024-07-19 |
250 | Towards Zero-Shot Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a method to bypass the need for fully supervised data to train MMT systems, using multimodal English data only. |
Matthieu Futeral; Cordelia Schmid; Benoît Sagot; Rachel Bawden; | arxiv-cs.CL | 2024-07-18 |
251 | Translate-and-Revise: Boosting Large Language Models for Constrained Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Imposing constraints on machine translation systems presents a challenging issue because these systems are not trained to make use of constraints in generating adequate, fluent translations. In this paper, we leverage the capabilities of large language models (LLMs) for constrained translation, given that LLMs can easily adapt to this task by taking translation instructions and constraints as prompts. |
PENGCHENG HUANG et. al. | arxiv-cs.CL | 2024-07-18 |
252 | Towards Chapter-to-Chapter Context-Aware Literary Translation Via Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through our comprehensive analysis, we unveil that literary translation under the Ch2Ch setting is challenging in nature, with respect to both model learning methods and translation decoding algorithms. |
Linghao Jin; Li An; Xuezhe Ma; | arxiv-cs.CL | 2024-07-12 |
253 | An Automatic Quality Metric for Evaluating Simultaneous Interpretation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an automatic evaluation metric for SI and SiMT focusing on word order synchronization. |
Mana Makinae; Katsuhito Sudoh; Masaru Yamada; Satoshi Nakamura; | arxiv-cs.CL | 2024-07-09 |
254 | Segment-Based Interactive Machine Translation for Pre-trained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Pre-trained large language models (LLM) are starting to be widely used in many applications. In this work, we explore the use of these models in interactive machine translation (IMT) environments. |
Angel Navarro; Francisco Casacuberta; | arxiv-cs.CL | 2024-07-09 |
255 | Identifying Intensity of The Structure and Content in Tweets and The Discriminative Power of Attributes in Context with Referential Translation Machines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We use referential translation machines (RTMs) to identify the similarity between an attribute and two words in English by casting the task as machine translation performance prediction (MTPP) between the words and the attribute word and the distance between their similarities for Task 10 with stacked RTM models. |
Ergun Biçici; | arxiv-cs.CL | 2024-07-06 |
256 | Enhancing Language Learning Through Technology: Introducing A New English-Azerbaijani (Arabic Script) Parallel Corpus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a pioneering English-Azerbaijani (Arabic Script) parallel corpus, designed to bridge the technological gap in language learning and machine translation (MT) for under-resourced languages. |
JALIL NOURMOHAMMADI KHIARAK et. al. | arxiv-cs.CL | 2024-07-06 |
257 | Toucan: Many-to-Many Translation for 150 African Language Pairs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims to advance the field of NLP, fostering cross-cultural understanding and knowledge exchange, particularly in regions with limited language resources such as Africa. |
AbdelRahim Elmadany; Ife Adebara; Muhammad Abdul-Mageed; | arxiv-cs.CL | 2024-07-05 |
258 | Low Resource Twi-English Parallel Corpus for Machine Translation in Multiple Domains (Twi-2-ENG) Related Papers Related Patents Related Grants Related Venues Related Experts View |
EMMANUEL AGYEI et. al. | Discov. Comput. | 2024-07-05 |
259 | Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluated three publicly available end-to-end models: Whisper, OWSM 3.1, and SeamlessM4T. |
Tiia Sildam; Andra Velve; Tanel Alumäe; | arxiv-cs.CL | 2024-07-04 |
260 | A Case Study on Context-Aware Neural Machine Translation with Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct experiments on cascade MTL architecture, which consists of one encoder and two decoders. |
Ramakrishna Appicharla; Baban Gain; Santanu Pal; Asif Ekbal; Pushpak Bhattacharyya; | arxiv-cs.CL | 2024-07-03 |
261 | Language Portability Strategies for Open-domain Dialogue with Pre-trained Language Models from High to Low Resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a study of linguistic portability strategies of large pre-trained language models (PLMs) used for open-domain dialogue systems in a high-resource language for this task. |
Ahmed Njifenjou; Virgile Sucal; Bassam Jabaian; Fabrice Lefèvre; | arxiv-cs.CL | 2024-07-01 |
262 | Language-agnostic Zero-Shot Machine Translation with Language-specific Modeling Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Zero-shot translation plays a key role in the multilingual Neural Machine Translation (NMT) domain, allowing multilingual systems to translate language pairs unseen in training. … |
Xiao Chen; Chirui Zhang; | 2024 International Joint Conference on Neural Networks … | 2024-06-30 |
263 | Document-Level Machine Translation with Effective Batch-Level Context Representation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: It is critical to provide inter-sentential context for document-level neural machine translation (DocNMT) to achieve higher-quality translations. As the document-level information … |
Kang Zhong; Jie Zhang; Wu Guo; | 2024 International Joint Conference on Neural Networks … | 2024-06-30 |
264 | Towards Massive Multilingual Holistic Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the current landscape of automatic language generation, there is a need to understand, evaluate, and mitigate demographic biases as existing models are becoming increasingly multilingual. To address this, we present the initial eight languages from the MASSIVE MULTILINGUAL HOLISTICBIAS (MMHB) dataset and benchmark consisting of approximately 6 million sentences representing 13 demographic axes. |
XIAOQING ELLEN TAN et. al. | arxiv-cs.CL | 2024-06-29 |
265 | Less Is More: Accurate Speech Recognition & Translation Without Web-Scale Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that state-of-the art accuracy can be reached without relying on web-scale data. |
KRISHNA C. PUVVADA et. al. | arxiv-cs.CL | 2024-06-28 |
266 | Sparse Regression for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce \textit{dice} instance selection method for proper selection of training instances, which plays an important role to learn correct feature mappings for improving the source and target coverage of the training set. |
Ergun Biçici; | arxiv-cs.CL | 2024-06-27 |
267 | XTower: A Multilingual LLM for Explaining and Correcting Translation Errors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces xTower, an open large language model (LLM) built on top of TowerBase designed to provide free-text explanations for translation errors in order to guide the generation of a corrected translation. |
MARCOS TREVISO et. al. | arxiv-cs.CL | 2024-06-27 |
268 | FFN: A Fine-grained Chinese-English Financial Domain Parallel Corpus Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For comparison, we also trained an OpenNMT model based on our dataset. We detail problems of LLMs and provide in-depth analysis, intending to stimulate further research and solutions in this largely uncharted territory. |
Yuxin Fu; Shijing Si; Leyi Mai; Xi-ang Li; | arxiv-cs.CL | 2024-06-26 |
269 | ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the widespread increase in the phenomenon of code-switching between Egyptian Arabic and English in recent times, this paper explores the intricacies of machine translation (MT) and automatic speech recognition (ASR) systems, focusing on translating code-switched Egyptian Arabic-English to either English or Egyptian Arabic. Our goal is to present the methodologies employed in developing these systems, utilizing large language models such as LLama and Gemma. |
Ahmed Heakl; Youssef Zaghloul; Mennatullah Ali; Rania Hossam; Walid Gomaa; | arxiv-cs.CL | 2024-06-26 |
270 | Document Image Machine Translation with Dynamic Multi-pre-trained Models Assembling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend the current TIMT task and propose a novel task, **D**ocument **I**mage **M**achine **T**ranslation to **Markdown** (**DIMT2Markdown**), which aims to translate a source document image with long context and complex layout structure to markdown-formatted target translation. |
YUPU LIANG et. al. | naacl | 2024-06-20 |
271 | Do Multilingual Language Models Think Better in English? IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a new approach called self-translate that leverages the few-shot translation capabilities of multilingual language models. |
Julen Etxaniz; Gorka Azkune; Aitor Soroa; Oier Lacalle; Mikel Artetxe; | naacl | 2024-06-20 |
272 | Contextual Label Projection for Cross-Lingual Structured Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel label projection approach, CLaP, which translates text to the target language and performs *contextual translation* on the labels using the translated text as the context, ensuring better accuracy for the translated labels. |
Tanmay Parekh; I-Hung Hsu; Kuan-Hao Huang; Kai-Wei Chang; Nanyun Peng; | naacl | 2024-06-20 |
273 | M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This complexity is particularly evident in widely used PDF documents, which represent information visually. This paper addresses this gap by introducing M3T a novel benchmark dataset tailored to evaluate NMT systems on the comprehensive task of translating semi-structured documents. |
BENJAMIN HSU et. al. | naacl | 2024-06-20 |
274 | An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we conduct empirical studies on intra-modal and cross-modal consistency and propose two training strategies, SimRegCR and SimZeroCR, for E2E ST in regular and zero-shot scenarios. |
Pengzhi Gao; Ruiqing Zhang; Zhongjun He; Hua Wu; Haifeng Wang; | naacl | 2024-06-20 |
275 | Fixing Rogue Memorization in Many-to-One Multilingual Translators of Extremely-Low-Resource Languages By Rephrasing Training Samples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we study the fine-tuning of pre-trained large high-resource language models (LLMs) into many-to-one multilingual machine translators for extremely-low-resource languages such as endangered Indigenous languages. |
Paulo Cavalin; Pedro Henrique Domingues; Claudio Pinhanez; Julio Nogima; | naacl | 2024-06-20 |
276 | Grammar-based Data Augmentation for Low-Resource Languages: The Case of Guarani-Spanish Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One of the main problems low-resource languages face in NLP can be pictured as a vicious circle: data is needed to build and test tools, but the available text is scarce and there are not powerful tools to collect it. In order to break this circle for Guarani, we explore if text automatically generated from a grammar can work as a Data Augmentation technique to boost the performance of Guarani-Spanish Machine Translation (MT) systems. |
AGUST�N LUCAS et. al. | naacl | 2024-06-20 |
277 | Complexity of Symbolic Representation in Working Memory of Transformer Correlates with The Complexity of A Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the properties of the content of symbolic working memory added to the Transformer model decoder. |
Alsu Sagirova; Mikhail Burtsev; | arxiv-cs.CL | 2024-06-20 |
278 | Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Surprisingly, our initial experiments found that fine-tuning with Q-LoRA for translation purposes led to performance improvements in terms of BLEU but degradation in COMET compared to in-context learning. To overcome this, we propose an alternative approach: adapting LLMs as Automatic Post-Editors (APE) rather than direct translators. |
Sai Koneru; Miriam Exel; Matthias Huck; Jan Niehues; | naacl | 2024-06-20 |
279 | How Effective Is Multi-source Pivoting for Translation of Low Resource Indian Languages? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Taking the case of English to Indian language MT, this paper explores the ‘multi-source translation’ approach with pivoting, using both source and pivot sentences to improve translation. |
Pranav Gaikwad; Meet Doshi; Raj Dabre; Pushpak Bhattacharyya; | arxiv-cs.CL | 2024-06-19 |
280 | Evaluating Structural Generalization in Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this question, we construct SGET, a machine translation dataset covering various types of compositional generalization with control of words and sentence structures. We evaluate neural machine translation models on SGET and show that they struggle more in structural generalization than in lexical generalization. |
Ryoma Kumon; Daiki Matsuoka; Hitomi Yanaka; | arxiv-cs.CL | 2024-06-19 |
281 | Does Context Help Mitigate Gender Bias in Neural Machine Translation? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Context-aware models have been previously suggested as a means to mitigate this type of bias. In this work, we examine this claim by analysing in detail the translation of stereotypical professions in English to German, and translation with non-informative context in Basque to Spanish. |
Harritxu Gete; Thierry Etchegoyhen; | arxiv-cs.CL | 2024-06-18 |
282 | Leveraging Statistical Machine Translation for Code Search Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine Translation (MT) has numerous applications in Software Engineering (SE). Recently, it has been employed not only for programming language translation but also as an oracle … |
Hung Phan; Ali Jannesari; | Proceedings of the 28th International Conference on … | 2024-06-18 |
283 | Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we get the best both worlds by integrating MT encoders directly into LLM backbones via sample-efficient self-distillation. |
Fabian David Schmidt; Philipp Borchert; Ivan Vulić; Goran Glavaš; | arxiv-cs.CL | 2024-06-18 |
284 | LiLiuM: EBay’s Large Language Models for E-commerce Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the LiLiuM series of large language models (LLMs): 1B, 7B, and 13B parameter models developed 100% in-house to fit eBay’s specific needs in the e-commerce domain. |
CHRISTIAN HEROLD et. al. | arxiv-cs.CL | 2024-06-17 |
285 | Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Error Span Annotation (ESA), a human evaluation protocol which combines the continuous rating of DA with the high-level error severity span marking of MQM. |
TOM KOCMI et. al. | arxiv-cs.CL | 2024-06-17 |
286 | CoSTA: Code-Switched Speech Translation Using Aligned Speech-Text Interleaving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on the problem of spoken translation (ST) of code-switched speech in Indian languages to English text. |
Bhavani Shankar; Preethi Jyothi; Pushpak Bhattacharyya; | arxiv-cs.CL | 2024-06-16 |
287 | Datasets for Multilingual Answer Sentence Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce new high-quality datasets for AS2 in five European languages (French, German, Italian, Portuguese, and Spanish), obtained through supervised Automatic Machine Translation (AMT) of existing English AS2 datasets such as ASNQ, WikiQA, and TREC-QA using a Large Language Model (LLM). |
Matteo Gabburo; Stefano Campese; Federico Agostini; Alessandro Moschitti; | arxiv-cs.CL | 2024-06-14 |
288 | Word Order in English-Japanese Simultaneous Interpretation: Analyses and Evaluation Using Chunk-wise Monotonic Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper analyzes the features of monotonic translations, which follow the word order of the source language, in simultaneous interpreting (SI). |
Kosuke Doi; Yuka Ko; Mana Makinae; Katsuhito Sudoh; Satoshi Nakamura; | arxiv-cs.CL | 2024-06-13 |
289 | Towards Multilingual Audio-Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we work towards extending Audio-Visual Question Answering (AVQA) to multilingual settings. |
ORCHID CHETIA PHUKAN et. al. | arxiv-cs.LG | 2024-06-13 |
290 | M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This complexity is particularly evident in widely used PDF documents, which represent information visually. This paper addresses this gap by introducing M3T, a novel benchmark dataset tailored to evaluate NMT systems on the comprehensive task of translating semi-structured documents. |
BENJAMIN HSU et. al. | arxiv-cs.CL | 2024-06-12 |
291 | Guiding In-Context Learning of LLMs Through Quality Estimation for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel methodology for in-context learning (ICL) that relies on a search algorithm guided by domain-specific quality estimation (QE). |
Javad Pourmostafa Roshan Sharami; Dimitar Shterionov; Pieter Spronck; | arxiv-cs.CL | 2024-06-12 |
292 | DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation Through Dual Learning Feedback Mechanisms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing self-reflection methods lack effective feedback information, limiting the translation performance. To address this, we introduce a DUAL-REFLECT framework, leveraging the dual learning of translation tasks to provide effective feedback, thereby enhancing the models’ self-reflective abilities and improving translation performance. |
ANDONG CHEN et. al. | arxiv-cs.CL | 2024-06-11 |
293 | Building Bridges: A Dataset for Evaluating Gender-Fair Machine Translation Into German Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address this research gap by studying gender-fair language in English-to-German MT. Concretely, we enrich a community-created gender-fair language dictionary and sample multi-sentence test instances from encyclopedic text and parliamentary speeches. |
Manuel Lardelli; Giuseppe Attanasio; Anne Lauscher; | arxiv-cs.CL | 2024-06-10 |
294 | Recovering Document Annotations for Sentence-level Bitext Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we reconstruct document-level information for three (ParaCrawl, News Commentary, and Europarl) large datasets in German, French, Spanish, Italian, Polish, and Portuguese (paired with English). |
Rachel Wicks; Matt Post; Philipp Koehn; | arxiv-cs.CL | 2024-06-06 |
295 | StatBot.Swiss: Bilingual Open Data Exploration in Natural Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we release the StatBot.Swiss dataset, the first bilingual benchmark for evaluating Text-to-SQL systems based on real-world applications. |
FARHAD NOORALAHZADEH et. al. | arxiv-cs.CL | 2024-06-05 |
296 | What Is The Best Way for ChatGPT to Translate Poetry? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite promising outcomes, our analysis reveals persistent issues in the translations generated by ChatGPT that warrant attention. To address these shortcomings, we propose an Explanation-Assisted Poetry Machine Translation (EAPMT) method, which leverages monolingual poetry explanation as a guiding information for the translation process. |
Shanshan Wang; Derek F. Wong; Jingming Yao; Lidia S. Chao; | arxiv-cs.CL | 2024-06-05 |
297 | Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that these artifacts can significantly affect the models, confirmed by extensive experiments across diverse models, languages, and translation processes. In light of this, we present a simple data augmentation strategy that can alleviate the adverse impacts of translation artifacts. |
CHAEHUN PARK et. al. | arxiv-cs.CL | 2024-06-04 |
298 | How Multilingual Are Large Language Models Fine-Tuned for Translation? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How does translation fine-tuning impact the MT capabilities of LLMs for zero-shot languages, zero-shot language pairs, and translation tasks that do not involve English? To address these questions, we conduct an extensive empirical evaluation of the translation quality of the TOWER family of language models (Alves et al., 2024) on 132 translation tasks from the multi-parallel FLORES-200 data. |
Aquia Richburg; Marine Carpuat; | arxiv-cs.CL | 2024-05-30 |
299 | Significance of Chain of Thought in Gender Bias Mitigation for English-Dravidian Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper examines gender bias in machine translation systems for languages such as Telugu and Kan- nada from the Dravidian family, analyzing how gender inflections affect translation accuracy and neutrality using Google Translate and Chat- GPT. |
Lavanya Prahallad; Radhika Mamidi; | arxiv-cs.CL | 2024-05-30 |
300 | Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new data pruning technique: Checkpoints Across Time (CAT), that leverages early model training dynamics to identify the most relevant data points for model performance. |
EVERLYN ASIKO CHIMOTO et. al. | arxiv-cs.CL | 2024-05-29 |
301 | QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the problem of sampling a set of high-quality and diverse translations. |
GONÇALO R. A. FARIA et. al. | arxiv-cs.CL | 2024-05-28 |
302 | Spanish and LLM Benchmarks: Is MMLU Lost in Translation? Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The evaluation of Large Language Models (LLMs) is a key element in their continuous improvement process and many benchmarks have been developed to assess the performance of LLMs … |
IRENE PLAZA et. al. | ArXiv | 2024-05-28 |
303 | TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion yet facilitates end-to-end inference through joint probability. |
CHENYANG LE et. al. | arxiv-cs.CL | 2024-05-28 |
304 | Optimizing Example Selection for Retrieval-augmented Machine Translation with Translation Memories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We aim to improve the upstream retrieval step and consider a fixed downstream edit-based model: the multi-Levenshtein Transformer. |
Maxime Bouthors; Josep Crego; François Yvon; | arxiv-cs.CL | 2024-05-23 |
305 | Improving Language Models Trained on Translated Data with Continual Pre-Training and Dictionary Learning Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the role of translation and synthetic data in training language models. |
Sabri Boughorbel; MD Rizwan Parvez; Majd Hawasly; | arxiv-cs.CL | 2024-05-23 |
306 | MELD-ST: An Emotion-aware Speech Translation Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs. |
SIROU CHEN et. al. | arxiv-cs.CL | 2024-05-21 |
307 | A Survey on Multi-modal Machine Translation: Tasks, Methods and Challenges Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we begin by offering an exhaustive overview of 99 prior works, comprehensively summarizing representative studies from the perspectives of dominant models, datasets, and evaluation metrics. |
HUANGJUN SHEN et. al. | arxiv-cs.CL | 2024-05-21 |
308 | DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce DiffNorm, a diffusion-based normalization strategy that simplifies data distributions for training NAT models. |
Weiting Tan; Jingyu Zhang; Lingfeng Shen; Daniel Khashabi; Philipp Koehn; | arxiv-cs.CL | 2024-05-21 |
309 | Beyond MLE: Investigating SEARNN for Low-Resourced Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With an average BLEU score improvement of $5.4$\% over the MLE objective, we proved that SEARNN is indeed a viable algorithm to effectively train RNNs on machine translation for low-resourced languages. |
Chris Emezue; | arxiv-cs.CL | 2024-05-20 |
310 | (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce TransAgents, a novel multi-agent framework that simulates the roles and collaborative practices of a human translation company, including a CEO, Senior Editor, Junior Editor, Translator, Localization Specialist, and Proofreader. |
MINGHAO WU et. al. | arxiv-cs.CL | 2024-05-20 |
311 | Neural Machine Translation for Low-Resource Languages from A Chinese-centric Perspective: A Survey Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine translation–the automatic transformation of one natural language (source language) into another (target language) through computational means–occupies a central role in … |
JINYI ZHANG et. al. | ACM Transactions on Asian and Low-Resource Language … | 2024-05-16 |
312 | Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, they still fall short in using gender-inclusive language, also representative of non-binary identities. In this paper, we look at gender-inclusive neomorphemes, neologistic elements that avoid binary gender markings as an approach towards fairer MT. In this direction, we explore prompting techniques with large language models (LLMs) to translate from English into Italian using neomorphemes. |
Andrea Piergentili; Beatrice Savoldi; Matteo Negri; Luisa Bentivogli; | arxiv-cs.CL | 2024-05-14 |
313 | LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new paradigm for machine translation that is particularly useful for no-resource languages (those without any publicly available bilingual or monolingual corpora): LLM-RBMT (LLM-Assisted Rule Based Machine Translation). |
Jared Coleman; Bhaskar Krishnamachari; Khalil Iskarous; Ruben Rosales; | arxiv-cs.CL | 2024-05-14 |
314 | An Empirical Study on The Robustness of Massively Multilingual Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we empirically investigate the translation robustness of Indonesian-Chinese translation in the face of various naturally occurring noise. |
Leiyu Pan; Deyi Xiong; | arxiv-cs.CL | 2024-05-13 |
315 | CANTONMT: Investigating Back-Translation and Model-Switch Mechanisms for Cantonese-English Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper investigates the development and evaluation of machine translation models from Cantonese to English, where we propose a novel approach to tackle low-resource language translations. |
Kung Yin Hong; Lifeng Han; Riza Batista-Navarro; Goran Nenadic; | arxiv-cs.CL | 2024-05-13 |
316 | Using Machine Translation to Augment Multilingual Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we explore the effects of using machine translation to fine-tune a multilingual model for a classification task across multiple languages. |
Adam King; | arxiv-cs.CL | 2024-05-08 |
317 | Relay Decoding: Concatenating Large Language Models for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When it is challenging to find large models that support the desired languages, resorting to continuous learning methods becomes a costly endeavor. To mitigate these expenses, we propose an innovative approach called RD (Relay Decoding), which entails concatenating two distinct large models that individually support the source and target languages. |
CHENGPENG FU et. al. | arxiv-cs.CL | 2024-05-05 |
318 | Sentiment Analysis Across Languages: Evaluation Before and After Machine Translation to English Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper examines the performance of transformer models in Sentiment Analysis tasks across multilingual datasets and text that has undergone machine translation. |
Aekansh Kathunia; Mohammad Kaif; Nalin Arora; N Narotam; | arxiv-cs.CL | 2024-05-05 |
319 | The IgboAPI Dataset: Empowering Igbo Language Technologies Through Multi-dialectal Enrichment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we present the IgboAPI dataset, a multi-dialectal Igbo-English dictionary dataset, developed with the aim of enhancing the representation of Igbo dialects. |
CHRIS CHINENYE EMEZUE et. al. | arxiv-cs.CL | 2024-05-02 |
320 | E-learning Application in English Writing Classroom Based on Neural Machine Translation and Semantic Analysis Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View |
Yaqiu Wang; | Entertain. Comput. | 2024-05-01 |
321 | Context-Aware Machine Translation with Source Coreference Explanation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This can lead to the explain-away effect, wherein the models only consider features easier to explain predictions, resulting in inaccurate translations. To address this issue, we propose a model that explains the decisions made for translation by predicting coreference features in the input. |
Huy Hien Vu; Hidetaka Kamigaito; Taro Watanabe; | arxiv-cs.CL | 2024-04-30 |
322 | Suvach — Generated Hindi QA Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new benchmark specifically designed for evaluating Hindi EQA models and discusses the methodology to do the same for any task. |
Vaishak Narayanan; Prabin Raj KP; Saifudheen Nouphal; | arxiv-cs.CL | 2024-04-30 |
323 | 3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Multimodal machine translation (MMT) is a challenging task that seeks to improve translation quality by incorporating visual information. However, recent studies have indicated … |
XINYU MA et. al. | International Conference on Language Resources and … | 2024-04-29 |
324 | Quality Estimation with $k$-nearest Neighbors and Automatic Evaluation for Model-specific Quality Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a model-specific, unsupervised QE approach, termed $k$NN-QE, that extracts information from the MT model’s training data using $k$-nearest neighbors. |
Tu Anh Dinh; Tobias Palzer; Jan Niehues; | arxiv-cs.CL | 2024-04-27 |
325 | Translation of Multifaceted Data Without Re-Training of Machine Translation Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we argue that this practice often overlooks the interrelation between components within the same data point. To address this limitation, we propose a novel MT pipeline that considers the intra-data relation in implementing MT for training data. |
HYEONSEOK MOON et. al. | arxiv-cs.CL | 2024-04-24 |
326 | The Impact of Multilinguality and Tokenization on Statistical Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multilingual neural machine translation systems has achieved state-of-the-art results on translation quality, especially for low-resource languages, yet statistical machine … |
Alidar Asvarov; Andrey Grabovoy; | 2024 35th Conference of Open Innovations Association (FRUCT) | 2024-04-24 |
327 | Setting Up The Data Printer with Improved English to Ukrainian Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Examples of task performance expressed in English are abundant, so with a high-quality translation system our community will be enabled to curate datasets faster. To aid this goal, we introduce a recipe to build a translation system using supervised finetuning of a large pretrained language model with a noisy parallel dataset of 3M pairs of Ukrainian and English sentences followed by a second phase of training using 17K examples selected by k-fold perplexity filtering on another dataset of higher quality. |
Yurii Paniv; Dmytro Chaplynskyi; Nikita Trynus; Volodymyr Kyrylov; | arxiv-cs.CL | 2024-04-23 |
328 | Automated Multi-Language to English Machine Translation Using Generative Pre-Trained Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we examine using local Generative Pretrained Transformer (GPT) models to perform automated zero shot black-box, sentence wise, multi-natural-language translation into English text. |
Elijah Pelofske; Vincent Urias; Lorie M. Liebrock; | arxiv-cs.CL | 2024-04-22 |
329 | From LLM to NMT: Advancing Low-Resource Machine Translation with Claude IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that Claude 3 Opus, a large language model (LLM) released by Anthropic in March 2024, exhibits stronger machine translation competence than other LLMs. |
Maxim Enis; Mark Hopkins; | arxiv-cs.CL | 2024-04-21 |
330 | End-to-End Speech Translation with Mutual Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we find that triple-task MTL (ST+MT+ASR) suffers from a knowledge transfer limitation that leads to performance stagnation compared with dual-task MTL (ST+MT or ST+ASR). |
H. Wang; Z. Xue; Y. Lei; D. Xiong; | icassp | 2024-04-15 |
331 | Vector Quantization Knowledge Transfer for End-to-End Text Image Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better align TIMT features with MT semantic features, we propose a novel Vector Quantization Knowledge Transfer (VQKT) method that employs a trainable codebook to quantize continuous features into discrete space. |
C. Ma; Y. Zhang; Y. Zhao; Y. Zhou; C. Zong; | icassp | 2024-04-15 |
332 | M2BART: Multilingual and Multimodal Encoder-Decoder Pre-Training for Any-to-Any Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces M2BART, a streamlined multilingual and multimodal framework for encoderdecoder models. |
P. -J. Chen; B. Shi; K. Niu; A. Lee; W. -N. Hsu; | icassp | 2024-04-15 |
333 | Memory-Augmented Speech-to-text Translation with Multi-Scale Context Translation Strategy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose memory-augmented speech-to-text translation, which leverages a memory module to perform context-aware translation. |
Y. Yuan; Y. Zhou; X. Shi; | icassp | 2024-04-15 |
334 | End-to-End Speech Translation with Mutual Knowledge Distillation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multi-task learning (MTL) is widely used to improve end-to-end speech translation (ST), which implicitly transfer knowledge from auxiliary automatic speech recognition (ASR) … |
Hao Wang; Zhengshan Xue; Yikun Lei; Deyi Xiong; | ICASSP 2024 – 2024 IEEE International Conference on … | 2024-04-14 |
335 | Vector Quantization Knowledge Transfer for End-to-End Text Image Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: End-to-end text image machine translation (TIMT) aims at translating source language embedded in images into target language without recognizing intermediate texts in images. … |
Cong Ma; Yaping Zhang; Yang Zhao; Yu Zhou; Chengqing Zong; | ICASSP 2024 – 2024 IEEE International Conference on … | 2024-04-14 |
336 | Multilingual Evaluation of Semantic Textual Relatedness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims to not only showcase our achievements but also inspire further research in multilingual STR, particularly for low-resourced languages. |
SHARVI ENDAIT et. al. | arxiv-cs.CL | 2024-04-13 |
337 | Understanding Machine Translation Fit for Language Learning: The Mediating Effect of Machine Translation Literacy Related Papers Related Patents Related Grants Related Venues Related Experts View |
Yanxia Yang; | Educ. Inf. Technol. | 2024-04-13 |
338 | Investigating Neural Machine Translation for Low-Resource Languages: Using Bavarian As A Case Study Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit state-of-the-art Neural Machine Translation techniques to develop automatic translation systems between German and Bavarian. |
Wan-Hua Her; Udo Kruschwitz; | arxiv-cs.CL | 2024-04-12 |
339 | Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Machine Translation (MT) remains one of the last NLP tasks where large language models (LLMs) have not yet replaced dedicated supervised systems. This work exploits the complementary strengths of LLMs and supervised MT by guiding LLMs to automatically post-edit MT with external feedback on its quality, derived from Multidimensional Quality Metric (MQM) annotations. |
Dayeon Ki; Marine Carpuat; | arxiv-cs.CL | 2024-04-11 |
340 | Exploring The Necessity of Visual Modality in Multimodal Machine Translation Using Authentic Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we adhere to the universal multimodal machine translation framework proposed by Tang et al. (2022). |
ZI LONG et. al. | arxiv-cs.CL | 2024-04-09 |
341 | Low-Resource Machine Translation Through Retrieval-Augmented LLM Prompting: A Study on The Mambai Language Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study explores the use of large language models (LLMs) for translating English into Mambai, a low-resource Austronesian language spoken in Timor-Leste, with approximately 200,000 native speakers. |
Raphaël Merx; Aso Mahmudi; Katrina Langford; Leo Alberto de Araujo; Ekaterina Vylomova; | arxiv-cs.CL | 2024-04-07 |
342 | MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR), on Track C: Cross-lingual. |
Shijia Zhou; Huangyan Shan; Barbara Plank; Robert Litschko; | arxiv-cs.CL | 2024-04-03 |
343 | Towards Better Understanding of Cybercrime: The Role of Fine-Tuned LLMs in Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose using fine-tuned Large Language Models (LLM) to generate translations that can accurately capture the nuances of cybercrime language. |
Veronica Valeros; Anna Širokova; Carlos Catania; Sebastian Garcia; | arxiv-cs.CL | 2024-04-02 |
344 | Low-resource Neural Machine Translation with Morphological Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a framework-solution for modeling complex morphology in low-resource settings. |
Antoine Nzeyimana; | arxiv-cs.CL | 2024-04-02 |
345 | Improving Vietnamese-English Medical Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce MedEV — a high-quality Vietnamese-English parallel dataset constructed specifically for the medical domain, comprising approximately 360K sentence pairs. |
Nhu Vo; Dat Quoc Nguyen; Dung D. Le; Massimo Piccardi; Wray Buntine; | arxiv-cs.CL | 2024-03-28 |
346 | A Tulu Resource for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the first parallel dataset for English-Tulu translation. |
Manu Narayanan; Noëmi Aepli; | arxiv-cs.CL | 2024-03-28 |
347 | KazParC: Kazakh Parallel Corpus for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce KazParC, a parallel corpus designed for machine translation across Kazakh, English, Russian, and Turkish. |
Rustem Yeshpanov; Alina Polonskaya; Huseyin Atakan Varol; | arxiv-cs.CL | 2024-03-28 |
348 | Going Beyond Word Matching: Syntax Improves In-context Example Selection for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a syntax-based in-context example selection method for MT, by computing the syntactic similarity between dependency trees using Polynomial Distance. |
Chenming Tang; Zhixiang Wang; Yunfang Wu; | arxiv-cs.CL | 2024-03-28 |
349 | M3P: Towards Multimodal Multilingual Translation with Multimodal Prompt Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a framework to leverage the multimodal prompt to guide the Multimodal Multilingual neural Machine Translation (m3P), which aligns the representations of different languages with the same meaning and generates the conditional vision-language memory for translation. |
JIAN YANG et. al. | arxiv-cs.CL | 2024-03-26 |
350 | The Impact of Syntactic and Semantic Proximity on Machine Translation with Back-Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Theoretically, however, the method should not work in general. We therefore conduct controlled experiments with artificial languages to determine what properties of languages make back-translation an effective training method, covering lexical, syntactic, and semantic properties. |
Nicolas Guerin; Shane Steinert-Threlkeld; Emmanuel Chemla; | arxiv-cs.CL | 2024-03-26 |
351 | Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle the problem of code-mixed (Hinglish and Bengalish) to English machine translation. |
Kartik Kartik; Sanjana Soni; Anoop Kunchukuttan; Tanmoy Chakraborty; Md Shad Akhtar; | arxiv-cs.CL | 2024-03-25 |
352 | Prediction of Translation Techniques for The Translation Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, the process of human-generated translation relies on a wide range of translation techniques, which are crucial for ensuring linguistic adequacy and fluency. This study suggests that these translation techniques could further optimize machine translation if they are automatically identified before being applied to guide the translation process effectively. |
Fan Zhou; Vincent Vandeghinste; | arxiv-cs.CL | 2024-03-21 |
353 | Multi-Dimensional Machine Translation Evaluation: Model Evaluation and Resource for Korean Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous studies have demonstrated the feasibility of MQM annotation but there are, to our knowledge, no computational models that predict MQM scores for novel texts, due to a lack of resources. In this paper, we address these shortcomings by (a) providing a 1200-sentence MQM evaluation benchmark for the language pair English-Korean and (b) reframing MT evaluation as the multi-task problem of simultaneously predicting several MQM scores using SOTA language models, both in a reference-based MT evaluation setup and a reference-free quality estimation (QE) setup. |
Dojun Park; Sebastian Padó; | arxiv-cs.CL | 2024-03-19 |
354 | Enhancing Taiwanese Hokkien Dual Translation By Exploring and Standardizing of Four Writing Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Machine translation focuses mainly on high-resource languages (HRLs), while low-resource languages (LRLs) like Taiwanese Hokkien are relatively under-explored. The study aims to address this gap by developing a dual translation model between Taiwanese Hokkien and both Traditional Mandarin Chinese and English. |
Bo-Han Lu; Yi-Hsuan Lin; En-Shiun Annie Lee; Richard Tzong-Han Tsai; | arxiv-cs.CL | 2024-03-18 |
355 | CantonMT: Cantonese to English NMT Platform with Fine-Tuned Models Using Synthetic Back-Translation Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we deploy a standard data augmentation methodology by back-translation to a new language translation direction Cantonese-to-English. |
Kung Yin Hong; Lifeng Han; Riza Batista-Navarro; Goran Nenadic; | arxiv-cs.CL | 2024-03-17 |
356 | CantonMT: Cantonese to English NMT Platform with Fine-Tuned Models Using Real and Synthetic Back-Translation Data Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Neural Machine Translation (NMT) for low-resource languages remains a challenge for many NLP researchers. In this work, we deploy a standard data augmentation methodology by … |
Kung Yin Hong; Lifeng Han; R. Batista-Navarro; Goran Nenadic; | ArXiv | 2024-03-17 |
357 | A Novel Paradigm Boosting Translation Capabilities of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a study on strategies to enhance the translation capabilities of large language models (LLMs) in the context of machine translation (MT) tasks. |
JIAXIN GUO et. al. | arxiv-cs.CL | 2024-03-17 |
358 | To Label or Not to Label: Hybrid Active Learning for Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Both approaches have limitations – diversity methods may extract varied but trivial examples, while uncertainty sampling can yield repetitive, uninformative instances. To bridge this gap, we propose Hybrid Uncertainty and Diversity Sampling (HUDS), an AL strategy for domain adaptation in NMT that combines uncertainty and diversity for sentence selection. |
Abdul Hameed Azeemi; Ihsan Ayyub Qazi; Agha Ali Raza; | arxiv-cs.CL | 2024-03-14 |
359 | Scaling Behavior of Machine Translation with Large Language Models Under Prompt Injection Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Their generality, however, opens them up to subversion by end users who may embed into their requests instructions that cause the model to behave in unauthorized and possibly unsafe ways. In this work we study these Prompt Injection Attacks (PIAs) on multiple families of LLMs on a Machine Translation task, focusing on the effects of model size on the attack success rates. |
Zhifan Sun; Antonio Valerio Miceli-Barone; | arxiv-cs.CL | 2024-03-14 |
360 | Multilingual Neural Machine Translation for Indic to Indic Languages Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The method of translation from one language to another without human intervention is known as Machine Translation (MT). Multilingual neural machine translation (MNMT) is a … |
Sudhansu Bala Das; Divyajyoti Panda; Tapas Kumar Mishra; Bidyut Kr. Patra; Asif Ekbal; | ACM Transactions on Asian and Low-Resource Language … | 2024-03-12 |
361 | ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the mixture of multilingual data during the pre-training of LLM, the LLM-based translation models face the off-target issue in both prompt-based methods, including a series of phenomena, namely instruction misunderstanding, translation with wrong language and over-generation. For this issue, this paper introduces an \textbf{\underline{A}}uto-\textbf{\underline{C}}onstriction \textbf{\underline{T}}urning mechanism for \textbf{\underline{M}}ultilingual \textbf{\underline{N}}eural \textbf{\underline{M}}achine \textbf{\underline{T}}ranslation (\model), which is a novel supervised fine-tuning mechanism and orthogonal to the traditional prompt-based methods. |
Shaojie Dai; Xin Liu; Ping Luo; Yue Yu; | arxiv-cs.CL | 2024-03-11 |
362 | Consensus-Based Machine Translation for Code-Mixed Texts Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multilingualism in India is widespread due to its long history of foreign acquaintances. This leads to the presence of an audience familiar with conversing using more than one … |
S. Mahata; Dipankar Das; Sivaji Bandyopadhyay; | ACM Transactions on Asian and Low-Resource Language … | 2024-03-09 |
363 | Enhanced Auto Language Prediction with Dictionary Capsule — A Novel Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper presents a novel Auto Language Prediction Dictionary Capsule (ALPDC) framework for language prediction and machine translation. |
PINNI VENKATA ABHIRAM et. al. | arxiv-cs.CL | 2024-03-09 |
364 | Cross-lingual Transfer or Machine Translation? On Data Augmentation for Monolingual Semantic Textual Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we directly compared two data augmentation techniques as potential solutions for monolingual STS: (a) cross-lingual transfer that exploits English resources alone as training data to yield non-English sentence embeddings as zero-shot inference, and (b) machine translation that coverts English data into pseudo non-English training data in advance. |
Sho Hoshino; Akihiko Kato; Soichiro Murakami; Peinan Zhang; | arxiv-cs.CL | 2024-03-08 |
365 | Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an Indian English-to-Hindi SSMT system that can transfer stress and aim to enhance the overall quality and engagement of educational content. |
Sai Akarsh; Vamshi Raghusimha; Anindita Mondal; Anil Vuppala; | arxiv-cs.CL | 2024-03-06 |
366 | BiVert: Bidirectional Vocabulary Evaluation Using Relations for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a bidirectional semantic-based evaluation method designed to assess the sense distance of the translation from the source text. |
Carinne Cherf; Yuval Pinter; | arxiv-cs.CL | 2024-03-06 |
367 | GaHealth: An English-Irish Bilingual Corpus of Health Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our study outlines the process used in developing the corpus and empirically demonstrates the benefits of using an in-domain dataset for the health domain. |
Séamus Lankford; Haithem Afli; Órla Ní Loinsigh; Andy Way; | arxiv-cs.CL | 2024-03-06 |
368 | General2Specialized LLMs Translation for E-commerce Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Taking e-commerce as an example, the texts usually include amounts of domain-related words and have more grammar problems, which leads to inferior performances of current NMT methods. To address these problems, we collect two domain-related resources, including a set of term pairs (aligned Chinese-English bilingual terms) and a parallel corpus annotated for the e-commerce domain. |
KAIDI CHEN et. al. | arxiv-cs.CL | 2024-03-06 |
369 | GaHealth: An English–Irish Bilingual Corpus of Health Data Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine Translation is a mature technology for many high-resource language pairs. However in the context of low-resource languages, there is a paucity of parallel data datasets … |
Séamus Lankford; Haithem Afli; Orla Ni Loinsigh; Andy Way; | ArXiv | 2024-03-06 |
370 | Detecting Concrete Visual Tokens for Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce new methods for detection of visually and contextually relevant (concrete) tokens from source sentences, including detection with natural language processing (NLP), detection with object detection, and a joint detection-verification technique. |
Braeden Bowen; Vipin Vijayan; Scott Grigsby; Timothy Anderson; Jeremy Gwinnup; | arxiv-cs.CL | 2024-03-05 |
371 | Adding Multimodal Capabilities to A Text-only Translation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to perform well on both Multi30k and typical text-only datasets, we use a performant text-only machine translation (MT) model as the starting point of our MMT model. |
Vipin Vijayan; Braeden Bowen; Scott Grigsby; Timothy Anderson; Jeremy Gwinnup; | arxiv-cs.CL | 2024-03-05 |
372 | The Case for Evaluating Multimodal Translation Models on Text Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Namely, the use of visual information by the MMT model cannot be shown directly from the Multi30k test set results and the sentences in Multi30k are are image captions, i.e., short, descriptive sentences, as opposed to complex sentences that typical text-only machine translation models are evaluated against. Therefore, we propose that MMT models be evaluated using 1) the CoMMuTE evaluation framework, which measures the use of visual information by MMT models, 2) the text-only WMT news translation task test sets, which evaluates translation performance against complex sentences, and 3) the Multi30k test sets, for measuring MMT model performance against a real MMT dataset. |
Vipin Vijayan; Braeden Bowen; Scott Grigsby; Timothy Anderson; Jeremy Gwinnup; | arxiv-cs.CL | 2024-03-05 |
373 | Transformers for Low-Resource Languages: Is Féidir Linn! IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The Transformer model is the state-of-the-art in Machine Translation. However and in general and neural translation models often under perform on language pairs with insufficient … |
Séamus Lankford; H. Alfi; Andy Way; | Machine Translation Summit | 2024-03-04 |
374 | Machine Translation in The Covid Domain: An English-Irish Case Study for LoResMT 2021 IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Translation models for the specific domain of translating Covid data from English to Irish were developed for the LoResMT 2021 shared task. |
Séamus Lankford; Haithem Afli; Andy Way; | arxiv-cs.CL | 2024-03-02 |
375 | A Multitask Co-training Framework for Improving Speech Translation By Leveraging Speech Recognition and Machine Translation Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View |
Yue Zhou; Yuxuan Yuan; Xiaodong Shi; | Neural Comput. Appl. | 2024-02-27 |
376 | An Interpretable Error Correction Method for Enhancing Code-to-code Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, researchers frequently invest substantial time and computational resources in retraining models, yet the improvement in translation accuracy is quite limited. To address these issues, we introduce a novel approach, $k\text{NN-ECD}$, which combines $k$-nearest-neighbor search with a key-value error correction datastore to overwrite the wrong translations of TransCoder-ST. |
Min Xue; Artur Andrzejak; Marla Leuther; | iclr | 2024-02-26 |
377 | MT-Ranker: Reference-free Machine Translation Evaluation By Inter-system Ranking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formulate the reference-free MT evaluation into a pairwise ranking problem. |
Ibraheem Muhammad Moosa; Rui Zhang; Wenpeng Yin; | iclr | 2024-02-26 |
378 | A Benchmark for Learning to Translate A New Language from One Grammar Book IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We turn to a field that is explicitly motivated and bottlenecked by a scarcity of web data: low-resource languages. In this paper, we introduce MTOB (Machine Translation from One Book), a benchmark for learning to translate between English and Kalamang—a language with less than 200 speakers and therefore virtually no presence on the web—using several hundred pages of field linguistics reference materials. |
Garrett Tanzer; Mirac Suzgun; Eline Visser; Dan Jurafsky; Luke Melas-Kyriazi; | iclr | 2024-02-26 |
379 | TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Importantly, feeding back such error information into the LLMs can lead to self-refinement and result in improved translation performance. Motivated by these insights, we introduce a systematic LLM-based self-refinement translation framework, named \textbf{TEaR}, which stands for \textbf{T}ranslate, \textbf{E}stimate, \textbf{a}nd \textbf{R}efine, marking a significant step forward in this direction. |
ZHAOPENG FENG et. al. | arxiv-cs.CL | 2024-02-26 |
380 | A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. |
Haoran Xu; Young Jin Kim; Amr Sharaf; Hany Hassan Awadalla; | iclr | 2024-02-26 |
381 | TMT: Tri-Modal Translation Between Speech, Image, and Text By Processing Different Modalities As Different Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel Tri-Modal Translation (TMT) model that translates between arbitrary modalities spanning speech, image, and text. |
MINSU KIM et. al. | arxiv-cs.CL | 2024-02-25 |
382 | Direct Punjabi to English Speech Translation Using Discrete Units Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With a motive to contribute towards speech translation research for low-resource languages, our work presents a direct speech-to-speech translation model for one of the Indic languages called Punjabi to English. |
Prabhjot Kaur; L. Andrew M. Bush; Weisong Shi; | arxiv-cs.CL | 2024-02-24 |
383 | Could We Have Had Better Multilingual LLMs If English Was Not The Central Language? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large Language Models (LLMs) demonstrate strong machine translation capabilities on languages they are trained on. |
Ryandito Diandaru; Lucky Susanto; Zilu Tang; Ayu Purwarianti; Derry Wijaya; | arxiv-cs.CL | 2024-02-21 |
384 | GATE X-E : A Challenge Set for Gender-Fair Translations from Weakly-Gendered Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite numerous studies on gender bias in translations into English from weakly gendered-languages, there are no benchmarks for evaluating this phenomenon or for assessing mitigation strategies. To address this gap, we introduce GATE X-E, an extension to the GATE (Rarrick et al., 2023) corpus, that consists of human translations from Turkish, Hungarian, Finnish, and Persian into English. |
Spencer Rarrick; Ranjita Naik; Sundar Poudel; Vishal Chowdhary; | arxiv-cs.CL | 2024-02-21 |
385 | Bangla AI: A Framework for Machine Translation Utilizing Large Language Models for Ethnic Media Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper outlines a theoretical framework elucidating the integration of LLM and MMT into the news searching and translation processes for ethnic media. |
MD Ashraful Goni; Fahad Mostafa; Kerk F. Kee; | arxiv-cs.CL | 2024-02-21 |
386 | SiLLM: Large Language Models for Simultaneous Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose SiLLM, which delegates the two sub-tasks to separate agents, thereby incorporating LLM into SiMT. |
Shoutao Guo; Shaolei Zhang; Zhengrui Ma; Min Zhang; Yang Feng; | arxiv-cs.CL | 2024-02-20 |
387 | UMBCLU at SemEval-2024 Task 1A and 1C: Semantic Textual Relatedness with and Without Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large language models (LLMs) have shown impressive performance on several natural language understanding tasks such as multilingual machine translation (MMT), semantic similarity (STS), and encoding sentence embeddings. Using a combination of LLMs that perform well on these tasks, we developed two STR models, $\textit{TranSem}$ and $\textit{FineSem}$, for the supervised and cross-lingual settings. |
Shubhashis Roy Dipta; Sai Vallurupalli; | arxiv-cs.CL | 2024-02-20 |
388 | UMBCLU at SemEval-2024 Task 1: Semantic Textual Relatedness with and Without Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The aim of SemEval-2024 Task 1, “Semantic Textual Relatedness for African and Asian Languages” is to develop models for identifying semantic textual relatedness (STR) between two … |
Shubhashis Roy Dipta; Sai Vallurupalli; | Proceedings of the 18th International Workshop on Semantic … | 2024-02-20 |
389 | Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore leveraging reinforcement learning with human feedback (\textit{RLHF}) to improve translation quality. |
NUO XU et. al. | arxiv-cs.CL | 2024-02-18 |
390 | Rethinking Human-like Translation Strategy: Integrating Drift-Diffusion Model with Large Language Models for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, prior work on LLM-based machine translation has mainly focused on better utilizing training data, demonstrations, or pre-defined and universal knowledge to improve performance, with a lack of consideration of decision-making like human translators. In this paper, we incorporate Thinker with the Drift-Diffusion Model (Thinker-DDM) to address this issue. |
HONGBIN NA et. al. | arxiv-cs.CL | 2024-02-16 |
391 | A Study for Enhancing Low-resource Thai-Myanmar-English Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Several methodologies have recently been proposed to enhance the performance of low-resource Neural Machine Translation (NMT). However, these techniques have yet to be explored … |
Mya Ei San; Sasiporn Usanavasin; Ye Kyaw Thu; Manabu Okumura; | ACM Transactions on Asian and Low-Resource Language … | 2024-02-13 |
392 | Unsupervised Sign Language Translation and Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a sliding window method to address the issues of aligning variable-length text with video sequences. |
ZHENGSHENG GUO et. al. | arxiv-cs.CL | 2024-02-12 |
393 | TransLLaMa: LLM-based Simultaneous Translation System IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study demonstrates that, after fine-tuning on a small dataset comprising causally aligned source and target sentence pairs, a pre-trained open-source LLM can control input segmentation directly by generating a special wait token. |
Roman Koshkin; Katsuhito Sudoh; Satoshi Nakamura; | arxiv-cs.CL | 2024-02-07 |
394 | Error Analysis of Pretrained Language Models (PLMs) in English-to-Arabic Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View |
H. Al-Khalifa; Khaloud Al-Khalefah; Hesham Haroon; | Hum. Centric Intell. Syst. | 2024-02-05 |
395 | A Morphologically-Aware Dictionary-based Data Augmentation Technique for Machine Translation of Under-Represented Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose strategies to synthesize parallel data relying on morpho-syntactic information and using bilingual lexicons along with a small amount of seed parallel data. |
Md Mahfuz Ibn Alam; Sina Ahmadi; Antonios Anastasopoulos; | arxiv-cs.CL | 2024-02-02 |
396 | Neural Machine Translation for Malayalam Paraphrase Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study explores four methods of generating paraphrases in Malayalam, utilizing resources available for English paraphrasing and pre-trained Neural Machine Translation (NMT) models. |
Christeena Varghese; Sergey Koshelev; Ivan P. Yamshchikov; | arxiv-cs.CL | 2024-01-31 |
397 | Massively Multilingual Text Translation For Low-Resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We attempt to leverage translation resources from rich-resource languages to efficiently produce best possible translation quality for well known texts, which are available in multiple languages, in a new, low-resource language. |
Zhong Zhou; | arxiv-cs.CL | 2024-01-29 |
398 | MultiMUC: Multilingual Template Filling on MUC-4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MultiMUC, the first multilingual parallel corpus for template filling, comprising translations of the classic MUC-4 template filling benchmark into five languages: Arabic, Chinese, Farsi, Korean, and Russian. |
WILLIAM GANTT et. al. | arxiv-cs.CL | 2024-01-29 |
399 | Misgendering and Assuming Gender in Machine Translation When Working with Low-Resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This chapter focuses on gender-related errors in machine translation (MT) in the context of low-resource languages. |
Sourojit Ghosh; Srishti Chatterjee; | arxiv-cs.CL | 2024-01-23 |
400 | How Far Can 100 Samples Go? Unlocking Overall Zero-Shot Multilingual Translation Via Tiny Multi-Parallel Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that for an English-centric model, surprisingly large zero-shot improvements can be achieved by simply fine-tuning with a very small amount of multi-parallel data. |
Di Wu; Shaomu Tan; Yan Meng; David Stap; Christof Monz; | arxiv-cs.CL | 2024-01-22 |
401 | Gender Bias in Machine Translation and The Era of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This chapter examines the role of Machine Translation in perpetuating gender bias, highlighting the challenges posed by cross-linguistic settings and statistical dependencies. |
Eva Vanmassenhove; | arxiv-cs.CL | 2024-01-18 |
402 | Gradable ChatGPT Translation Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, this paper proposes a generic taxonomy, which defines gradable translation prompts in terms of expression type, translation style, POS information and explicit statement, thus facilitating the construction of prompts endowed with distinct attributes tailored for various translation tasks. |
Hui Jiao; Bei Peng; Lu Zong; Xiaojun Zhang; Xinwei Li; | arxiv-cs.CL | 2024-01-18 |
403 | Machine Translation with Large Language Models: Prompt Engineering for Persian, English, and Russian Directions Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Generative large language models (LLMs) have demonstrated exceptional proficiency in various natural language processing (NLP) tasks, including machine translation, question … |
Nooshin Pourkamali; Shler Ebrahim Sharifi; | ArXiv | 2024-01-16 |
404 | A Novel Approach for Automatic Program Repair Using Round-Trip Translation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back. |
Fernando Vallecillos Ruiz; Anastasiia Grishina; Max Hort; Leon Moonen; | arxiv-cs.SE | 2024-01-15 |
405 | An Approach for Mistranslation Removal from Popular Dataset for Indic MT Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, the MT systems built using this dataset cannot perform to their usual potential. In this paper, we propose an algorithm to remove mistranslations from the training corpus and evaluate its performance and efficiency. |
Sudhansu Bala Das; Leo Raphael Rodrigues; Tapas Kumar Mishra; Bidyut Kr. Patra; | arxiv-cs.CL | 2024-01-12 |
406 | Adapting Large Language Models for Document-Level Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our results show that specialized models can sometimes surpass GPT-4 in translation performance but still face issues like off-target translation due to error propagation in decoding. We provide an in-depth analysis of these LLMs tailored for DocMT, examining translation errors, discourse phenomena, strategies for training and inference, the data efficiency of parallel documents, recent test set evaluations, and zero-shot crosslingual transfer. |
Minghao Wu; Thuy-Trang Vu; Lizhen Qu; George Foster; Gholamreza Haffari; | arxiv-cs.CL | 2024-01-12 |
407 | Machine Translation Models Are Zero-Shot Detectors of Translation Direction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore an unsupervised approach to translation direction detection based on the simple hypothesis that $p(\text{translation}|\text{original})>p(\text{original}|\text{translation})$, motivated by the well-known simplification effect in translationese or machine-translationese. |
Michelle Wastl; Jannis Vamvas; Rico Sennrich; | arxiv-cs.CL | 2024-01-12 |
408 | Lost in The Source Language: How Large Language Models Evaluate The Quality of Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study investigates how Large Language Models (LLMs) leverage source and reference data in machine translation evaluation task, aiming to better understand the mechanisms behind their remarkable performance in this task. |
XU HUANG et. al. | arxiv-cs.CL | 2024-01-12 |
409 | Towards Boosting Many-to-Many Multilingual Machine Translation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on boosting many-to-many multilingual translation of LLMs with an emphasis on zero-shot translation directions. |
Pengzhi Gao; Zhongjun He; Hua Wu; Haifeng Wang; | arxiv-cs.CL | 2024-01-11 |
410 | End to End Hindi to English Speech Conversion Using Bark, MBART and A Finetuned XLSR Wav2Vec2 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech has long been a barrier to effective communication and connection, persisting as a challenge in our increasingly interconnected world. This research paper introduces a … |
Aniket Tathe; Anand Kamble; Suyash Kumbharkar; Atharva Bhandare; Anirban C. Mitra; | ArXiv | 2024-01-11 |
411 | Addressing Data Scarcity Issue for English-Mizo Neural Machine Translation Using Data Augmentation and Language Model Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Low-resource language in machine translation systems poses multiple complications regarding accuracy in translation due to insufficient incorporation of linguistic information. … |
Vanlalmuansangi Khenglawt; Sahinur Rahman Laskar; Partha Pakray; Ajoy Kumar Khan; | J. Intell. Fuzzy Syst. | 2024-01-11 |
412 | Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article introduces contrastive alignment instructions (AlignInstruct) to address two challenges in machine translation (MT) on large language models (LLMs). |
Zhuoyuan Mao; Yen Yu; | arxiv-cs.CL | 2024-01-11 |
413 | MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image Translation By Prompts Redescription and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make reconstruction explicit, we propose a prompt redescription strategy to realize a mirror effect between the source and reconstructed image in the diffusion model (MirrorDiffusion). |
Yupei Lin; Xiaoyu Xian; Yukai Shi; Liang Lin; | arxiv-cs.CV | 2024-01-06 |
414 | SCIR-MT’s Submission for WMT24 General Machine Translation Task Related Papers Related Patents Related Grants Related Venues Related Experts View |
Baohang Li; Zekai Ye; Yi-Chong Huang; Xiaocheng Feng; Bing Qin; | Conference on Machine Translation | 2024-01-01 |
415 | Domain Dynamics: Evaluating Large Language Models in English-Hindi Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in machine translation, leveraging extensive pre-training on vast amounts of data. However, this gener-alist … |
Soham Bhattacharjee; Baban Gain; Asif Ekbal; | Conference on Machine Translation | 2024-01-01 |
416 | Occiglot at WMT24: European Open-source Large Language Models Evaluated on Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This document describes the submission of the very first version of the Occiglot open-source large language model to the General MT Shared Task of the 9th Conference of Machine … |
ELEFTHERIOS AVRAMIDIS et. al. | Conference on Machine Translation | 2024-01-01 |
417 | Investigating The Linguistic Performance of Large Language Models in Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper summarizes the results of our test suite evaluation on 39 machine translation systems submitted at the Shared Task of the Ninth Conference of Machine Translation … |
SHUSHEN MANAKHIMOVA et. al. | Conference on Machine Translation | 2024-01-01 |
418 | CUNI at WMT24 General Translation Task: LLMs, (Q)LoRA, CPO and Model Merging Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the contributions of Charles University teams to the WMT24 General Translation task (English to Czech, German and Russian, and Czech to Ukrainian) and the … |
MIROSLAV HRABAL et. al. | Conference on Machine Translation | 2024-01-01 |
419 | NTTSU at WMT2024 General Translation Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The NTTSU team’s submission leverages several large language models developed through a training procedure that includes continual pre-training and supervised fine-tuning. For … |
MINATO KONDO et. al. | Conference on Machine Translation | 2024-01-01 |
420 | TSU HITS’s Submissions to The WMT 2024 General Machine Translation Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the TSU HITS team’s submission system for the WMT’24 general translation task. We focused on exploring the capabilities of discrete diffusion models for the … |
Vladimir Mynka; Nikolay Mikhaylovskiy; | Conference on Machine Translation | 2024-01-01 |
421 | Document-level Translation with LLM Reranking: Team-J at WMT 2024 General Translation Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We participated in the constrained track for English-Japanese and Japanese-Chinese translations at the WMT 2024 General Machine Translation Task. Our approach was to generate a … |
KEITO KUDO et. al. | Conference on Machine Translation | 2024-01-01 |
422 | From General LLM to Translation: How We Dramatically Improve Translation Quality Using Human Evaluation Data for LLM Finetuning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes Yandex submission to the WMT2024 General Translation Task. More specifically, we present a novel pipeline designed to build a strong paragraph-level … |
DENIS ELSHIN et. al. | Conference on Machine Translation | 2024-01-01 |
423 | UvA-MT’s Participation in The WMT24 General Translation Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Fine-tuning Large Language Models (FT-LLMs) with parallel data has emerged as a promising paradigm in recent machine translation research. In this paper, we explore the … |
Shaomu Tan; David Stap; Seth Aycock; C. Monz; Di Wu; | Conference on Machine Translation | 2024-01-01 |
424 | AlphaIntellect at SemEval-2024 Task 6: Detection of Hallucinations in Generated Text Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: One major issue in natural language generation (NLG) models is detecting hallucinations (semantically inaccurate outputs). This study investigates a hallucination detection system … |
Sohan Choudhury; Priyam Saha; Subharthi Ray; Shankha S. Das; Dipankar Das; | International Workshop on Semantic Evaluation | 2024-01-01 |
425 | Recent Advances in Interactive Machine Translation With Large Language Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper explores the role of Large Language Models (LLMs) in revolutionizing interactive Machine Translation (MT), providing a comprehensive analysis across nine innovative … |
YANSHU WANG et. al. | IEEE Access | 2024-01-01 |
426 | How Far Can 100 Samples Go? Unlocking Zero-Shot Translation with Tiny Multi-Parallel Data Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Zero-shot translation aims to translate between language pairs not seen during training in Multilingual Machine Translation (MMT) and is widely considered an open problem. A … |
Di Wu; Shaomu Tan; Yan Meng; David Stap; C. Monz; | Annual Meeting of the Association for Computational … | 2024-01-01 |
427 | BLASER 2.0: A Metric for Evaluation and Quality Estimation of Massively Multilingual Speech and Text Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present B LASER 2.0, an automatic metric of machine translation quality which supports both speech and text modalities. Compared to its predecessor B LASER (Chen et al., 2023), … |
David Dale; M. Costa-jussà; | Conference on Empirical Methods in Natural Language … | 2024-01-01 |
428 | Hidetsune at SemEval-2024 Task 4: An Application of Machine Learning to Multilingual Propagandistic Memes Identification Using Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this system paper for SemEval-2024 Task4 subtask 2b, I present my approach to identifying propagandistic memes in multiple languages. I firstly establish a baseline for … |
Hidetsune Takahashi; | International Workshop on Semantic Evaluation | 2024-01-01 |
429 | Enabling Human-Centered Machine Translation Using Concept-Based Large Language Model Prompting and Translation Memory Related Papers Related Patents Related Grants Related Venues Related Experts View |
Ming Qian; Chuiqing Kong; | Interacción | 2024-01-01 |
430 | ASOS at NADI 2024 Shared Task: Bridging Dialectness Estimation and MSA Machine Translation for Arabic Language Enhancement Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This study undertakes a comprehensive investigation of transformer-based models to advance Arabic language processing, focusing on two pivotal aspects: the estimation of Arabic … |
Omer Nacar; Serry Sibaee; Abdullah Alharbi; L. Ghouti; Anis Koubaa; | ARABICNLP | 2024-01-01 |
431 | Enhancing Low-Resource NLP By Consistency Training With Data and Model Perturbations Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Natural language processing (NLP) has recently shown significant progress in rich-resource scenarios. However, it is much less effective for low-resource scenarios due to the … |
XIAOBO LIANG et. al. | IEEE/ACM Transactions on Audio, Speech, and Language … | 2024-01-01 |
432 | Improving BERTScore for Machine Translation Evaluation Through Contrastive Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: BERTScore is an automatic evaluation metric for machine translation. It calculates similarity scores between candidate and reference tokens through embeddings. The quality of … |
Gongbo Tang; Oreen Yousuf; Zeying Jin; | IEEE Access | 2024-01-01 |
433 | Naïve Bayes Approach for Word Sense Disambiguation System With A Focus on Parts-of-Speech Ambiguity Resolution Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Natural languages are written and spoken languages, and NLP (Natural Language Processing) is the ability of a computer program to recognize both written and spoken languages. Word … |
AJITH ABRAHAM et. al. | IEEE Access | 2024-01-01 |
434 | Improving LLM-based Machine Translation with Systematic Self-Correction IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Large Language Models (LLMs) have achieved impressive results in Machine Translation (MT). However, careful evaluations by human reveal that the translations produced by LLMs … |
ZHAOPENG FENG et. al. | ArXiv | 2024-01-01 |
435 | Findings of The WMT24 General Machine Translation Shared Task: The LLM Era Is Here But MT Is Not Solved Yet IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This overview paper presents the results of the General Machine Translation Task organised as part of the 2024 Conference on Machine Translation (WMT). In the general MT task, … |
TOM KOCMI et. al. | Conference on Machine Translation | 2024-01-01 |
436 | Findings of The WMT 2024 Biomedical Translation Shared Task: Test Sets on Abstract Level Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present the results of the ninth edition of the Biomedical Translation Task at WMT’24. We released test sets for six language pairs, namely, French, German, Italian, … |
MARIANA L. NEVES et. al. | Conference on Machine Translation | 2024-01-01 |
437 | IOL Research Machine Translation Systems for WMT24 General Machine Translation Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper illustrates the submission system of the IOL Research team for the WMT24 General Machine Translation shared task. We submitted translations for all translation … |
Wenbo Zhang; | Conference on Machine Translation | 2024-01-01 |
438 | MSLC24 Submissions to The General Machine Translation Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The MSLC (Metric Score Landscape Challenge) submissions for English–German, English–Spanish, and Japanese–Chinese are constrained systems built using Transformer models for the … |
Samuel Larkin; Chi-liu Lo; Rebecca Knowles; | Conference on Machine Translation | 2024-01-01 |
439 | Speech Recognition and Intelligent Translation Under Multimodal Human-computer Interaction System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The traditional translation robot is limited to the translation of single-mode text images and text videos, which has the problem of low translation accuracy. Therefore, speech … |
Danhua Huang; Shuaiqiu Xiang; | J. Intell. Syst. | 2024-01-01 |
440 | The Bangla/Bengali Seed Dataset Submission to The WMT24 Open Language Data Initiative Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We contribute a seed dataset for the Bangla/Bengali language as part of the WMT24 Open Language Data Initiative shared task. We validate the quality of the dataset against a mined … |
Firoz Ahmed; Nitin Venkateswaran; Sarah Moeller; | Conference on Machine Translation | 2024-01-01 |
441 | HW-TSC’s Participation in The WMT 2024 QEAPE Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The paper presents the submission by HW-TSC in the WMT 2024 Quality-informed Automatic Post Editing (QEAPE) shared task for the English-Hindi (En-Hi) and English-Tamil (En-Ta) … |
JIAWEI YU et. al. | Conference on Machine Translation | 2024-01-01 |
442 | FLORES+ Translation and Machine Translation Evaluation for The Erzya Language Related Papers Related Patents Related Grants Related Venues Related Experts View |
Isai Gordeev; Sergey Kuldin; David Dale; | Conference on Machine Translation | 2024-01-01 |
443 | A High-quality Seed Dataset for Italian Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the submission of a high-quality translation of the OLDI Seed dataset into Italian for the WMT 2024 Open Language Data Initiative shared task. The base of … |
Edoardo Ferrante; | Conference on Machine Translation | 2024-01-01 |
444 | Evaluating WMT 2024 Metrics Shared Task Submissions on AfriMTE (the African Challenge Set) Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The A FRI MTE challenge set from WMT 2024 Metrics Shared Task aims to evaluate the capabilities of evaluation metrics for machine translation on low-resource African languages, … |
Jiayi Wang; D. Adelani; Pontus Stenetorp; | Conference on Machine Translation | 2024-01-01 |
445 | MSLC24: Further Challenges for Metrics on A Wide Landscape of Translation Quality Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this second edition of the Metric Score Land-scape Challenge (MSLC), we examine how automatic metrics for machine translation perform on a wide variety of machine translation … |
Rebecca Knowles; Samuel Larkin; Chi-kiu Lo; | Conference on Machine Translation | 2024-01-01 |
446 | SRIB-NMT’s Submission to The Indic MT Shared Task in WMT 2024 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the context of the Indic Low Resource Machine Translation (MT) challenge at WMT-24 ((Pakray et al., 2024)), we participated in four language pairs: English-Assamese (en-as), … |
Pranamya Patil; Raghavendra Hr; Aditya Raghuwanshi; Kushal Verma; | Conference on Machine Translation | 2024-01-01 |
447 | DLUT-NLP Machine Translation Systems for WMT24 Low-Resource Indic Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View |
Chenfei Ju; Junpeng Liu; Kaiyu Huang; Degen Huang; | Conference on Machine Translation | 2024-01-01 |
448 | Enhancing Tuvan Language Resources Through The FLORES Dataset Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: FLORES is a benchmark dataset designed for evaluating machine translation systems, particularly for low-resource languages. This paper, conducted as a part of Open Language Data … |
Ali Kuzhuget; Airana Mongush; Nachyn-Enkhedorzhu Oorzhak; | Conference on Machine Translation | 2024-01-01 |
449 | MTNLP-IIITH: Machine Translation for Low-Resource Indic Languages Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine Translation for low-resource languages poses significant challenges, primarily due to the limited availability of data.The WMT24 Low-Resource Indic Neural Machine … |
Abhinav P M; Ketaki Shetye; Parameswari Krishnamurthy; | Conference on Machine Translation | 2024-01-01 |
450 | Findings of The WMT 2024 Shared Task on Non-Repetitive Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The repetition of words in an English sentence can create a monotonous or awkward impression. In such cases, repetition should be avoided appropriately. To evaluate the … |
Kazutaka Kinugawa; Hideya Mino; Isao Goto; Naoto Shirai; | Conference on Machine Translation | 2024-01-01 |
451 | Findings of WMT 2024’s MultiIndic22MT Shared Task for Machine Translation of 22 Indian Languages Related Papers Related Patents Related Grants Related Venues Related Experts View |
Raj Dabre; Anoop Kunchukuttan; | Conference on Machine Translation | 2024-01-01 |
452 | Findings of WMT 2024 Shared Task on Low-Resource Indic Languages Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the results of the low-resource Indic language translation task, organized in conjunction with the Ninth Conference on Machine Translation (WMT) 2024. In this … |
PARTHA PAKRAY et. al. | Conference on Machine Translation | 2024-01-01 |
453 | Spanish Corpus and Provenance with Computer-Aided Translation for The WMT24 OLDI Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the S EED -CAT submission to the WMT24 Open Language Data Initiative shared task. We detail our data collection method, which involves a computer-aided … |
Jose Cols; | Conference on Machine Translation | 2024-01-01 |
454 | English-to-Low-Resource Translation: A Multimodal Approach for Hindi, Malayalam, Bengali, and Hausa Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multimodal machine translation leverages multiple data modalities to enhance translation quality, particularly for low-resourced languages. This paper uses a multimodal model that … |
ALI HATAMI et. al. | Conference on Machine Translation | 2024-01-01 |
455 | DCU ADAPT at WMT24: English to Low-resource Multi-Modal Translation Task Related Papers Related Patents Related Grants Related Venues Related Experts View |
Sami Haq; Rudali Huidrom; Sheila Castilho; | Conference on Machine Translation | 2024-01-01 |
456 | Tulu Language Text Recognition and Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Language is a primary means of communication, but it is not the only means; knowing a language does, however, assist speed up the process. Many distinct languages are spoken … |
.. Prathwini; Anisha P. Rodrigues; P. Vijaya; Roshan Fernandes; | IEEE Access | 2024-01-01 |
457 | An Intelligent Error Detection Model for Machine Translation Using Composite Neural Network-Based Semantic Perception Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Although machine translation has received great progress in recent years, machine translation results usually existed some errors due to the complex relationship between sentence … |
Yaoxi Wu; Qiao Liang; | IEEE Access | 2024-01-01 |
458 | SYSTRAN @ WMT24 Non-Repetitive Translation Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Many contemporary NLP systems rely on neural decoders for text generation, which demonstrate an impressive ability to generate text approaching human fluency levels. However, in … |
Marko Avila; Josep Crego; | Conference on Machine Translation | 2024-01-01 |
459 | Mediapi-RGB: Enabling Technological Breakthroughs in French Sign Language (LSF) Research Through An Extensive Video-Text Corpus Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: We introduce Mediapi-RGB, a new dataset of French Sign Language (LSF) along with the first LSF-to-French machine translation model. With 86 hours of video, it the largest LSF … |
YANIS OUAKRIM et. al. | VISIGRAPP : VISAPP | 2024-01-01 |
460 | Research on Automatic Identification of Machine English Translation Errors Based on Improved GLR Algorithm Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine translation is a powerful tool for overcoming linguistic obstacles, but it often introduces errors that lower the overall translation quality. This research project aims … |
Guanghuan Li; | Informatica (Slovenia) | 2024-01-01 |
461 | WMT24 System Description for The MultiIndic22MT Shared Task on Manipuri Language Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents a Transformer-based Neural Machine Translation (NMT) system developed by the Centre for Natural Language Processing and the Department of Computer Science and … |
Ningthoujam Justwant Singh; Kshetrimayum Boynao Singh; Avichandra Singh Ningthoujam; Sanjita Phijam; Thoudam Doren Singh; | Conference on Machine Translation | 2024-01-01 |
462 | System Description of BV-SLP for Sindhi-English Machine Translation in MultiIndic22MT 2024 Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents our machine translation system that was developed for the WAT2024 MultiIndic MT shared task. We built our system for the Sindhi-English language pair. We … |
Nisheeth Joshi; Pragya Katyayan; Palak Arora; Bharti Nathani; | Conference on Machine Translation | 2024-01-01 |
463 | NovelTrans: System for WMT24 Discourse-Level Literary Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes our submission system, NovelTrans, from NLP 2 CT and DeepTranx for the WMT24 Discourse-Level Literary Translation Task in Chinese-English, Chinese-German, and … |
Yuchen Liu; Yutong Yao; Runzhe Zhan; Yuchu Lin; Derek F. Wong; | Conference on Machine Translation | 2024-01-01 |
464 | Context-Aware Linguistic Steganography Model Based on Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Linguistic steganography based on text generation is a hot topic in the field of text information hiding. Previous studies have managed to improve the syntactic quality of … |
CHANGHAO DING et. al. | IEEE/ACM Transactions on Audio, Speech, and Language … | 2024-01-01 |
465 | Benchmarking and Improving Long-Text Translation with Large Language Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent studies have illuminated the promising capabilities of large language models (LLMs) in handling long texts. However, their performance in machine translation (MT) of long … |
LONGYUE WANG et. al. | Annual Meeting of the Association for Computational … | 2024-01-01 |
466 | HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we release an annotated dataset for the hallucination and omission phenomena covering 18 translation directions with varying resource levels and scripts. |
DAVID DALE et. al. | emnlp | 2023-12-22 |
467 | Program Translation Via Code Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a novel model called Code Distillation (CoDist) whereby we capture the semantic and structural equivalence of code in a language agnostic intermediate representation. |
YUFAN HUANG et. al. | emnlp | 2023-12-22 |
468 | MMNMT: Modularizing Multilingual Neural Machine Translation with Flexibly Assembled MoE and Dense Blocks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a modularized MNMT framework that is able to flexibly assemble dense and MoE-based sparse modules to achieve the best of both worlds. |
SHANGJIE LI et. al. | emnlp | 2023-12-22 |
469 | Towards A Better Understanding of Variations in Zero-Shot Neural Machine Translation Performance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through systematic experimentation, spanning 1,560 language directions across 40 languages, we identify three key factors contributing to high variations in ZS NMT performance: 1) target-side translation quality, 2) vocabulary overlap, and 3) linguistic properties. |
Shaomu Tan; Christof Monz; | emnlp | 2023-12-22 |
470 | MT2: Towards A Multi-Task Machine Translation Model with Translation-Specific In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most of the previous work uses separate models or methods to solve these tasks, which is not conducive to knowledge transfer of different tasks and increases the complexity of system construction. In this work, we explore the potential of pre-trained language model in machine translation tasks and propose a Multi-Task Machine Translation (MT2) model to integrate these translation tasks. |
CHUNYOU LI et. al. | emnlp | 2023-12-22 |
471 | Revisiting Machine Translation for Cross-lingual Classification IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed. |
Mikel Artetxe; Vedanuj Goswami; Shruti Bhosale; Angela Fan; Luke Zettlemoyer; | emnlp | 2023-12-22 |
472 | Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with The GeNTE Corpus IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on GeNTE, we then overview existing reference-based evaluation approaches, highlight their limits, and propose a reference-free method more suitable to assess gender-neutral translation. |
Andrea Piergentili; Beatrice Savoldi; Dennis Fucci; Matteo Negri; Luisa Bentivogli; | emnlp | 2023-12-22 |
473 | A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for Fairer Instruction-Tuned Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In MT, this might lead to misgendered translations, resulting, among other harms, in the perpetuation of stereotypes and prejudices. In this work, we address this gap by investigating whether and to what extent such models exhibit gender bias in machine translation and how we can mitigate it. |
Giuseppe Attanasio; Flor Plaza del Arco; Debora Nozza; Anne Lauscher; | emnlp | 2023-12-22 |
474 | Learn and Consolidate: Continual Adaptation for Zero-Shot and Multilingual Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a two-stage approach that encourages original models to acquire language-agnostic multilingual representations from new data, and preserves the model architecture without introducing parameters. |
Kaiyu Huang; Peng Li; Junpeng Liu; Maosong Sun; Yang Liu; | emnlp | 2023-12-22 |
475 | Crossing The Threshold: Idiomatic Machine Translation Through Retrieval Augmentation and Loss Weighting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve translation of natural idioms, we introduce two straightforward yet effective techniques: the strategic upweighting of training loss on potentially idiomatic sentences, and using retrieval-augmented models. |
Emmy Liu; Aditi Chaudhary; Graham Neubig; | emnlp | 2023-12-22 |
476 | DecoMT: Decomposed Prompting for Machine Translation Between Related Languages Using Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations. |
Ratish Puduppully; Anoop Kunchukuttan; Raj Dabre; Ai Ti Aw; Nancy Chen; | emnlp | 2023-12-22 |
477 | On The Use of Metaphor Translation in Psychiatry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Now, metaphor has been shown to be paramount in both identifying individuals struggling with mental problems and helping those individuals understand and communicate their experiences. Therefore, this paper aims to survey the potential of Machine Translation for providing equitable psychiatric healthcare and highlights the need for further research on the transferability of existing machine and metaphor translation research in the domain of psychiatry. |
Lois Wong; | arxiv-cs.CL | 2023-12-22 |
478 | PromptST: Abstract Prompt Learning for End-to-End Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take the first step toward understanding the fusion of speech and text features in S2T model. |
TENGFEI YU et. al. | emnlp | 2023-12-22 |
479 | Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, when it comes to non-English languages, the quantity and quality of textual information are comparatively scarce. To address this issue, we introduce the novel task of automatic Knowledge Graph Completion (KGE) and perform a thorough investigation on bridging the gap in both the quantity and quality of textual information between English and non-English languages. |
SIMONE CONIA et. al. | emnlp | 2023-12-22 |
480 | Challenges in Context-Aware Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate and present several core challenges that impede progress within the field, relating to discourse phenomena, context usage, model architectures, and document-level evaluation. |
Linghao Jin; Jacqueline He; Jonathan May; Xuezhe Ma; | emnlp | 2023-12-22 |
481 | Video-Helpful Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce EVA (Extensive training set and Video-helpful evaluation set for Ambiguous subtitles translation), an MMT dataset containing 852k Japanese-English parallel subtitle pairs, 520k Chinese-English parallel subtitle pairs, and corresponding video clips collected from movies and TV episodes. |
Yihang Li; Shuichiro Shimizu; Chenhui Chu; Sadao Kurohashi; Wei Li; | emnlp | 2023-12-22 |
482 | CLAD-ST: Contrastive Learning with Adversarial Data for Robust Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address this robustness problem in downstream MT models by forcing the MT encoder to bring the representations of a noisy input closer to its clean version in the semantic space. This is achieved by introducing a contrastive learning method that leverages adversarial examples in the form of ASR outputs paired with their corresponding human transcripts to optimize the network parameters. |
Sathish Indurthi; Shamil Chollampatt; Ravi Agrawal; Marco Turchi; | emnlp | 2023-12-22 |
483 | An Empirical Study of Translation Hypothesis Ensembling with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate how hypothesis ensembling can improve the quality of the generated text for the specific problem of LLM-based machine translation. |
Ant�nio Farinhas; Jos� de Souza; Andre Martins; | emnlp | 2023-12-22 |
484 | PROSE: A Pronoun Omission Solution for Chinese-English Spoken Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate the negative impact introduced by pro-drop, we propose Mention-Aware Semantic Augmentation, a novel approach that leverages the semantic embedding of dropped pronouns to augment training pairs. |
Ke Wang; Xiutian Zhao; Yanghui Li; Wei Peng; | emnlp | 2023-12-22 |
485 | Document-Level Machine Translation with Large Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The study focuses on three aspects: 1) Effects of Context-Aware Prompts, where we investigate the impact of different prompts on document-level translation quality and discourse phenomena; 2) Comparison of Translation Models, where we compare the translation performance of ChatGPT with commercial MT systems and advanced document-level MT methods; 3) Analysis of Discourse Modelling Abilities, where we further probe discourse knowledge encoded in LLMs and shed light on impacts of training techniques on discourse modeling. |
LONGYUE WANG et. al. | emnlp | 2023-12-22 |
486 | Exploring Discourse Structure in Document-level Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a more sound paragraph-to-paragraph translation mode and explore whether discourse structure can improve DocMT. |
Xinyu Hu; Xiaojun Wan; | emnlp | 2023-12-22 |
487 | Contextual Code Switching for Machine Translation Using Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an extensive study on the code switching task specifically for the machine translation task comparing multiple LLMs. |
Arshad Kaji; Manan Shah; | arxiv-cs.CL | 2023-12-20 |
488 | An Empirical Study of Unsupervised Neural Machine Translation: Analyzing NMT Output, Model’s Behavior and Sentences’ Contribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus on three very diverse languages, French, Gujarati, and Kazakh, and train bilingual NMT models, to and from English, with various levels of supervision, in high- and low- resource setups, measure quality of the NMT output and compare the generated sequences’ word order and semantic similarity to source and reference sentences. |
Isidora Chara Tourni; Derry Wijaya; | arxiv-cs.CL | 2023-12-19 |
489 | Fine-tuning Large Language Models for Adaptive Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents the outcomes of fine-tuning Mistral 7B, a general-purpose large language model (LLM), for adaptive machine translation (MT). |
Yasmin Moslem; Rejwanul Haque; Andy Way; | arxiv-cs.CL | 2023-12-19 |
490 | Overview of MTIL Track at FIRE 2023: Machine Translation for Indian Languages Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The objective of the MTIL track in FIRE 2023 was to encourage the development of Indian Language to Indian Language (IL-IL) Neural Machine Translation models. The languages … |
SURUPENDU GANGOPADHYAY et. al. | Proceedings of the 15th Annual Meeting of the Forum for … | 2023-12-15 |
491 | Neural Machine Translation of Clinical Text: An Empirical Investigation Into Multilingual Pre-Trained Language Models and Transfer-Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We conduct investigations on clinical text machine translation by examining multilingual neural network models using deep learning such as Transformer based structures. |
LIFENG HAN et. al. | arxiv-cs.CL | 2023-12-12 |
492 | Performance Evaluation of Popular Deep Neural Networks for Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The field of Neural Machine Translation (NMT) has shown impressive performance for quick and easy communication in various languages spoken all over the world. NMT helps us by … |
MUHAMMAD NAEEM et. al. | 2023 International Conference on Frontiers of Information … | 2023-12-11 |
493 | First Attempt at Building Parallel Corpora for Machine Translation of Northeast India’s Very Low-Resource Languages Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the creation of initial bilingual corpora for thirteen very low-resource languages of India, all from Northeast India. It also presents the results of initial … |
A. Tonja; Melkamu Abay Mersha; Ananya Kalita; Olga Kolesnikova; Jugal Kalita; | ArXiv | 2023-12-08 |
494 | Design of Automatic Translation System for English for Special Purpose in Agriculture Based on Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Agricultural terms have some unique characteristics, which make them need special treatment in machine translation. Agriculture is a highly specialized field, with a large number … |
Meilin Wang; | Proceedings of the 3rd International Conference on … | 2023-12-08 |
495 | Converting Epics/Stories Into Pseudocode Using Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With this research paper, we aim to present a methodology to generate pseudocode from a given agile user story of small functionalities so as to reduce the overall time spent on the industrial project. |
Gaurav Kolhatkar; Akshit Madan; Nidhi Kowtal; Satyajit Roy; Sheetal Sonawane; | arxiv-cs.CL | 2023-12-08 |
496 | First Attempt at Building Parallel Corpora for Machine Translation of Northeast India’s Very Low-Resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the creation of initial bilingual corpora for thirteen very low-resource languages of India, all from Northeast India. |
Atnafu Lambebo Tonja; Melkamu Mersha; Ananya Kalita; Olga Kolesnikova; Jugal Kalita; | arxiv-cs.CL | 2023-12-07 |
497 | Making Translators Privacy-aware on The User’s Side Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose PRISM to enable users of machine translation systems to preserve the privacy of data on their own initiative. |
Ryoma Sato; | arxiv-cs.CR | 2023-12-07 |
498 | Improving Neural Machine Translation By Multi-Knowledge Integration with Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on how to integrate multi-knowledge, multiple types of knowledge, into NMT models to enhance the performance with prompting. |
Ke Wang; Jun Xie; Yuqi Zhang; Yu Zhao; | arxiv-cs.CL | 2023-12-07 |
499 | Efficient Monotonic Multihead Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Efficient Monotonic Multihead Attention (EMMA), a state-of-the-art simultaneous translation model with numerically-stable and unbiased monotonic alignment estimation. |
Xutai Ma; Anna Sun; Siqi Ouyang; Hirofumi Inaguma; Paden Tomasello; | arxiv-cs.CL | 2023-12-07 |
500 | Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address key challenges facing LLMs fine-tuned for SimulMT, validate classical SimulMT concepts and practices in the context of LLMs, explore adapting LLMs that are fine-tuned for NMT to the task of SimulMT, and introduce Simul-LLM, the first open-source fine-tuning and evaluation pipeline development framework for LLMs focused on SimulMT. |
Victor Agostinelli; Max Wild; Matthew Raffel; Kazi Ahmed Asif Fuad; Lizhong Chen; | arxiv-cs.CL | 2023-12-07 |
501 | English-Arabic Text Translation and Abstractive Summarization Using Transformers Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The vast growth of online and offline data has revolutionized how we gather, evaluate, and understand information. Comprehending lengthy text documents and extracting crucial … |
Heidi Ahmed Holiel; Nancy Mohamed; Arwa Ahmed; Walaa Medhat; | 2023 20th ACS/IEEE International Conference on Computer … | 2023-12-04 |
502 | End-to-End Speech-to-Text Translation: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, researchers have been exploring end-to-end (E2E) models for ST translation. |
Nivedita Sethiya; Chandresh Kumar Maurya; | arxiv-cs.CL | 2023-12-02 |
503 | Beyond Lexical Consistency: Preserving Semantic Consistency for Program Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Program translation aims to convert the input programs from one programming language to another. Automatic program translation is a prized target of software engineering research, … |
Yali Du; Yiwei Ma; Zheng Xie; Ming Li; | 2023 IEEE International Conference on Data Mining (ICDM) | 2023-12-01 |
504 | Quick Back-Translation for Unsupervised Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a two-for-one improvement to Transformer back-translation: Quick Back-Translation (QBT). |
Benjamin Brimacombe; Jiawei Zhou; | arxiv-cs.CL | 2023-12-01 |
505 | Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present GEST — a new manually created dataset designed to measure gender-stereotypical reasoning in language models and machine translation systems. |
Matúš Pikuliak; Andrea Hrckova; Stefan Oresko; Marián Šimko; | arxiv-cs.CL | 2023-11-30 |
506 | Relevance-guided Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an explainability-based training approach for NMT, applied in Unsupervised and Supervised model training, for translation of three languages of varying resources, French, Gujarati, Kazakh, to and from English. |
Isidora Chara Tourni; Derry Wijaya; | arxiv-cs.CL | 2023-11-30 |
507 | INarIG: Iterative Non-autoregressive Instruct Generation Model For Word-Level Auto Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the INarIG (Iterative Non-autoregressive Instruct Generation) model, which constructs the human typed sequence into Instruction Unit and employs iterative decoding with subwords to fully utilize input information given in the task. |
HENGCHAO SHANG et. al. | arxiv-cs.CL | 2023-11-29 |
508 | AdaptMLLM: Fine-Tuning Multilingual Language Models on Low-Resource Languages with Integrated LLM Playgrounds IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: The advent of Multilingual Language Models (MLLMs) and Large Language Models (LLMs) has spawned innovation in many areas of natural language processing. Despite the exciting … |
Séamus Lankford; Haithem Afli; Andy Way; | Inf. | 2023-11-29 |
509 | A Benchmark for Evaluating Machine Translation Metrics on Dialects Without Standard Orthography Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we evaluate how robust metrics are to non-standardized dialects, i.e. spelling differences in language varieties that do not have a standard orthography. |
Noëmi Aepli; Chantal Amrhein; Florian Schottmann; Rico Sennrich; | arxiv-cs.CL | 2023-11-28 |
510 | Reducing Gender Bias in Machine Translation Through Counterfactual Data Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We also propose a novel domain-adaptation technique that leverages in-domain data created with the counterfactual data generation techniques proposed by Zmigrod et al. (2019) to further improve accuracy on the WinoMT challenge test set without significant loss in translation quality. We show its effectiveness in NMT systems from English into three morphologically rich languages French, Spanish, and Italian. |
Ranjita Naik; Spencer Rarrick; Vishal Chowdhary; | arxiv-cs.CL | 2023-11-27 |
511 | Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, when it comes to non-English languages, the quantity and quality of textual information are comparatively scarce. To address this issue, we introduce the novel task of automatic Knowledge Graph Enhancement (KGE) and perform a thorough investigation on bridging the gap in both the quantity and quality of textual information between English and non-English languages. |
SIMONE CONIA et. al. | arxiv-cs.AI | 2023-11-27 |
512 | Improving Word Sense Disambiguation in Neural Machine Translation with Salient Document Context Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a simple and scalable approach to resolve translation ambiguity by incorporating a small amount of extra-sentential context in neural \mt. Our approach requires no sense annotation and no change to standard model architectures. |
Elijah Rippeth; Marine Carpuat; Kevin Duh; Matt Post; | arxiv-cs.CL | 2023-11-26 |
513 | Machine Translation to Control Formality Features in The Target Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When a language translation technique is used to translate from a source language that does not pertain the formality (e.g. English) to a target language that does, there is a missing information on formality that could be a challenge in producing an accurate outcome. This research explores how this issue should be resolved when machine learning methods are used to translate from English to languages with formality, using Hindi as the example data. |
Harshita Tyagi; Prashasta Jung; Hyowon Lee; | arxiv-cs.CL | 2023-11-22 |
514 | Context-aware Neural Machine Translation for English-Japanese Business Scene Dialogues Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore how context-awareness can improve the performance of the current Neural Machine Translation (NMT) models for English-Japanese business dialogues translation, and what kind of context provides meaningful information to improve translation. |
Sumire Honda; Patrick Fernandes; Chrysoula Zerva; | arxiv-cs.CL | 2023-11-20 |
515 | Multi-Task Self-Supervised Learning Based Tibetan-Chinese Speech-to-Speech Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech-to-speech translation tasks are commonly tackled by using a three-level cascade system which comprises of speech recognition, machine translation, and speech synthesis. … |
Rouhe Liu; Yue Zhao; Xiaona Xu; | 2023 International Conference on Asian Language Processing … | 2023-11-18 |
516 | Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite extensive study into translating Bangla to English, English to Bangla, and Banglish to Bangla in the past, there has been a noticeable gap in translating Bangla regional dialects into standard Bangla. In this study, we set out to fill this gap by creating a collection of 32,500 sentences, encompassing Bangla, Banglish, and English, representing five regional Bangla dialects. |
FATEMA TUJ JOHORA FARIA et. al. | arxiv-cs.CL | 2023-11-18 |
517 | Rethinking The Exploitation of Monolingual Data for Low-Resource Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The utilization of monolingual data has been shown to be a promising strategy for addressing low-resource machine translation problems. Previous studies have demonstrated the … |
JIANHUI PANG et. al. | Computational Linguistics | 2023-11-16 |
518 | SentAlign: Accurate and Scalable Sentence Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present SentAlign, an accurate sentence alignment tool designed to handle very large parallel document pairs. |
Steinþór Steingrímsson; Hrafn Loftsson; Andy Way; | arxiv-cs.CL | 2023-11-15 |
519 | English–Vietnamese Machine Translation Using Deep Learning for Chatbot Applications Related Papers Related Patents Related Grants Related Venues Related Experts View |
Sakya Tuan; P. Meesad; Ha Huy Cuong Nguyen; | SN Computer Science | 2023-11-15 |
520 | Assessing Translation Capabilities of Large Language Models Involving English and Indian Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, our aim is to explore the multilingual capabilities of large language models by using machine translation as a task involving English and 22 Indian languages. |
VANDAN MUJADIA et. al. | arxiv-cs.CL | 2023-11-15 |
521 | Evaluating Gender Bias in The Translation of Gender-Neutral Languages Into English Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite numerous studies into gender bias in translations from gender-neutral languages such as Turkish into more strongly gendered languages like English, there are no benchmarks for evaluating this phenomenon or for assessing mitigation strategies. To address this gap, we introduce GATE X-E, an extension to the GATE (Rarrick et al., 2023) corpus, that consists of human translations from Turkish, Hungarian, Finnish, and Persian into English. |
Spencer Rarrick; Ranjita Naik; Sundar Poudel; Vishal Chowdhary; | arxiv-cs.CL | 2023-11-15 |
522 | Extending Multilingual Machine Translation Through Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We aim to extend large-scale MNMT models to a new language, allowing for translation between the newly added and all of the already supported languages in a challenging scenario: using only a parallel corpus between the new language and English. |
Wen Lai; Viktor Hangya; Alexander Fraser; | arxiv-cs.CL | 2023-11-14 |
523 | On-the-Fly Fusion of Large Language Models and Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the on-the-fly ensembling of a machine translation model with an LLM, prompted on the same task and input. |
Hieu Hoang; Huda Khayrallah; Marcin Junczys-Dowmunt; | arxiv-cs.CL | 2023-11-14 |
524 | Investigating Multi-Pivot Ensembling with Massively Multilingual Machine Translation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Pivoting via high-resource languages remains a strong strategy for low-resource directions, and in this paper we revisit ways of pivoting through multiple languages. |
Alireza Mohammadshahi; Jannis Vamvas; Rico Sennrich; | arxiv-cs.CL | 2023-11-13 |
525 | Added Toxicity Mitigation at Inference Time for Multimodal and Massively Multilingual Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Added toxicity in the context of translation refers to the fact of producing a translation output with more toxicity than there exists in the input. In this paper, we present MinTox which is a novel pipeline to identify added toxicity and mitigate this issue which works at inference time. |
Marta R. Costa-jussà; David Dale; Maha Elbayad; Bokai Yu; | arxiv-cs.CL | 2023-11-11 |
526 | Gender Inflected or Bias Inflicted: On Using Grammatical Gender Cues for Bias Evaluation in Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To demonstrate our point, in this work, we use Hindi as the source language and construct two sets of gender-specific sentences: OTSC-Hindi and WinoMT-Hindi that we use to evaluate different Hindi-English (HI-EN) NMT systems automatically for gender bias. |
Pushpdeep Singh; | arxiv-cs.CL | 2023-11-07 |
527 | Findings of The WMT 2023 Shared Task on Discourse-Level Literary Translation: A Fresh Orb in The Cosmos of LLMs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We employ both automatic and human evaluations to measure the performance of the submitted systems. |
LONGYUE WANG et. al. | arxiv-cs.CL | 2023-11-06 |
528 | CBSiMT: Mitigating Hallucination in Simultaneous Machine Translation with Weighted Prefix-to-Prefix Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Confidence-Based Simultaneous Machine Translation (CBSiMT) framework, which uses model confidence to perceive hallucination tokens and mitigates their negative impact with weighted prefix-to-prefix training. |
MENGGE LIU et. al. | arxiv-cs.CL | 2023-11-06 |
529 | Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture Transcripts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To create the parallel corpora, we propose a dynamic programming based sentence alignment algorithm which leverages the cosine similarity of machine-translated sentences. |
Haiyue Song; Raj Dabre; Chenhui Chu; Atsushi Fujita; Sadao Kurohashi; | arxiv-cs.CL | 2023-11-06 |
530 | Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Neural machine translation (NMT) for low-resource local languages in Indonesia faces significant challenges, including the need for a representative benchmark and limited data availability. This work addresses these challenges by comprehensively analyzing training NMT systems for four low-resource local languages in Indonesia: Javanese, Sundanese, Minangkabau, and Balinese. |
Lucky Susanto; Ryandito Diandaru; Adila Krisnadhi; Ayu Purwarianti; Derry Wijaya; | arxiv-cs.CL | 2023-11-02 |
531 | Parts of Speech Tagged Phrase-Based Statistical Machine Translation System for English → Mizo Language Related Papers Related Patents Related Grants Related Venues Related Experts View |
Chanambam Sveta Devi; Amit Kumar Roy; Bipul Syam Purkayastha; | SN Computer Science | 2023-11-01 |
532 | Is Robustness Transferable Across Languages in Multilingual Neural Machine Translation? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the transferability of robustness across different languages in multilingual neural machine translation. |
Leiyu Pan; Deyi Xiong; | arxiv-cs.AI | 2023-10-31 |
533 | Towards A Deep Understanding of Multilingual End-to-End Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we employ Singular Value Canonical Correlation Analysis (SVCCA) to analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages. |
Haoran Sun; Xiaohu Zhao; Yikun Lei; Shaolin Zhu; Deyi Xiong; | arxiv-cs.CL | 2023-10-31 |
534 | Gex’ez-English Bi-Directional Neural Machine Translation Using Transformer Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine translation is the technique of translating texts from one language to another without human intervention using artificial intelligence. Neural Machine Translation (NMT) … |
Sefineh Getachew; Yirga Yayeh; | 2023 International Conference on Information and … | 2023-10-26 |
535 | Cultural Adaptation of Recipes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new task involving the translation and cultural adaptation of recipes between Chinese and English-speaking cuisines. |
YONG CAO et. al. | arxiv-cs.CL | 2023-10-26 |
536 | DISCO: A Large Scale Human Annotated Corpus for Disfluency Correction in Indo-European Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Towards the goal of multilingual disfluency correction, we present a high-quality human-annotated DC corpus covering four important Indo-European languages: English, Hindi, German and French. |
Vineet Bhat; Preethi Jyothi; Pushpak Bhattacharyya; | arxiv-cs.CL | 2023-10-25 |
537 | CUNI Submission to MRL 2023 Shared Task on Multi-lingual Multi-task Information Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To keep the inferred tags on the correct positions in the original language, we propose a method based on scoring the candidate positions using a label-sensitive translation model. |
Jindřich Helcl; Jindřich Libovický; | arxiv-cs.CL | 2023-10-25 |
538 | ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ComSL, a speech-language model built atop a composite architecture of public pre-trained speech-only and language-only models and optimized data-efficiently for spoken language tasks. |
CHENYANG LE et. al. | nips | 2023-10-24 |
539 | Machine Translation for Nko: Tools, Corpora and Baseline Results Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Currently, there is no usable machine translation system for Nko, a language spoken by tens of millions of people across multiple West African countries, which holds significant cultural and educational value. To address this issue, we present a set of tools, resources, and baseline results aimed towards the development of usable machine translation systems for Nko and other languages that do not currently have sufficiently large parallel text corpora available. |
MOUSSA KOULAKO BALA DOUMBOUYA et. al. | arxiv-cs.CL | 2023-10-24 |
540 | Non-autoregressive Machine Translation with Probabilistic Context-free Grammar Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, conventional NAT models suffer from limited-expression power and performance degradation compared to autoregressive (AT) models due to the assumption of conditional independence among target tokens. To address these limitations, we propose a novel approach called PCFG-NAT, which leverages a specially designed Probabilistic Context-Free Grammar (PCFG) to enhance the ability of NAT models to capture complex dependencies among output tokens. |
SHANGTONG GUI et. al. | nips | 2023-10-24 |
541 | Data Augmentation Techniques for Machine Translation of Code-Switched Texts: A Comparative Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we compare three popular approaches: lexical replacements, linguistic theories, and back-translation (BT), in the context of Egyptian Arabic-English CSW. |
Injy Hamed; Nizar Habash; Ngoc Thang Vu; | arxiv-cs.CL | 2023-10-23 |
542 | Domain Terminology Integration Into Machine Translation: Leveraging Large Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper discusses the methods that we used for our submissions to the WMT 2023 Terminology Shared Task for German-to-English (DE-EN), English-to-Czech (EN-CS), and Chinese-to-English (ZH-EN) language pairs. |
D. Kelleher; | arxiv-cs.CL | 2023-10-22 |
543 | Boosting Unsupervised Machine Translation with Pseudo-Parallel Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a training strategy that relies on pseudo-parallel sentence pairs mined from monolingual corpora in addition to synthetic sentence pairs back-translated from monolingual corpora. |
Ivana Kvapilíková; Ondřej Bojar; | arxiv-cs.CL | 2023-10-22 |
544 | Evaluating and Optimizing The Effectiveness of Neural Machine Translation in Supporting Code Retrieval Models: A Study on The CAT Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we analyze the performance of NMT in natural language-to-code translation in the newly curated CAT benchmark[31] that includes the optimized versions of three Java datasets TLCodeSum, CodeSearchNet, Funcom, and a Python dataset PCSD. |
Hung Phan; Ali Jannesari; | cikm | 2023-10-21 |
545 | Code-Switching with Word Senses for Pretraining in Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Word Sense Pretraining for Neural Machine Translation (WSP-NMT) – an end-to-end approach for pretraining multilingual NMT models leveraging word sense-specific information from Knowledge Bases. |
Vivek Iyer; Edoardo Barba; Alexandra Birch; Jeff Z. Pan; Roberto Navigli; | arxiv-cs.CL | 2023-10-21 |
546 | Translation Performance from The User’s Perspective of Large Language Models and Neural Machine Translation Systems IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The rapid global expansion of ChatGPT, which plays a crucial role in interactive knowledge sharing and translation, underscores the importance of comparative performance … |
Jungha Son; Boyoung Kim; | Inf. | 2023-10-19 |
547 | A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for Fairer Instruction-Tuned Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In MT, this might lead to misgendered translations, resulting, among other harms, in the perpetuation of stereotypes and prejudices. In this work, we address this gap by investigating whether and to what extent such models exhibit gender bias in machine translation and how we can mitigate it. |
Giuseppe Attanasio; Flor Miriam Plaza-del-Arco; Debora Nozza; Anne Lauscher; | arxiv-cs.CL | 2023-10-18 |
548 | Knn-seq: Efficient, Extensible KNN-MT Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an efficient and extensible kNN-MT framework, knn-seq, for researchers and developers that is carefully designed to run efficiently, even with a billion-scale large datastore. |
HIROYUKI DEGUCHI et. al. | arxiv-cs.CL | 2023-10-18 |
549 | Direct Neural Machine Translation with Task-level Mixture of Experts Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we examine Task-level MoE’s applicability in direct NMT and propose a series of high-performing training and evaluation configurations, through which Task-level MoE-based direct NMT systems outperform bilingual and pivot-based models for a large number of low and high-resource direct pairs, and translation directions. |
Isidora Chara Tourni; Subhajit Naskar; | arxiv-cs.CL | 2023-10-18 |
550 | An Empirical Study of Translation Hypothesis Ensembling with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate how hypothesis ensembling can improve the quality of the generated text for the specific problem of LLM-based machine translation. |
António Farinhas; José G. C. de Souza; André F. T. Martins; | arxiv-cs.CL | 2023-10-17 |
551 | Long-form Simultaneous Speech Translation: Thesis Proposal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This thesis proposal addresses end-to-end simultaneous speech translation, particularly in the long-form setting, i.e., without pre-segmentation. We present a survey of the latest advancements in E2E SST, assess the primary obstacles in SST and its relevance to long-form scenarios, and suggest approaches to tackle these challenges. |
Peter Polák; | arxiv-cs.CL | 2023-10-17 |
552 | Exploring Automatic Evaluation Methods Based on A Decoder-based LLM for Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper compares various methods, including tuning with encoder-based models and large language models under equal conditions, on two different tasks, machine translation evaluation and semantic textual similarity, in two languages, Japanese and English. |
Tomohito Kasahara; Daisuke Kawahara; | arxiv-cs.CL | 2023-10-17 |
553 | UvA-MT’s Participation in The WMT23 General Translation Shared Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the UvA-MT’s submission to the WMT 2023 shared task on general machine translation. |
Di Wu; Shaomu Tan; David Stap; Ali Araabi; Christof Monz; | arxiv-cs.CL | 2023-10-15 |
554 | UvA-MT’s Participation in The WMT 2023 General Translation Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the UvA-MT’s submission to the WMT 2023 shared task on general machine translation. We participate in the constrained track in two directions: English … |
Di Wu; Shaomu Tan; David Stap; Ali Araabi; C. Monz; | ArXiv | 2023-10-15 |
555 | MILPaC: A Novel Benchmark for Evaluating Translation of Legal Text to Indian Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we construct the first high-quality legal parallel corpus containing aligned text units in English and nine Indian languages, that includes several low-resource languages. |
Sayan Mahapatra; Debtanu Datta; Shubham Soni; Adrijit Goswami; Saptarshi Ghosh; | arxiv-cs.CL | 2023-10-15 |
556 | Human-in-the-loop Machine Translation with Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a human-in-the-loop pipeline that guides LLMs to produce customized outputs with revision instructions. |
Xinyi Yang; Runzhe Zhan; Derek F. Wong; Junchao Wu; Lidia S. Chao; | arxiv-cs.CL | 2023-10-13 |
557 | Political Claim Identification and Categorization in A Multilingual Setting: First Experiments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores different strategies for the cross-lingual projection of political claims analysis. |
Urs Zaberer; Sebastian Padó; Gabriella Lapesa; | arxiv-cs.CL | 2023-10-13 |
558 | XDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the issue, we introduce xDial-Eval, built on top of open-source English dialogue evaluation datasets. |
CHEN ZHANG et. al. | arxiv-cs.CL | 2023-10-13 |
559 | Edge NLP for Efficient Machine Translation in Low Connectivity Areas Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine translation (MT) usually requires connectivity and access to the cloud which is often limited in many parts of the world, including hard to reach rural areas. Natural … |
Tess Watt; Christos Chrysoulas; Dimitra Gkatzia; | 2023 IEEE 9th World Forum on Internet of Things (WF-IoT) | 2023-10-12 |
560 | Enhancing Expressivity Transfer in Textless Speech-to-speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Expressivity plays a vital role in conveying emotions, nuances, and cultural subtleties, thereby enhancing communication across diverse languages. To address this issue this study presents a novel method that operates at the discrete speech unit level and leverages multilingual emotion embeddings to capture language-agnostic information. |
Jarod Duret; Benjamin O’Brien; Yannick Estève; Titouan Parcollet; | arxiv-cs.SD | 2023-10-11 |
561 | Task-Oriented Semantic Communications for Speech Transmission Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Semantic communications execute intelligent tasks at the receiver by only transmitting necessary information. In this paper, we introduce TOS-ST, a task-oriented semantic … |
Zhenzi Weng; Zhijin Qin; Xiaoming Tao; | 2023 IEEE 98th Vehicular Technology Conference … | 2023-10-10 |
562 | Larth: Dataset and Machine Translation for Etruscan Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To the best of our knowledge, there are no publicly available Etruscan corpora for natural language processing. Therefore, we propose a dataset for machine translation from Etruscan to English, which contains 2891 translated examples from existing academic sources. |
Gianluca Vico; Gerasimos Spanakis; | arxiv-cs.CL | 2023-10-09 |
563 | All Translation Tools Are Not Equal: Investigating The Quality of Language Translation for Forced Migration Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: As the volume and complexity of forced movement continues to grow, there is an urgent need to use new data sources to better understand emerging crises. Organic sources, like … |
AMEETA AGRAWAL et. al. | 2023 IEEE 10th International Conference on Data Science and … | 2023-10-09 |
564 | Terminology-Aware Translation with Constrained Decoding and Large Language Model Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Alternatively, we leverage a large language model to refine a hypothesis by providing it with terminology constraints. |
Nikolay Bogoychev; Pinzhen Chen; | arxiv-cs.CL | 2023-10-09 |
565 | Synslator: An Interactive Machine Translation Tool with Online Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Synslator, a user-friendly computer-aided translation (CAT) tool that not only supports IMT, but is adept at online learning with real-time translation memories. |
JIAYI WANG et. al. | arxiv-cs.CL | 2023-10-08 |
566 | CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop multilingual modeling approaches for code translation and demonstrate their great potential in improving the translation quality of both low-resource and high-resource language pairs and boosting the training efficiency. |
Weixiang Yan; Yuchen Tian; Yunzhe Li; Qian Chen; Wen Wang; | arxiv-cs.AI | 2023-10-07 |
567 | Evaluation of Cross-Lingual Bug Localization: Two Industrial Cases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study reports the results of applying the cross-lingual bug localization approach proposed by Xia et al. to industrial software projects. |
Shinpei Hayashi; Takashi Kobayashi; Tadahisa Kato; | arxiv-cs.SE | 2023-10-03 |
568 | Tuning Large Language Model for End-to-end Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces LST, a Large multimodal model designed to excel at the E2E-ST task. |
HAO ZHANG et. al. | arxiv-cs.CL | 2023-10-03 |
569 | Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To understand when and why the navigation capabilities of language IDs are weakened, we compare two extreme decoder input cases in the ZST directions: Off-Target (OFF) and On-Target (ON) cases. |
CHANGTONG ZAN et. al. | arxiv-cs.CL | 2023-09-28 |
570 | MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nonetheless, visual speech is not as distinguishable as audio speech, making it difficult to develop a mapping from source speech phonemes to the target language text. To address this issue, we propose MixSpeech, a cross-modality self-learning framework that utilizes audio speech to regularize the training of visual speech tasks. |
XIZE CHENG et. al. | iccv | 2023-09-27 |
571 | CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these are not directly applicable to MMT since they do not provide aligned multimodal multilingual features for generative tasks. To alleviate this issue, instead of designing complex modules for MMT, we propose CLIPTrans, which simply adapts the independently pre-trained multimodal M-CLIP and the multilingual mBART. |
DEVAANSH GUPTA et. al. | iccv | 2023-09-27 |
572 | Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper describes the FBK’s participation in the Simultaneous Translation and Automatic Subtitling tracks of the IWSLT 2023 Evaluation Campaign. |
Sara Papi; Marco Gaido; Matteo Negri; | arxiv-cs.CL | 2023-09-27 |
573 | Segmentation-Free Streaming Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a Segmentation-Free framework that enables the model to translate an unsegmented source stream by delaying the segmentation decision until the translation has been generated. |
Javier Iranzo-Sánchez; Jorge Iranzo-Sánchez; Adrià Giménez; Jorge Civera; Alfons Juan; | arxiv-cs.CL | 2023-09-26 |
574 | Hindi to English: Transformer-Based Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we have developed a Neural Machine Translation (NMT) system by training the Transformer model to translate texts from Indian Language Hindi to English. |
Kavit Gangar; Hardik Ruparel; Shreyas Lele; | arxiv-cs.CL | 2023-09-22 |
575 | Audience-specific Explanations for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we explore techniques to extract example explanations from a parallel corpus. |
Renhan Lou; Jan Niehues; | arxiv-cs.CL | 2023-09-22 |
576 | Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we developed carefully a parallel corpus for Arabic-English (AR- EN) translation in the financial domain for benchmarking different domain adaptation methods. |
Emad A. Alghamdi; Jezia Zakraoui; Fares A. Abanmy; | arxiv-cs.CL | 2023-09-22 |
577 | OSN-MDAD: Machine Translation Dataset for Arabic Multi-Dialectal Conversations on Online Social Media Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While few attempts have been made to build translation datasets for dialectal Arabic, they are domain dependent and are not OSN cultural-language friendly. In this work, we attempt to alleviate these limitations by proposing an online social network-based multidialect Arabic dataset that is crafted by contextually translating English tweets into four Arabic dialects: Gulf, Yemeni, Iraqi, and Levantine. |
Fatimah Alzamzami; Abdulmotaleb El Saddik; | arxiv-cs.CL | 2023-09-21 |
578 | SpeechAlign: A Framework for Speech Translation Alignment Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Speech-to-Speech and Speech-to-Text translation are currently dynamic areas of research. In our commitment to advance these fields, we present SpeechAlign, a framework designed to evaluate the underexplored field of source-target alignment in speech models. |
Belen Alastruey; Aleix Sant; Gerard I. Gállego; David Dale; Marta R. Costa-jussà; | arxiv-cs.CL | 2023-09-20 |
579 | Machine Translation of Electrical Terminology Constraints Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In practical applications, the accuracy of domain terminology translation is an important criterion for the performance evaluation of domain machine translation models. Aiming at … |
Zepeng Wang; Yuan Chen; Juwei Zhang; | Inf. | 2023-09-20 |
580 | SignBank+: Preparing A Multilingual Sign Language Dataset for Machine Translation Using Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SignBank+, a clean version of the SignBank dataset, optimized for machine translation between spoken language text and SignWriting, a phonetic sign language writing system. |
Amit Moryossef; Zifan Jiang; | arxiv-cs.CL | 2023-09-20 |
581 | NSOAMT — New Search Only Approach to Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The idea is to develop a solution that, by indexing an incremental set of words that combine a certain semantic meaning, makes it possible to create a process of correspondence between their native language record and the language of translation. |
João Luís; Diogo Cardoso; José Marques; Luís Campos; | arxiv-cs.CL | 2023-09-19 |
582 | LoGenText-Plus: Improving Neural Machine Translation-based Logging Texts Generation with Syntactic Templates Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Developers insert logging statements in the source code to collect important runtime information about software systems. The textual descriptions in logging statements (i.e., … |
Zishuo Ding; Yiming Tang; Xiaoyu Cheng; Heng Li; Weiyi Shang; | ACM Transactions on Software Engineering and Methodology | 2023-09-18 |
583 | Optimizing Machine Translation for Virtual Assistants: Multi-Variant Generation with VerbNet and Conditional Beam Search Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper, we introduce a domain-adapted machine translation (MT) model for intelligent virtual assistants (IVA) designed to translate natural language understanding (NLU) … |
Marcin Sowański; Artur Janicki; | 2023 18th Conference on Computer Science and Intelligence … | 2023-09-17 |
584 | Controllability for English-Ukrainian Machine Translation By Using Style Transfer Techniques Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While straightforward machine translation got significant improvements in the last 10 years with the arrival of encoder-decoder neural networks and transformers architecture, … |
DANIIL MAKSYMENKO et. al. | 2023 18th Conference on Computer Science and Intelligence … | 2023-09-17 |
585 | Use of Neural Machine Translation in Multimodal Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multimodal Neural Machine Translation (MNMT) is a type of Machine Translation that allows the translation of source language that contains various forms of information, such as … |
Manavi Nair; Sarvesh Tanwar; Sumit Badotra; Vinay Kukreja; | 2023 6th International Conference on Contemporary Computing … | 2023-09-14 |
586 | Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hallucinations and off-target translation remain unsolved problems in MT, especially for low-resource languages and massively multilingual models. In this paper, we introduce two related methods to mitigate these failure cases with a modified decoding objective, without either requiring retraining or external models. |
Rico Sennrich; Jannis Vamvas; Alireza Mohammadshahi; | arxiv-cs.CL | 2023-09-13 |
587 | Design of A Smart Teaching English Translation System Based on Big Data Machine Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the context of artificial intelligence, the use of machine translation in English reading classroom teaching is a more common learning method. In traditional teaching methods, … |
Chunye Zhang; Tianyue Yu; Yingqi Gao; Mau Luen Tham; | Int. J. Web Based Learn. Teach. Technol. | 2023-09-12 |
588 | Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Improperly assuming the pseudo-parallel data are correctly correlated will make the networks overfit to the noisy correspondence. Therefore, we propose Dual-view Curricular Optimal Transport (DCOT) to learn with noisy correspondence in CCR. |
YABING WANG et. al. | arxiv-cs.CV | 2023-09-11 |
589 | The Effect of Alignment Objectives on Code-Switching Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we are proposing a way of training a single machine translation model that is able to translate monolingual sentences from one language to another, along with translating code-switched sentences to either language. |
Mohamed Anwar; | arxiv-cs.CL | 2023-09-10 |
590 | Algorithmic Translation Correction Mechanisms: An End-to-end Algorithmic Implementation of English-Chinese Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: INTRODUCTION: Machine translation is a modern natural language processing research field with important scientific and practical significance. In practice, the variation of … |
Lei Shi; | EAI Endorsed Trans. Scalable Inf. Syst. | 2023-09-05 |
591 | Epi-Curriculum: Episodic Curriculum Learning for Low-Resource Domain Adaptation in Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel approach Epi-Curriculum to address low-resource domain adaptation (DA), which contains a new episodic training framework along with denoised curriculum learning. |
Keyu Chen; Di Zhuang; Mingchen Li; J. Morris Chang; | arxiv-cs.LG | 2023-09-05 |
592 | Advancing Text-to-GLOSS Neural Translation Using A Novel Hyper-parameter Optimization Technique Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the use of transformers for Neural Machine Translation of text-to-GLOSS for Deaf and Hard-of-Hearing communication. |
Younes Ouargani; Noussaima El Khattabi; | arxiv-cs.CL | 2023-09-05 |
593 | Exploration of Low-resource Language-oriented Machine Translation System of Genetic Algorithm-optimized Hyper-task Network Under Cloud Platform Technology Related Papers Related Patents Related Grants Related Venues Related Experts View |
Xiao Liu; Junlong Chen; Deyu Qi; Tong Zhang; | J. Supercomput. | 2023-09-04 |
594 | Neural Machine Translation Systems for English to Khasi: A Case Study of An Austroasiatic Language Related Papers Related Patents Related Grants Related Venues Related Experts View |
A. V. Hujon; Thoudam Doren Singh; Khwairakpam Amitab; | Expert Syst. Appl. | 2023-09-01 |
595 | Impact of Visual Context on Noisy Multimodal NMT: An Empirical Study for English to Indian Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The study investigates the effectiveness of utilizing multimodal information in Neural Machine Translation (NMT). |
Baban Gain; Dibyanayan Bandyopadhyay; Samrat Mukherjee; Chandranath Adak; Asif Ekbal; | arxiv-cs.CL | 2023-08-30 |
596 | Training and Meta-Evaluating Machine Translation Evaluation Metrics at The Paragraph Level Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: As research on machine translation moves to translating text beyond the sentence level, it remains unclear how effective automatic evaluation metrics are at scoring longer … |
Daniel Deutsch; Juraj Juraska; Mara Finkelstein; Markus Freitag; | arxiv-cs.CL | 2023-08-25 |
597 | Improving Translation Faithfulness of Large Language Models Via Augmenting Instructions IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large Language Models (LLMs) present strong general capabilities, and a current compelling challenge is stimulating their specialized capabilities, such as machine translation, through low-cost instruction tuning. |
YIJIE CHEN et. al. | arxiv-cs.CL | 2023-08-24 |
598 | SeamlessM4T: Massively Multilingual & Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. |
SEAMLESS COMMUNICATION et. al. | arxiv-cs.CL | 2023-08-22 |
599 | An Effective Method Using Phrase Mechanism in Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we report an effective method using a phrase mechanism, PhraseTransformer, to improve the strong baseline model Transformer in constructing a Neural Machine Translation (NMT) system for parallel corpora Vietnamese-Chinese. |
Phuong Minh Nguyen; Le Minh Nguyen; | arxiv-cs.CL | 2023-08-21 |
600 | Knowledge Distillation on Joint Task End-to-End Speech Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: An End-to-End Speech Translation (E2E-ST) model takes input audio in one language and directly produces output text in another language. The model requires to learn both … |
Khandokar Md. Nayem; Ran Xue; Ching-Yun Chang; A. Shanbhogue; | Interspeech | 2023-08-20 |
601 | Factuality Detection Using Machine Translation — A Use Case for German Clinical Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the context of factuality detection, this work presents a simple solution using machine translation to translate English data to German to train a transformer-based factuality detection model. |
Mohammed Bin Sumait; Aleksandra Gabryszak; Leonhard Hennig; Roland Roller; | arxiv-cs.CL | 2023-08-17 |
602 | Fast Training of NMT Model with Data Sorting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One potential area for improvement is to address the computation of empty tokens that the Transformer computes only to discard them later, leading to an unnecessary computational burden. To tackle this, we propose an algorithm that sorts translation sentence pairs based on their length before batching, minimizing the waste of computing power. |
Daniela N. Rim; Kimera Richard; Heeyoul Choi; | arxiv-cs.CL | 2023-08-16 |
603 | VBD-MT Chinese-Vietnamese Translation Systems for VLSP 2022 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present our systems participated in the VLSP 2022 machine translation shared task. |
Hai Long Trieu; Song Kiet Bui; Tan Minh Tran; Van Khanh Tran; Hai An Nguyen; | arxiv-cs.CL | 2023-08-15 |
604 | Extrapolating Large Language Models to Non-English By Aligning Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we empower pre-trained LLMs on non-English languages by building semantic alignment across languages. |
WENHAO ZHU et. al. | arxiv-cs.CL | 2023-08-09 |
605 | Negative Lexical Constraints in Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We compared various methods based on modifying either the decoding process or the training data. |
Josef Jon; Dušan Variš; Michal Novák; João Paulo Aires; Ondřej Bojar; | arxiv-cs.CL | 2023-08-07 |
606 | Show Me The World in My Language: Establishing The First Baseline for Scene-Text to Scene-Text Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the task of “visually” translating scene text from a source language (e.g., Hindi) to a target language (e.g., English). |
Shreyas Vaidya; Arvind Kumar Sharma; Prajwal Gatti; Anand Mishra; | arxiv-cs.CV | 2023-08-06 |
607 | Do Multilingual Language Models Think Better in English? IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a new approach called self-translate, which overcomes the need of an external translation system by leveraging the few-shot translation capabilities of multilingual language models. |
Julen Etxaniz; Gorka Azkune; Aitor Soroa; Oier Lopez de Lacalle; Mikel Artetxe; | arxiv-cs.CL | 2023-08-02 |
608 | Predicting Perfect Quality Segments in MT Output with Fine-Tuned OpenAI LLM: Is It Possible to Capture Editing Distance Patterns from Historical Data? Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Translation Quality Estimation (TQE) is an essential step before deploying the output translation into usage. TQE is also critical in assessing machine translation (MT) and human … |
Serge Gladkoff; G. Erofeev; Lifeng Han; G. Nenadic; | ArXiv | 2023-07-31 |
609 | MTUncertainty: Assessing The Need for Post-editing of Machine Translation Outputs By Fine-tuning OpenAI LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We take OpenAI models as the best state-of-the-art technology and approach TQE as a binary classification task. |
Serge Gladkoff; Lifeng Han; Gleb Erofeev; Irina Sorokina; Goran Nenadic; | arxiv-cs.CL | 2023-07-31 |
610 | Structural Transfer Learning in NL-to-Bash Semantic Parsers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a methodology for obtaining a quantitative understanding of structural overlap between machine translation tasks. |
Kyle Duffy; Satwik Bhattamishra; Phil Blunsom; | arxiv-cs.CL | 2023-07-31 |
611 | Toward Quantum Machine Translation of Syntactically Distinct Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The present study aims to explore the feasibility of language translation using quantum natural language processing algorithms on noisy intermediate-scale quantum (NISQ) devices. |
Mina Abbaszade; Mariam Zomorodi; Vahid Salari; Philip Kurian; | arxiv-cs.CL | 2023-07-31 |
612 | Using Online Machine Translation in International Scholarly Writing and Publishing: A Longitudinal Case of A Chinese Engineering Scholar Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Scholars who use English as an additional language (EAL) worldwide are under increasing pressure to write and publish in English due to the pervasive publish‐or‐perish culture and … |
C. Zou; Wei Gong; Ping Li; | Learned Publishing | 2023-07-31 |
613 | Morpheme-Based Neural Machine Translation Models for Low-Resource Fusion Languages Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Neural approaches, which are currently state-of-the-art in many areas, have contributed significantly to the exciting advancements in machine translation. However, Neural Machine … |
A. Gezmu; A. Nürnberger; | ACM Transactions on Asian and Low-Resource Language … | 2023-07-28 |
614 | Multilingual Lexical Simplification Via Paraphrase Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel multilingual LS method via paraphrase generation, as paraphrases provide diversity in word selection while preserving the sentence’s meaning. |
KANG LIU et. al. | arxiv-cs.CL | 2023-07-27 |
615 | XDLM: Cross-lingual Diffusion Language Model for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, while pretraining with diffusion models has been studied within a single language, the potential of cross-lingual pretraining remains understudied. To address these gaps, we propose XDLM, a novel Cross-lingual diffusion model for machine translation, consisting of pretraining and fine-tuning stages. |
Linyao Chen; Aosong Feng; Boming Yang; Zihui Li; | arxiv-cs.CL | 2023-07-25 |
616 | Lost In Translation: Generating Adversarial Examples Robust to Round-Trip Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a comprehensive study on the robustness of current text adversarial attacks to round-trip translation. |
Neel Bhandari; Pin-Yu Chen; | arxiv-cs.CL | 2023-07-24 |
617 | Incorporating Human Translator Style Into English-Turkish Literary Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on English-Turkish literary translation and develop machine translation models that take into account the stylistic features of translators. |
ZEYNEP YIRMIBEŞOĞLU et. al. | arxiv-cs.CL | 2023-07-21 |
618 | Construction of Mizo: English Parallel Corpus for Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Parallel corpus is a key component of statistical and Neural Machine Translation (NMT). While most research focuses on machine translation, corpus creation studies are limited for … |
Thangkhanhau Haulai; J. Hussain; | ACM Transactions on Asian and Low-Resource Language … | 2023-07-21 |
619 | Improving End-to-End Speech Translation By Imitation-Based Knowledge Distillation with Synthetic Transcripts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an imitation learning approach where a teacher NMT system corrects the errors of an AST student without relying on manual transcripts. |
Rebekka Hubert; Artem Sokolov; Stefan Riezler; | arxiv-cs.CL | 2023-07-17 |
620 | Data Augmentation for Machine Translation Via Dependency Subtree Swapping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a generic framework for data augmentation via dependency subtree swapping that is applicable to machine translation. |
Attila Nagy; Dorina Petra Lakatos; Botond Barta; Patrick Nanys; Judit Ács; | arxiv-cs.CL | 2023-07-13 |
621 | Investigating Unsupervised Neural Machine Translation for Low-resource Language Pair English-Mizo Via Lexically Enhanced Pre-trained Language Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The vast majority of languages in the world at present are considered to be low-resource languages. Since the availability of large parallel data is crucial for the success of … |
C. Lalrempuii; B. Soni; | ACM Transactions on Asian and Low-Resource Language … | 2023-07-13 |
622 | Back Deduction Based Testing for Word Sense Disambiguation Ability of Machine Translation Systems Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine translation systems have penetrated our daily lives, providing translation services from source language to target language to millions of users online daily. Word Sense … |
JUN WANG et. al. | Proceedings of the 32nd ACM SIGSOFT International Symposium … | 2023-07-12 |
623 | The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-speech translation (S2ST) task which aims to translate from English speech of multi-source to Chinese speech. |
KUN SONG et. al. | arxiv-cs.SD | 2023-07-10 |
624 | Multi-VALUE: A Framework for Cross-Dialectal English NLP IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a suite of resources for evaluating and achieving English dialect invariance. |
CALEB ZIEMS et. al. | acl | 2023-07-08 |
625 | TeCS: A Dataset and Benchmark for Tense Consistency of Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a parallel tense test set, containing French-English 552 utterances. |
Yiming Ai; Zhiwei He; Kai Yu; Rui Wang; | acl | 2023-07-08 |
626 | Simple and Effective Unsupervised Speech Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The amount of labeled data to train models for speech tasks is limited for most languages, however, the data scarcity is exacerbated for speech translation which requires labeled data covering two different languages. To address this issue, we study a simple and effective approach to build speech translation systems without labeled data by leveraging recent advances in unsupervised speech recognition, machine translation and speech synthesis, either in a pipeline approach, or to generate pseudo-labels for training end-to-end speech translation models. |
CHANGHAN WANG et. al. | acl | 2023-07-08 |
627 | Learning Optimal Policy for Simultaneous Machine Translation Via Binary Search IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a new method for constructing the optimal policy online via binary search. |
Shoutao Guo; Shaolei Zhang; Yang Feng; | acl | 2023-07-08 |
628 | Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple yet efficient approach to adapt VLP to unseen languages using MPLM. |
Yasmine Karoui; R�mi Lebret; Negar Foroutan Eghlidi; Karl Aberer; | acl | 2023-07-08 |
629 | MCLIP: Multilingual CLIP Via Cross-lingual Transfer IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce mCLIP, a retrieval-efficient dual-stream multilingual VLP model, trained by aligning the CLIP model and a Multilingual Text Encoder (MTE) through a novel Triangle Cross-modal Knowledge Distillation (TriKD) method. |
GUANHUA CHEN et. al. | acl | 2023-07-08 |
630 | Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Firstly, the current objective of KD spreads its focus to whole distributions to learn the knowledge, yet lacks special treatment on the most crucial top-1 information. Secondly, the knowledge is largely covered by the golden information due to the fact that most top-1 predictions of teachers overlap with ground-truth tokens, which further restricts the potential of KD. To address these issues, we propose a new method named Top-1 Information Enhanced Knowledge Distillation (TIE-KD). |
SONGMING ZHANG et. al. | acl | 2023-07-08 |
631 | What About �em�? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Wrong pronoun translations can discriminate against marginalized groups, e. g. , non-binary individuals (Dev et al. , 2021). In this �reality check�, we study how three commercial MT systems translate 3rd-person pronouns. |
Anne Lauscher; Debora Nozza; Ehm Miltersen; Archie Crowley; Dirk Hovy; | acl | 2023-07-08 |
632 | Easy Guided Decoding in Providing Suggestions for Interactive Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we utilize the parameterized objective function of neural machine translation (NMT) and propose a novel constrained decoding algorithm, namely Prefix-Suffix Guided Decoding (PSGD), to deal with the TS problem without additional training. |
Ke Wang; Xin Ge; Jiayi Wang; Yuqi Zhang; Yu Zhao; | acl | 2023-07-08 |
633 | Understanding and Bridging The Modality Gap for Speech Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We find that the modality gap is relatively small during training except for some difficult cases, but keeps increasing during inference due to the cascading effect. To address these problems, we propose the Cross-modal Regularization with Scheduled Sampling (Cress) method. |
Qingkai Fang; Yang Feng; | acl | 2023-07-08 |
634 | Back Translation for Speech-to-text Translation Without Transcripts IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to utilize large amounts of target-side monolingual data to enhance ST without transcripts. |
Qingkai Fang; Yang Feng; | acl | 2023-07-08 |
635 | Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting Model IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To eliminate the rule-based nature of data creation, we instead propose using machine translation models to create gender-biased text from real gender-fair text via round-trip translation. |
Chantal Amrhein; Florian Schottmann; Rico Sennrich; Samuel L�ubli; | acl | 2023-07-08 |
636 | CMOT: Cross-modal Mixup Via Optimal Transport for Speech Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Cross-modal Mixup via Optimal Transport (CMOT) to overcome the modality gap. |
Yan Zhou; Qingkai Fang; Yang Feng; | acl | 2023-07-08 |
637 | Translation-Enhanced Multilingual Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide two key contributions. 1) Relying on a multilingual multi-modal encoder, we provide a systematic empirical study of standard methods used in cross-lingual NLP when applied to mTTI: Translate Train, Translate Test, and Zero-Shot Transfer. 2) We propose Ensemble Adapter (EnsAd), a novel parameter-efficient approach that learns to weigh and consolidate the multilingual text knowledge within the mTTI framework, mitigating the language gap and thus improving mTTI performance. |
Yaoyiran Li; Ching-Yun Chang; Stephen Rawls; Ivan Vulic; Anna Korhonen; | acl | 2023-07-08 |
638 | Do GPTs Produce Less Literal Translations? IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, there has been relatively little investigation on how such translations differ qualitatively from the translations generated by standard Neural Machine Translation (NMT) models. In this work, we investigate these differences in terms of the literalness of translations produced by the two systems. |
Vikas Raunak; Arul Menezes; Matt Post; Hany Hassan; | acl | 2023-07-08 |
639 | Scene Graph As Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate a more realistic unsupervised multimodal machine translation (UMMT) setup, inference-time image-free UMMT, where the model is trained with source-text image pairs, and tested with only source-text inputs. |
Hao Fei; Qian Liu; Meishan Zhang; Min Zhang; Tat-Seng Chua; | acl | 2023-07-08 |
640 | Searching for Needles in A Haystack: On The Role of Incidental Bilingualism in PaLM�s Translation Capability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a mixed-method approach to measure and understand incidental bilingualism at scale. |
Eleftheria Briakou; Colin Cherry; George Foster; | acl | 2023-07-08 |
641 | Cross2StrA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unpaired cross-lingual image captioning has long suffered from irrelevancy and disfluency issues, due to the inconsistencies of the semantic scene and syntax attributes during transfer. In this work, we propose to address the above problems by incorporating the scene graph (SG) structures and the syntactic constituency (SC) trees. |
Shengqiong Wu; Hao Fei; Wei Ji; Tat-Seng Chua; | acl | 2023-07-08 |
642 | A Simple Concatenation Can Effectively Improve Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the works of video Transformer, we propose a simple unified cross-modal ST method, which concatenates speech and text as the input, and builds a teacher that can utilize both cross-modal information simultaneously. |
Linlin Zhang; Kai Fan; Boxing Chen; Luo Si; | acl | 2023-07-08 |
643 | On Evaluating Multilingual Compositional Generalization with Translated Datasets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we show that this entails critical semantic distortion. To address this limitation, we craft a faithful rule-based translation of the MCWQ dataset from English to Chinese and Japanese. |
Zi Wang; Daniel Hershcovich; | acl | 2023-07-08 |
644 | XPQA: Cross-Lingual Product Question Answering in 12 Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing work on PQA focuses mainly on English, in practice there is need to support multiple customer languages while leveraging product information available in English. To study this practical industrial task, we present xPQA, a large-scale annotated cross-lingual PQA dataset in 12 languages, and report results in (1) candidate ranking, to select the best English candidate containing the information to answer a non-English question; and (2) answer generation, to generate a natural-sounding non-English answer based on the selected English candidate. |
Xiaoyu Shen; Akari Asai; Bill Byrne; Adria De Gispert; | acl | 2023-07-08 |
645 | Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new MMT approach based on a strong text-only MT model, which uses neural adapters, a novel guided self-attention mechanism and which is jointly trained on both visually-conditioned masking and MMT. |
Matthieu Futeral; Cordelia Schmid; Ivan Laptev; Beno�t Sagot; Rachel Bawden; | acl | 2023-07-08 |
646 | Neural Machine Translation Methods for Translating Text to Sign Language Glosses IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our experiments, we improve the performance of the transformer-based models via (1) data augmentation, (2) semi-supervised Neural Machine Translation (NMT), (3) transfer learning and (4) multilingual NMT. |
Dele Zhu; Vera Czehmann; Eleftherios Avramidis; | acl | 2023-07-08 |
647 | Exploring Better Text Image Translation with Multimodal Codebook Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we first annotate a Chinese-English TIT dataset named OCRMT30K, providing convenience for subsequent studies. |
ZHIBIN LAN et. al. | acl | 2023-07-08 |
648 | Understanding and Improving The Robustness of Terminology Constraints in Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the robustness of two typical terminology translation methods: Placeholder (PH) and Code-Switch (CS), concerning (1) the number of constraints and (2) the target constraint length. |
HUAAO ZHANG et. al. | acl | 2023-07-08 |
649 | Continual Knowledge Distillation for Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a method called continual knowledge distillation to take advantage of existing translation models to improve one model of interest. |
Yuanchi Zhang; Peng Li; Maosong Sun; Yang Liu; | acl | 2023-07-08 |
650 | Using Neural Machine Translation for Generating Diverse Challenging Exercises for Language Learner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach to automatically generate distractors for cloze exercises for English language learners, using round-trip neural machine translation. |
Frank Palma Gomez; Subhadarshi Panda; Michael Flor; Alla Rozovskaya; | acl | 2023-07-08 |
651 | Extrinsic Evaluation of Machine Translation Metrics IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate how useful MT metrics are at detecting the segment-level quality by correlating metrics with how useful the translations are for downstream task. |
Nikita Moghe; Tom Sherborne; Mark Steedman; Alexandra Birch; | acl | 2023-07-08 |
652 | PEIT: Bridging The Modality Gap with Pre-trained Models for End-to-End Image Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose PEIT, an end-to-end image translation framework that bridges the modality gap with pre-trained models. |
Shaolin Zhu; Shangjie Li; Yikun Lei; Deyi Xiong; | acl | 2023-07-08 |
653 | Songs Across Borders: Singable and Controllable Neural Lyric Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper bridges the singability quality gap by formalizing lyric translation into a constrained translation problem, converting theoretical guidance and practical techniques from translatology literature to prompt-driven NMT approaches, exploring better adaptation methods, and instantiating them to an English-Chinese lyric translation system. |
Longshen Ou; Xichu Ma; Min-Yen Kan; Ye Wang; | acl | 2023-07-08 |
654 | Learning Language-Specific Layers for Multilingual Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Language-Specific Transformer Layers (LSLs), which allow us to increase model capacity, while keeping the amount of computation and the number of parameters used in the forward pass constant. |
Telmo Pires; Robin Schmidt; Yi-Hsiu Liao; Stephan Peitz; | acl | 2023-07-08 |
655 | INK: Injecting KNN Knowledge in Nearest Neighbor Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an effective training framework INK to directly smooth the representation space via adjusting representations of kNN neighbors with a small number of new parameters. |
Wenhao Zhu; Jingjing Xu; Shujian Huang; Lingpeng Kong; Jiajun Chen; | acl | 2023-07-08 |
656 | Neural Machine Translation for Mathematical Formulae Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we perform the tasks of translating from LaTeX to Mathematica as well as from LaTeX to semantic LaTeX. |
Felix Petersen; Moritz Schubotz; Andre Greiner-Petter; Bela Gipp; | acl | 2023-07-08 |
657 | Text Style Transfer Back-Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For natural inputs, BT brings only slight improvements and sometimes even adverse effects. To address this issue, we propose Text Style Transfer Back Translation (TST BT), which uses a style transfer to modify the source side of BT data. |
DAIMENG WEI et. al. | acl | 2023-07-08 |
658 | MultiTACRED: A Multilingual Version of The TAC Relation Extraction Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Relation extraction (RE) is a fundamental task in information extraction, whose extension to multilingual settings has been hindered by the lack of supervised resources comparable in size to large English datasets such as TACRED (Zhang et al. , 2017). To address this gap, we introduce the MultiTACRED dataset, covering 12 typologically diverse languages from 9 language families, which is created by machine-translating TACRED instances and automatically projecting their entity annotations. |
Leonhard Hennig; Philippe Thomas; Sebastian M�ller; | acl | 2023-07-08 |
659 | An Analysis of Error Types in Chinese to English Translation By Google Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Abstract: Due to the rapid development of globalization and digitalization, neural machine translation (NMT) systems have gradually developed into the mainstream technology in the … |
Yanqi Lu; | Proceedings of the 2023 International Joint Conference on … | 2023-07-07 |
660 | Performance Evaluation of English to Bodo Neural Machine Translation System with Varying Model Architecture and Vocabulary Size Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper is about a work done on Neural Machine Translation of English-Bodo language pair using deep learning technique. Bodo is a language of northeastern part of India … |
P. Boruah; Shikhar Kr. Sarma; Kishore Kashyap; Simanta Kalita; | 2023 14th International Conference on Computing … | 2023-07-06 |
661 | Tokenization Effect on Neural Machine Translation: An Experimental Investigation for English-Assamese Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Tokenization, as a research task, is mostly overlooked when dealing with machine translation as much emphasis is placed on modelling or data enhancement, not to speak for language … |
Mazida Akhtara Ahmed; Kishore Kashyap; Shikhar Kumar Sarma; | 2023 14th International Conference on Computing … | 2023-07-06 |
662 | To Be or Not to Be: A Translation Reception Study of A Literary Text Translated Into Dutch and Catalan Using Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article presents the results of a study involving the reception of a fictional story by Kurt Vonnegut translated from English into Catalan and Dutch in three conditions: machine-translated (MT), post-edited (PE) and translated from scratch (HT). |
Ana Guerberof Arenas; Antonio Toral; | arxiv-cs.CL | 2023-07-05 |
663 | Simplification of Arabic Text: A Hybrid Approach Integrating Machine Translation and Transformer-based Lexical Model Related Papers Related Patents Related Grants Related Venues Related Experts View |
Suha Al-Thanyyan; Aqil M. Azmi; | J. King Saud Univ. Comput. Inf. Sci. | 2023-07-01 |
664 | X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. |
MEHRAD MORADSHAHI et. al. | arxiv-cs.CL | 2023-06-30 |
665 | Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple yet efficient approach to adapt VLP to unseen languages using MPLM. |
Yasmine Karoui; Rémi Lebret; Negar Foroutan; Karl Aberer; | arxiv-cs.CL | 2023-06-29 |
666 | Learning Multilingual Expressive Speech Representation for Prosody Prediction Without Parallel Data Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We propose a method for speech-to-speech emotionpreserving translation that operates at the level of discrete speech units. Our approach relies on the use of multilingual emotion … |
J. Duret; Titouan Parcollet; Y. Estève; | ArXiv | 2023-06-29 |
667 | The Unreasonable Effectiveness of Few-shot Learning for Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. |
XAVIER GARCIA et. al. | icml | 2023-06-27 |
668 | Slot Lost in Translation? Not Anymore: A Machine Translation Model for Virtual Assistants with Type-Independent Slot Transfer Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this article, we present a machine translation model adapted to the domain of intelligent virtual assistants (IVA) that can be used to translate training and evaluation … |
Marcin Sowanski; A. Janicki; | 2023 30th International Conference on Systems, Signals and … | 2023-06-27 |
669 | Quality Estimation of Machine Translated Texts Based on Direct Evidence from Training Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we show that the parallel corpus used as training data for training the MT system holds direct clues for estimating the quality of translations produced by the MT system. |
Vibhuti Kumari; Narayana Murthy Kavi; | arxiv-cs.CL | 2023-06-27 |
670 | Constructing Multilingual Code Search Dataset Using Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this research, we create a multilingual code search dataset in four natural and four programming languages using a neural machine translation model. |
Ryo Sekizawa; Nan Duan; Shuai Lu; Hitomi Yanaka; | arxiv-cs.CL | 2023-06-27 |
671 | A Graph Fusion Approach for Cross-Lingual Machine Reading Comprehension Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel approach, which jointly models the cross-lingual alignment information and the mono-lingual syntax information using a graph. |
ZENAN XU et. al. | aaai | 2023-06-26 |
672 | Evaluation of Chinese-English Machine Translation of Emotion-Loaded Microblog Texts: A Human Annotated Dataset for The Quality Assessment of Emotion Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on how current Machine Translation (MT) tools perform on the translation of emotion-loaded texts by evaluating outputs from Google Translate according to a framework proposed in this paper. |
Shenbin Qian; Constantin Orasan; Felix do Carmo; Qiuliang Li; Diptesh Kanojia; | arxiv-cs.CL | 2023-06-20 |
673 | BayLing: Bridging Cross-lingual Alignment and Instruction Following Through Interactive Translation for Large Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To minimize human workload, we propose to transfer the capabilities of language generation and instruction following from English to other languages through an interactive translation task. |
SHAOLEI ZHANG et. al. | arxiv-cs.CL | 2023-06-19 |
674 | Data Augmentation Via Back-translation for Aspect Term Extraction Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We tackle Aspect Term Extraction (ATE), a task that automatically recognizes aspect terms conditioned on the under-standing of word-level semantics. Due to the capacity of … |
Qingting Xu; Yu Hong; Jiaxiang Chen; Jianmin Yao; Guodong Zhou; | 2023 International Joint Conference on Neural Networks … | 2023-06-18 |
675 | Robust Secret Data Hiding for Transformer-based Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Hiding secret information in text is a research area of significant importance and a great challenge. In recent years, there have been huge developments and exciting advances in … |
Tianhe Lu; Gongshen Liu; Ru Zhang; Peixuan Li; Tianjie Ju; | 2023 International Joint Conference on Neural Networks … | 2023-06-18 |
676 | Sheffield’s Submission to The AmericasNLP Shared Task on Machine Translation Into Indigenous Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we describe the University of Sheffield’s submission to the AmericasNLP 2023 Shared Task on Machine Translation into Indigenous Languages which comprises the translation from Spanish to eleven indigenous languages. |
Edward Gow-Smith; Danae Sánchez Villegas; | arxiv-cs.CL | 2023-06-16 |
677 | Discourse Representation Structure Parsing for Chinese Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe the pipeline of automatically collecting the linearized Chinese meaning representation data for sequential-to sequential neural networks. |
Chunliu Wang; Xiao Zhang; Johan Bos; | arxiv-cs.CL | 2023-06-16 |
678 | Baseline Transliteration Corpus for Improved English-Amharic Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View |
Yohannes Biadgligne; Kamel Smaïli; | Informatica (Slovenia) | 2023-06-15 |
679 | Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Babel-ImageNet, a massively multilingual benchmark that offers (partial) translations of ImageNet labels to 100 languages, built without machine translation or manual annotation. |
Gregor Geigle; Radu Timofte; Goran Glavaš; | arxiv-cs.CL | 2023-06-14 |
680 | A Survey of Vision-Language Pre-training from The Lens of Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We summarize the common architectures, pre-training objectives, and datasets from literature and conjecture what further is needed to make progress on multimodal machine translation. |
Jeremy Gwinnup; Kevin Duh; | arxiv-cs.CL | 2023-06-12 |
681 | Rethinking Translation Memory Augmented Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper rethinks translation memory augmented neural machine translation (TM-augmented NMT) from two perspectives, i.e., a probabilistic view of retrieval and the variance-bias … |
HONGKUN HAO et. al. | arxiv-cs.CL | 2023-06-12 |
682 | Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we investigate the impact of applying textual data augmentation tasks to low resource machine translation. |
Catherine Gitau; VUkosi Marivate; | arxiv-cs.CL | 2023-06-12 |
683 | Measuring Sentiment Bias in Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we explore how machine translation might introduce a bias in sentiments as classified by sentiment analysis models. |
KAI HARTUNG et. al. | arxiv-cs.CL | 2023-06-12 |
684 | A Benchmark Dataset and Evaluation Methodology for Chinese Zero Pronoun Translation Related Papers Related Patents Related Grants Related Venues Related Experts View |
MINGZHOU XU et. al. | Language Resources and Evaluation | 2023-06-10 |
685 | Good, But Not Always Fair: An Evaluation of Gender Bias for Three Commercial Machine Translation Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, analyses have been redirected to more nuanced aspects, intricate phenomena, as well as potential risks that may arise from the widespread use of MT tools. Along this line, this paper offers a meticulous assessment of three commercial MT systems – Google Translate, DeepL, and Modern MT – with a specific focus on gender translation and bias. |
Silvia Alma Piazzolla; Beatrice Savoldi; Luisa Bentivogli; | arxiv-cs.CL | 2023-06-09 |
686 | Assisting Language Learners: Automated Trans-Lingual Definition Generation Via Contrastive Prompt Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel task of Trans-Lingual Definition Generation (TLDG), which aims to generate definitions in another language, i.e., the native speaker’s language. |
HENGYUAN ZHANG et. al. | arxiv-cs.CL | 2023-06-09 |
687 | Improving Language Model Integration for Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, some works on automatic speech recognition have demonstrated that, if the implicit language model is neutralized in decoding, further improvements can be gained when integrating an external language model. In this work, we transfer this concept to the task of machine translation and compare with the most prominent way of including additional monolingual data – namely back-translation. |
Christian Herold; Yingbo Gao; Mohammad Zeineldeen; Hermann Ney; | arxiv-cs.CL | 2023-06-08 |
688 | Twi Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: French is a strategically and economically important language in the regions where the African language Twi is spoken. However, only a very small proportion of Twi speakers in … |
Frederick Gyasi; Tim Schlippe; | Big Data Cogn. Comput. | 2023-06-08 |
689 | A Little Is Enough: Few-Shot Quality Estimation Based Corpus Filtering Improves Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: All the scripts and datasets utilized in this study will be publicly available. |
Akshay Batheja; Pushpak Bhattacharyya; | arxiv-cs.CL | 2023-06-06 |
690 | MCTS: A Multi-Reference Chinese Text Simplification Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce MCTS, a multi-reference Chinese text simplification dataset. |
RUINING CHONG et. al. | arxiv-cs.CL | 2023-06-05 |
691 | Machine Translation to Sign Language Using Post-Translation Replacement Without Placeholders Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Sign language is typically the first language for those who are born deaf or who lose their hearing in early childhood. To provide important information for these individuals, it … |
Taro Miyazaki; Naoki Nakatani; Tsubasa Uchida; H. Kaneko; Masanori Sano; | 2023 IEEE International Conference on Acoustics, Speech, … | 2023-06-04 |
692 | Extract and Attend: Improving Entity Translation in Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When we humans encounter an unknown entity during translation, we usually first look up in a dictionary and then organize the entity translation together with the translations of other parts to form a smooth target sentence. Inspired by this translation process, we propose an Extract-and-Attend approach to enhance entity translation in NMT, where the translation candidates of source entities are first extracted from a dictionary and then attended to by the NMT model to generate the target sentence. |
ZIXIN ZENG et. al. | arxiv-cs.CL | 2023-06-03 |
693 | Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the submission of the UPC Machine Translation group to the IWSLT 2023 Offline Speech Translation task. |
Ioannis Tsiamas; Gerard I. Gállego; José A. R. Fonollosa; Marta R. Costa-jussà; | arxiv-cs.CL | 2023-06-02 |
694 | Machine Versus Corpus-based Translation of Multiword Terms Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine translation (MT) post-editing is an increasingly common practice in the translation industry which is also slowly being applied in the development of terminological … |
Melania Cabezas-García; P. L. Araúz; | Digit. Scholarsh. Humanit. | 2023-06-01 |
695 | Improving Polish to English Neural Machine Translation with Transfer Learning: Effects of Data Volume and Language Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates the impact of data volume and the use of similar languages on transfer learning in a machine translation task. |
Juuso Eronen; Michal Ptaszynski; Karol Nowakowski; Zheng Lin Chia; Fumito Masui; | arxiv-cs.CL | 2023-06-01 |
696 | Regressing Word and Sentence Embeddings for Low-Resource Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In recent years, neural machine translation (NMT) has achieved unprecedented performance in the automated translation of resource-rich languages. However, it has not yet managed … |
Inigo Jauregi Unanue; E. Z. Borzeshi; M. Piccardi; | IEEE Transactions on Artificial Intelligence | 2023-06-01 |
697 | Improved Cross-Lingual Transfer Learning For Automatic Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this work it to improve cross-lingual transfer learning in multilingual speech-to-text translation via semantic knowledge distillation. |
SAMEER KHURANA et. al. | arxiv-cs.CL | 2023-06-01 |
698 | How Does Pretraining Improve Discourse-Aware Translation? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the underlying reasons for their strong performance have not been well explained. To bridge this gap, we introduce a probing task to interpret the ability of PLMs to capture discourse relation knowledge. |
Zhihong Huang; Longyue Wang; Siyou Liu; Derek F. Wong; | arxiv-cs.CL | 2023-05-31 |
699 | Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We tackle the task of automatically discriminating between human and machine translations. |
Malina Chichirau; Rik van Noord; Antonio Toral; | arxiv-cs.CL | 2023-05-31 |
700 | Translation-Enhanced Multilingual Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 2) We propose Ensemble Adapter (EnsAd), a novel parameter-efficient approach that learns to weigh and consolidate the multilingual text knowledge within the mTTI framework, mitigating the language gap and thus improving mTTI performance. |
Yaoyiran Li; Ching-Yun Chang; Stephen Rawls; Ivan Vulić; Anna Korhonen; | arxiv-cs.CL | 2023-05-30 |
701 | A Corpus for Sentence-level Subjectivity Detection on English News Articles IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop novel annotation guidelines for sentence-level subjectivity detection, which are not limited to language-specific cues. |
FRANCESCO ANTICI et. al. | arxiv-cs.CL | 2023-05-29 |
702 | HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents HaVQA, the first multimodal dataset for visual question-answering (VQA) tasks in the Hausa language. |
SHANTIPRIYA PARIDA et. al. | arxiv-cs.CL | 2023-05-28 |
703 | An Open-Source Gloss-Based Baseline for Spoken to Signed Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an open-source implementation of a text-to-gloss-to-pose-to-video pipeline approach, demonstrating conversion from German to Swiss German Sign Language, French to French Sign Language of Switzerland, and Italian to Italian Sign Language of Switzerland. |
AMIT MORYOSSEF et. al. | arxiv-cs.CL | 2023-05-28 |
704 | Enhancing Translation for Indigenous Languages: Experiments with Multilingual Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes CIC NLP’s submission to the AmericasNLP 2023 Shared Task on machine translation systems for indigenous languages of the Americas. |
ATNAFU LAMBEBO TONJA et. al. | arxiv-cs.CL | 2023-05-27 |
705 | Robustness of Multi-Source MT to Transcription Errors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automatic speech translation is sensitive to speech recognition errors, but in a multilingual scenario, the same content may be available in various languages via simultaneous interpreting, dubbing or subtitling. In this paper, we hypothesize that leveraging multiple sources will improve translation quality if the sources complement one another in terms of correct information they contain. |
Dominik Macháček; Peter Polák; Ondřej Bojar; Raj Dabre; | arxiv-cs.CL | 2023-05-26 |
706 | Disambiguated Lexically Constrained Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose disambiguated LCNMT (D-LCNMT) to solve the problem. |
JINPENG ZHANG et. al. | arxiv-cs.CL | 2023-05-26 |
707 | Do GPTs Produce Less Literal Translations? IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, there has been relatively little investigation on how such translations differ qualitatively from the translations generated by standard Neural Machine Translation (NMT) models. In this work, we investigate these differences in terms of the literalness of translations produced by the two systems. |
Vikas Raunak; Arul Menezes; Matt Post; Hany Hassan Awadalla; | arxiv-cs.CL | 2023-05-26 |
708 | What About “em”? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: As 3rd-person pronoun usage shifts to include novel forms, e.g., neopronouns, we need more research on identity-inclusive NLP. Exclusion is particularly harmful in one of the most … |
Anne Lauscher; Debora Nozza; Archie Crowley; E. Miltersen; Dirk Hovy; | Annual Meeting of the Association for Computational … | 2023-05-25 |
709 | MTCue: Learning Zero-Shot Control of Extra-Textual Attributes By Leveraging Unstructured Context in Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces MTCue, a novel neural machine translation (NMT) framework that interprets all context (including discrete variables) as text. |
Sebastian Vincent; Robert Flynn; Carolina Scarton; | arxiv-cs.CL | 2023-05-25 |
710 | What About Em? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Wrong pronoun translations can discriminate against marginalized groups, e.g., non-binary individuals (Dev et al., 2021). In this “reality check”, we study how three commercial MT systems translate 3rd-person pronouns. |
Anne Lauscher; Debora Nozza; Archie Crowley; Ehm Miltersen; Dirk Hovy; | arxiv-cs.CL | 2023-05-25 |
711 | Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Cross-Lingual Knowledge Distillation (CLKD) from a strong English AS2 teacher as a method to train AS2 models for low-resource languages in the tasks without the need of labeled data for the target language. |
Shivanshu Gupta; Yoshitomo Matsubara; Ankit Chadha; Alessandro Moschitti; | arxiv-cs.CL | 2023-05-25 |
712 | Eliciting The Translation Ability of Large Language Models Via Multilingual Finetuning with Translation Instructions IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a detailed analysis by finetuning a multilingual pretrained language model, XGLM-7B, to perform multilingual translation following given instructions. |
Jiahuan Li; Hao Zhou; Shujian Huang; Shanbo Cheng; Jiajun Chen; | arxiv-cs.CL | 2023-05-24 |
713 | Leveraging GPT-4 for Automatic Translation Post-Editing IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we formalize the task of direct translation post-editing with Large Language Models (LLMs) and explore the use of GPT-4 to automatically post-edit NMT outputs across several language pairs. |
Vikas Raunak; Amr Sharaf; Yiren Wang; Hany Hassan Awadallah; Arul Menezes; | arxiv-cs.CL | 2023-05-24 |
714 | An Analysis of The Evaluation of The Translation Quality of Neural Machine Translation Application Systems Related Papers Related Patents Related Grants Related Venues Related Experts View |
Shanshan Liu; Wenxiao Zhu; | Appl. Artif. Intell. | 2023-05-23 |
715 | BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To better model the common semantics shared across texts and videos, we introduce a contrastive learning method in the cross-modal encoder. |
LIYAN KANG et. al. | arxiv-cs.CV | 2023-05-23 |
716 | Improving Speech Translation By Fusing Speech and Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we harness the complementary strengths of speech and text, which are disparate modalities. |
WENBIAO YIN et. al. | arxiv-cs.CL | 2023-05-23 |
717 | Non-parametric, Nearest-neighbor-assisted Fine-tuning for Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through qualitative analysis, we found particular improvements when it comes to translating grammatical relations or function words, which results in increased fluency of our model. |
Jiayi Wang; Ke Wang; Yuqi Zhang; Yu Zhao; Pontus Stenetorp; | arxiv-cs.CL | 2023-05-22 |
718 | Neural Machine Translation for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we survey the NMT for code generation literature, cataloging the variety of methods that have been explored according to input and output representations, model architectures, optimization techniques used, data sets, and evaluation methods. |
Dharma KC; Clayton T. Morrison; | arxiv-cs.CL | 2023-05-22 |
719 | Decomposed Prompting for Machine Translation Between Related Languages Using Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations. |
Ratish Puduppully; Anoop Kunchukuttan; Raj Dabre; Ai Ti Aw; Nancy F. Chen; | arxiv-cs.CL | 2023-05-22 |
720 | Is Translation Helpful? An Empirical Analysis of Cross-Lingual Transfer in Low-Resource Dialog Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A typical approach is to leverage off-the-shelf machine translation (MT) systems to utilize either the training corpus or developed models from high-resource languages. In this work, we investigate whether it is helpful to utilize MT at all in this task. |
Lei Shen; Shuai Yu; Xiaoyu Shen; | arxiv-cs.CL | 2023-05-21 |
721 | VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present our deployment-ready Speech-to-Speech Machine Translation (SSMT) system for English-Hindi, English-Marathi, and Hindi-Marathi language pairs. |
SHIVAM MHASKAR et. al. | arxiv-cs.CL | 2023-05-21 |
722 | HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we release an annotated dataset for the hallucination and omission phenomena covering 18 translation directions with varying resource levels and scripts. |
DAVID DALE et. al. | arxiv-cs.CL | 2023-05-19 |
723 | NollySenti: Leveraging Transfer Learning and Machine Translation for Nigerian Movie Sentiment Classification IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on the task of sentiment classification for cross domain adaptation. |
Iyanuoluwa Shode; David Ifeoluwa Adelani; Jing Peng; Anna Feldman; | arxiv-cs.CL | 2023-05-18 |
724 | DUB: Discrete Unit Back-translation for Speech Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With DUB, the back-translation technique can successfully be applied on direct ST and obtains an average boost of 5.5 BLEU on MuST-C En-De/Fr/Es. |
Dong Zhang; Rong Ye; Tom Ko; Mingxuan Wang; Yaqian Zhou; | arxiv-cs.CL | 2023-05-18 |
725 | AlignAtt: Using Attention-based Audio-Translation Alignments As A Guide for Simultaneous Speech Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose AlignAtt, a novel policy for simultaneous ST (SimulST) that exploits the attention information to generate source-target alignments that guide the model during inference. |
Sara Papi; Marco Turchi; Matteo Negri; | arxiv-cs.CL | 2023-05-18 |
726 | Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting Model IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To eliminate the rule-based nature of data creation, we instead propose using machine translation models to create gender-biased text from real gender-fair text via round-trip translation. |
Chantal Amrhein; Florian Schottmann; Rico Sennrich; Samuel Läubli; | arxiv-cs.CL | 2023-05-18 |
727 | Low-resource Multilingual Neural Translation Using Linguistic Feature-based Relevance Mechanisms Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This article investigates approaches to effectively harness source-side linguistic features for low-resource multilingual neural machine translation (MNMT). Previous works focus … |
Abhisek Chakrabarty; Raj Dabre; Chenchen Ding; M. Utiyama; E. Sumita; | ACM Transactions on Asian and Low-Resource Language … | 2023-05-18 |
728 | On The Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we find that failing in encoding discriminative target language signal will lead to off-target and a closer lexical distance (i.e., KL-divergence) between two languages’ vocabularies is related with a higher off-target rate. |
Liang Chen; Shuming Ma; Dongdong Zhang; Furu Wei; Baobao Chang; | arxiv-cs.CL | 2023-05-18 |
729 | ChatGPT Perpetuates Gender Bias in Machine Translation and Ignores Non-Gendered Pronouns: Findings Across Bengali and Five Other Low-Resource Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this multicultural age, language translation is one of the most performed tasks, and it is becoming increasingly AI-moderated and automated. As a novel AI system, ChatGPT claims to be proficient in such translation tasks and in this paper, we put that claim to the test. |
Sourojit Ghosh; Aylin Caliskan; | arxiv-cs.CY | 2023-05-17 |
730 | Searching for Needles in A Haystack: On The Role of Incidental Bilingualism in PaLM’s Translation Capability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a mixed-method approach to measure and understand incidental bilingualism at scale. |
Eleftheria Briakou; Colin Cherry; George Foster; | arxiv-cs.CL | 2023-05-17 |
731 | Searching for Needles in A Haystack: On The Role of Incidental Bilingualism in PaLM’s Translation Capability IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Large, multilingual language models exhibit surprisingly good zero- or few-shot machine translation capabilities, despite having never seen the intentionally-included translation … |
Eleftheria Briakou; Colin Cherry; George F. Foster; | Annual Meeting of the Association for Computational … | 2023-05-17 |
732 | Progressive Translation: Improving Domain Robustness of Neural Machine Translation with Intermediate Sequences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Borrowing techniques from Statistical Machine Translation, we propose intermediate signals which are intermediate sequences from the source-like structure to the target-like structure. |
Chaojun Wang; Yang Liu; Wai Lam; | arxiv-cs.CL | 2023-05-16 |
733 | XPQA: Cross-Lingual Product Question Answering Across 12 Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While existing work on PQA focuses mainly on English, in practice there is need to support multiple customer languages while leveraging product information available in English. To study this practical industrial task, we present xPQA, a large-scale annotated cross-lingual PQA dataset in 12 languages across 9 branches, and report results in (1) candidate ranking, to select the best English candidate containing the information to answer a non-English question; and (2) answer generation, to generate a natural-sounding non-English answer based on the selected English candidate. |
Xiaoyu Shen; Akari Asai; Bill Byrne; Adrià de Gispert; | arxiv-cs.CL | 2023-05-16 |
734 | The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided By Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated particularly by the task of cross-lingual SLU, we demonstrate that the task of speech translation (ST) is a good means of pretraining speech models for end-to-end SLU on both intra- and cross-lingual scenarios. |
Mutian He; Philip N. Garner; | arxiv-cs.CL | 2023-05-16 |
735 | Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Secondly, the knowledge is largely covered by the golden information due to the fact that most top-1 predictions of teachers overlap with ground-truth tokens, which further restricts the potential of KD. To address these issues, we propose a novel method named \textbf{T}op-1 \textbf{I}nformation \textbf{E}nhanced \textbf{K}nowledge \textbf{D}istillation (TIE-KD). |
SONGMING ZHANG et. al. | arxiv-cs.CL | 2023-05-14 |
736 | Cross-language Information Retrieval for Poetry Form of Literature-based on Machine Transliteration Using CNN Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Transliteration is phonetically translating a language’s words into an international or non-native screenplay. The machine translation process now plays an essential role in … |
R. Jadhav; M. Dhore; | J. Intell. Fuzzy Syst. | 2023-05-13 |
737 | PESTS: Persian_English Cross Lingual Corpus for Semantic Textual Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, the corpus of semantic textual similarity between sentences in Persian and English languages has been produced for the first time by using linguistic experts. |
Mohammad Abdous; Poorya Piroozfar; Behrouz Minaei Bidgoli; | arxiv-cs.CL | 2023-05-13 |
738 | Perturbation-based QE: An Explainable, Unsupervised Word-level Quality Estimation Method for Blackbox Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Perturbation-based QE – a word-level Quality Estimation approach that works simply by analyzing MT system output on perturbed input source sentences. |
Tu Anh Dinh; Jan Niehues; | arxiv-cs.CL | 2023-05-12 |
739 | Subword Segmental Machine Translation: Unifying Segmentation and Target Sentence Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To use SSMT during inference we propose dynamic decoding, a text generation algorithm that adapts segmentations as it generates translations. |
Francois Meyer; Jan Buys; | arxiv-cs.CL | 2023-05-11 |
740 | Text-image Matching for Multi-model Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View |
Xiayang Shi; Zhenqiang Yu; Xuhui Wang; Yijun Li; Yufeng Niu; | The Journal of Supercomputing | 2023-05-09 |
741 | Multi-Teacher Knowledge Distillation For Text Image Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Multi-Teacher Knowledge Distillation (MTKD) method to effectively distillate knowledge into the end-to-end TIMT model from the pipeline model. |
CONG MA et. al. | arxiv-cs.CL | 2023-05-09 |
742 | MultiTACRED: A Multilingual Version of The TAC Relation Extraction Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Relation extraction (RE) is a fundamental task in information extraction, whose extension to multilingual settings has been hindered by the lack of supervised resources comparable in size to large English datasets such as TACRED (Zhang et al., 2017). To address this gap, we introduce the MultiTACRED dataset, covering 12 typologically diverse languages from 9 language families, which is created by machine-translating TACRED instances and automatically projecting their entity annotations. |
Leonhard Hennig; Philippe Thomas; Sebastian Möller; | arxiv-cs.CL | 2023-05-08 |
743 | Label-Free Multi-Domain Machine Translation with Stage-wise Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a label-free multi-domain machine translation model which requires only a few or no domain-annotated data in training and no domain labels in inference. |
Fan Zhang; Mei Tu; Sangha Kim; Song Liu; Jinyao Yan; | arxiv-cs.CL | 2023-05-06 |
744 | Exploring Human-Like Translation Strategy with Large Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Compared to typical machine translation that focuses solely on source-to-target mapping, LLM-based translation can potentially mimic the human translation process which might take preparatory steps to ensure high-quality translation. This work explores this possibility by proposing the MAPS framework, which stands for Multi-Aspect Prompting and Selection. |
ZHIWEI HE et. al. | arxiv-cs.CL | 2023-05-06 |
745 | In-context Learning As Maintaining Coherency: A Study of On-the-fly Machine Translation Using Large Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The phenomena of in-context learning has typically been thought of as learning from examples. In this work which focuses on Machine Translation, we present a perspective of in-context learning as the desired generation task maintaining coherency with its context, i.e., the prompt examples. |
Suzanna Sia; Kevin Duh; | arxiv-cs.CL | 2023-05-05 |
746 | Unified Model Learning for Various Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although the dataset-specific models have achieved impressive performance, it is cumbersome as each dataset demands a model to be designed, trained, and stored. In this work, we aim to unify these translation tasks into a more general setting. |
YUNLONG LIANG et. al. | arxiv-cs.CL | 2023-05-04 |
747 | Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate lexical sharing in multilingual machine translation (MT) from Hindi, Gujarati, Nepali into English. |
Sonal Sannigrahi; Rachel Bawden; | arxiv-cs.CL | 2023-05-04 |
748 | Learning Language-Specific Layers for Multilingual Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Language-Specific Transformer Layers (LSLs), which allow us to increase model capacity, while keeping the amount of computation and the number of parameters used in the forward pass constant. |
Telmo Pessoa Pires; Robin M. Schmidt; Yi-Hsiu Liao; Stephan Peitz; | arxiv-cs.CL | 2023-05-04 |
749 | Evaluating The Efficacy of Length-Controllable Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that BLEURT and COMET have the highest correlation with human evaluation and are most suitable as evaluation metrics for length-controllable machine translation. |
HAO CHENG et. al. | arxiv-cs.CL | 2023-05-03 |
750 | Shared Latent Space By Both Languages in Non-Autoregressive Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel latent variable modeling that integrates a dual reconstruction perspective and an advanced hierarchical latent modeling with a shared intermediate latent space across languages. |
DongNyeong Heo; Heeyoul Choi; | arxiv-cs.CL | 2023-05-02 |
751 | SLTUNET: A Simple Unified Model for Sign Language Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose SLTUNET, a simple unified neural model designed to support multiple SLTrelated tasks jointly, such as sign-to-gloss, gloss-to-text and sign-to-text translation. |
Biao Zhang; Mathias Müller; Rico Sennrich; | arxiv-cs.CL | 2023-05-02 |
752 | English-Assamese Neural Machine Translation Using Prior Alignment and Pre-trained Language Model IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
SAHINUR RAHMAN LASKAR et. al. | Comput. Speech Lang. | 2023-05-01 |
753 | Cross-lingual Text Reuse Detection at Document Level for English-Urdu Language Pair Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In recent years, the problem of Cross-Lingual Text Reuse Detection (CLTRD) has gained the interest of the research community due to the availability of large digital repositories … |
M. Sharjeel; I. Muneer; S. Nosheen; R. M. A. Nawab; Paul Rayson; | ACM Transactions on Asian and Low-Resource Language … | 2023-05-01 |
754 | Metamorphic Testing of Machine Translation Models Using Back Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine translation software has been widely adopted in recent years. The recent advance in deep learning research has massively improved the accuracy and fluency of the … |
Wentao Gao; Jiayuan He; Van-Thuan Pham; | 2023 IEEE/ACM International Workshop on Deep Learning for … | 2023-05-01 |
755 | Low-Resourced Machine Translation for Senegalese Wolof Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a parallel Wolof/French corpus of 123,000 sentences on which we conducted experiments on machine translation models based on Recurrent Neural Networks (RNN) in different data configurations. |
Derguene Mbaye; Moussa Diallo; Thierno Ibrahima Diop; | arxiv-cs.CL | 2023-04-30 |
756 | Cross-lingual Search for E-Commerce Based on Query Translatability and Mixed-Domain Fine-Tuning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Online stores in the US offer a unique scenario for Cross-Lingual Information Retrieval (CLIR) due to the mix of Spanish and English in user queries. Machine Translation (MT) … |
JESUS PEREZ-MARTIN et. al. | Companion Proceedings of the ACM Web Conference 2023 | 2023-04-30 |
757 | LEAPT: Learning Adaptive Prefix-to-Prefix Translation For Simultaneous Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by strategies utilized by human interpreters and wait policies, we propose a novel adaptive prefix-to-prefix training policy called LEAPT, which allows our machine translation model to learn how to translate source sentence prefixes and make use of the future context. |
L. Lin; S. Li; X. Shi; | icassp | 2023-04-27 |
758 | Rethinking The Reasonability of The Test Set for Simultaneous Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we manually annotate a monotonic test set based on the MuST-C English-Chinese test set, denoted as SiMuST-C. |
M. LIU et. al. | icassp | 2023-04-27 |
759 | Lost In Translation: Generating Adversarial Examples Robust to Round-Trip Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a comprehensive study on the robustness of current text adversarial attacks to round-trip translation. |
N. Bhandari; P. -Y. Chen; | icassp | 2023-04-27 |
760 | M3ST: Mix at Three Levels for Speech Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Mix at three levels for Speech Translation (M3ST) method to increase the diversity of the augmented training corpus. |
X. CHENG et. al. | icassp | 2023-04-27 |
761 | Improving Speech-to-Speech Translation Through Unlabeled Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an effective way to utilize the massive existing unlabeled text from different languages to create a large amount of S2ST data to improve S2ST performance by applying various acoustic effects to the generated synthetic data. |
X. -P. NGUYEN et. al. | icassp | 2023-04-27 |
762 | A Corpus-Based Auto-encoder-and-Decoder Machine Translation Using Deep Neural Network for Translation from English to Telugu Language Related Papers Related Patents Related Grants Related Venues Related Experts View |
Mohan Mahanty; B. Vamsi; Dasari Madhavi; | SN Computer Science | 2023-04-26 |
763 | NAIST-SIC-Aligned: An Aligned English-Japanese Simultaneous Interpretation Corpus Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to fill in the gap by introducing NAIST-SIC-Aligned, which is an automatically-aligned parallel English-Japanese SI dataset. |
JINMING ZHAO et. al. | arxiv-cs.CL | 2023-04-23 |
764 | NAIST-SIC-Aligned: Automatically-Aligned English-Japanese Simultaneous Interpretation Corpus Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: It remains a question that how simultaneous interpretation (SI) data affects simultaneous machine translation (SiMT). Research has been limited due to the lack of a large-scale … |
JINMING ZHAO et. al. | ArXiv | 2023-04-23 |
765 | Lost in Translationese? Reducing Translation Effect Using Abstract Meaning Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We compare our AMR-based approach against three other techniques based on machine translation or paraphrase generation. |
Shira Wein; Nathan Schneider; | arxiv-cs.CL | 2023-04-22 |
766 | Improving Speech Translation By Cross-Modal Multi-Grained Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the final model often performs worse on the MT task than the MT model trained alone, which means that the knowledge transfer ability of this method is also limited. To deal with these problems, we propose the FCCL (Fine- and Coarse- Granularity Contrastive Learning) approach for E2E-ST, which makes explicit knowledge transfer through cross-modal multi-grained contrastive learning. |
HAO ZHANG et. al. | arxiv-cs.CL | 2023-04-20 |
767 | On The Principles and Decisions of New Word Translation in Sino-Japan Cross-Border E-Commerce: A Study in The Context of Cross-Cultural Communication Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the context of the rapid development of multimedia and information technology, machine translation plays an indispensable role in cross-border e-commerce between China and … |
Gaowa Sulun; | Int. J. Digit. Multim. Broadcast. | 2023-04-18 |
768 | An Empirical Study of Leveraging Knowledge Distillation for Compressing Multilingual Neural Machine Translation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, works focusing on distilling knowledge from large multilingual neural machine translation (MNMT) models into smaller ones are practically nonexistent, despite the popularity and superiority of MNMT. This paper bridges this gap by presenting an empirical investigation of knowledge distillation for compressing MNMT models. |
Varun Gumma; Raj Dabre; Pratyush Kumar; | arxiv-cs.CL | 2023-04-18 |
769 | STA: An Efficient Data Augmentation Method for Low-resource Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Transformer-based neural machine translation (NMT) has achieved state-of-the-art performance in the NMT paradigm. However, it relies on the availability of copious parallel … |
Fuxue Li; Chuncheng Chi; Hong Yan; Beibei Liu; Mingzhi Shao; | J. Intell. Fuzzy Syst. | 2023-04-17 |
770 | Neural Machine Translation For Low Resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The goal of this paper is to investigate the realm of low resource languages and build a Neural Machine Translation model to achieve state-of-the-art results. |
Vakul Goyle; Parvathy Krishnaswamy; Kannan Girija Ravikumar; Utsa Chattopadhyay; Kartikay Goyle; | arxiv-cs.CL | 2023-04-16 |
771 | TransDocs: Optical Character Recognition with Word to Word Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, I have shown comparative study for pre-trained OCR while using deep learning model using LSTM-based seq2seq architecture with attention for machine translation. |
Abhishek Bamotra; Phani Krishna Uppala; | arxiv-cs.CV | 2023-04-15 |
772 | Learning Homographic Disambiguation Representation for Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach to tackle homographic issues of NMT in the latent space. |
Weixuan Wang; Wei Peng; Qun Liu; | arxiv-cs.CL | 2023-04-12 |
773 | Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we systematically investigate the advantages and challenges of LLMs for MMT by answering two questions: 1) How well do LLMs perform in translating massive languages? |
WENHAO ZHU et. al. | arxiv-cs.CL | 2023-04-10 |
774 | RISC: Generating Realistic Synthetic Bilingual Insurance Contract Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents RISC, an open-source Python package data generator (https://github.com/GRAAL-Research/risc). |
David Beauchemin; Richard Khoury; | arxiv-cs.CL | 2023-04-09 |
775 | How to Design Translation Prompts for ChatGPT: An Empirical Study IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, in this paper, we explore how to assist machine translation with ChatGPT. |
Yuan Gao; Ruili Wang; Feng Hou; | arxiv-cs.CL | 2023-04-04 |
776 | A Neural Attention-Based Encoder-Decoder Approach for English to Bangla Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine translation (MT) is the process of translating text from one language to another using bilingual data sets and grammatical rules. Recent works in the field of MT have … |
Abdullah Al Shiam; S. M. Redwan; Humaun Kabir; Jungpil Shin; | Comput. Sci. J. Moldova | 2023-04-01 |
777 | $\varepsilon$ KÚ Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates the performance of massively multilingual neural machine translation (NMT) systems in translating Yor\`ub\’a greetings ($\varepsilon$ k\’u [MASK]), which are a big part of Yor\`ub\’a language and culture, into English. To evaluate these models, we present IkiniYor\`ub\’a, a Yor\`ub\’a-English translation dataset containing some Yor\`ub\’a greetings, and sample use cases. |
Idris Akinade; Jesujoba Alabi; David Adelani; Clement Odoje; Dietrich Klakow; | arxiv-cs.CL | 2023-03-31 |
778 | Varepsilon Kú Mask: Integrating Yorùbá Cultural Greetings Into Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper investigates the performance of massively multilingual neural machine translation (NMT) systems in translating Yorùbá greetings (kú mask), which are a big part of … |
Idris Akinade; Jesujoba Oluwadara Alabi; David Ifeoluwa Adelani; Clement Odoje; D. Klakow; | ArXiv | 2023-03-31 |
779 | Sentiment Analysis of Multilingual Dataset of Bahraini Dialects, Arabic, and English Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Sentiment analysis is an application of natural language processing (NLP) that requires a machine learning algorithm and a dataset. In some cases, the dataset availability is … |
Thuraya Omran; Baraa T. Sharef; C. Grosan; Yongming Li; | Data | 2023-03-30 |
780 | Hallucinations in Large Multilingual Translation Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing research on hallucinations has primarily focused on small bilingual models trained on high-resource languages, leaving a gap in our understanding of hallucinations in massively multilingual models across diverse translation scenarios. In this work, we fill this gap by conducting a comprehensive analysis on both the M2M family of conventional neural machine translation models and ChatGPT, a general-purpose large language model~(LLM) that can be prompted for translation. |
NUNO M. GUERREIRO et. al. | arxiv-cs.CL | 2023-03-28 |
781 | Translate The Beauty in Songs: Jointly Learning to Align Melody and Translate Lyrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Lyrics-Melody Translation with Adaptive Grouping (LTAG), a holistic solution to automatic song translation by jointly modeling lyrics translation and lyrics-melody alignment. |
CHENGXI LI et. al. | arxiv-cs.CL | 2023-03-27 |
782 | Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We test the efficacy of bilingual lexica in a real-world set-up, on 200-language translation models trained on web-crawled text. We present several findings: (1) using lexical data augmentation, we demonstrate sizable performance gains for unsupervised translation; (2) we compare several families of data augmentation, demonstrating that they yield similar improvements, and can be combined for even greater improvements; (3) we demonstrate the importance of carefully curated lexica over larger, noisier ones, especially with larger models; and (4) we compare the efficacy of multilingual lexicon data versus human-translated parallel data. |
Alex Jones; Isaac Caswell; Ishank Saxena; Orhan Firat; | arxiv-cs.CL | 2023-03-27 |
783 | Linguistically Informed ChatGPT Prompts to Enhance Japanese-Chinese Machine Translation: A Case Study on Attributive Clauses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Present-day machine translation tools often fail to accurately translate attributive clauses from Japanese to Chinese. In light of this, this paper investigates the linguistic problem underlying such difficulties, namely how does the semantic role of the modified noun affect the selection of translation patterns for attributive clauses, from a linguistic perspective. |
Wenshi Gu; | arxiv-cs.CL | 2023-03-27 |
784 | Towards Making The Most of ChatGPT for Machine Translation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to further mine ChatGPT’s translation ability by revisiting several aspects: temperature, task information, and domain information, and correspondingly propose an optimal temperature setting and two (simple but effective) prompts: Task-Specific Prompts (TSP) and Domain-Specific Prompts (DSP). |
KEQIN PENG et. al. | arxiv-cs.CL | 2023-03-23 |
785 | Selective Data Augmentation for Robust Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to use an e2e architecture for English-Hindi (en-hi) ST. We use two imperfect machine translation (MT) services to translate Libri-trans en text into hi text. |
Rajul Acharya; Ashish Panda; Sunil Kumar Kopparapu; | arxiv-cs.CL | 2023-03-22 |
786 | LEAPT: Learning Adaptive Prefix-to-prefix Translation For Simultaneous Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by strategies utilized by human interpreters and wait policies, we propose a novel adaptive prefix-to-prefix training policy called LEAPT, which allows our machine translation model to learn how to translate source sentence prefixes and make use of the future context. |
Lei Lin; Shuangtao Li; Xiaodong Shi; | arxiv-cs.CL | 2023-03-21 |
787 | Towards Reliable Neural Machine Translation with Consistency-Aware Meta-Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A contributing factor to this problem is that NMT models trained with the one-to-one paradigm struggle to handle the source diversity phenomenon, where inputs with the same meaning can be expressed differently. In this work, we treat this problem as a bilevel optimization problem and present a consistency-aware meta-learning (CAML) framework derived from the model-agnostic meta-learning (MAML) algorithm to address it. |
Rongxiang Weng; Qiang Wang; Wensen Cheng; Changfeng Zhu; Min Zhang; | arxiv-cs.CL | 2023-03-20 |
788 | Translate Your Gibberish: Black-box Adversarial Attack on Machine Translation Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a simple approach to fool state-of-the-art machine translation tools in the task of translation from Russian to English and vice versa. |
Andrei Chertkov; Olga Tsymboi; Mikhail Pautov; Ivan Oseledets; | arxiv-cs.CL | 2023-03-20 |
789 | On The Scalability of Data Augmentation Techniques for Low-resource Machine Translation Between Chinese and Vietnamese Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Neural Machine Translation (NMT) has constantly been shown to be a standard choice to build a translation system, in both academia and industry. For low-resource language pairs, … |
Huan Vu; Ngoc-Dung Bui; | Journal of Information and Telecommunication | 2023-03-19 |
790 | Contrastive Adversarial Training for Multi-Modal Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The multi-modal machine translation task is to improve translation quality with the help of additional visual input. It is expected to disambiguate or complement semantics while … |
Xin Huang; Jiajun Zhang; Chengqing Zong; | ACM Transactions on Asian and Low-Resource Language … | 2023-03-14 |
791 | ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To relax the dependency on labeled data of downstream tasks, we propose an intuitive and effective zero-shot learning framework, ZeroNLG, which can deal with multiple NLG tasks, including image-to-text (image captioning), video-to-text (video captioning), and text-to-text (neural machine translation), across English, Chinese, German, and French within a unified framework. |
BANG YANG et. al. | arxiv-cs.CL | 2023-03-11 |
792 | A Multi-stack RNN-based Neural Machine Translation Model for English to Pakistan Sign Language Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
U. Farooq; Mohd Shafry Mohd Rahim; Adnan Abid; | Neural Computing and Applications | 2023-03-11 |
793 | GATE: A Challenge Set for Gender-Ambiguous Translation Examples IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent work has led to the development of gender rewriters that generate alternative gender translations on such ambiguous inputs, but such systems are plagued by poor linguistic coverage. To encourage better performance on this task we present and release GATE, a linguistically diverse corpus of gender-ambiguous source sentences along with multiple alternative target language translations. |
Spencer Rarrick; Ranjita Naik; Varun Mathur; Sundar Poudel; Vishal Chowdhary; | arxiv-cs.CL | 2023-03-07 |
794 | Implications of Multi-Word Expressions on English to Bharti Braille Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper, we have shown the improvement of English to Bharti Braille machine translation system. We have shown how we can improve a baseline NMT model by adding some … |
Nisheeth Joshi; Pragya Katyayan; | 2023 6th International Conference on Information Systems … | 2023-03-03 |
795 | Rethinking The Reasonability of The Test Set for Simultaneous Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we manually annotate a monotonic test set based on the MuST-C English-Chinese test set, denoted as SiMuST-C. |
MENGGE LIU et. al. | arxiv-cs.CL | 2023-03-02 |
796 | Exploring The Potential of Machine Translation for Generating Named Entity Datasets: A Case Study Between Persian and English Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study focuses on the generation of Persian named entity datasets through the application of machine translation on English datasets. |
Amir Sartipi; Afsaneh Fatemi; | arxiv-cs.CL | 2023-02-19 |
797 | Zero and Few-Shot Localization of Task-Oriented Dialogue Agents with A Distilled Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose automatic methods that use ToD training data in a source language to build a high-quality functioning dialogue agent in another target language that has no training data (i.e. zero-shot) or a small training set (i.e. few-shot). |
Mehrad Moradshahi; Sina J. Semnani; Monica S. Lam; | arxiv-cs.CL | 2023-02-18 |
798 | How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a comprehensive evaluation of GPT models for machine translation, covering various aspects such as quality of different GPT models in comparison with state-of-the-art research and commercial systems, effect of prompting strategies, robustness towards domain shifts and document-level translation. |
AMR HENDY et. al. | arxiv-cs.CL | 2023-02-17 |
799 | Evaluating and Improving The Coreference Capabilities of Machine Translation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we ask: \emph{How well do MT models learn coreference resolution from implicit signal?} |
Asaf Yehudai; Arie Cattan; Omri Abend; Gabriel Stanovsky; | arxiv-cs.CL | 2023-02-16 |
800 | Document Flattening: Beyond Concatenating Context for Document-Level Neural Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct comprehensive experiments and analyses on three benchmark datasets for English-German translation, and validate the effectiveness of two variants of DocFlat. |
Minghao Wu; George Foster; Lizhen Qu; Gholamreza Haffari; | arxiv-cs.CL | 2023-02-15 |
801 | Encoding Sentence Position in Context-Aware Neural Machine Translation with Concatenation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We compare various methods to encode sentence positions into token representations, including novel methods. |
Lorenzo Lupo; Marco Dinarelli; Laurent Besacier; | arxiv-cs.CL | 2023-02-13 |
802 | Approximating to The Real Translation Quality for Neural Machine Translation Via Causal Motivated Methods Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: It is hard to evaluate translations objectively and accurately, which limits the applications of machine translation. In this article, we assume that the above phenomenon is … |
Xuewen Shi; Heyan Huang; Ping Jian; Yi-Kun Tang; | ACM Transactions on Asian and Low-Resource Language … | 2023-02-13 |
803 | Language-Aware Multilingual Machine Translation with Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Finally, we apply intra-distillation to this co-training approach. Combining these two approaches significantly improves MMT performance, outperforming three state-of-the-art SSL methods by a large margin, e.g., 11.3\% and 3.7\% improvement on an 8-language and a 15-language benchmark compared with MASS, respectively |
Haoran Xu; Jean Maillard; Vedanuj Goswami; | arxiv-cs.CL | 2023-02-09 |
804 | The Unreasonable Effectiveness of Few-shot Learning for Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. |
XAVIER GARCIA et. al. | arxiv-cs.CL | 2023-02-02 |
805 | An Evaluation of Persian-English Machine Translation Datasets with Transformers Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Nowadays, many researchers are focusing their attention on the subject of machine translation (MT). However, Persian machine translation has remained unexplored despite a vast … |
Amir Sartipi; Meghdad Dehghan; Afsaneh Fatemi; | arxiv-cs.CL | 2023-02-01 |
806 | Code Translation with Compiler Representations IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we leverage low-level compiler intermediate representations (IR) code translation. |
MARC SZAFRANIEC et. al. | iclr | 2023-02-01 |
807 | Attention Link: An Efficient Attention-Based Low Resource Machine Translation Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel architecture named as attention link (AL) to help improve transformer models’ performance, especially in low training resources. |
Zeping Min; | arxiv-cs.CL | 2023-02-01 |
808 | Adaptive Machine Translation with Large Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work aims to investigate how we can utilize in-context learning to improve real-time adaptive MT. Our extensive experiments show promising results at translation time. |
Yasmin Moslem; Rejwanul Haque; John D. Kelleher; Andy Way; | arxiv-cs.CL | 2023-01-30 |
809 | Gender Neutralization for An Inclusive Machine Translation: from Theoretical Foundations to Open Challenges IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we explore gender-neutral translation (GNT) as a form of gender inclusivity and a goal to be achieved by machine translation (MT) models, which have been found to perpetuate gender bias and discrimination. |
Andrea Piergentili; Dennis Fucci; Beatrice Savoldi; Luisa Bentivogli; Matteo Negri; | arxiv-cs.CL | 2023-01-24 |
810 | Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This report provides a preliminary evaluation of ChatGPT for machine translation, including translation prompt, multilingual translation, and translation robustness. |
WENXIANG JIAO et. al. | arxiv-cs.CL | 2023-01-20 |
811 | Improving Machine Translation with Phrase Pair Injection and Corpus Filtering IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that the combination of Phrase Pair Injection and Corpus Filtering boosts the performance of Neural Machine Translation (NMT) systems. |
Akshay Batheja; Pushpak Bhattacharyya; | arxiv-cs.CL | 2023-01-19 |
812 | Malayalam Natural Language Processing: Challenges in Building A Phrase-Based Statistical Machine Translation System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Statistical Machine Translation (SMT) is a preferred Machine Translation approach to convert the text in a specific language into another by automatically learning translations … |
M. Sebastian; G. Santhosh Kumar; | ACM Transactions on Asian and Low-Resource Language … | 2023-01-19 |
813 | Machine Translation for Accessible Multi-Language Text Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to leverage those very advances to demonstrate that multi-language analysis is currently accessible to all computational scholars. |
Edward W. Chew; William D. Weisman; Jingying Huang; Seth Frey; | arxiv-cs.CL | 2023-01-19 |
814 | Understanding and Detecting Hallucinations in Neural Machine Translation Via Model Introspection IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Neural sequence generation models are known to hallucinate, by producing outputs that are unrelated to the source text. These hallucinations are potentially harmful, yet it … |
Weijia Xu; Sweta Agrawal; Eleftheria Briakou; Marianna J. Martindale; Marine Carpuat; | arxiv-cs.CL | 2023-01-18 |
815 | Unsupervised Mandarin-Cantonese Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The key contributions of our project include: 1. |
Megan Dare; Valentina Fajardo Diaz; Averie Ho Zoen So; Yifan Wang; Shibingfeng Zhang; | arxiv-cs.CL | 2023-01-10 |
816 | Automatic Standardization of Arabic Dialects for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Carrying out this research could then lead to combining ”automatic standardization” software and automatic translation software so that we take the output of the first software and introduce it as input into the second one to obtain at the end a quality machine translation. |
Abidrabbo Alnassan; | arxiv-cs.CL | 2023-01-09 |
817 | Applying Automated Machine Translation to Educational Video Courses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We studied the capability of automated machine translation in the online video education space by automatically translating Khan Academy videos with state-of-the-art translation models and applying text-to-speech synthesis and audio/video synchronization to build engaging videos in target languages. |
Linden Wang; | arxiv-cs.CL | 2023-01-08 |
818 | Building A Parallel Corpus and Training Translation Models Between Luganda and English Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we build a parallel corpus with 41,070 pairwise sentences for Luganda and English which is based on three different open-sourced corpora. |
Richard Kimera; Daniela N. Rim; Heeyoul Choi; | arxiv-cs.CL | 2023-01-06 |
819 | Statistical Machine Translation for Indic Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Different preprocessing approaches are proposed in this paper to handle the noise of the dataset. |
Sudhansu Bala Das; Divyajoti Panda; Tapas Kumar Mishra; Bidyut Kr. Patra; | arxiv-cs.CL | 2023-01-02 |
820 | CCIM: Cross-modal Cross-lingual Interactive Image Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Text image machine translation (TIMT) which translates source language text images into target language texts has attracted intensive attention in recent years. Although the … |
CONG MA et. al. | Conference on Empirical Methods in Natural Language … | 2023-01-01 |
821 | Challenging The State-of-the-art Machine Translation Metrics from A Linguistic Perspective Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We employ a linguistically motivated challenge set in order to evaluate the state-of-the-art machine translation metrics submitted to the Metrics Shared Task of the 8th Conference … |
Eleftherios Avramidis; Shushen Manakhimova; Vivien Macketanz; Sebastian Möller; | Conference on Machine Translation | 2023-01-01 |
822 | Metric Score Landscape Challenge (MSLC23): Understanding Metrics’ Performance on A Wider Landscape of Translation Quality Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The Metric Score Landscape Challenge (MSLC23) dataset aims to gain insight into metric scores on a broader/wider landscape of machine translation (MT) quality. It provides a … |
Chi-kiu Lo; Samuel Larkin; Rebecca Knowles; | Conference on Machine Translation | 2023-01-01 |
823 | Rahul Patil at SemEval-2023 Task 1: V-WSD: Visual Word Sense Disambiguation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Semeval 2023 task 1: VWSD, In this paper, we propose an ensemble of two Neural network systems that ranks 10 images given a word and limited textual context. We have used openAI … |
Rahul Patil; Pinal Patel; Charin Patel; Mangal Verma; | International Workshop on Semantic Evaluation | 2023-01-01 |
824 | Challenges in Rendering Arabic Text to English Using Machine Translation: A Systematic Literature Review Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The Arabic text can be translated into English using a variety of machine translation techniques. The translation of Arabic text into English still poses a number of challenges in … |
S. A. Almaaytah; S. Alzobidy; | IEEE Access | 2023-01-01 |
825 | Machine Translation of Omani Arabic Dialect from Social Media Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Research studies on Machine Translation (MT) between Modern Standard Arabic (MSA) and English are abundant. However, studies on MT between Omani Arabic (OA) dialects and English … |
Khoula Al-Kharusi; Abdurahman AAlAbdulsalam; | ARABICNLP | 2023-01-01 |
826 | Choosing What to Mask: More Informed Masking for Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: More informed masking in cross-lingual visual pre-training for multimodal machine translation. … |
Júlia Sato; Helena de Medeiros Caseli; Lucia Specia; | Annual Meeting of the Association for Computational … | 2023-01-01 |
827 | MEE4 and XLsim : IIIT HYD’s Submissions’ for WMT23 Metrics Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents our contributions to the WMT2023 shared metrics task, consisting of two distinct evaluation approaches: a) Unsupervised Metric (MEE4) and b) Supervised Metric … |
Ananya Mukherjee; Manish Shrivastava; | Conference on Machine Translation | 2023-01-01 |
828 | An Automatic Error Detection Method for Machine Translation Results Via Deep Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Nowadays, the rapid development of natural language processing has brought great progress for the area of machine translation. Various deep neural network-based machine … |
Weihong Zhang; | IEEE Access | 2023-01-01 |
829 | CCEval: A Representative Evaluation Benchmark for The Chinese-centric Multilingual Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View |
Lianzhang Lou; Xi Yin; Yutao Xie; Yang Xiang; | Conference on Empirical Methods in Natural Language … | 2023-01-01 |
830 | Solving Turkish Math Word Problems By Sequence-to-sequence Encoder-decoder Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: : Solving math word problems (MWP) is a challenging task due to the semantic gap between natural language texts and mathematical equations. The main purpose of the task is to take … |
Esin Gedik; Tunga Güngör; | Turkish J. Electr. Eng. Comput. Sci. | 2023-01-01 |
831 | Results of WMT23 Metrics Shared Task: Metrics Might Be Guilty But References Are Not Innocent IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the results of the WMT23 Metrics Shared Task. Participants submitting automatic MT evaluation metrics were asked to score the outputs of the translation … |
MARKUS FREITAG et. al. | Conference on Machine Translation | 2023-01-01 |
832 | Team TheSyllogist at SemEval-2023 Task 3: Language-Agnostic Framing Detection in Multi-Lingual Online News: A Zero-Shot Transfer Approach Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We describe our system for SemEval-2022 Task 3 subtask 2 which on detecting the frames used in a news article in a multi-lingual setup. We propose a multi-lingual approach based … |
Osama Mohammed Afzal; Preslav Nakov; | International Workshop on Semantic Evaluation | 2023-01-01 |
833 | HIT-MI&T Lab’s Submission to Eval4NLP 2023 Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recently, Large Language Models (LLMs) have boosted the research in natural language processing and shown impressive capabilities across numerous domains, including machine … |
RUI ZHANG et. al. | EVAL4NLP | 2023-01-01 |
834 | The Ethics of Machine Translation Post-editing in The Translation Ecosystem Related Papers Related Patents Related Grants Related Venues Related Experts View |
Celia Rico; María Del Mar Sánchez Ramos; | Towards Responsible Machine Translation | 2023-01-01 |
835 | Analyzing Challenges in Neural Machine Translation for Software Localization Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Advancements in Neural Machine Translation (NMT) greatly benefit the software localization industry by decreasing the post-editing time of human annotators. Although the volume of … |
Sai Koneru; Matthias Huck; Miriam Exel; J. Niehues; | Conference of the European Chapter of the Association for … | 2023-01-01 |
836 | Rematchka at NADI 2023 Shared Task: Parameter Efficient Tuning for Dialect Identification and Dialect Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Dialect identification systems play a significant role in various fields and applications as in speech and language technologies, facilitating language education, supporting … |
Reem Abdel-Salam; | ARABICNLP | 2023-01-01 |
837 | Empowering LLM-based Machine Translation with Cultural Awareness IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Traditional neural machine translation (NMT) systems often fail to translate sentences that contain culturally specific information. Most previous NMT methods have incorporated … |
Binwei Yao; Ming Jiang; Diyi Yang; Junjie Hu; | ArXiv | 2023-01-01 |
838 | Multimodal Neural Machine Translation Using Synthetic Images Transformed By Latent Diffusion Model Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This study proposes a new multimodal neural machine translation model using synthetic images transformed by a latent diffusion model. … |
Ryoya Yuasa; Akihiro Tamura; Tomoyuki Kajiwara; Takashi Ninomiya; T. Kato; | Annual Meeting of the Association for Computational … | 2023-01-01 |
839 | The Path to Continuous Domain Adaptation Improvements By HW-TSC for The WMT23 Biomedical Translation Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the domain adaptation methods adopted by Huawei Translation Service Center (HW-TSC) to train the neural machine translation (NMT) system on the English↔German … |
ZHANGLIN WU et. al. | Conference on Machine Translation | 2023-01-01 |
840 | TTIC’s Submission to WMT-SLT 23 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper, we describe TTIC’s submission to WMT 2023 Sign Language Translation task on the Swiss-German Sign Language (DSGS) to German track. Our approach explores the … |
MARCELO SANDOVAL-CASTANEDA et. al. | Conference on Machine Translation | 2023-01-01 |
841 | GTCOM and DLUT’s Neural Machine Translation Systems for WMT23 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the submission by Global Tone Communication Co., Ltd. and Dalian Univeristy of Technology for the WMT23 shared general Machine Translation (MT) task at the … |
Hao Zong; | Conference on Machine Translation | 2023-01-01 |
842 | Multifaceted Challenge Set for Evaluating Machine Translation Performance Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine Translation Evaluation is critical to Machine Translation research, as the evaluation results reflect the effectiveness of training strategies. As a result, a fair and … |
XIAOYU CHEN et. al. | Conference on Machine Translation | 2023-01-01 |
843 | RoCS-MT: Robustness Challenge Set for Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: RoCS-MT, a Robust Challenge Set for Machine Translation (MT), is designed to test MT systems’ ability to translate user-generated content (UGC) that displays non-standard … |
Rachel Bawden; Benoît Sagot; | Conference on Machine Translation | 2023-01-01 |
844 | IIIT HYD’s Submission for WMT23 Test-suite Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper summarizes the results of our test suite evaluation on 12 machine translation systems submitted at the Shared Task of the 8th Conference of Machine Translation (WMT23) … |
Ananya Mukherjee; Manish Shrivastava; | Conference on Machine Translation | 2023-01-01 |
845 | IOL Research Machine Translation Systems for WMT23 General Machine Translation Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the IOL Research team’s submission systems for the WMT23 general machine translation shared task. We participated in two language translation directions, … |
Wenbo Zhang; | Conference on Machine Translation | 2023-01-01 |
846 | Linguistically Motivated Evaluation of The 2023 State-of-the-art Machine Translation: Can ChatGPT Outperform NMT? IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper offers a fine-grained analysis of the machine translation outputs in the context of the Shared Task at the 8th Conference of Machine Translation (WMT23). Building on … |
SHUSHEN MANAKHIMOVA et. al. | Conference on Machine Translation | 2023-01-01 |
847 | Exploring Prompt Engineering with GPT Language Models for Document-Level Machine Translation: Insights and Findings IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes Lan-Bridge Translation systems for the WMT 2023 General Translation shared task. We participate in 2 directions: English to and from Chinese. With the … |
Yangjian Wu; Gang Hu; | Conference on Machine Translation | 2023-01-01 |
848 | MUNI-NLP Submission for Czech-Ukrainian Translation Task at WMT23 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The system is trained on officialy provided data only. We have heavily filtered all the data to remove machine translated text, Russian text and other noise. We use the DeepNorm … |
Pavel Rychlý; Yuliia Teslia; | Conference on Machine Translation | 2023-01-01 |
849 | Treating General MT Shared Task As A Multi-Domain Adaptation Problem: HW-TSC’s Submission to The WMT23 General MT Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the WMT23 general machine translation (MT) shared task, in which we participate in … |
ZHANGLIN WU et. al. | Conference on Machine Translation | 2023-01-01 |
850 | Achieving State-of-the-Art Multilingual Translation Model with Minimal Data and Parameters Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This is LanguageX (ZengHuiMT)’s submission to WMT 2023 General Machine Translation task for 13 language directions. We initially employ an encoder-decoder model to train on all 13 … |
Hui Zeng; | Conference on Machine Translation | 2023-01-01 |
851 | The HW-TSC’s Simultaneous Speech-to-Text Translation System for IWSLT 2023 Evaluation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper, we present our submission to the IWSLT 2023 Simultaneous Speech-to-Text Translation competition. Our participation involves three language directions: … |
JIAXIN GUO et. al. | International Workshop on Spoken Language Translation | 2023-01-01 |
852 | The HW-TSC’s Simultaneous Speech-to-Speech Translation System for IWSLT 2023 Evaluation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper, we present our submission to the IWSLT 2023 Simultaneous Speech-to-Speech Translation competition. Our participation involves three language directions: … |
HENGCHAO SHANG et. al. | International Workshop on Spoken Language Translation | 2023-01-01 |
853 | The Xiaomi AI Lab’s Speech Translation Systems for IWSLT 2023 Offline Task, Simultaneous Task and Speech-to-Speech Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This system description paper introduces the systems submitted by Xiaomi AI Lab to the three tracks of the IWSLT 2023 Evaluation Campaign, namely the offline speech translation … |
WUWEI HUANG et. al. | International Workshop on Spoken Language Translation | 2023-01-01 |
854 | JHU IWSLT 2023 Dialect Speech Translation System Description Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents JHU’s submissions to the IWSLT 2023 dialectal and low-resource track of Tunisian Arabic to English speech translation. The Tunisian dialect lacks formal … |
A. HUSSEIN et. al. | International Workshop on Spoken Language Translation | 2023-01-01 |
855 | NAIST Simultaneous Speech-to-speech Translation System for IWSLT 2023 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes NAIST’s submission to the IWSLT 2023 Simultaneous Speech Translation task: English-to-German, Japanese, Chinese speech-to-text translation and … |
RYO FUKUDA et. al. | International Workshop on Spoken Language Translation | 2023-01-01 |
856 | Multi-teacher Knowledge Distillation for End-to-End Text Image Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
CONG MA et. al. | IEEE International Conference on Document Analysis and … | 2023-01-01 |
857 | Alleviating Exposure Bias for Neural Machine Translation Via Contextual Augmentation and Self Distillation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In neural machine translation (NMT), most sequence-to-sequence (seq2seq) models are trained only with the teacher-forcing paradigm, where the ground truth history is used to … |
Zhidong Liu; Junhui Li; Muhua Zhu; | IEEE/ACM Transactions on Audio, Speech, and Language … | 2023-01-01 |
858 | Towards Speech to Speech Machine Translation Focusing on Indian Languages Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We introduce an SSMT (Speech to Speech Machine Translation, aka Speech to Speech Video Translation) Pipeline(https://ssmt.iiit.ac.in/ssmtiiith), as web application for translating … |
Vandan Mujadia; D. Sharma; | Conference of the European Chapter of the Association for … | 2023-01-01 |
859 | Yishu: Yishu at WMT2023 Translation Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper introduces the Dtranx AI translation system, developed for the WMT 2023 Universal Translation Shared Task. Our team participated in two language directions: English to … |
Luo Min; Yixin Tan; Qiulin Chen; | Conference on Machine Translation | 2023-01-01 |
860 | KYB General Machine Translation Systems for WMT23 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes our approach to constructing a neural machine translation system for the WMT 2023 general machine translation shared task. Our model is based on the … |
Pak Ching Li; Yoko Matsuzaki; Shivam Kalkar; | Conference on Machine Translation | 2023-01-01 |
861 | AIST AIRC Submissions to The WMT23 Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the development process of NMT systems that were submitted to the WMT 2023 General Translation task by the team of AIST AIRC. We trained constrained track … |
Matīss Rikters; Makoto Miwa; | Conference on Machine Translation | 2023-01-01 |
862 | PROMT Systems for WMT23 Shared General Translation Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the PROMT submissions for the WMT23 Shared General Translation Task. This year we participated in two directions of the Shared Translation Task: English to … |
Alexander P. Molchanov; Vladislav Kovalenko; | Conference on Machine Translation | 2023-01-01 |
863 | An Enhanced Method for Neural Machine Translation Via Data Augmentation Based on The Self-Constructed English-Chinese Corpus, WCC-EC Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In an era of increasing globalization, the imperative for understanding multilingual texts elevated the role of translation to an everyday necessity. The efficacy of contemporary … |
Jinyi Zhang; Cong Guo; Jiannan Mao; Chong Guo; Tadahiro Matsumoto; | IEEE Access | 2023-01-01 |
864 | Bridging The Gap Between Native Text and Translated Text Through Adversarial Learning: A Case Study on Cross-Lingual Event Extraction Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent research in cross-lingual learning has found that combining large-scale pretrained multilingual language models with machine translation can yield good performance. We … |
Pengfei Yu; Jonathan May; Heng Ji; | Findings | 2023-01-01 |
865 | A Data Augmentation Method for English-Vietnamese Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The translation quality of machine translation systems depends on the parallel corpus used for training, particularly on the quantity and quality of the corpus. However, building … |
N. Pham; Van Vinh Nguyen; T. Pham; | IEEE Access | 2023-01-01 |
866 | From Inclusive Language to Gender-Neutral Machine Translation IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Gender inclusivity in language has become a central topic of debate and research. Its application in the cross-lingual contexts of human and machine translation (MT), however, … |
Andrea Piergentili; Dennis Fucci; Beatrice Savoldi; L. Bentivogli; Matteo Negri; | ArXiv | 2023-01-01 |
867 | CKDST: Comprehensively and Effectively Distill Knowledge from Machine Translation to End-to-End Speech Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Distilling knowledge from a high-resource task, e.g., machine translation, is an effective way to alleviate the data scarcity problem of end-to-end speech translation. However, … |
YIKUN LEI et. al. | Annual Meeting of the Association for Computational … | 2023-01-01 |
868 | Translating A Low-resource Language Using GPT-3 and A Human-readable Dictionary IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We investigate how well words in the polysynthetic language Inuktitut can be translated by combining dictionary definitions, without use of a neural machine translation model … |
Micha Elsner; Jordan Needle; | Special Interest Group on Computational Morphology and … | 2023-01-01 |
869 | Transformer-Based Neural Machine Translation for Post-OCR Error Correction in Cursive Text Related Papers Related Patents Related Grants Related Venues Related Experts View |
Nehal Yasin; Imran Siddiqi; Momina Moetesum; Sadaf Abdul Rauf; | ICDAR Workshops | 2023-01-01 |
870 | Looking for Traces of Textual Deepfakes in Bulgarian on Social Media Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Textual deepfakes can cause harm, especially on social media. At the moment, there are models trained to detect deepfake messages mainly for the English language, but no research … |
Irina Temnikova; Iva Marinova; Silvia Gargova; Ruslana Margova; I. Koychev; | Recent Advances in Natural Language Processing | 2023-01-01 |
871 | WKU_NLP at SemEval-2023 Task 9: Translation Augmented Multilingual Tweet Intimacy Analysis Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes a system for the SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis. This system consists of a pretrained multilingual masked language model as a text … |
Qinyuan Zheng; | International Workshop on Semantic Evaluation | 2023-01-01 |
872 | Document-Level Neural Machine Translation With Recurrent Context States Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Integrating contextual information into sentence-level neural machine translation (NMT) systems has been proven to be effective in generating fluent and coherent translations. … |
Yue Zhao; Hui Liu; | IEEE Access | 2023-01-01 |
873 | Low-Resource Machine Translation Systems for Indic Languages Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present our submission to the WMT23 shared task in translation between English and Assamese, Khasi, Mizo and Manipuri. All our systems were pretrained on the task of … |
Ivana Kvapilíková; Ondrej Bojar; | Conference on Machine Translation | 2023-01-01 |
874 | MUNI-NLP Systems for Low-resource Indic Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The WMT 2023 Shared Task on Low-Resource Indic Language Translation featured to and from Assamese, Khasi, Manipuri, Mizo on one side and English on the other. We submitted systems … |
Edoardo Signoroni; Pavel Rychlý; | Conference on Machine Translation | 2023-01-01 |
875 | Many-to-Many Multilingual Translation Model for Languages of Indonesia Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Indonesia is home to over 700 languages and most people speak their respective regional languages aside from the lingua franca. In this paper, we focus on the task of multilingual … |
Wilson Wongso; Ananto Joyoadikusumo; Brandon Scott Buana; Derwin Suhartono; | IEEE Access | 2023-01-01 |
876 | IOL Research Machine Translation Systems for WMT23 Low-Resource Indic Language Translation Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the IOL Research team’s submission systems for the WMT23 low-resource Indic language translation shared task. We participated in 4 language pairs, including … |
Wenbo Zhang; | Conference on Machine Translation | 2023-01-01 |
877 | NITS-CNLP Low-Resource Neural Machine Translation Systems of English-Manipuri Language Pair Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the transformer-based Neural Machine translation (NMT) system for the Low-Resource Indic Language Translation task for the English-Manipuri language pair … |
Kshetrimayum Boynao Singh; Avichandra Singh Ningthoujam; Loitongbam Sanayai Meetei; Sivaji Bandyopadhyay; Thoudam Doren Singh; | Conference on Machine Translation | 2023-01-01 |
878 | Findings of The WMT 2023 Biomedical Translation Shared Task: Evaluation of ChatGPT 3.5 As A Comparison System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present an overview of the Biomedical Translation Task that was part of the Eighth Conference on Machine Translation (WMT23). The aim of the task was the automatic translation … |
MARIANA NEVES et. al. | Conference on Machine Translation | 2023-01-01 |
879 | Findings of The Second WMT Shared Task on Sign Language Translation (WMT-SLT23) IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the results of the Second WMT Shared Task on Sign Language Translation (WMT-SLT23; https://www.wmt-slt.com/). This shared task is concerned with automatic … |
MATHIAS MÜLLER et. al. | Conference on Machine Translation | 2023-01-01 |
880 | Findings of The 2023 Conference on Machine Translation (WMT23): LLMs Are Here But Not Quite There Yet IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the results of the General Machine Translation Task organised as part of the 2023 Conference on Machine Translation (WMT). In the general MT task, participants … |
TOM KOCMI et. al. | Conference on Machine Translation | 2023-01-01 |
881 | VARCO-MT: NCSOFT’s WMT’23 Terminology Shared Task Submission Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: A lack of consistency in terminology translation undermines quality of translation from even the best performing neural machine translation (NMT) models, especially in narrow … |
Geon Woo Park; Junghwa Lee; Meiying Ren; Allison Shindell; Yeonsoo Lee; | Conference on Machine Translation | 2023-01-01 |
882 | Lingua Custodia’s Participation at The WMT 2023 Terminology Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents Lingua Custodia’s submission to the WMT23 shared task on Terminology shared task. Ensuring precise translation of technical terms plays a pivotal role in … |
Jingshu Liu; Mariam Nakhlé; Gaëtan Caillout; Raheel Qader; | Conference on Machine Translation | 2023-01-01 |
883 | IACS-LRILT: Machine Translation for Low-Resource Indic Languages Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Even though, machine translation has seen huge improvements in the the last decade, translation quality for Indic languages is still underwhelming, which is attributed to the … |
Dhairya Suman; Atanu Mandal; Santanu Pal; S. Naskar; | Conference on Machine Translation | 2023-01-01 |
884 | GUIT-NLP’s Submission to Shared Task: Low Resource Indic Language Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the submission of the GUIT-NLP team in the “Shared Task: Low Resource Indic Language Translation” focusing on three low-resource language pairs: English-Mizo, … |
Mazida Akhtara Ahmed; Kuwali Talukdar; P. Boruah; Prof. Shikhar Kumar Sarma; Kishore Kashyap; | Conference on Machine Translation | 2023-01-01 |
885 | How Effective Is Machine Translation on Low-resource Code-switching? A Case Study Comparing Human and Automatic Metrics Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents an investigation into the differences between processing monolingual input and code-switching (CSW) input in the context of machine translation (MT). … |
Li Nguyen; Christopher Bryant; Oliver Mayeux; Zheng Yuan; | Annual Meeting of the Association for Computational … | 2023-01-01 |
886 | CUNI at WMT23 General Translation Task: MT and A Genetic Algorithm Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the contributions of Charles University teams to the WMT23 General translation task (English to Czech and Czech to Ukrainian translation directions). Our main … |
Josef Jon; M. Popel; Ondrej Bojar; | Conference on Machine Translation | 2023-01-01 |
887 | NAIST-NICT WMT’23 General MT Task Submission Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper, we describe our NAIST-NICT submission to the WMT’23 English ↔ Japanese general machine translation task. Our system generates diverse translation candidates and … |
HIROYUKI DEGUCHI et. al. | Conference on Machine Translation | 2023-01-01 |
888 | Towards Responsible Machine Translation: Ethical and Legal Considerations in Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View |
Towards Responsible Machine Translation | 2023-01-01 | |
889 | DSHacker at SemEval-2023 Task 3: Genres and Persuasion Techniques Detection with Multilingual Data Augmentation Through Machine Translation and Text Generation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In our article, we present the systems developed for SemEval-2023 Task 3, which aimed to evaluate the ability of Natural Language Processing (NLP) systems to detect genres and … |
Arkadiusz Modzelewski; Witold Sosnowski; M. Wilczynska; A. Wierzbicki; | International Workshop on Semantic Evaluation | 2023-01-01 |
890 | The BIGAI Offline Speech Translation Systems for IWSLT 2023 Evaluation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the BIGAI’s submission to IWSLT 2023 Offline Speech Translation task on three language tracks from English to Chinese, German and Japanese. The end-to-end … |
Zhihang Xie; | International Workshop on Spoken Language Translation | 2023-01-01 |
891 | The MineTrans Systems for IWSLT 2023 Offline Speech Translation and Speech-to-Speech Translation Tasks Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the extscMineTrans English-to-Chinese speech translation systems developed for two challenge tracks of IWSLT 2023, i.e., Offline Speech Translation (S2T) and … |
YICHAO DU et. al. | International Workshop on Spoken Language Translation | 2023-01-01 |
892 | Multi-Parallel Corpus of North Levantine Arabic Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Low-resource Machine Translation (MT) is characterized by the scarce availability of training data and/or standardized evaluation benchmarks. In the context of Dialectal Arabic, … |
MATEUSZ KRUBIŃSKI et. al. | ARABICNLP | 2023-01-01 |
893 | ANLP-RG at NADI 2023 Shared Task: Machine Translation of Arabic Dialects: A Comparative Study of Transformer Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper, we present our findings within the context of the NADI-2023 Shared Task (Subtask 2). Our task involves developing a translation model from the Palestinian, … |
Wiem Derouich; Saméh Kchaou; Rahma Boujelbane; | ARABICNLP | 2023-01-01 |
894 | HW-TSC at IWSLT2023: Break The Quality Ceiling of Offline Track Via Pre-Training and Domain Adaptation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents HW-TSC’s submissions to the IWSLT 2023 Offline Speech Translation task, including speech translation of talks from English to German, Chinese, and Japanese, … |
ZONGYAO LI et. al. | International Workshop on Spoken Language Translation | 2023-01-01 |
895 | I2R’s End-to-End Speech Translation System for IWSLT 2023 Offline Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes I2R’s submission to the offline speech translation track for IWSLT 2023. We focus on an end-to-end approach for translation from English audio to German text, … |
M. Huzaifah; Kye Min Tan; Richeng Duan; | International Workshop on Spoken Language Translation | 2023-01-01 |
896 | Speech Translation with Style: AppTek’s Submissions to The IWSLT Subtitling and Formality Tracks in 2023 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: AppTek participated in the subtitling and formality tracks of the IWSLT 2023 evaluation. This paper describes the details of our subtitling pipeline – speech segmentation, speech … |
PARNIA BAHAR et. al. | International Workshop on Spoken Language Translation | 2023-01-01 |
897 | Improving Neural Machine Translation Formality Control with Domain Adaptation and Reranking-based Transductive Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents Huawei Translation Service Center (HW-TSC)’s submission on the IWSLT 2023 formality control task, which provides two training scenarios: supervised and … |
ZHANGLIN WU et. al. | International Workshop on Spoken Language Translation | 2023-01-01 |
898 | Improving Formality-Sensitive Machine Translation Using Data-Centric Approaches and Prompt Engineering Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper, we present the KU x Upstage team’s submission for the Special Task on Formality Control on Spoken Language Translation, which involves translating English into four … |
Seugnjun Lee; Hyeonseok Moon; Chanjun Park; Heu-Jeoung Lim; | International Workshop on Spoken Language Translation | 2023-01-01 |
899 | An Evaluation of Source Factors in Concatenation-Based Context-Aware Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We explore the use of source factors in context-aware neural machine translation, specifically concatenation-based models, to improve the translation quality of inter-sentential … |
Harritxu Gete; Thierry Etchegoyhen; | Recent Advances in Natural Language Processing | 2023-01-01 |
900 | UM-DFKI Maltese Speech Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: For the 2023 IWSLT Maltese Speech Translation Task, UM-DFKI jointly presents a cascade solution which achieves 0.6 BLEU. While this is the first time that a Maltese speech … |
A. WILLIAMS et. al. | International Workshop on Spoken Language Translation | 2023-01-01 |
901 | An Intelligent Algorithm for Fast Machine Translation of Long English Sentences Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Translation of long sentences in English is a complex problem in machine translation. This work briefly introduced the basic framework of intelligent machine translation algorithm … |
Hengheng He; | Journal of Intelligent Systems | 2023-01-01 |
902 | Is ChatGPT A Good Translator? A Preliminary Study Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This report provides a preliminary evaluation of ChatGPT for machine translation, including translation prompt, multilingual translation, and translation robustness. We adopt the … |
Wenxiang Jiao; Wenxuan Wang; Jen-tse Huang; Xing Wang; Zhaopeng Tu; | ArXiv | 2023-01-01 |
903 | Simultaneous Interpreting As A Noisy Channel: How Much Information Gets Through Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We explore the relationship between information density/surprisal of source and target texts in translation and interpreting in the language pair English-German, looking at the … |
M. Kunilovskaya; Heike Przybyl; Ekaterina Lapshinova-Koltunski; Elke Teich; | Recent Advances in Natural Language Processing | 2023-01-01 |
904 | A Deep Learning-Based Intelligent Quality Detection Model for Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With more and more active international connections, the complex scenes-aware machine translation has been a novel concern in the area of natural language processing. Although … |
Meijuan Chen; | IEEE Access | 2023-01-01 |
905 | CoMix: Guide Transformers to Code-Mix Using POS Structure and Phonetics Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Code-mixing is ubiquitous in multilingual societies, which makes it vital to build models for code-mixed data to power human language interfaces. Existing multilingual transformer … |
Gaurav Arora; S. Merugu; Vivek Sembium; | Annual Meeting of the Association for Computational … | 2023-01-01 |
906 | Layer-Level Progressive Transformer With Modality Difference Awareness for Multi-Modal Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multi-modal neural machine translation (MNMT) aims to translate sentences from the source language into the target language with the aid of corresponding images. Unfortunately, … |
Junjun Guo; Junjie Ye; Yanchao Xiang; Zheng Yu; | IEEE/ACM Transactions on Audio, Speech, and Language … | 2023-01-01 |
907 | SignBank+: Multilingual Sign Language Translation Dataset Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This work advances the field of sign language machine translation by focusing on dataset quality and simplification of the translation system. We introduce SignBank+, a clean … |
Amit Moryossef; Zifan Jiang; | ArXiv | 2023-01-01 |
908 | Machine Translation Advancements for Low-Resource Indian Languages in WMT23: CFILT-IITB’s Effort for Bridging The Gap Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper is related to the submission of the CFILT-IITB team for the task called IndicMT in WMT23. The paper describes our MT systems submitted to the WMT23 IndicMT shared task. … |
Pranav Gaikwad; Meet Doshi; S. Deoghare; Pushpak Bhattacharyya; | Conference on Machine Translation | 2023-01-01 |
909 | Findings of The WMT 2023 Shared Task on Quality Estimation IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We report the results of the WMT 2023 shared task on Quality Estimation, in which the challenge is to predict the quality of the output of neural machine translation systems at … |
FREDERIC BLAIN et. al. | Conference on Machine Translation | 2023-01-01 |
910 | Findings of The WMT 2023 Shared Task on Automatic Post-Editing Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present the results from the 9th round of the WMT shared task on MT Automatic Post-Editing, which consists of automatically correcting the output of a “black-box” machine … |
PUSHPAK BHATTACHARYYA et. al. | Conference on Machine Translation | 2023-01-01 |
911 | Findings of The WMT 2023 Shared Task on Machine Translation with Terminologies Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The WMT 2023 Terminology Shared Task investigates progress in machine translation of texts with specialized vocabulary. The participants were given the source text and … |
KIRILL SEMENOV et. al. | Conference on Machine Translation | 2023-01-01 |
912 | Findings of The WMT 2023 Shared Task on Low-Resource Indic Language Translation IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the results of the low-resource Indic language translation task organized alongside the Eighth Conference on Machine Translation (WMT) 2023. In this task, … |
SANTANU PAL et. al. | Conference on Machine Translation | 2023-01-01 |
913 | Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multimodal machine translation (MMT) systems have been successfully developed in recent years for a few language pairs. However, training such models usually requires tuples of a … |
Tosho Hirasawa; Emanuele Bugliarello; Desmond Elliott; Mamoru Komachi; | Conference on Machine Translation | 2023-01-01 |
914 | Towards Better Evaluation for Formality-Controlled English-Japanese Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper we propose a novel approach to automatically classify the level of formality in Japanese text, using three categories (formal, polite, and informal). We introduce a … |
Edison Marrese-Taylor; Pin Chen Wang; Yutaka Matsuo; | Conference on Machine Translation | 2023-01-01 |
915 | SJTU-MTLAB’s Submission to The WMT23 Word-Level Auto Completion Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Word-level auto-completion (WLAC) plays a crucial role in Computer-Assisted Translation. In this paper, we describe the SJTU-MTLAB’s submission to the WMT23 WLAC task. We propose … |
Xingyu Chen; Rui Wang; | Conference on Machine Translation | 2023-01-01 |
916 | HW-TSC 2023 Submission for The Quality Estimation Shared Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Quality estimation (QE) is an essential technique to assess machine translation quality without reference translations. In this paper, we focus on Huawei Translation Services … |
YUANG LI et. al. | Conference on Machine Translation | 2023-01-01 |
917 | Machine Translation with Large Language Models: Prompting, Few-shot Learning, and Fine-tuning with QLoRA IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While large language models have made remarkable advancements in natural language generation, their potential in machine translation, especially when fine-tuned, remains … |
Xuan Zhang; Navid Rajabi; Kevin Duh; Philipp Koehn; | Conference on Machine Translation | 2023-01-01 |
918 | PRHLT’s Submission to WLAC 2023 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes our submission to the Word-Level AutoCompletion shared task of WMT23. We participated in the English–German and German–English categories. We extended our … |
Ángel Navarro; Miguel Domingo; Francisco Casacuberta; | Conference on Machine Translation | 2023-01-01 |
919 | KnowComp Submission for WMT23 Word-Level AutoCompletion Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The NLP community has recently witnessed the success of Large Language Models (LLMs) across various Natural Language Processing (NLP) tasks. However, the potential of LLMs for … |
Yi Wu; Haochen Shi; Weiqi Wang; Yangqiu Song; | Conference on Machine Translation | 2023-01-01 |
920 | DUTNLP System for The WMT2023 Discourse-Level Literary Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the submission of DUTNLP Lab submission to WMT23 Discourse-Level Literary Translation in the Chinese to English translation direction under unconstrained … |
Anqi Zhao; Kaiyu Huang; Hao Yu; Degen Huang; | Conference on Machine Translation | 2023-01-01 |
921 | TJUNLP:System Description for The WMT23 Literary Task in Chinese to English Translation Direction Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper introduces the overall situation of the Natural Language Processing Laboratory of Tianjin University participating in the WMT23 machine translation evaluation task from … |
Shaolin Zhu; Deyi Xiong; | Conference on Machine Translation | 2023-01-01 |
922 | A Smaller and Better Word Embedding for Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Word embeddings play an important role in Neural Machine Translation (NMT). However, it still has a series of problems such as ignoring the prior knowledge of the association … |
Qi Chen; | IEEE Access | 2023-01-01 |
923 | Breaking The Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence Modeling IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel representation method for Chinese characters to break the bottlenecks, namely StrokeNet, which represents a Chinese character by a Latinized stroke sequence (e. g. , �? |
Zhijun Wang; Xuebo Liu; Min Zhang; | emnlp | 2022-12-30 |
924 | GuoFeng: A Benchmark for Zero Pronoun Recovery and Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the data and evaluation gaps, we propose a benchmark testset for target evaluation on Chinese-English ZP translation. |
MINGZHOU XU et. al. | emnlp | 2022-12-30 |
925 | Modeling Consistency Preference Via Lexical Chains for Document-level Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we aim to relieve the issue of lexical translation inconsistency for document-level neural machine translation (NMT) by modeling consistency preference for lexical chains, which consist of repeated words in a source-side document and provide a representation of the lexical consistency structure of the document. |
XINGLIN LYU et. al. | emnlp | 2022-12-30 |
926 | Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The experts note that MT outputs contain not only mistranslations, but also discourse-disrupting errors and stylistic inconsistencies. To address these problems, we train a post-editing model whose output is preferred over normal MT output at a rate of 69% by experts. |
KATHERINE THAI et. al. | emnlp | 2022-12-30 |
927 | DivEMT: Neural Machine Translation Post-Editing Effort Across Typologically Diverse Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce DivEMT, the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages. |
Gabriele Sarti; Arianna Bisazza; Ana Guerberof-Arenas; Antonio Toral; | emnlp | 2022-12-30 |
928 | LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end,we first propose the Multilingual MMT task by establishing two new Multilingual MMT benchmark datasets covering seven languages.Then, an effective baseline LVP-M3 using visual prompts is proposed to support translations between different languages,which includes three stages (token encoding, language-aware visual prompt generation, and language translation). |
HONGCHENG GUO et. al. | emnlp | 2022-12-30 |
929 | Information-Transport-based Policy for Simultaneous Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we treat the translation as information transport from source to target and accordingly propose an Information-Transport-based Simultaneous Translation (ITST). |
Shaolei Zhang; Yang Feng; | emnlp | 2022-12-30 |
930 | WeTS: A Benchmark for Translation Suggestion IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To break these limitations mentioned above and spur the research in TS, we create a benchmark dataset, called WeTS, which is a golden corpus annotated by expert translators on four translation directions. |
Zhen Yang; Fandong Meng; Yingxue Zhang; Ernan Li; Jie Zhou; | emnlp | 2022-12-30 |
931 | DEMETR: Diagnosing Evaluation Metrics for Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The operations of newer learned metrics (e. g. , BLEURT, COMET), which leverage pretrained language models to achieve higher correlations with human quality judgments than BLEU, are opaque in comparison. In this paper, we shed light on the behavior of these learned metrics by creating DEMETR, a diagnostic dataset with 31K English examples (translated from 10 source languages) for evaluating the sensitivity of MT evaluation metrics to 35 different linguistic perturbations spanning semantic, syntactic, and morphological error categories. |
MARZENA KARPINSKA et. al. | emnlp | 2022-12-30 |
932 | Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the challenging problem of performing a generative task in a target language when labeled data is only available in English, using summarization as a case study. |
TU VU et. al. | emnlp | 2022-12-30 |
933 | PreQuEL: Quality Estimation of Machine Translation Outputs in Advance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the task of PreQuEL, Pre-(Quality-Estimation) Learning. |
Shachar Don-Yehiya; Leshem Choshen; Omri Abend; | emnlp | 2022-12-30 |
934 | T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new approach to perform zero-shot cross-modal transfer between speech and text for translation tasks. |
Paul-Ambroise Duquenne; Hongyu Gong; Beno�t Sagot; Holger Schwenk; | emnlp | 2022-12-30 |
935 | MT-GenEval: A Counterfactual and Contextual Dataset for Evaluating Gender Accuracy in Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce MT-GenEval, a benchmark for evaluating gender accuracy in translation from English into eight widely-spoken languages. |
ANNA CURREY et. al. | emnlp | 2022-12-30 |
936 | Neural Machine Translation with Contrastive Translation Memories IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Different from previous works that make use of mutually similar but redundant translation memories (TMs), we propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence while individually contrastive to each other providing maximal information gain in three phases. |
Xin Cheng; Shen Gao; Lemao Liu; Dongyan Zhao; Rui Yan; | emnlp | 2022-12-30 |
937 | Entropy-Based Vocabulary Substitution for Incremental Learning in Multilingual Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an entropy-based vocabulary substitution (EVS) method that just needs to walk through new language pairs for incremental learning in a large-scale multilingual data updating while remaining the size of the vocabulary. |
Kaiyu Huang; Peng Li; Jin Ma; Yang Liu; | emnlp | 2022-12-30 |
938 | ConsistTL: Modeling Consistency in Transfer Learning for Low-Resource Neural Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel transfer learning method for NMT, namely ConsistTL, which can continuously transfer knowledge from the parent model during the training of the child model. |
Zhaocong Li; Xuebo Liu; Derek F. Wong; Lidia S. Chao; Min Zhang; | emnlp | 2022-12-30 |
939 | Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to enable zero-shot ST, we propose a novel Discrete Cross-Modal Alignment (DCMA) method that employs a shared discrete vocabulary space to accommodate and match both modalities of speech and text. |
CHEN WANG et. al. | emnlp | 2022-12-30 |
940 | Bilingual Synchronization: Restoring Translational Relationships with Editing Operations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider here a more general setting which assumes an initial target sequence, that must be transformed into a valid translation of the source, thereby restoring parallelism between source and target. |
Jitao Xu; Josep Crego; Fran�ois Yvon; | emnlp | 2022-12-30 |
941 | Distill The Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, in this work, we introduce IKD-MMT, a novel MMT framework to support the image-free inference phase via an inversion knowledge distillation scheme. |
Ru Peng; Yawen Zeng; Jake Zhao; | emnlp | 2022-12-30 |
942 | A Domain Specific Parallel Corpus and Enhanced English-Assamese Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine translation deals with automatic translation from one natural language to another. Neural machine translation is a widely accepted technique of the corpus-based machine … |
Sahinur Rahman Laskar; Riyanka Manna; Partha Pakray; Sivaji Bandyopadhyay; | Computación y Sistemas | 2022-12-25 |
943 | Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the use of deep Transformer translation model for the CCMT 2022 Chinese-Thai low-resource machine translation task. |
Wenjie Hao; Hongfei Xu; Lingling Mu; Hongying Zan; | arxiv-cs.CL | 2022-12-24 |
944 | Beyond Triplet: Leveraging The Most Data for Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: First, they can only utilize triple data (bilingual texts with images), which is scarce; second, current benchmarks are relatively restricted and do not correspond to realistic scenarios. Therefore, this paper correspondingly establishes new methods and new datasets for MMT. |
YAOMING ZHU et. al. | arxiv-cs.CL | 2022-12-20 |
945 | Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new MMT approach based on a strong text-only MT model, which uses neural adapters, a novel guided self-attention mechanism and which is jointly trained on both visually-conditioned masking and MMT. |
Matthieu Futeral; Cordelia Schmid; Ivan Laptev; Benoît Sagot; Rachel Bawden; | arxiv-cs.CL | 2022-12-20 |
946 | IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation Metrics for Indian Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Indian languages, having over a billion speakers, are linguistically different from English, and to date, there has not been a systematic study of evaluating MT systems from English into Indian languages. In this paper, we fill this gap by creating an MQM dataset consisting of 7000 fine-grained annotations, spanning 5 Indian languages and 7 MT systems, and use it to establish correlations between annotator scores and scores obtained using existing automatic metrics. |
ANANYA B. SAI et. al. | arxiv-cs.CL | 2022-12-20 |
947 | Mu2SLAM: Multitask, Multilingual Speech and Language Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech … |
Yong Cheng; Yu Zhang; Melvin Johnson; Wolfgang Macherey; Ankur Bapna; | ArXiv | 2022-12-19 |
948 | Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages. |
Yong Cheng; Yu Zhang; Melvin Johnson; Wolfgang Macherey; Ankur Bapna; | arxiv-cs.CL | 2022-12-19 |
949 | Controlling Styles in Neural Machine Translation with Activation Prompt Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address both challenges, this paper presents a new benchmark and approach. |
Yifan Wang; Zewei Sun; Shanbo Cheng; Weiguo Zheng; Mingxuan Wang; | arxiv-cs.CL | 2022-12-17 |
950 | AdaTranS: Adapting with Boundary-based Shrinking for End-to-End Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose AdaTranS for end-to-end ST. It adapts the speech features with a new shrinking mechanism to mitigate the length mismatch between speech and text features by predicting word boundaries. |
Xingshan Zeng; Liangyou Li; Qun Liu; | arxiv-cs.CL | 2022-12-17 |
951 | EntityRank: Unsupervised Mining of Bilingual Named Entity Pairs from Parallel Corpora for Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: As Neural Machine Translation (NMT) heavily relies on training data, finding an effective method to help NMT make better use of limited data is of great significance. In this … |
MIN ZHANG et. al. | 2022 IEEE International Conference on Big Data (Big Data) | 2022-12-17 |
952 | Better Datastore, Better Translation: Generating Datastores from Pre-Trained Models for Nearest Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose PRED, a framework that leverages Pre-trained models for Datastores in kNN-MT. |
Jiahuan Li; Shanbo Cheng; Zewei Sun; Mingxuan Wang; Shujian Huang; | arxiv-cs.CL | 2022-12-17 |
953 | An Approach Based on Deep Learning for Indian Sign Language Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: PurposeAccording to the Indian Sign Language Research and Training Centre (ISLRTC), India has approximately 300 certified human interpreters to help people with hearing loss. This … |
Kinjal Mistree; D. Thakor; Brijesh Bhatt; | Int. J. Intell. Comput. Cybern. | 2022-12-16 |
954 | Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show effective regularization strategies, namely dropout techniques for MoE layers in EOM and FOM, Conditional MoE Routing and Curriculum Learning methods that prevent over-fitting and improve the performance of MoE models on low-resource tasks without adversely affecting high-resource tasks. |
Maha Elbayad; Anna Sun; Shruti Bhosale; | arxiv-cs.CL | 2022-12-14 |
955 | ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we step towards bridging the gap between multilingual NLs and multilingual PLs for large language models (LLMs). |
YEKUN CHAI et. al. | arxiv-cs.CL | 2022-12-13 |
956 | Towards A General Purpose Machine Translation System for Sranantongo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study we create a general purpose machine translation system for srn. |
Just Zwennicker; David Stap; | arxiv-cs.CL | 2022-12-13 |
957 | End-to-End Speech Translation of Arabic to English Broadcast News Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents our efforts towards the development of the first Broadcast News end-to-end Arabic to English speech translation system. |
Fethi Bougares; Salim Jouili; | arxiv-cs.CL | 2022-12-11 |
958 | M3ST: Mix at Three Levels for Speech Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Mix at three levels for Speech Translation (M^3ST) method to increase the diversity of the augmented training corpus. |
XUXIN CHENG et. al. | arxiv-cs.CL | 2022-12-07 |
959 | Exploring Web-based Translation Resources Applied to Hindi-English Cross-Lingual Information Retrieval Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Internet users perceive a multilingual web but are unfamiliar with it due to communication in their regional language called Cross-Lingual Information Retrieval (CLIR). In CLIR, a … |
V. Sharma; Namita Mittal; Ankit Vidyarthi; Deepak Gupta; | ACM Transactions on Asian and Low-Resource Language … | 2022-12-06 |
960 | Life-long Learning for Multilingual Neural Machine Translation with Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, in one-tomany scenario, we propose a multilingual distillation method to make the new model (student) jointly learn multilingual output from old model (teacher) and new task. |
YANG ZHAO et. al. | arxiv-cs.CL | 2022-12-06 |
961 | Impact of Domain-Adapted Multilingual Neural Machine Translation in The Medical Domain Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We compare the out-of-domain MNMT with the in-domain adapted MNMT. |
Miguel Rios; Raluca-Maria Chereji; Alina Secara; Dragos Ciobanu; | arxiv-cs.CL | 2022-12-05 |
962 | In-context Examples Selection for Machine Translation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to understand the properties of good in-context examples for MT in both in-domain and out-of-domain settings. |
Sweta Agrawal; Chunting Zhou; Mike Lewis; Luke Zettlemoyer; Marjan Ghazvininejad; | arxiv-cs.CL | 2022-12-05 |
963 | Democratizing Neural Machine Translation with OPUS-MT IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows. |
JÖRG TIEDEMANN et. al. | arxiv-cs.CL | 2022-12-04 |
964 | The RoyalFlush System for The WMT 2022 Efficiency Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the submission of the RoyalFlush neural machine translation system for the WMT 2022 translation efficiency task. |
BO QIN et. al. | arxiv-cs.CL | 2022-12-03 |
965 | Tackling Low-Resourced Sign Language Translation: UPC at WMT-SLT 22 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper describes the system developed at the Universitat Polit\`ecnica de Catalunya for the Workshop on Machine Translation 2022 Sign Language Translation Task, in particular, for the sign-to-text direction. |
Laia Tarrés; Gerard I. Gàllego; Xavier Giró-i-Nieto; Jordi Torres; | arxiv-cs.CL | 2022-12-02 |
966 | CUNI Systems for The WMT22 Czech-Ukrainian Translation Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Charles University submissions to the WMT22 General Translation Shared Task on Czech-Ukrainian and Ukrainian-Czech machine translation. |
Martin Popel; Jindřich Libovický; Jindřich Helcl; | arxiv-cs.CL | 2022-12-01 |
967 | Neural Machine Translation for Kashmiri to English and Hindi Using Pre-trained Embeddings Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Neural Machine Translation (NMT) is one of the advanced approaches of Machine Translation (MT) that has recently gained popularity. A significant amount of parallel corpus is … |
Shailashree K. Sheshadri; Deepa Gupta; M. Costa-jussà; | 2022 OITS International Conference on Information … | 2022-12-01 |
968 | CUNI Systems for The WMT 22 Czech-Ukrainian Translation Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present Charles University submissions to the WMT 22 GeneralTranslation Shared Task on Czech-Ukrainian and Ukrainian-Czech machine translation. We present two constrained … |
M. Popel; Jindřich Libovický; Jindřich Helcl; | Conference on Machine Translation | 2022-12-01 |
969 | Domain Mismatch Doesn’t Always Prevent Cross-lingual Transfer Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Cross-lingual transfer learning without labeled target language data or parallel text has been surprisingly effective in zero-shot cross-lingual classification, question … |
Daniel Edmiston; Phillip Keung; Noah A. Smith; | International Conference on Language Resources and … | 2022-11-30 |
970 | Sevi: Speech-to-Visualization Through Neural Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Arguably, the most natural way to specify what to visualize is through natural language or speech, similar to our daily search on Google or Apple Siri, leaving to the system the task of reasoning about what to visualize and how. In this demo, we present Sevi an end-to-end data visualization system that acts as a virtual assistant to allow novices to create visualizations through either natural language or speech. |
Jiawei Tang; Yuyu Luo; Mourad Ouzzani; Guoliang Li; Hongyang Chen; | sigmod | 2022-11-30 |
971 | VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a machine translation system tailored for the task of video dubbing, which directly considers the speech duration of each token in translation, to match the length of source and target speech. |
YIHAN WU et. al. | arxiv-cs.CL | 2022-11-30 |
972 | Findings of The WMT 2022 Shared Task on Translation Suggestion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We report the result of the first edition of the WMT shared task on Translation Suggestion (TS). |
Zhen Yang; Fandong Meng; Yingxue Zhang; Ernan Li; Jie Zhou; | arxiv-cs.CL | 2022-11-29 |
973 | CUNI Submission in WMT22 General Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the CUNI-Bergamot submission for the WMT22 General translation task. |
Josef Jon; Martin Popel; Ondřej Bojar; | arxiv-cs.CL | 2022-11-29 |
974 | CUNI-Bergamot Submission at WMT22 General Translation Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present the CUNI-Bergamot submission for the WMT22 General translation task. We compete in English-Czech direction. Our submission further explores block backtranslation … |
Josef Jon; M. Popel; Ondrej Bojar; | ArXiv | 2022-11-29 |
975 | Domain Mismatch Doesn’t Always Prevent Cross-Lingual Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that a simple initialization regimen can overcome much of the effect of domain mismatch in cross-lingual transfer. |
Daniel Edmiston; Phillip Keung; Noah A. Smith; | arxiv-cs.CL | 2022-11-29 |
976 | An Interactive Learning Platform with Machine Translation for Practicing Text-Based Conversational English Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Text-based conversation, such as chatting or texting, has become an essential means of written communication. This paper investigates the importance of practicing textual … |
A. Rusli; M. Shishido; | 2022 Joint 12th International Conference on Soft Computing … | 2022-11-29 |
977 | Wolaytta-English Cross-lingual Information Retrieval Using Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The necessity of a system that supports query-based searching with the presence of more than 7,154 world languages also increases as the accessibility of information across … |
Aklilu Thomas Bedecho; Michael Melese Woldeyohannis; | 2022 International Conference on Information and … | 2022-11-28 |
978 | Amharic-Kistanigna Bi-directional Machine Translation Using Deep Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine translation is an application of NLP, which can be used to translate text from one natural language to another natural language. This research attempted to design … |
Mengistu Kinfe Negia; Rahel Mekonen Tamiru; Million Meshesha; | 2022 International Conference on Information and … | 2022-11-28 |
979 | BJTU-WeChat’s Systems for The WMT22 Chat Translation Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT’22 chat translation task for English-German. Based on the Transformer, we … |
Yunlong Liang; Fandong Meng; Jinan Xu; Yufeng Chen; Jie Zhou; | Conference on Machine Translation | 2022-11-28 |
980 | Summer: WeChat Neural Machine Translation Systems for The WMT22 Biomedical Translation Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces WeChat’s participation in WMT 2022 shared biomedical translation task on Chinese to English. |
Ernan Li; Fandong Meng; Jie Zhou; | arxiv-cs.CL | 2022-11-27 |
981 | BJTU-WeChat’s Systems for The WMT22 Chat Translation Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT’22 chat translation task for English-German. |
Yunlong Liang; Fandong Meng; Jinan Xu; Yufeng Chen; Jie Zhou; | arxiv-cs.CL | 2022-11-27 |
982 | ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic-English IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic-English Speech Translation Corpus. This corpus is an extension of the ArzEn speech corpus, which was … |
Injy Hamed; Nizar Habash; S. Abdennadher; Ngoc Thang Vu; | Workshop on Arabic Natural Language Processing | 2022-11-22 |
983 | ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic – English Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we collect translations in both directions, monolingual Egyptian Arabic and monolingual English, forming a three-way speech translation corpus. |
Injy Hamed; Nizar Habash; Slim Abdennadher; Ngoc Thang Vu; | arxiv-cs.CL | 2022-11-21 |
984 | Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a simple back-translation-style data augmentation method for mandarin Chinese polyphone disambiguation, utilizing a large amount of unlabeled text data. |
CHUNYU QIANG et. al. | arxiv-cs.SD | 2022-11-17 |
985 | TSMind: Alibaba and Soochow University’s Submission to The WMT22 Translation Suggestion Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the joint submission of Alibaba and Soochow University, TSMind, to the WMT 2022 Shared Task on Translation Suggestion (TS). |
XIN GE et. al. | arxiv-cs.CL | 2022-11-16 |
986 | MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we leverage the evaluations of candidate systems submitted to the English-German SST task at IWSLT 2022 and conduct an extensive correlation analysis of CR and the aforementioned metrics. |
Dominik Macháček; Ondřej Bojar; Raj Dabre; | arxiv-cs.CL | 2022-11-15 |
987 | Easy Guided Decoding in Providing Suggestions for Interactive Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we utilize the parameterized objective function of neural machine translation (NMT) and propose a novel constrained decoding algorithm, namely Prefix Suffix Guided Decoding (PSGD), to deal with the TS problem without additional training. |
Ke Wang; Xin Ge; Jiayi Wang; Yu Zhao; Yuqi Zhang; | arxiv-cs.CL | 2022-11-13 |
988 | Morphologically Motivated Input Variations and Data Augmentation in Turkish-English Neural Machine Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Success of neural networks in natural language processing has paved the way for neural machine translation (NMT), which rapidly became the mainstream approach in machine … |
Zeynep Yi̇rmi̇beşoğlu; Tunga Güngör; | ACM Transactions on Asian and Low-Resource Language … | 2022-11-11 |
989 | Grammatical Error Correction: A Survey of The State of The Art IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. |
CHRISTOPHER BRYANT et. al. | arxiv-cs.CL | 2022-11-09 |
990 | ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ERNIE-UniX2, a unified cross-lingual cross-modal pre-training framework for both generation and understanding tasks. |
BIN SHAN et. al. | arxiv-cs.CV | 2022-11-09 |
991 | Building User-oriented Personalized Machine Translator Based on User-Generated Textual Content Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine Translation (MT) has been a very useful tool to assist multilingual communication and collaboration. In recent years, by taking advantage of the exciting developments of … |
P. ZHANG et. al. | Proceedings of the ACM on Human-Computer Interaction | 2022-11-07 |
992 | InsNet: An Efficient, Flexible, and Performant Insertion-based Text Generation Model IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose InsNet, an expressive insertion-based text generator with efficient training and flexible decoding (parallel or sequential). |
Sidi Lu; Tao Meng; Nanyun Peng; | nips | 2022-11-06 |
993 | Refining Low-Resource Unsupervised Translation By Language Disentanglement of Multilingual Translation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a simple refinement procedure to separate languages from a pre-trained multilingual UMT model for it to focus on only the target low-resource task. |
Xuan-Phi Nguyen; Shafiq Joty; Kui Wu; Ai Ti Aw; | nips | 2022-11-06 |
994 | MT-GenEval: A Counterfactual and Contextual Dataset for Evaluating Gender Accuracy in Machine Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce MT-GenEval, a benchmark for evaluating gender accuracy in translation from English into eight widely-spoken languages. |
ANNA CURREY et. al. | arxiv-cs.CL | 2022-11-02 |
995 | Analysis of Layer-Wise Training in Direct Speech to Speech Translation Using BI-LSTM Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech-to-speech translation (S2ST) is the process of translation of speech from one language to another. Traditional S2ST systems follow a cascaded approach, where three modules … |
Lalaram Arya; Ayush Agarwal; Jagabandhu Mishra; S. Prasanna; | 2022 25th Conference of the Oriental COCOSDA International … | 2022-11-01 |
996 | Domain Curricula for Code-Switched MT at MixMT 2022 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present our approach and results for the Code-mixed Machine Translation (MixMT) shared task at WMT 2022: the task consists of two subtasks, monolingual to code-mixed machine translation (Subtask-1) and code-mixed to monolingual machine translation (Subtask-2). |
Lekan Raheem; Maab Elrashid; | arxiv-cs.CL | 2022-10-31 |
997 | Translation Comprehension Misunderstandings of English Majors in The Information Era: — A Case Study of Offline English-Chinese Translation Test Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the past, English major students carried heavy dictionaries while learning English. Nowadays, they can obtain more information quickly and efficiently by using electronic … |
Xinshan Zhou; Yang Liu; Chunjie Yin; | Proceedings of the 14th International Conference on … | 2022-10-28 |
998 | Domain Adaptation of Machine Translation with Crowdworkers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework that efficiently and effectively collects parallel sentences in a target domain from the web with the help of crowdworkers. |
Makoto Morishita; Jun Suzuki; Masaaki Nagata; | arxiv-cs.CL | 2022-10-27 |
999 | The Effect of Normalization for Bi-directional Amharic-English Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents the first relatively large-scale Amharic-English parallel sentence dataset. |
TADESSE DESTAW BELAY et. al. | arxiv-cs.CL | 2022-10-27 |
1000 | Improving Speech-to-Speech Translation Through Unlabeled Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an effective way to utilize the massive existing unlabeled text from different languages to create a large amount of S2ST data to improve S2ST performance by applying various acoustic effects to the generated synthetic data. |
XUAN-PHI NGUYEN et. al. | arxiv-cs.CL | 2022-10-26 |