Paper Digest: Most Cited Papers on Transformer
Paper Digest Team extracted all recent Transformer (NLP) related papers on our radar, and generated highlight sentences for them. The results are then sorted by impact.
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Try us today and unlock the full potential of our services for free!
TABLE 1: Paper Digest: Most Cited Papers on Transformer
| Paper | Author(s) | Source | Date | |
|---|---|---|---|---|
| 1 | Language Models Are Few-Shot Learners IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. |
TOM BROWN et. al. | nips | 2020-11-17 |
| 2 | Language Models Are Few-Shot Learners IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. |
TOM B. BROWN et. al. | arxiv-cs.CL | 2020-05-28 |
| 3 | RoBERTa: A Robustly Optimized BERT Pretraining Approach IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. |
YINHAN LIU et. al. | arxiv-cs.CL | 2019-07-26 |
| 4 | Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. |
ZE LIU et. al. | iccv | 2021-10-08 |
| 5 | Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. |
Nils Reimers; Iryna Gurevych; | emnlp | 2019-11-02 |
| 6 | GPT-4 Technical Report IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. |
JOSH ACHIAM et. al. | arxiv-cs.CL | 2023-03-15 |
| 7 | BART: Denoising Sequence-to-Sequence Pre-training For Natural Language Generation, Translation, And Comprehension IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. |
MIKE LEWIS et. al. | acl | 2020-06-20 |
| 8 | BART: Denoising Sequence-to-Sequence Pre-training For Natural Language Generation, Translation, And Comprehension IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. |
MIKE LEWIS et. al. | arxiv-cs.CL | 2019-10-29 |
| 9 | XLNet: Generalized Autoregressive Pretraining For Language Understanding IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. |
ZHILIN YANG et. al. | arxiv-cs.CL | 2019-06-19 |
| 10 | XLNet: Generalized Autoregressive Pretraining for Language Understanding IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. |
ZHILIN YANG et. al. | nips | 2019-11-15 |
| 11 | LoRA: Low-Rank Adaptation of Large Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Finetuning updates have a low intrinsic rank which allows us to train only the rank decomposition matrices of certain weights, yielding better performance and practical benefits. |
EDWARD J HU et. al. | iclr | 2022-02-08 |
| 12 | LoRA: Low-Rank Adaptation of Large Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. |
EDWARD J. HU et. al. | arxiv-cs.CL | 2021-06-17 |
| 13 | DistilBERT, A Distilled Version Of BERT: Smaller, Faster, Cheaper And Lighter IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger counterparts. |
Victor Sanh; Lysandre Debut; Julien Chaumond; Thomas Wolf; | arxiv-cs.CL | 2019-10-02 |
| 14 | ALBERT: A Lite BERT For Self-supervised Learning Of Language Representations IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. |
ZHENZHONG LAN et. al. | arxiv-cs.CL | 2019-09-26 |
| 15 | Unsupervised Cross-lingual Representation Learning At Scale IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks. |
ALEXIS CONNEAU et. al. | acl | 2020-06-20 |
| 16 | Unsupervised Cross-lingual Representation Learning At Scale IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We train a Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered CommonCrawl data. |
ALEXIS CONNEAU et. al. | arxiv-cs.CL | 2019-11-05 |
| 17 | Evaluating Large Language Models Trained on Code IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. |
MARK CHEN et. al. | arxiv-cs.LG | 2021-07-07 |
| 18 | Prefix-Tuning: Optimizing Continuous Prompts for Generation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which we call the prefix. |
Xiang Lisa Li; Percy Liang; | acl | 2021-07-26 |
| 19 | Longformer: The Long-Document Transformer IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. |
Iz Beltagy; Matthew E. Peters; Arman Cohan; | arxiv-cs.CL | 2020-04-10 |
| 20 | Transformer-XL: Attentive Language Models Beyond A Fixed-Length Context IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. |
ZIHANG DAI et. al. | acl | 2019-07-28 |
| 21 | Transformer-XL: Attentive Language Models Beyond A Fixed-Length Context IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. |
ZIHANG DAI et. al. | arxiv-cs.LG | 2019-01-09 |
| 22 | ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. |
Jiasen Lu; Dhruv Batra; Devi Parikh; Stefan Lee; | nips | 2019-11-15 |
| 23 | Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unlike the recently-proposed Vision Transformer (ViT) that was designed for image classification specifically, we introduce the Pyramid Vision Transformer (PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks. |
WENHAI WANG et. al. | iccv | 2021-10-08 |
| 24 | Finetuned Language Models Are Zero-Shot Learners IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper explores a simple method for improving the zero-shot learning abilities of language models. |
JASON WEI et. al. | arxiv-cs.CL | 2021-09-03 |
| 25 | Visual Instruction Tuning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and an LLM for general-purpose visual and language understanding. |
Haotian Liu; Chunyuan Li; Qingyang Wu; Yong Jae Lee; | nips | 2023-10-24 |
| 26 | OPT: Open Pre-trained Transformer Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. |
SUSAN ZHANG et. al. | arxiv-cs.CL | 2022-05-02 |
| 27 | Sparks of Artificial General Intelligence: Early Experiments with GPT-4 IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. |
SÉBASTIEN BUBECK et. al. | arxiv-cs.CL | 2023-03-22 |
| 28 | The Llama 3 Herd of Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a new set of foundation models, called Llama 3. |
AARON GRATTAFIORI et. al. | arxiv-cs.AI | 2024-07-31 |
| 29 | DeBERTa: Decoding-enhanced BERT with Disentangled Attention IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. |
Pengcheng He; Xiaodong Liu; Jianfeng Gao; Weizhu Chen; | arxiv-cs.CL | 2020-06-05 |
| 30 | ELECTRA: Pre-training Text Encoders As Discriminators Rather Than Generators IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As an alternative, we propose a more sample-efficient pre-training task called replaced token detection. |
Kevin Clark; Minh-Thang Luong; Quoc V. Le; Christopher D. Manning; | arxiv-cs.CL | 2020-03-23 |
| 31 | Reformer: The Efficient Transformer IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce two techniques to improve the efficiency of Transformers. |
Nikita Kitaev; Łukasz Kaiser; Anselm Levskaya; | arxiv-cs.LG | 2020-01-13 |
| 32 | Reformer: The Efficient Transformer IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Efficient Transformer with locality-sensitive hashing and reversible layers |
Nikita Kitaev; Lukasz Kaiser; Anselm Levskaya; | iclr | 2019-12-21 |
| 33 | The Pile: An 800GB Dataset Of Diverse Text For Language Modeling IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With this in mind, we present \textit{the Pile}: an 825 GiB English text corpus targeted at training large-scale language models. |
LEO GAO et. al. | arxiv-cs.CL | 2020-12-31 |
| 34 | Point Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Point Transformer, a deep neural network that operates directly on unordered and unstructured point sets. |
Nico Engel; Vasileios Belagiannis; Klaus Dietmayer; | arxiv-cs.CV | 2020-11-02 |
| 35 | TinyBERT: Distilling BERT For Natural Language Understanding IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models. |
XIAOQI JIAO et. al. | emnlp | 2020-11-10 |
| 36 | FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose FlashAttention, an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. |
Tri Dao; Daniel Y. Fu; Stefano Ermon; Atri Rudra; Christopher Ré; | arxiv-cs.LG | 2022-05-27 |
| 37 | Training Compute-Optimal Large Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. |
JORDAN HOFFMANN et. al. | arxiv-cs.CL | 2022-03-29 |
| 38 | Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present our techniques for training very large transformer models and implement a simple, efficient intra-layer model parallel approach that enables training transformer models with billions of parameters. |
MOHAMMAD SHOEYBI et. al. | arxiv-cs.CL | 2019-09-17 |
| 39 | VL-BERT: Pre-training Of Generic Visual-Linguistic Representations IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short). |
WEIJIE SU et. al. | arxiv-cs.CV | 2019-08-22 |
| 40 | MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We believe that the enhanced multi-modal generation capabilities of GPT-4 stem from the utilization of sophisticated large language models (LLM). To examine this phenomenon, we present MiniGPT-4, which aligns a frozen visual encoder with a frozen advanced LLM, Vicuna, using one projection layer. |
Deyao Zhu; Jun Chen; Xiaoqian Shen; Xiang Li; Mohamed Elhoseiny; | iclr | 2024-02-26 |
| 41 | GPT-3: Its Nature, Scope, Limits, and Consequences IF:8 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this commentary, we discuss the nature of reversible and irreversible questions, that is, questions that may enable one to identify the nature of the source of their answers. … |
Luciano Floridi; Massimo Chiriatti; | Minds and Machines | 2020-01-01 |
| 42 | Swin Transformer V2: Scaling Up Capacity and Resolution IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present techniques for scaling Swin Transformer [??] up to 3 billion parameters and making it capable of training with images of up to 1,536×1,536 resolution. |
ZE LIU et. al. | cvpr | 2022-06-07 |
| 43 | Linformer: Self-Attention With Linear Complexity IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we demonstrate that the self-attention mechanism can be approximated by a low-rank matrix. |
Sinong Wang; Belinda Z. Li; Madian Khabsa; Han Fang; Hao Ma; | arxiv-cs.LG | 2020-06-08 |
| 44 | Pre-Trained Image Processing Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the low-level computer vision task (e.g., denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). |
HANTING CHEN et. al. | arxiv-cs.CV | 2020-12-01 |
| 45 | TruthfulQA: Measuring How Models Mimic Human Falsehoods IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a benchmark to measure whether a language model is truthful in generating answers to questions. |
Stephanie Lin; Jacob Hilton; Owain Evans; | acl | 2022-05-17 |
| 46 | DialoGPT: Large-Scale Generative Pre-training For Conversational Response Generation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a large, tunable neural conversational response generation model, DialoGPT (dialogue generative pre-trained transformer). |
YIZHE ZHANG et. al. | arxiv-cs.CL | 2019-11-01 |
| 47 | PVTv2: Improved Baselines with Pyramid Vision Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View |
WENHAI WANG et. al. | ArXiv | 2021-06-25 |
| 48 | PVT V2: Improved Baselines with Pyramid Vision Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs, including (1) linear complexity attention layer, (2) overlapping patch embedding, and (3) convolutional feed-forward network. |
WENHAI WANG et. al. | arxiv-cs.CV | 2021-06-25 |
| 49 | Decision Transformer: Reinforcement Learning Via Sequence Modeling IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. |
LILI CHEN et. al. | nips | 2021-11-20 |
| 50 | Decision Transformer: Reinforcement Learning Via Sequence Modeling IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. |
LILI CHEN et. al. | arxiv-cs.LG | 2021-06-02 |
| 51 | Generative Pretraining From Pixels IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. |
MARK CHEN et. al. | icml | 2020-07-11 |
| 52 | Transformer in Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we point out that the attention inside these local patches are also essential for building visual transformers with high performance and we explore a new architecture, namely, Transformer iN Transformer (TNT). |
KAI HAN et. al. | nips | 2021-11-20 |
| 53 | A Primer In BERTology: What We Know About How BERT Works IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. |
Anna Rogers; Olga Kovaleva; Anna Rumshisky; | arxiv-cs.CL | 2020-02-27 |
| 54 | Text Summarization With Pretrained Encoders IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. |
Yang Liu; Mirella Lapata; | emnlp | 2019-11-02 |
| 55 | Scalable Diffusion Models with Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We train latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches. |
William Peebles; Saining Xie; | iccv | 2023-09-27 |
| 56 | GLM: General Language Model Pretraining with Autoregressive Blank Infilling IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, none of the pretraining frameworks performs the best for all tasks of three main categories including natural language understanding (NLU), unconditional generation, and conditional generation. We propose a General Language Model (GLM) based on autoregressive blank infilling to address this challenge. |
ZHENGXIAO DU et. al. | acl | 2022-05-17 |
| 57 | CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers. |
Yue Wang; Weishi Wang; Shafiq Joty; Steven C. H. Hoi; | arxiv-cs.CL | 2021-09-02 |
| 58 | CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers. |
Yue Wang; Weishi Wang; Shafiq Joty; Steven C.H. Hoi; | emnlp | 2021-11-05 |
| 59 | CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by this, in this paper, we study how to learn multi-scale feature representations in transformer models for image classification. |
Chun-Fu (Richard) Chen; Quanfu Fan; Rameswar Panda; | iccv | 2021-10-08 |
| 60 | CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by this, in this paper, we study how to learn multi-scale feature representations in transformer models for image classification. |
Chun-Fu Chen; Quanfu Fan; Rameswar Panda; | arxiv-cs.CV | 2021-03-27 |
| 61 | Enhancing The Locality And Breaking The Memory Bottleneck Of Transformer On Time Series Forecasting IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to tackle such forecasting problem with Transformer [1]. |
SHIYANG LI et. al. | arxiv-cs.LG | 2019-06-29 |
| 62 | Calibrate Before Use: Improving Few-shot Performance of Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: GPT-3 can perform numerous tasks when provided a natural language prompt that contains a few training examples. We show that this type of few-shot learning can be unstable: the choice of prompt format, training examples, and even the order of the examples can cause accuracy to vary from near chance to near state-of-the-art. |
Zihao Zhao; Eric Wallace; Shi Feng; Dan Klein; Sameer Singh; | icml | 2021-07-08 |
| 63 | Calibrate Before Use: Improving Few-Shot Performance of Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate that this instability arises from the bias of language models towards predicting certain answers, e.g., those that are placed near the end of the prompt or are common in the pre-training data. |
Tony Z. Zhao; Eric Wallace; Shi Feng; Dan Klein; Sameer Singh; | arxiv-cs.CL | 2021-02-18 |
| 64 | What Makes Good In-Context Examples for GPT-3? IF:8 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: GPT-3 has attracted lots of attention due to its superior performance across a wide range of NLP tasks, especially with its in-context learning abilities. Despite its success, we … |
JIACHANG LIU et. al. | Workshop on Knowledge Extraction and Integration for Deep … | 2021-01-17 |
| 65 | Self-Refine: Iterative Refinement with Self-Feedback IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. |
AMAN MADAAN et. al. | nips | 2023-10-24 |
| 66 | FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address these problems, we propose to combine Transformer with the seasonal-trend decomposition method, in which the decomposition method captures the global profile of time series while Transformers capture more detailed structures. |
TIAN ZHOU et. al. | icml | 2022-07-15 |
| 67 | MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a simple and effective approach to compress large Transformer (Vaswani et al., 2017) based pre-trained models, termed as deep self-attention distillation. |
WENHUI WANG et. al. | nips | 2020-11-17 |
| 68 | WebGPT: Browser-assisted Question-answering with Human Feedback IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We train and evaluate our models on ELI5, a dataset of questions asked by Reddit users. |
REIICHIRO NAKANO et. al. | arxiv-cs.CL | 2021-12-17 |
| 69 | Locating and Editing Factual Associations in GPT IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. |
Kevin Meng; David Bau; Alex Andonian; Yonatan Belinkov; | arxiv-cs.CL | 2022-02-10 |
| 70 | Efficient Transformers: A Survey IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With the aim of helping the avid researcher navigate this flurry, this paper characterizes a large and thoughtful selection of recent efficiency-flavored X-former models, providing an organized and comprehensive overview of existing work and models across multiple domains. |
Yi Tay; Mostafa Dehghani; Dara Bahri; Donald Metzler; | arxiv-cs.LG | 2020-09-14 |
| 71 | Passage Re-ranking With BERT IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we describe a simple re-implementation of BERT for query-based passage re-ranking. |
Rodrigo Nogueira; Kyunghyun Cho; | arxiv-cs.IR | 2019-01-13 |
| 72 | Pre-Training with Whole Word Masking for Chinese BERT IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to first introduce the whole word masking (wwm) strategy for Chinese BERT, along with a series of Chinese pre-trained language models. |
Yiming Cui; Wanxiang Che; Ting Liu; Bing Qin; Ziqing Yang; | arxiv-cs.CL | 2019-06-19 |
| 73 | CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. |
SHUAI LU et. al. | arxiv-cs.SE | 2021-02-09 |
| 74 | DeBERTaV3: Improving DeBERTa Using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model by replacing mask language modeling (MLM) with replaced token detection (RTD), a more sample-efficient pre-training task. |
Pengcheng He; Jianfeng Gao; Weizhu Chen; | arxiv-cs.CL | 2021-11-18 |
| 75 | MPNet: Masked and Permuted Pre-training for Language Understanding IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose MPNet, a novel pre-training method that inherits the advantages of BERT and XLNet and avoids their limitations. |
Kaitao Song; Xu Tan; Tao Qin; Jianfeng Lu; Tie-Yan Liu; | nips | 2020-11-17 |
| 76 | Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition IF:8 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recurrent sequence-to-sequence models using encoder-decoder architecture have made great progress in speech recognition task. However, they suffer from the drawback of slow … |
Linhao Dong; Shuang Xu; Bo Xu; | 2018 IEEE International Conference on Acoustics, Speech and … | 2018-01-01 |
| 77 | CamemBERT: A Tasty French Language Model IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the feasibility of training monolingual Transformer-based language models for other languages, taking French as an example and evaluating our language models on part-of-speech tagging, dependency parsing, named entity recognition and natural language inference tasks. |
LOUIS MARTIN et. al. | arxiv-cs.CL | 2019-11-10 |
| 78 | CamemBERT: A Tasty French Language Model IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the feasibility of training monolingual Transformer-based language models for other languages, taking French as an example and evaluating our language models on part-of-speech tagging, dependency parsing, named entity recognition and natural language inference tasks. |
LOUIS MARTIN et. al. | acl | 2020-06-20 |
| 79 | It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners IF:7 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous amounts … |
Timo Schick; Hinrich Schütze; | ArXiv | 2020-09-15 |
| 80 | Improving Language Models By Retrieving from Trillions of Tokens IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. |
SEBASTIAN BORGEAUD et. al. | icml | 2022-07-15 |
| 81 | Improving Language Models By Retrieving from Trillions of Tokens IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. |
SEBASTIAN BORGEAUD et. al. | arxiv-cs.CL | 2021-12-08 |
| 82 | StereoSet: Measuring Stereotypical Bias in Pretrained Language Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present StereoSet, a large-scale natural English dataset to measure stereotypical biases in four domains: gender, profession, race, and religion. |
Moin Nadeem; Anna Bethke; Siva Reddy; | acl | 2021-07-26 |
| 83 | Medical Transformer: Gated Axial-Attention for Medical Image Segmentation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module. |
Jeya Maria Jose Valanarasu; Poojan Oza; Ilker Hacihaliloglu; Vishal M. Patel; | arxiv-cs.CV | 2021-02-21 |
| 84 | Recent Advances in Natural Language Processing Via Large Pre-Trained Language Models: A Survey IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a survey of recent work that uses these large language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches. |
BONAN MIN et. al. | arxiv-cs.CL | 2021-11-01 |
| 85 | AraBERT: Transformer-based Model for Arabic Language Understanding IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we pre-trained BERT specifically for the Arabic language in the pursuit of achieving the same success that BERT did for the English language. |
Wissam Antoun; Fady Baly; Hazem Hajj; | arxiv-cs.CL | 2020-02-28 |
| 86 | Unicoder-VL: A Universal Encoder For Vision And Language By Cross-modal Pre-training IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Unicoder-VL, a universal encoder that aims to learn joint representations of vision and language in a pre-training manner. |
GEN LI et. al. | arxiv-cs.CV | 2019-08-16 |
| 87 | CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present CSWin Transformer, an efficient and effective Transformer-based backbone for general-purpose vision tasks. |
XIAOYI DONG et. al. | cvpr | 2022-06-07 |
| 88 | ProtTrans: Towards Cracking The Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing IF:7 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Computational biology and bioinformatics provide vast data gold-mines from protein sequences, ideal for Language Models (LMs) taken from Natural Language Processing (NLP). These … |
AHMED ELNAGGAR et. al. | bioRxiv | 2020-07-12 |
| 89 | BERTweet: A Pre-trained Language Model For English Tweets IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present BERTweet, the first public large-scale pre-trained language model for English Tweets. |
Dat Quoc Nguyen; Thanh Vu; Anh Tuan Nguyen; | arxiv-cs.CL | 2020-05-20 |
| 90 | Meshed-Memory Transformer For Image Captioning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With the aim of filling this gap, we present M$^2$ – a Meshed Transformer with Memory for Image Captioning. |
Marcella Cornia; Matteo Stefanini; Lorenzo Baraldi; Rita Cucchiara; | arxiv-cs.CV | 2019-12-17 |
| 91 | Mixtral of Experts IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. |
ALBERT Q. JIANG et. al. | arxiv-cs.LG | 2024-01-08 |
| 92 | How Contextual Are Contextualized Word Representations? Comparing The Geometry Of BERT, ELMo, And GPT-2 Embeddings IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This suggests that upper layers of contextualizing models produce more context-specific representations, much like how upper layers of LSTMs produce more task-specific representations. |
Kawin Ethayarajh; | emnlp | 2019-11-02 |
| 93 | Learning RoI Transformer for Oriented Object Detection in Aerial Images IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a RoI Transformer to address these problems. |
Jian Ding; Nan Xue; Yang Long; Gui-Song Xia; Qikai Lu; | cvpr | 2019-06-14 |
| 94 | GPT-NeoX-20B: An Open-Source Autoregressive Language Model IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. |
SID BLACK et. al. | arxiv-cs.CL | 2022-04-14 |
| 95 | GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While there is emerging work on relieving this pressure via model compression, the applicability and performance of existing compression techniques is limited by the scale and complexity of GPT models. In this paper, we address this challenge, and propose GPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient. |
Elias Frantar; Saleh Ashkboos; Torsten Hoefler; Dan Alistarh; | arxiv-cs.LG | 2022-10-31 |
| 96 | GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Sid Black; Gao Leo; Phil Wang; Connor Leahy; Stella Biderman; | 2021-01-01 | |
| 97 | Linguistic Knowledge And Transferability Of Contextual Representations IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To shed light on the linguistic knowledge they capture, we study the representations produced by several recent pretrained contextualizers (variants of ELMo, the OpenAI transformer language model, and BERT) with a suite of seventeen diverse probing tasks. |
Nelson F. Liu; Matt Gardner; Yonatan Belinkov; Matthew E. Peters; Noah A. Smith; | arxiv-cs.CL | 2019-03-21 |
| 98 | Linguistic Knowledge And Transferability Of Contextual Representations IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To shed light on the linguistic knowledge they capture, we study the representations produced by several recent pretrained contextualizers (variants of ELMo, the OpenAI transformer language model, and BERT) with a suite of sixteen diverse probing tasks. |
Nelson F. Liu; Matt Gardner; Yonatan Belinkov; Matthew E. Peters; Noah A. Smith,; | naacl | 2019-06-02 |
| 99 | BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose BioGPT, a domain-specific generative Transformer language model pre-trained on large scale biomedical literature. |
RENQIAN LUO et. al. | arxiv-cs.CL | 2022-10-19 |
| 100 | Video Action Transformer Network IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Action Transformer model for recognizing and localizing human actions in video clips. |
Rohit Girdhar; Joao Carreira; Carl Doersch; Andrew Zisserman; | cvpr | 2019-06-14 |
| 101 | Video Action Transformer Network IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Action Transformer model for recognizing and localizing human actions in video clips. |
Rohit Girdhar; João Carreira; Carl Doersch; Andrew Zisserman; | arxiv-cs.CV | 2018-12-06 |
| 102 | Monolithic Transformers for Silicon RF IC Design IF:8 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: A comprehensive review of the electrical performance of passive transformers fabricated in silicon IC technology is presented. Two types of transformer construction are considered … |
J.R. Long; | IEEE Journal of Solid-State Circuits | 2000-01-01 |
| 103 | A Comparative Study On Transformer Vs RNN In Speech Applications IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper focuses on an emergent sequence-to-sequence model called Transformer, which achieves state-of-the-art performance in neural machine translation and other natural language processing applications. |
SHIGEKI KARITA et. al. | arxiv-cs.CL | 2019-09-13 |
| 104 | Capabilities of GPT-4 on Medical Challenge Problems IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a comprehensive evaluation of GPT-4, a state-of-the-art LLM, on medical competency examinations and benchmark datasets. |
Harsha Nori; Nicholas King; Scott Mayer McKinney; Dean Carignan; Eric Horvitz; | arxiv-cs.CL | 2023-03-20 |
| 105 | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose and develop a family of language models named \glam (\textbf{G}eneralist \textbf{La}nguage \textbf{M}odel), which uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants. |
NAN DU et. al. | icml | 2022-07-15 |
| 106 | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose and develop a family of language models named GLaM (Generalist Language Model), which uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants. |
NAN DU et. al. | arxiv-cs.CL | 2021-12-13 |
| 107 | A Generalization of Transformer Networks to Graphs IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a generalization of transformer neural network architecture for arbitrary graphs. |
Vijay Prakash Dwivedi; Xavier Bresson; | arxiv-cs.LG | 2020-12-17 |
| 108 | LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. |
RENRUI ZHANG et. al. | arxiv-cs.CV | 2023-03-28 |
| 109 | Revisiting Pre-Trained Models For Chinese Natural Language Processing IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we target on revisiting Chinese pre-trained language models to examine their effectiveness in a non-English language and release the Chinese pre-trained language model series to the community. |
YIMING CUI et. al. | emnlp | 2020-11-10 |
| 110 | An Explanation of In-context Learning As Implicit Bayesian Inference IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In-context learning emerges both theoretically and empirically when the pretraining distribution is a mixture distribution, resulting in the language model implicitly performing Bayesian inference in its forward pass. |
Sang Michael Xie; Aditi Raghunathan; Percy Liang; Tengyu Ma; | iclr | 2022-02-08 |
| 111 | LUKE: Deep Contextualized Entity Representations With Entity-aware Self-attention IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose new pretrained contextualized representations of words and entities based on the bidirectional transformer. |
Ikuya Yamada; Akari Asai; Hiroyuki Shindo; Hideaki Takeda; Yuji Matsumoto; | emnlp | 2020-11-12 |
| 112 | Learning Deep Transformer Models For Machine Translation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. |
QIANG WANG et. al. | acl | 2019-07-28 |
| 113 | Learning Deep Transformer Models For Machine Translation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for the development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. |
QIANG WANG et. al. | arxiv-cs.CL | 2019-06-04 |
| 114 | Carbon Emissions and Large Neural Network Training IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We highlight the following opportunities to improve energy efficiency and CO2 equivalent emissions (CO2e): Large but sparsely activated DNNs can consume <1/10th the energy of large, dense DNNs without sacrificing accuracy despite using as many or even more parameters. |
DAVID PATTERSON et. al. | arxiv-cs.LG | 2021-04-21 |
| 115 | AdapterHub: A Framework For Adapting Transformers IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose AdapterHub, a framework that allows dynamic stitching-in of pre-trained adapters for different tasks and languages. |
JONAS PFEIFFER et. al. | arxiv-cs.CL | 2020-07-15 |
| 116 | A Systematic Evaluation of Large Language Models of Code IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the current state-of-the-art code LMs (e.g., Codex (Chen et al., 2021)) are not publicly available, leaving many questions about their model and data design decisions. We aim to fill in some of these blanks through a systematic evaluation of the largest existing models: Codex, GPT-J, GPT-Neo, GPT-NeoX-20B, and CodeParrot, across various programming languages. |
Frank F. Xu; Uri Alon; Graham Neubig; Vincent J. Hellendoorn; | arxiv-cs.PL | 2022-02-26 |
| 117 | Pretrained Transformers for Text Ranking: BERT and Beyond IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. |
Jimmy Lin; Rodrigo Nogueira; Andrew Yates; | arxiv-cs.IR | 2020-10-13 |
| 118 | Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Point-BERT, a novel paradigm for learning Transformers to generalize the concept of BERT onto 3D point cloud. |
XUMIN YU et. al. | cvpr | 2022-06-07 |
| 119 | Reducing Transformer Depth On Demand With Structured Dropout IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Layerdrop, a form of structured dropout that allows you to train one model at training time and prune to any desired depth at test time. You can also |
Angela Fan; Edouard Grave; Armand Joulin; | iclr | 2019-12-21 |
| 120 | Multi-modal Transformer For Video Retrieval IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a multi-modal transformer to jointly encode the different modalities in video, which allows each of them to attend to the others. |
Valentin Gabeur; Chen Sun; Karteek Alahari; Cordelia Schmid; | eccv | 2020-08-21 |
| 121 | Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we perform an extensive analysis of fine-tuned BERT models using second order Hessian information, and we use our results to propose a novel method for quantizing BERT models to ultra low precision. |
SHENG SHEN et. al. | aaai | 2020-02-07 |
| 122 | A Multiscale Visualization Of Attention In The Transformer Model IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To make the model more accessible, we introduce an open-source tool that visualizes attention at multiple scales, each of which provides a unique perspective on the attention mechanism. |
Jesse Vig; | arxiv-cs.HC | 2019-06-12 |
| 123 | VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. |
HASSAN AKBARI et. al. | arxiv-cs.CV | 2021-04-22 |
| 124 | KG-BERT: BERT For Knowledge Graph Completion IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to use pre-trained language models for knowledge graph completion. |
Liang Yao; Chengsheng Mao; Yuan Luo; | arxiv-cs.CL | 2019-09-07 |
| 125 | DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation IF:7 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic medical image segmentation has made great progress owing to powerful deep representation learning. Inspired by the success of self-attention mechanism in transformer, … |
AI-JUN LIN et. al. | IEEE Transactions on Instrumentation and Measurement | 2021-06-12 |
| 126 | K-Adapter: Infusing Knowledge Into Pre-Trained Models With Adapters IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this, we propose K-Adapter, a framework that retains the original parameters of the pre-trained model fixed and supports the development of versatile knowledge-infused model. |
RUIZE WANG et. al. | arxiv-cs.CL | 2020-02-05 |
| 127 | Instruction Tuning with GPT-4 IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the first attempt to use GPT-4 to generate instruction-following data for LLM finetuning. |
Baolin Peng; Chunyuan Li; Pengcheng He; Michel Galley; Jianfeng Gao; | arxiv-cs.CL | 2023-04-06 |
| 128 | GPT-GNN: Generative Pre-Training Of Graph Neural Networks IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the GPT-GNN framework to initialize GNNs by generative pre-training. |
Ziniu Hu; Yuxiao Dong; Kuansan Wang; Kai-Wei Chang; Yizhou Sun; | kdd | 2020-08-21 |
| 129 | SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. |
Elias Frantar; Dan Alistarh; | arxiv-cs.LG | 2023-01-02 |
| 130 | The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we analyze the latest model, GPT-4V(ision), to deepen the understanding of LMMs. |
ZHENGYUAN YANG et. al. | arxiv-cs.CV | 2023-09-29 |
| 131 | SPECTER: Document-level Representation Learning Using Citation-informed Transformers IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose SPECTER, a new method to generate document-level embedding of scientific papers based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph. |
Arman Cohan; Sergey Feldman; Iz Beltagy; Doug Downey; Daniel Weld; | acl | 2020-06-20 |
| 132 | SPECTER: Document-level Representation Learning Using Citation-informed Transformers IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose SPECTER, a new method to generate document-level embedding of scientific documents based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph. |
Arman Cohan; Sergey Feldman; Iz Beltagy; Doug Downey; Daniel S. Weld; | arxiv-cs.CL | 2020-04-15 |
| 133 | Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we bridge the individual video frames and explore the temporal contexts across them via a transformer architecture for robust object tracking. |
Ning Wang; Wengang Zhou; Jie Wang; Houqaing Li; | arxiv-cs.CV | 2021-03-22 |
| 134 | Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we bridge the individual video frames and explore the temporal contexts across them via a transformer architecture for robust object tracking. |
Ning Wang; Wengang Zhou; Jie Wang; Houqiang Li; | cvpr | 2021-06-11 |
| 135 | Q8BERT: Quantized 8Bit BERT IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we show how to perform quantization-aware training during the fine-tuning phase of BERT in order to compress BERT by $4\times$ with minimal accuracy loss. |
Ofir Zafrir; Guy Boudoukh; Peter Izsak; Moshe Wasserblat; | arxiv-cs.CL | 2019-10-14 |
| 136 | FNet: Mixing Tokens with Fourier Transforms IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that Transformer encoder architectures can be sped up, with limited accuracy costs, by replacing the self-attention sublayers with simple linear transformations that mix input tokens. |
James Lee-Thorp; Joshua Ainslie; Ilya Eckstein; Santiago Ontanon; | naacl | 2022-07-09 |
| 137 | Fine-tune BERT For Extractive Summarization IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we describe BERTSUM, a simple variant of BERT, for extractive summarization. |
Yang Liu; | arxiv-cs.CL | 2019-03-25 |
| 138 | Integrating Multimodal Information In Large Pretrained Transformers IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we proposed an attachment to BERT and XLNet called Multimodal Adaptation Gate (MAG). |
WASIFUR RAHMAN et. al. | acl | 2020-06-20 |
| 139 | Integrating Multimodal Information In Large Pretrained Transformers IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we proposed an attachment to BERT and XLNet called Multimodal Adaptation Gate (MAG). |
WASIFUR RAHMAN et. al. | arxiv-cs.LG | 2019-08-15 |
| 140 | Transformer Transducer: A Streamable Speech Recognition Model With Transformer Encoders And RNN-T Loss IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system. |
Q. Zhang et al.; | icassp | 2020-04-26 |
| 141 | Mass-Editing Memory in A Transformer IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop MEMIT, a method for directly updating a language model with many memories, demonstrating experimentally that it can scale up to thousands of associations for GPT-J (6B) and GPT-NeoX (20B), exceeding prior work by orders of magnitude. |
Kevin Meng; Arnab Sen Sharma; Alex Andonian; Yonatan Belinkov; David Bau; | arxiv-cs.CL | 2022-10-13 |
| 142 | Mass-Editing Memory in A Transformer IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop MEMIT, a method for directly updating a language model with many memories, demonstrating experimentally that it can scale up to thousands of associations for GPT-J (6B) and GPT-NeoX (20B), exceeding prior work by an order of magnitude. |
Kevin Meng; Arnab Sen Sharma; Alex J Andonian; Yonatan Belinkov; David Bau; | iclr | 2023-02-01 |
| 143 | ChatGPT: Jack of All Trades, Master of None IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the existing studies are mostly non-automated and tested on a very limited scale. In this work, we examined ChatGPT’s capabilities on 25 diverse analytical NLP tasks, most of them subjective even to humans, such as sentiment analysis, emotion recognition, offensiveness, and stance detection. |
JAN KOCOŃ et. al. | arxiv-cs.CL | 2023-02-21 |
| 144 | Masked Language Model Scoring IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In all, PLLs and their associated pseudo-perplexities (PPPLs) enable plug-and-play use of the growing number of pretrained MLMs; e.g., we use a single cross-lingual model to rescore translations in multiple languages. |
Julian Salazar; Davis Liang; Toan Q. Nguyen; Katrin Kirchhoff; | arxiv-cs.CL | 2019-10-31 |
| 145 | The Evolved Transformer IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our goal is to apply NAS to search for a better alternative to the Transformer. |
David So; Quoc Le; Chen Liang; | icml | 2019-05-24 |
| 146 | The Evolved Transformer IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our goal is to apply NAS to search for a better alternative to the Transformer. |
David R. So; Chen Liang; Quoc V. Le; | arxiv-cs.LG | 2019-01-30 |
| 147 | Out of One, Many: Using Language Models to Simulate Human Samples IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose and explore the possibility that language models can be studied as effective proxies for specific human sub-populations in social science research. |
LISA P. ARGYLE et. al. | arxiv-cs.LG | 2022-09-14 |
| 148 | An End-to-End Transformer Model for 3D Object Detection IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose 3DETR, an end-to-end Transformer based object detection model for 3D point clouds. |
Ishan Misra; Rohit Girdhar; Armand Joulin; | iccv | 2021-10-08 |
| 149 | ChatGPT and A New Academic Reality: Artificial Intelligence‐written Research Papers and The Ethics of The Large Language Models in Scholarly Publishing IF:6 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This article discusses OpenAI’s ChatGPT, a generative pre‐trained transformer, which uses natural language processing to fulfill text‐based user requests (i.e., a “chatbot”). The … |
BRADY D. LUND et. al. | Journal of the Association for Information Science and … | 2023-03-10 |
| 150 | Multimodal Learning with Transformers: A Survey IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a comprehensive survey of Transformer techniques oriented at multimodal data. |
Peng Xu; Xiatian Zhu; David A. Clifton; | arxiv-cs.CV | 2022-06-13 |
| 151 | Incorporating Convolution Designs Into Visual Transformers IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome these limitations, we analyze the potential drawbacks when directly borrowing Transformer architectures from NLP. |
KUN YUAN et. al. | iccv | 2021-10-08 |
| 152 | Large Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Artificial intelligence is making spectacular progress, and one of the best examples is the development of large language models (LLMs) such as OpenAI’s GPT series. In these lectures, written for readers with a background in mathematics or physics, we give a brief history and survey of the state of the art, and describe the underlying transformer architecture in detail. |
Michael R. Douglas; | arxiv-cs.CL | 2023-07-11 |
| 153 | CogVideo: Large-scale Pretraining for Text-to-Video Generation Via Transformers IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present CogVideo, a 9B-parameter transformer for text-to-video generation. |
Wenyi Hong; Ming Ding; Wendi Zheng; Xinghan Liu; Jie Tang; | iclr | 2023-02-01 |
| 154 | VideoGPT: Video Generation Using VQ-VAE and Transformers IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos. |
Wilson Yan; Yunzhi Zhang; Pieter Abbeel; Aravind Srinivas; | arxiv-cs.CV | 2021-04-20 |
| 155 | A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The review covers the basic components and existing pretraining methods used in natural language processing, computer vision, and graph learning. |
CE ZHOU et. al. | arxiv-cs.AI | 2023-02-18 |
| 156 | BLiMP: The Benchmark of Linguistic Minimal Pairs for English IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. |
ALEX WARSTADT et. al. | arxiv-cs.CL | 2019-12-02 |
| 157 | BLiMP: A Benchmark of Linguistic Minimal Pairs for English IF:6 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We introduce The Benchmark of Linguistic Minimal Pairs (BLiMP),1 a challenge set for evaluating the linguistic knowledge of language models (LMs) on major grammatical phenomena in … |
ALEX WARSTADT et. al. | Transactions of the Association for Computational … | 2020-01-01 |
| 158 | Leveraging Pre-trained Checkpoints For Sequence Generation Tasks IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. |
Sascha Rothe; Shashi Narayan; Aliaksei Severyn; | arxiv-cs.CL | 2019-07-29 |
| 159 | A Transformer-Based Siamese Network for Change Detection IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a transformer-based Siamese network architecture (abbreviated by ChangeFormer) for Change Detection (CD) from a pair of co-registered remote sensing images. |
Wele Gedara Chaminda Bandara; Vishal M. Patel; | arxiv-cs.CV | 2022-01-04 |
| 160 | Contextual Transformer Networks for Visual Recognition IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we design a novel Transformer-style module, i.e., Contextual Transformer (CoT) block, for visual recognition. |
Yehao Li; Ting Yao; Yingwei Pan; Tao Mei; | arxiv-cs.CV | 2021-07-26 |
| 161 | FreeLB: Enhanced Adversarial Training For Natural Language Understanding IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel adversarial training algorithm, FreeLB, that promotes higher invariance in the embedding space, by adding adversarial perturbations to word embeddings and minimizing the resultant adversarial risk inside different regions around input samples. |
CHEN ZHU et. al. | arxiv-cs.CL | 2019-09-25 |
| 162 | Vector-quantized Image Modeling with Improved VQGAN IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Pretraining language models with next-token prediction on massive text corpora has delivered phenomenal zero-shot, few-shot, transfer learning and multi-tasking capabilities on both generative and discriminative language tasks. Motivated by this success, we explore a Vector-quantized Image Modeling (VIM) approach that involves pretraining a Transformer to predict rasterized image tokens autoregressively. |
JIAHUI YU et. al. | arxiv-cs.CV | 2021-10-09 |
| 163 | AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce AGIEval, a novel benchmark specifically designed to assess foundation model in the context of human-centric standardized exams, such as college entrance exams, law school admission tests, math competitions, and lawyer qualification tests. |
WANJUN ZHONG et. al. | arxiv-cs.CL | 2023-04-13 |
| 164 | Distilling Task-Specific Knowledge From BERT Into Simple Neural Networks IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to distill knowledge from BERT, a state-of-the-art language representation model, into a single-layer BiLSTM, as well as its siamese counterpart for sentence-pair tasks. |
RAPHAEL TANG et. al. | arxiv-cs.CL | 2019-03-28 |
| 165 | R-Drop: Regularized Dropout for Neural Networks IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a simple consistency training strategy to regularize dropout, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. |
XIAOBO LIANG et. al. | nips | 2021-11-20 |
| 166 | Visualizing And Measuring The Geometry Of BERT IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. |
ANDY COENEN et. al. | arxiv-cs.LG | 2019-06-06 |
| 167 | Visualizing and Measuring The Geometry of BERT IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. |
EMILY REIF et. al. | nips | 2019-11-15 |
| 168 | What Can Transformers Learn In-Context? A Case Study of Simple Function Classes IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While large language models such as GPT-3 exhibit some ability to perform in-context learning, it is unclear what the relationship is between tasks on which this succeeds and what is present in the training data. To investigate this, we consider the problem of training a model to in-context learn a function class (e.g., linear functions): given data derived from some functions in the class, can we train a model (e.g., a Transformer) to in-context learn most functions from that class? |
Shivam Garg; Dimitris Tsipras; Gregory Valiant; Percy Liang; | nips | 2022-11-06 |
| 169 | Summary of ChatGPT-Related Research and Perspective Towards The Future of Large Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a comprehensive survey of ChatGPT-related (GPT-3.5 and GPT-4) research, state-of-the-art large language models (LLM) from the GPT series, and their prospective applications across diverse domains. |
YIHENG LIU et. al. | arxiv-cs.CL | 2023-04-04 |
| 170 | UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we present UTNet, a simple yet powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation. |
Yunhe Gao; Mu Zhou; Dimitris Metaxas; | arxiv-cs.CV | 2021-07-01 |
| 171 | Transformer Models for Text-based Emotion Detection: A Review of BERT-based Approaches IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Francisca Adoma Acheampong; Henry Nunoo-Mensah; Wenyu Chen; | Artif. Intell. Rev. | 2021-01-01 |
| 172 | ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present an efficient and affordable post-training quantization approach to compress large Transformer-based models, termed as \OURS. |
ZHEWEI YAO et. al. | nips | 2022-11-06 |
| 173 | An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, the retrieved knowledge might be noisy and irrelevant to the question, and the re-embedded knowledge features during reasoning might deviate from their original meanings in the knowledge base (KB). To address this challenge, we propose PICa, a simple yet effective method that Prompts GPT3 via the use of Image Captions, for knowledge-based VQA. |
ZHENGYUAN YANG et. al. | arxiv-cs.CV | 2021-09-10 |
| 174 | How Can We Know When Language Models Know? On The Calibration of Language Models for Question Answering IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we ask the question how can we know when language models know, with confidence, the answer to a particular query? |
Zhengbao Jiang; Jun Araki; Haibo Ding; Graham Neubig; | arxiv-cs.CL | 2020-12-01 |
| 175 | NExT-GPT: Any-to-Any Multimodal LLM IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fill the gap, we present an end-to-end general-purpose any-to-any MM-LLM system, NExT-GPT. |
Shengqiong Wu; Hao Fei; Leigang Qu; Wei Ji; Tat-Seng Chua; | arxiv-cs.AI | 2023-09-11 |
| 176 | Span-based Joint Entity and Relation Extraction with Transformer Pre-training IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SpERT, an attention model for span-based joint entity and relation extraction. |
Markus Eberts; Adrian Ulges; | arxiv-cs.CL | 2019-09-17 |
| 177 | How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a comprehensive evaluation of GPT models for machine translation, covering various aspects such as quality of different GPT models in comparison with state-of-the-art research and commercial systems, effect of prompting strategies, robustness towards domain shifts and document-level translation. |
AMR HENDY et. al. | arxiv-cs.CL | 2023-02-17 |
| 178 | WizardMath: Empowering Mathematical Reasoning for Large Language Models Via Reinforced Evol-Instruct IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present WizardMath, which enhances the mathematical CoT reasoning abilities of LLMs without using external python tools, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. |
HAIPENG LUO et. al. | arxiv-cs.CL | 2023-08-18 |
| 179 | Mathematical Capabilities of ChatGPT IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. |
SIMON FRIEDER et. al. | nips | 2023-10-24 |
| 180 | Semantics-Aware BERT for Language Understanding IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To promote natural language understanding, we propose to incorporate explicit contextual semantics from pre-trained semantic role labeling, and introduce an improved language representation model, Semantics-aware BERT (SemBERT), which is capable of explicitly absorbing contextual semantics over a BERT backbone. |
ZHUOSHENG ZHANG et. al. | aaai | 2020-02-07 |
| 181 | Deep Entity Matching With Pre-Trained Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Ditto, a novel entity matching system based on pre-trained Transformer-based language models. |
Yuliang Li; Jinfeng Li; Yoshihiko Suhara; AnHai Doan; Wang-Chiew Tan; | arxiv-cs.DB | 2020-04-01 |
| 182 | Mathematical Capabilities of ChatGPT IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. |
SIMON FRIEDER et. al. | arxiv-cs.LG | 2023-01-31 |
| 183 | MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit impressive problem-solving skills in many tasks and domains, but their ability in mathematical reasoning in visual contexts has not been systematically studied. To bridge this gap, we present MathVista, a benchmark designed to combine challenges from diverse mathematical and visual tasks. |
PAN LU et. al. | iclr | 2024-02-26 |
| 184 | News Summarization and Evaluation in The Era of GPT-3 IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The recent success of prompting large language models like GPT-3 has led to a paradigm shift in NLP research. In this paper, we study its impact on text summarization, focusing on the classic benchmark domain of news summarization. |
Tanya Goyal; Junyi Jessy Li; Greg Durrett; | arxiv-cs.CL | 2022-09-25 |
| 185 | DeeBERT: Dynamic Early Exiting For Accelerating BERT Inference IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a simple but effective method, DeeBERT, to accelerate BERT inference. |
Ji Xin; Raphael Tang; Jaejun Lee; Yaoliang Yu; Jimmy Lin; | acl | 2020-06-20 |
| 186 | DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5, considering diverse perspectives — including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. |
BOXIN WANG et. al. | arxiv-cs.CL | 2023-06-20 |
| 187 | DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5, considering diverse perspectives – including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. |
BOXIN WANG et. al. | nips | 2023-10-24 |
| 188 | SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction IF:6 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the rapid progress of AI in both academia and industry, Deep Learning has been widely introduced into various areas in drug discovery to accelerate its pace and cut R&D … |
Sheng Wang; Yuzhi Guo; Yuhong Wang; Hongmao Sun; Junzhou Huang; | Proceedings of the 10th ACM International Conference on … | 2019-01-01 |
| 189 | Analyzing The Structure Of Attention In A Transformer Language Model IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the structure of attention in a Transformer language model, the GPT-2 small pretrained model. |
Jesse Vig; Yonatan Belinkov; | arxiv-cs.CL | 2019-06-07 |
| 190 | TransFG: A Transformer Architecture for Fine-Grained Recognition IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Fine-grained visual classification (FGVC) which aims at recognizing objects from subcategories is a very challenging task due to the inherently subtle inter-class differences. Most existing works mainly tackle this problem by reusing the backbone network to extract features of detected discriminative regions. |
JU HE et. al. | aaai | 2022-02-07 |
| 191 | An Empirical Study of Training End-to-End Vision-and-Language Transformers IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present METER, a Multimodal End-to-end TransformER framework, through which we investigate how to design and pre-train a fully transformer-based VL model in an end-to-end manner. |
ZI-YI DOU et. al. | cvpr | 2022-06-07 |
| 192 | How Is ChatGPT’s Behavior Changing Over Time? IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We find that the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time. |
Lingjiao Chen; Matei Zaharia; James Zou; | arxiv-cs.CL | 2023-07-18 |
| 193 | On The Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that both hypotheses fail to explain the fine-tuning instability. |
Marius Mosbach; Maksym Andriushchenko; Dietrich Klakow; | arxiv-cs.LG | 2020-06-08 |
| 194 | Stabilizing Transformers for Reinforcement Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we demonstrate that the standard transformer architecture is difficult to optimize, which was previously observed in the supervised learning setting but becomes especially pronounced with RL objectives. |
EMILIO PARISOTTO et. al. | icml | 2020-07-11 |
| 195 | COVID-Twitter-BERT: A Natural Language Processing Model To Analyse COVID-19 Content On Twitter IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we release COVID-Twitter-BERT (CT-BERT), a transformer-based model, pretrained on a large corpus of Twitter messages on the topic of COVID-19. |
Martin Müller; Marcel Salathé; Per E Kummervold; | arxiv-cs.CL | 2020-05-15 |
| 196 | ETC: Encoding Long And Structured Inputs In Transformers IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a new Transformer architecture, Extended Transformer Construction (ETC), that addresses two key challenges of standard Transformer architectures, namely scaling input length and encoding structured inputs. |
JOSHUA AINSLIE et. al. | emnlp | 2020-11-12 |
| 197 | 3D Object Detection with Pointformer IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively. |
Xuran Pan; Zhuofan Xia; Shiji Song; Li Erran Li; Gao Huang; | arxiv-cs.CV | 2020-12-21 |
| 198 | Data Augmentation Using Pre-trained Transformer Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study different types of transformer based pre-trained models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. |
Varun Kumar; Ashutosh Choudhary; Eunah Cho; | arxiv-cs.CL | 2020-03-04 |
| 199 | Transformer Networks For Trajectory Forecasting IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We question the use of the LSTM models and propose the novel use of Transformer Networks for trajectory forecasting. |
Francesco Giuliari; Irtiza Hasan; Marco Cristani; Fabio Galasso; | arxiv-cs.CV | 2020-03-18 |
| 200 | Ignore Previous Prompt: Attack Techniques For Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: By proposing PromptInject, a prosaic alignment framework for mask-based iterative adversarial prompt composition, we examine how GPT-3, the most widely deployed language model in production, can be easily misaligned by simple handcrafted inputs. |
Fábio Perez; Ian Ribeiro; | arxiv-cs.CL | 2022-11-17 |
| 201 | GPT-3: What’s It Good For? IF:6 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: GPT-3 made the mainstream media headlines this year, generating far more interest than we’d normally expect of a technical advance in NLP. People are fascinated by its ability to … |
R. Dale; | Natural Language Engineering | 2020-12-15 |
| 202 | CERT: Contrastive Self-supervised Learning For Language Understanding IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this issue, we propose CERT: Contrastive self-supervised Encoder Representations from Transformers, which pretrains language representation models using contrastive self-supervised learning at the sentence level. |
Hongchao Fang; Sicheng Wang; Meng Zhou; Jiayuan Ding; Pengtao Xie; | arxiv-cs.CL | 2020-05-16 |
| 203 | Textbooks Are All You Need IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of “textbook quality data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). |
SURIYA GUNASEKAR et. al. | arxiv-cs.CL | 2023-06-20 |
| 204 | GPT3.int8(): 8-bit Matrix Multiplication for Transformers at Scale IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop methods for Int8 matrix multiplication for transformer multi-layer perceptron (MLP) and attention projection layers, which cut the required memory for inference by half while retaining full precision performance. |
Tim Dettmers; Mike Lewis; Luke Zettlemoyer; | nips | 2022-11-06 |
| 205 | XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding And Generation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce XGLUE, a new benchmark dataset to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora, and evaluate their performance across a diverse set of cross-lingual tasks. |
YAOBO LIANG et. al. | emnlp | 2020-11-12 |
| 206 | Visual Saliency Transformer IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unlike conventional architectures used in Vision Transformer (ViT), we leverage multi-level token fusion and propose a new token upsampling method under the transformer framework to get high-resolution detection results. |
Nian Liu; Ni Zhang; Kaiyuan Wan; Ling Shao; Junwei Han; | iccv | 2021-10-08 |
| 207 | CoAuthor: Designing A Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we argue that by curating and analyzing large interaction datasets, the HCI community can foster more incisive examinations of LMs’ generative capabilities. |
Mina Lee; Percy Liang; Qian Yang; | arxiv-cs.HC | 2022-01-18 |
| 208 | ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce ChatGLM, an evolving family of large language models that we have been developing over time. |
TEAM GLM et. al. | arxiv-cs.CL | 2024-06-18 |
| 209 | I-BERT: Integer-only BERT Quantization IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose I-BERT, a novel quantization scheme for Transformer based models that quantizes the entire inference with integer-only arithmetic. |
Sehoon Kim; Amir Gholami; Zhewei Yao; Michael W. Mahoney; Kurt Keutzer; | icml | 2021-07-08 |
| 210 | Differentially Private Fine-tuning of Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that by combining recent advances in NLP, parameter-efficiency, privacy accounting, and using larger models, one can privately fine-tune models whose utility approaches that of non-private models. |
DA YU et. al. | iclr | 2022-02-08 |
| 211 | Differentially Private Fine-tuning of Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. |
DA YU et. al. | arxiv-cs.LG | 2021-10-13 |
| 212 | Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a new Vision Transformer (ViT) architecture Multi-Scale Vision Longformer, which significantly enhances the ViT of [??] |
PENGCHUAN ZHANG et. al. | iccv | 2021-10-08 |
| 213 | Fast Model Editing at Scale IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To enable easy post-hoc editing at scale, we propose Model Editor Networks using Gradient Decomposition (MEND), a collection of small auxiliary editing networks that use a single desired input-output pair to make fast, local edits to a pre-trained model’s behavior. |
Eric Mitchell; Charles Lin; Antoine Bosselut; Chelsea Finn; Christopher D. Manning; | arxiv-cs.LG | 2021-10-21 |
| 214 | Fast Model Editing at Scale IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A computationally efficient approach for learning to edit the behavior of very large pre-trained language models (10 billion+ parameters) |
Eric Mitchell; Charles Lin; Antoine Bosselut; Chelsea Finn; Christopher D Manning; | iclr | 2022-02-08 |
| 215 | SpectralGPT: Spectral Remote Sensing Foundation Model IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While most foundation models are tailored to effectively process RGB images for various visual tasks, there is a noticeable gap in research focused on spectral data, which offers valuable information for scene understanding, especially in remote sensing (RS) applications. To fill this gap, we created for the first time a universal RS foundation model, named SpectralGPT, which is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT). |
DANFENG HONG et. al. | arxiv-cs.CV | 2023-11-13 |
| 216 | DynaBERT: Dynamic BERT with Adaptive Width and Depth IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel dynamic BERT model (abbreviated as DynaBERT), which can flexibly adjust the size and latency by selecting adaptive width and depth. |
LU HOU et. al. | nips | 2020-11-17 |
| 217 | Rethinking Transformer-based Set Prediction for Object Detection IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the causes of the optimization difficulty in the training of DETR. |
Zhiqing Sun; Shengcao Cao; Yiming Yang; Kris Kitani; | arxiv-cs.CV | 2020-11-21 |
| 218 | Rethinking Transformer-Based Set Prediction for Object Detection IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the causes of the optimization difficulty in the training of DETR. |
Zhiqing Sun; Shengcao Cao; Yiming Yang; Kris M. Kitani; | iccv | 2021-10-08 |
| 219 | KUISAIL At SemEval-2020 Task 12: BERT-CNN For Offensive Speech Identification In Social Media IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we describe our approach to utilize pre-trained BERT models with Convolutional Neural Networks for sub-task A of the Multilingual Offensive Language Identification shared task (OffensEval 2020), which is a part of the SemEval 2020. |
Ali Safaya; Moutasem Abdullatif; Deniz Yuret; | arxiv-cs.CL | 2020-07-26 |
| 220 | Pre-training Tasks For Embedding-based Large-scale Retrieval IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct a comprehensive study on the embedding-based retrieval models. |
Wei-Cheng Chang; Felix X. Yu; Yin-Wen Chang; Yiming Yang; Sanjiv Kumar; | arxiv-cs.LG | 2020-02-10 |
| 221 | TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. |
MINGHAO LI et. al. | aaai | 2023-06-26 |
| 222 | OLMpics-On What Language Model Pre-training Captures IF:5 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent success of pre-trained language models (LMs) has spurred widespread interest in the language capabilities that they possess. However, efforts to understand whether LM … |
Alon Talmor; Yanai Elazar; Yoav Goldberg; Jonathan Berant; | Transactions of the Association for Computational … | 2020-01-01 |
| 223 | Lite Transformer With Long-Short Range Attention IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an efficient mobile NLP architecture, Lite Transformer to facilitate deploying mobile NLP applications on edge devices. |
Zhanghao Wu; Zhijian Liu; Ji Lin; Yujun Lin; Song Han; | arxiv-cs.CL | 2020-04-24 |
| 224 | Large Language Models Are State-of-the-Art Evaluators of Translation Quality IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe GEMBA, a GPT-based metric for assessment of translation quality, which works both with a reference translation and without. |
Tom Kocmi; Christian Federmann; | arxiv-cs.CL | 2023-02-28 |
| 225 | CogView2: Faster and Better Text-to-Image Generation Via Hierarchical Transformers IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we put forward a solution based on hierarchical transformers and local parallel autoregressive generation. |
Ming Ding; Wendi Zheng; Wenyi Hong; Jie Tang; | nips | 2022-11-06 |
| 226 | You Only Look at One Sequence: Rethinking Transformer in Vision Through Object Detection IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To answer this question, we present You Only Look at One Sequence (YOLOS), a series of object detection models based on the vanilla Vision Transformer with the fewest possible modifications, region priors, as well as inductive biases of the target task. |
YUXIN FANG et. al. | nips | 2021-11-20 |
| 227 | Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these issues, we propose a novel end-to-end Part-Aware Transformer (PAT) for occluded person Re-ID through diverse part discovery via a transformer encoder-decoder architecture, including a pixel context based transformer encoder and a part prototype based transformer decoder. |
YULIN LI et. al. | cvpr | 2021-06-11 |
| 228 | BERTje: A Dutch BERT Model IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Using the same architecture and parameters, we developed and evaluated a monolingual Dutch BERT model called BERTje. |
WIETSE DE VRIES et. al. | arxiv-cs.CL | 2019-12-19 |
| 229 | Calibration Of Pre-trained Transformers IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We focus on BERT and RoBERTa in this work, and analyze their calibration across three tasks: natural language inference, paraphrase detection, and commonsense reasoning. |
Shrey Desai; Greg Durrett; | emnlp | 2020-11-12 |
| 230 | Transformer for Single Image Super-Resolution IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Efficient Super-Resolution Transformer (ESRT) for SISR. |
ZHISHENG LU et. al. | arxiv-cs.CV | 2021-08-25 |
| 231 | BERTology Meets Biology: Interpreting Attention in Protein Language Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate a set of methods for analyzing protein Transformer models through the lens of attention. |
JESSE VIG et. al. | arxiv-cs.CL | 2020-06-26 |
| 232 | BERTology Meets Biology: Interpreting Attention in Protein Language Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We analyze the internal representations of protein language models, and show that attention targets structural and functional properties of protein sequences. |
JESSE VIG et. al. | iclr | 2021-01-21 |
| 233 | Few-shot Learning with Multilingual Generative Language Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we train multilingual generative language models on a corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks. |
XI VICTORIA LIN et. al. | emnlp | 2022-12-30 |
| 234 | Pre-training Of Graph Augmented Transformers For Medication Recommendation IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address these challenges, we propose G-BERT, a new model to combine the power of Graph Neural Networks (GNNs) and BERT (Bidirectional Encoder Representations from Transformers) for medical code representation and medication recommendation. |
Junyuan Shang; Tengfei Ma; Cao Xiao; Jimeng Sun; | arxiv-cs.AI | 2019-06-02 |
| 235 | VLN↻BERT: A Recurrent Vision-and-Language BERT for Navigation IF:5 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Accuracy of many visiolinguistic tasks has benefited significantly from the application of vision-and-language (V&L) BERT. However, its application for the task of … |
Yicong Hong; Qi Wu; Yuankai Qi; Cristian Rodriguez-Opazo; Stephen Gould; | 2021 IEEE/CVF Conference on Computer Vision and Pattern … | 2020-11-26 |
| 236 | Multilingual Is Not Enough: BERT For Finnish IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on Finnish and thoroughly evaluate the multilingual BERT model on a range of tasks, comparing it with a new Finnish BERT model trained from scratch. |
ANTTI VIRTANEN et. al. | arxiv-cs.CL | 2019-12-15 |
| 237 | SwinTrack: A Simple and Strong Baseline for Transformer Tracking IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to further unleash the power of Transformer by proposing a simple yet efficient fully-attentional tracker, dubbed \textbf{SwinTrack}, within classic Siamese framework. |
Liting Lin; Heng Fan; Zhipeng Zhang; Yong Xu; Haibin Ling; | nips | 2022-11-06 |
| 238 | A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate their performance on nine natural language understanding (NLU) tasks using 21 datasets. |
JUNJIE YE et. al. | arxiv-cs.CL | 2023-03-18 |
| 239 | An Empirical Analysis of Compute-optimal Large Language Model Training IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. |
JORDAN HOFFMANN et. al. | nips | 2022-11-06 |
| 240 | Learning Video Representations Using Contrastive Bidirectional Transformer IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a self-supervised learning approach for video features that results in significantly improved performance on downstream tasks (such as video classification, captioning and segmentation) compared to existing methods. |
Chen Sun; Fabien Baradel; Kevin Murphy; Cordelia Schmid; | arxiv-cs.LG | 2019-06-13 |
| 241 | Spatial-Spectral Transformer for Hyperspectral Image Classification IF:5 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recently, a great many deep convolutional neural network (CNN)-based methods have been proposed for hyperspectral image (HSI) classification. Although the proposed CNN-based … |
Xin He; Yushi Chen; Zhouhan Lin; | Remote. Sens. | 2021-01-01 |
| 242 | Generative Language Modeling For Automated Theorem Proving IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance. |
Stanislas Polu; Ilya Sutskever; | arxiv-cs.LG | 2020-09-07 |
| 243 | Emergent Analogical Reasoning in Large Language Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we performed a direct comparison between human reasoners and a large language model (the text-davinci-003 variant of GPT-3) on a range of analogical tasks, including a novel text-based matrix reasoning task closely modeled on Raven’s Progressive Matrices. |
Taylor Webb; Keith J. Holyoak; Hongjing Lu; | arxiv-cs.AI | 2022-12-18 |
| 244 | Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, LLMs have inherent limitations as they are incapable of accessing up-to-date information (stored on the Web or in task-specific knowledge bases), using external tools, and performing precise mathematical and logical reasoning. In this paper, we present Chameleon, an AI system that mitigates these limitations by augmenting LLMs with plug-and-play modules for compositional reasoning. |
PAN LU et. al. | arxiv-cs.CL | 2023-04-19 |
| 245 | TENER: Adapting Transformer Encoder For Named Entity Recognition IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose TENER, a NER architecture adopting adapted Transformer Encoder to model the character-level features and word-level features. |
Hang Yan; Bocao Deng; Xiaonan Li; Xipeng Qiu; | arxiv-cs.CL | 2019-11-10 |
| 246 | StructBERT: Incorporating Language Structures Into Pre-training For Deep Language Understanding IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the linearization exploration work of Elman, we extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. |
WEI WANG et. al. | iclr | 2019-12-21 |
| 247 | StructBERT: Incorporating Language Structures Into Pre-training For Deep Language Understanding IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the linearization exploration work of Elman [8], we extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. |
WEI WANG et. al. | arxiv-cs.CL | 2019-08-13 |
| 248 | Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This report provides a preliminary evaluation of ChatGPT for machine translation, including translation prompt, multilingual translation, and translation robustness. |
WENXIANG JIAO et. al. | arxiv-cs.CL | 2023-01-20 |
| 249 | Prompting GPT-3 To Be Reliable IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our core contribution is to establish simple and effective prompts that improve GPT-3’s reliability as it: 1) generalizes out-of-distribution, 2) balances demographic distribution and uses natural language instructions to reduce social biases, 3) calibrates output probabilities, and 4) updates the LLM’s factual knowledge and reasoning chains. |
CHENGLEI SI et. al. | arxiv-cs.CL | 2022-10-17 |
| 250 | Prompting GPT-3 To Be Reliable IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We establish simple and effective prompting methods to make GPT-3 reliable in terms of: robustness, fairness, calibration, factuality. |
CHENGLEI SI et. al. | iclr | 2023-02-01 |
| 251 | The Curse of Recursion: Training on Generated Data Makes Models Forget IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we consider what the future might hold. |
ILIA SHUMAILOV et. al. | arxiv-cs.LG | 2023-05-27 |
| 252 | Large Pre-trained Language Models Contain Human-like Biases of What Is Right and Wrong to Do IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: That is, we show that these norms can be captured geometrically by a direction, which can be computed, e.g., by a PCA, in the embedding space, reflecting well the agreement of phrases to social norms implicitly expressed in the training texts and providing a path for attenuating or even preventing toxic degeneration in LMs. |
Patrick Schramowski; Cigdem Turan; Nico Andersen; Constantin A. Rothkopf; Kristian Kersting; | arxiv-cs.CL | 2021-03-08 |
| 253 | Muppet: Massive Multi-task Representations with Pre-Finetuning IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. |
ARMEN AGHAJANYAN et. al. | emnlp | 2021-11-05 |
| 254 | Star-Transformer IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Star-Transformer, a lightweight alternative by careful sparsification. |
QIPENG GUO et. al. | naacl | 2019-06-02 |
| 255 | U-shape Transformer for Underwater Image Enhancement IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we constructed a large-scale underwater image (LSUI) dataset including 5004 image pairs, and reported an U-shape Transformer network where the transformer model is for the first time introduced to the UIE task. |
Lintao Peng; Chunli Zhu; Liheng Bian; | arxiv-cs.CV | 2021-11-23 |
| 256 | Star-Transformer IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Star-Transformer, a lightweight alternative by careful sparsification. |
QIPENG GUO et. al. | arxiv-cs.CL | 2019-02-25 |
| 257 | Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent As Meta-Optimizers IF:5 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Large pretrained language models have shown surprising In-Context Learning (ICL) ability. With a few demonstration input-label pairs, they can predict the label for an unseen … |
DAMAI DAI et. al. | ArXiv | 2022-12-20 |
| 258 | Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Medprompt, based on a composition of several prompting strategies. |
HARSHA NORI et. al. | arxiv-cs.CL | 2023-11-27 |
| 259 | SwinSUNet: Pure Transformer Network for Remote Sensing Image Change Detection IF:5 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Convolutional neural network (CNN) can extract effective semantic features, so it was widely used for remote sensing image change detection (CD) in the latest years. CNN has … |
Cui Zhang; Liejun Wang; Shuli Cheng; Yongming Li; | IEEE Transactions on Geoscience and Remote Sensing | 2022-01-01 |
| 260 | VT-ADL: A Vision Transformer Network for Image Anomaly Detection and Localization IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a transformer-based image anomaly detection and localization network. |
Pankaj Mishra; Riccardo Verk; Daniele Fornasier; Claudio Piciarelli; Gian Luca Foresti; | arxiv-cs.CV | 2021-04-20 |
| 261 | Improving The Transformer Translation Model With Document-Level Context IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we extend the Transformer model with a new context encoder to represent document-level context, which is then incorporated into the original encoder and decoder. |
JIACHENG ZHANG et. al. | emnlp | 2018-11-02 |
| 262 | Blockwise Self-Attention For Long Document Understanding IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present BlockBERT, a lightweight and efficient BERT model for better modeling long-distance dependencies. |
JIEZHONG QIU et. al. | emnlp | 2020-11-10 |
| 263 | Blockwise Self-Attention For Long Document Understanding IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present BlockBERT, a lightweight and efficient BERT model for better modeling long-distance dependencies. |
JIEZHONG QIU et. al. | arxiv-cs.CL | 2019-11-07 |
| 264 | AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a summary of various useful libraries to work with T-PTLMs. |
Katikapalli Subramanyam Kalyan; Ajit Rajasekharan; Sivanesan Sangeetha; | arxiv-cs.CL | 2021-08-12 |
| 265 | Large Language Models Are Zero-Shot Time Series Forecasters IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Developing this approach, we find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot extrapolate time series at a level comparable to or exceeding the performance of purpose-built time series models trained on the downstream tasks. To facilitate this performance, we propose procedures for effectively tokenizing time series data and converting discrete distributions over tokens into highly flexible densities over continuous values. |
Nate Gruver; Marc Finzi; Shikai Qiu; Andrew Gordon Wilson; | arxiv-cs.LG | 2023-10-11 |
| 266 | Generating Human Motion From Textual Descriptions With Discrete Representations IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate a simple and must-known conditional generative framework based on Vector Quantised-Variational AutoEncoder (VQ-VAE) and Generative Pre-trained Transformer (GPT) for human motion generation from textural descriptions. |
JIANRONG ZHANG et. al. | cvpr | 2023-05-17 |
| 267 | Transformer-based Acoustic Modeling For Hybrid Speech Recognition IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition. |
YONGQIANG WANG et. al. | arxiv-cs.CL | 2019-10-22 |
| 268 | Transformer-Based Acoustic Modeling For Hybrid Speech Recognition IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition. |
Y. Wang et al.; | icassp | 2020-04-26 |
| 269 | GPT Detectors Are Biased Against Non-native English Writers IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we evaluate the performance of several widely-used GPT detectors using writing samples from native and non-native English writers. |
Weixin Liang; Mert Yuksekgonul; Yining Mao; Eric Wu; James Zou; | arxiv-cs.CL | 2023-04-05 |
| 270 | Towards Automated Circuit Discovery for Mechanistic Interpretability IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We automate one of the process’ steps: to identify the circuit that implements the specified behavior in the model’s computational graph. We propose several algorithms and reproduce previous interpretability results to validate them. |
Arthur Conmy; Augustine N. Mavor-Parker; Aengus Lynch; Stefan Heimersheim; Adrià Garriga-Alonso; | arxiv-cs.LG | 2023-04-28 |
| 271 | Black-Box Tuning for Language-Model-as-a-Service IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes the black-box tuning framework to optimize the continuous prompt prepended to the input text via derivative-free optimization. |
Tianxiang Sun; Yunfan Shao; Hong Qian; Xuanjing Huang; Xipeng Qiu; | icml | 2022-07-15 |
| 272 | FlowFormer: A Transformer Architecture for Optical Flow IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce optical Flow transFormer, dubbed as FlowFormer, a transformer-based neural network architecture for learning optical flow. |
ZHAOYANG HUANG et. al. | eccv | 2022-10-19 |
| 273 | Taming Pretrained Transformers For Extreme Multi-label Text Classification IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose X-Transformer, the first scalable approach to fine-tuning deep transformer models for the XMC problem. |
Wei-Cheng Chang; Hsiang-Fu Yu; Kai Zhong; Yiming Yang; Inderjit Dhillon; | arxiv-cs.LG | 2019-05-06 |
| 274 | Taming Pretrained Transformers For Extreme Multi-label Text Classification IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose X-Transformer, the first scalable approach to fine-tuning deep transformer models for the XMC problem. |
Wei-Cheng Chang; Hsiang-Fu Yu; Kai Zhong; Yiming Yang; Inderjit S. Dhillon; | kdd | 2020-08-21 |
| 275 | Supervised Multimodal Bitransformers For Classifying Images And Text IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a supervised multimodal bitransformer model that fuses information from text and image encoders, and obtain state-of-the-art performance on various multimodal classification benchmark tasks, outperforming strong baselines, including on hard test sets specifically designed to measure multimodal performance. |
Douwe Kiela; Suvrat Bhooshan; Hamed Firooz; Ethan Perez; Davide Testuggine; | arxiv-cs.CL | 2019-09-06 |
| 276 | Training with Quantization Noise for Extreme Model Compression IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods where the approximations introduced by STE are severe, such as Product Quantization. |
ANGELA FAN et. al. | arxiv-cs.LG | 2020-04-15 |
| 277 | MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-attention relation distillation for task-agnostic compression of pretrained Transformers. |
Wenhui Wang; Hangbo Bao; Shaohan Huang; Li Dong; Furu Wei; | arxiv-cs.CL | 2020-12-31 |
| 278 | Training with Quantization Noise for Extreme Model Compression IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we extend this approach to work with extreme compression methods where the approximations introduced by STE are severe. |
ANGELA FAN et. al. | iclr | 2021-01-21 |
| 279 | MultiModal-GPT: A Vision and Language Model for Dialogue with Humans IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a vision and language model named MultiModal-GPT to conduct multi-round dialogue with humans. |
TAO GONG et. al. | arxiv-cs.CV | 2023-05-08 |
| 280 | Want To Reduce Labeling Cost? GPT-3 Can Help IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore ways to leverage GPT-3 as a low-cost data labeler to train other models. |
Shuohang Wang; Yang Liu; Yichong Xu; Chenguang Zhu; Michael Zeng; | arxiv-cs.CL | 2021-08-30 |
| 281 | Probing Pretrained Language Models For Lexical Semantics IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks, addressing the following questions: 1) How do different lexical knowledge extraction strategies (monolingual versus multilingual source LM, out-of-context versus in-context encoding, inclusion of special tokens, and layer-wise averaging) impact performance? |
Ivan Vulić; Edoardo Maria Ponti; Robert Litschko; Goran Glavaš; Anna Korhonen; | emnlp | 2020-11-12 |
| 282 | Image Dehazing Transformer With Transmission-Aware 3D Position Embedding IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The key insight of this study is to investigate how to combine CNN and Transformer for image dehazing. |
CHUN-LE GUO et. al. | cvpr | 2022-06-07 |
| 283 | BOND: BERT-Assisted Open-Domain Named Entity Recognition With Distant Supervision IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this challenge, we propose a new computational framework — BOND, which leverages the power of pre-trained language models (e.g., BERT and RoBERTa) to improve the prediction performance of NER models. |
CHEN LIANG et. al. | kdd | 2020-08-21 |
| 284 | RobBERT: A Dutch RoBERTa-based Language Model IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While previous approaches have used earlier implementations of BERT to train a Dutch version of BERT, we used RoBERTa, a robustly optimized BERT approach, to train a Dutch language model called RobBERT. |
Pieter Delobelle; Thomas Winters; Bettina Berendt; | emnlp | 2020-11-10 |
| 285 | Overview of The Transformer-based Models for NLP Tasks IF:5 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In 2017, Vaswani et al. proposed a new neural network architecture named Transformer. That modern architecture quickly revolutionized the natural language processing world. Models … |
Anthony Gillioz; Jacky Casas; Elena Mugellini; Omar Abou Khaled; | 2020 15th Conference on Computer Science and Information … | 2020-01-01 |
| 286 | Funnel-Transformer: Filtering Out Sequential Redundancy For Efficient Language Processing IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With this intuition, we propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. |
Zihang Dai; Guokun Lai; Yiming Yang; Quoc V. Le; | arxiv-cs.LG | 2020-06-05 |
| 287 | Evaluating The Logical Reasoning Ability of ChatGPT and GPT-4 IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This report analyses multiple logical reasoning datasets, with popular benchmarks like LogiQA and ReClor, and newly-released datasets like AR-LSAT. |
HANMENG LIU et. al. | arxiv-cs.CL | 2023-04-06 |
| 288 | Summary of ChatGPT/GPT-4 Research and Perspective Towards The Future of Large Language Models IF:5 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents a comprehensive survey of ChatGPT and GPT-4, state-of-the-art large language models (LLM) from the GPT series, and their prospective applications across … |
YI-HSUEH LIU et. al. | ArXiv | 2023-01-01 |
| 289 | How Language Model Hallucinations Can Snowball IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements.To do this, we construct three question-answering datasets where LMs often state an incorrect answer which is followed by an explanation with at least one incorrect claim. |
Muru Zhang; Ofir Press; William Merrill; Alisa Liu; Noah A. Smith; | icml | 2024-06-12 |
| 290 | Understanding The Behaviors Of BERT In Ranking IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore several different ways to leverage the pre-trained BERT and fine-tune it on two ranking tasks: MS MARCO passage reranking and TREC Web Track ad hoc document ranking. |
Yifan Qiao; Chenyan Xiong; Zhenghao Liu; Zhiyuan Liu; | arxiv-cs.IR | 2019-04-16 |
| 291 | HRFormer: High-Resolution Vision Transformer for Dense Predict IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost. |
YUHUI YUAN et. al. | nips | 2021-11-20 |
| 292 | HRFormer: High-Resolution Transformer for Dense Prediction IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost. |
YUHUI YUAN et. al. | arxiv-cs.CV | 2021-10-18 |
| 293 | Reducing Activation Recomputation in Large Transformer Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show how to significantly accelerate training of large transformer models by reducing activation recomputation. |
VIJAY KORTHIKANTI et. al. | arxiv-cs.LG | 2022-05-10 |
| 294 | Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, future superhuman models will behave in complex ways too difficult for humans to reliably evaluate; humans will only be able to weakly supervise superhuman models. We study an analogy to this problem: can weak model supervision elicit the full capabilities of a much stronger model? |
COLLIN BURNS et. al. | arxiv-cs.CL | 2023-12-14 |
| 295 | Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, future superhuman models will behave in complex ways too difficult for humans to reliably evaluate; humans will only be able to *weakly supervise* superhuman models. We study an analogy to this problem: can weak model supervision elicit the full capabilities of a much stronger model? |
COLLIN BURNS et. al. | icml | 2024-06-12 |
| 296 | Multilevel Intelligent Universal Transformer for Medium Voltage Applications IF:5 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The solid-state transformer allows add-on intelligence to enhance power quality compatibility between source and load. It is desired to demonstrate the benefits gained by the use … |
Jih-Sheng Lai; A. Maitra; A. Mansoor; F. Goodman; | Fourtieth IAS Annual Meeting. Conference Record of the 2005 … | 2005-01-01 |
| 297 | Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration IF:5 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The state-of-the-art neural network architecture named Transformer has been used successfully for many sequence-tosequence transformation tasks. The advantage of this architecture … |
SHIGEKI KARITA et. al. | 2019-01-01 | |
| 298 | Is GPT-3 A Good Data Annotator? IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It is therefore natural to wonder whether it can be used to effectively annotate data for NLP tasks. In this paper, we evaluate the performance of GPT-3 as a data annotator by comparing it with traditional data annotation methods and analyzing its output on a range of tasks. |
BOSHENG DING et. al. | acl | 2023-07-08 |
| 299 | Is GPT-3 A Good Data Annotator? IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It is therefore natural to wonder whether it can be used to effectively annotate data for NLP tasks. In this paper, we evaluate the performance of GPT-3 as a data annotator by comparing it with traditional data annotation methods and analyzing its output on a range of tasks. |
BOSHENG DING et. al. | arxiv-cs.CL | 2022-12-20 |
| 300 | Vision Transformer for Small-Size Datasets IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes Shifted Patch Tokenization (SPT) and Locality Self-Attention (LSA), which effectively solve the lack of locality inductive bias and enable it to learn from scratch even on small-size datasets. |
Seung Hoon Lee; Seunghyun Lee; Byung Cheol Song; | arxiv-cs.CV | 2021-12-26 |
| 301 | Interpreting and Improving Natural-language Processing (in Machines) with Natural Language-processing (in The Brain) IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose here a novel interpretation approach that relies on the only processing system we have that does understand language: the human brain. |
Mariya Toneva; Leila Wehbe; | nips | 2019-11-15 |
| 302 | Attention Is Not Only A Weight: Analyzing Transformers With Vector Norms IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper shows that attention weights alone are only one of the two factors that determine the output of attention and proposes a norm-based analysis that incorporates the second factor, the norm of the transformed input vectors. |
Goro Kobayashi; Tatsuki Kuribayashi; Sho Yokoi; Kentaro Inui; | emnlp | 2020-11-12 |
| 303 | CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Along with the pseudo labels, a weight-sharing triple-branch transformer framework is proposed to apply self-attention and cross-attention for source/target feature learning and source-target domain alignment, respectively. |
TONGKUN XU et. al. | arxiv-cs.CV | 2021-09-13 |
| 304 | Style Transformer: Unpaired Text Style Transfer Without Disentangled Latent Representation IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the Style Transformer, which makes no assumption about the latent representation of source sentence and equips the power of attention mechanism in Transformer to achieve better style transfer and better content preservation. |
Ning Dai; Jianze Liang; Xipeng Qiu; Xuanjing Huang,; | acl | 2019-07-28 |
| 305 | Style Transformer: Unpaired Text Style Transfer Without Disentangled Latent Representation IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the Style Transformer, which makes no assumption about the latent representation of source sentence and equips the power of attention mechanism in Transformer to achieve better style transfer and better content preservation. |
Ning Dai; Jianze Liang; Xipeng Qiu; Xuanjing Huang; | arxiv-cs.CL | 2019-05-14 |
| 306 | Pretraining-Based Natural Language Generation For Text Summarization IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel pretraining-based encoder-decoder framework, which can generate the output sequence based on the input sequence in a two-stage manner. |
Haoyu Zhang; Jianjun Xu; Ji Wang; | arxiv-cs.CL | 2019-02-25 |
| 307 | When Does Pretraining Help?: Assessing Self-supervised Learning for Law and The CaseHOLD Dataset of 53,000+ Legal Holdings IF:5 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While self-supervised learning has made rapid advances in natural language processing, it remains unclear when researchers should engage in resource-intensive domain-specific … |
Lucia Zheng; Neel Guha; Brandon R. Anderson; Peter Henderson; Daniel E. Ho; | Proceedings of the Eighteenth International Conference on … | 2021-01-01 |
| 308 | TernaryBERT: Distillation-aware Ultra-low Bit BERT IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model. |
WEI ZHANG et. al. | emnlp | 2020-11-12 |
| 309 | P2T: Pyramid Pooling Transformer for Scene Understanding IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, pyramid pooling has not been explored in backbone network design. To bridge this gap, we propose to adapt pyramid pooling to Multi-Head Self-Attention (MHSA) in the vision transformer, simultaneously reducing the sequence length and capturing powerful contextual features. |
Yu-Huan Wu; Yun Liu; Xin Zhan; Ming-Ming Cheng; | arxiv-cs.CV | 2021-06-22 |
| 310 | The Reversal Curse: LLMs Trained on A Is B Fail to Learn B Is A IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is worth noting, however, that if A is B appears in-context, models can deduce the reverse relationship. We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as Uriah Hawthorne is the composer of Abyssal Melodies and showing that they fail to correctly answer Who composed Abyssal Melodies? |
LUKAS BERGLUND et. al. | arxiv-cs.CL | 2023-09-21 |
| 311 | Self-Attention Attribution: Interpreting Information Interactions Inside Transformer IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a self-attention attribution method to interpret the information interactions inside Transformer. |
Yaru Hao; Li Dong; Furu Wei; Ke Xu; | arxiv-cs.CL | 2020-04-23 |
| 312 | Adapt Or Get Left Behind: Domain Adaptation Through BERT Language Model Finetuning For Aspect-Target Sentiment Classification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Aspect-Target Sentiment Classification (ATSC) is a subtask of Aspect-Based Sentiment Analysis (ABSA), which has many applications e.g. in e-commerce, where data and insights from reviews can be leveraged to create value for businesses and customers. |
Alexander Rietzler; Sebastian Stabinger; Paul Opitz; Stefan Engl; | arxiv-cs.CL | 2019-08-30 |
| 313 | End-to-End Human Object Interaction Detection With HOI Transformer IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose HOI Transformer to tackle human object interaction (HOI) detection in an end-to-end manner. |
CHENG ZOU et. al. | cvpr | 2021-06-11 |
| 314 | Shunted Self-Attention Via Multi-Scale Token Aggregation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such a constraint inevitably limits the ability of each self-attention layer in capturing multi-scale features, thereby leading to performance degradation in handling images with multiple objects of different scales. To address this issue, we propose a novel and generic strategy, termed shunted self-attention (SSA), that allows ViTs to model the attentions at hybrid scales per attention layer. |
Sucheng Ren; Daquan Zhou; Shengfeng He; Jiashi Feng; Xinchao Wang; | cvpr | 2022-06-07 |
| 315 | TinyStories: How Small Can Language Models Be and Still Speak Coherent English? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. |
Ronen Eldan; Yuanzhi Li; | arxiv-cs.CL | 2023-05-12 |
| 316 | MiniCPM-V: A GPT-4V Level MLLM on Your Phone IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present MiniCPM-V, a series of efficient MLLMs deployable on end-side devices. |
YUAN YAO et. al. | arxiv-cs.CV | 2024-08-03 |
| 317 | Cloze-driven Pretraining Of Self-attention Networks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems. |
Alexei Baevski; Sergey Edunov; Yinhan Liu; Luke Zettlemoyer; Michael Auli; | emnlp | 2019-11-02 |
| 318 | Plagiarism in The Age of Massive Generative Pre-trained Transformers (GPT-3) IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: As if 2020 was not a peculiar enough year, its fifth month saw the relatively quiet publication of a preprint describing the most powerful natural language processing (NLP) system … |
N Dehouche; | Ethics in Science and Environmental Politics | 2021-01-01 |
| 319 | Text-to-Text Pre-Training for Data-to-Text Tasks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the pre-train + fine-tune strategy for data-to-text tasks. |
Mihir Kale; Abhinav Rastogi; | arxiv-cs.CL | 2020-05-20 |
| 320 | Few-shot Learning with Multilingual Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we train multilingual generative language models on a corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks. |
XI VICTORIA LIN et. al. | arxiv-cs.CL | 2021-12-20 |
| 321 | Visformer: The Vision-friendly Transformer IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on these observations, we propose a new architecture named Visformer, which is abbreviated from the `Vision-friendly Transformer’. |
ZHENGSU CHEN et. al. | arxiv-cs.CV | 2021-04-26 |
| 322 | PoWER-BERT: Accelerating BERT Inference Via Progressive Word-vector Elimination IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop a novel method, called PoWER-BERT, for improving the inference time of the popular BERT model, while maintaining the accuracy. |
SAURABH GOYAL et. al. | icml | 2020-07-11 |
| 323 | Toward Transformer-Based Object Detection IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The model that we propose, ViT-FRCNN, demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance. |
JOSH BEAL et. al. | arxiv-cs.CV | 2020-12-17 |
| 324 | Intermediate-Task Transfer Learning With Pretrained Language Models: When And Why Does It Work? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To investigate this, we perform a large-scale study on the pretrained RoBERTa model with 110 intermediate-target task combinations. |
YADA PRUKSACHATKUN et. al. | acl | 2020-06-20 |
| 325 | PoWER-BERT: Accelerating BERT Inference Via Progressive Word-vector Elimination IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop a novel method, called PoWER-BERT, for improving the inference time of the popular BERT model, while maintaining the accuracy. |
SAURABH GOYAL et. al. | arxiv-cs.LG | 2020-01-24 |
| 326 | GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs Via Cipher IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we discover that chat in cipher can bypass the safety alignment techniques of LLMs, which are mainly conducted in natural languages. |
YOULIANG YUAN et. al. | arxiv-cs.CL | 2023-08-12 |
| 327 | Compressing Large-Scale Transformer-Based Models: A Case Study on BERT IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks. |
PRAKHAR GANESH et. al. | arxiv-cs.LG | 2020-02-27 |
| 328 | Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new transformer-based framework to learn class-specific object localization maps as pseudo labels for weakly supervised semantic segmentation (WSSS). |
Lian Xu; Wanli Ouyang; Mohammed Bennamoun; Farid Boussaid; Dan Xu; | cvpr | 2022-06-07 |
| 329 | VidTr: Video Transformer Without Convolutions IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Video Transformer (VidTr) with separable-attention for video classification. |
YANYI ZHANG et. al. | arxiv-cs.CV | 2021-04-23 |
| 330 | ParsBERT: Transformer-based Model For Persian Language Understanding IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a monolingual BERT for the Persian language (ParsBERT), which shows its state-of-the-art performance compared to other architectures and multilingual models. |
Mehrdad Farahani; Mohammad Gharachorloo; Marzieh Farahani; Mohammad Manthouri; | arxiv-cs.CL | 2020-05-26 |
| 331 | Episodic Transformer for Vision-and-Language Navigation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Episodic Transformer (E.T.), a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions. |
Alexander Pashevich; Cordelia Schmid; Chen Sun; | iccv | 2021-10-08 |
| 332 | Few-shot Training LLMs for Project-specific Code-summarization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the use few-shot training with the very large GPT (Generative Pre-trained Transformer) Codex model, and find evidence suggesting that one can significantly surpass state-of-the-art models for code-summarization, leveraging project-specific training. |
Toufique Ahmed; Premkumar Devanbu; | arxiv-cs.SE | 2022-07-09 |
| 333 | ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new traffic representation model called Encrypted Traffic Bidirectional Encoder Representations from Transformer (ET-BERT), which pre-trains deep contextualized datagram-level representation from large-scale unlabeled data. |
XINJIE LIN et. al. | www | 2022-04-29 |
| 334 | KLUE: Korean Language Understanding Evaluation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Korean Language Understanding Evaluation (KLUE) benchmark. |
SUNGJOON PARK et. al. | arxiv-cs.CL | 2021-05-20 |
| 335 | Memory-Efficient Pipeline-Parallel DNN Training IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose PipeDream-2BW, a system that supports memory-efficient pipeline parallelism. |
Deepak Narayanan; Amar Phanishayee; Kaiyu Shi; Xie Chen; Matei Zaharia; | arxiv-cs.LG | 2020-06-16 |
| 336 | Polar Transformer Networks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We learn feature maps invariant to translation, and equivariant to rotation and scale. |
Carlos Esteves; Christine Allen-Blanchette; Xiaowei Zhou; Kostas Daniilidis; | iclr | 2018-12-04 |
| 337 | A Survey on Visual Transformer IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given its high performance and less need for vision-specific inductive bias, transformer is receiving more and more attention from the computer vision community. In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages. |
KAI HAN et. al. | arxiv-cs.CV | 2020-12-23 |
| 338 | Learning to Optimize QoS-Constrained Multicast Beamforming with HPE Transformer IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper proposes a deep learning-based approach for the quality-of-service (QoS) constrained multi-group mul-ticast beamforming design. The proposed method consists of a … |
Yang Li; Ya-Feng Liu; | GLOBECOM 2023 – 2023 IEEE Global Communications Conference | 2023-12-04 |
| 339 | A Comparison of Transformer and LSTM Encoder Decoder Models for ASR IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present competitive results using a Transformer encoder-decoder-attention model for end-to-end speech recognition needing less training time compared to a similarly performing … |
Albert Zeyer; Parnia Bahar; Kazuki Irie; Ralf Schlüter; Hermann Ney; | 2019 IEEE Automatic Speech Recognition and Understanding … | 2019-01-01 |
| 340 | SG-Net: Syntax-Guided Machine Reading Comprehension IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose using syntax to guide the text modeling by incorporating explicit syntactic constraints into attention mechanism for better linguistically motivated word representations. |
ZHUOSHENG ZHANG et. al. | aaai | 2020-02-07 |
| 341 | Evaluating Commonsense in Pre-trained Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the commonsense ability of GPT, BERT, XLNet, and RoBERTa by testing them on seven challenging benchmarks, finding that language modeling and its variants are effective objectives for promoting models’ commonsense ability while bi-directional context and larger training set are bonuses. We release a test set, named CATs publicly, for future research. |
Xuhui Zhou; Yue Zhang; Leyang Cui; Dandan Huang; | arxiv-cs.CL | 2019-11-26 |
| 342 | Adversarial Training For Large Neural Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that adversarial pre-training can improve both generalization and robustness. |
XIAODONG LIU et. al. | arxiv-cs.CL | 2020-04-19 |
| 343 | Pure Transformers Are Powerful Graph Learners IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice. |
JINWOO KIM et. al. | arxiv-cs.LG | 2022-07-06 |
| 344 | Pure Transformers Are Powerful Graph Learners IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that standard Transformers without graph-specific modifications can work well in graph learning both in theory and practice. |
JINWOO KIM et. al. | nips | 2022-11-06 |
| 345 | Towards Making The Most of BERT in Neural Machine Translation IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a concerted training framework (CTnmt) that is the key to integrate the pre-trained LMs to neural machine translation (NMT). |
JIACHENG YANG et. al. | aaai | 2020-02-07 |
| 346 | Towards Making The Most of BERT in Neural Machine Translation IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a concerted training framework (CTNMT) that is the key to integrate the pre-trained LMs to neural machine translation (NMT). |
JIACHENG YANG et. al. | arxiv-cs.CL | 2019-08-14 |
| 347 | T-GSA: Transformer With Gaussian-Weighted Self-Attention For Speech Enhancement IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Transformer with Gaussian-weighted self-attention (T-GSA), whose attention weights are attenuated according to the distance between target and context symbols. |
J. Kim; M. El-Khamy and J. Lee; | icassp | 2020-04-26 |
| 348 | When BERT Plays The Lottery, All Tickets Are Winning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For fine-tuned BERT, we show that (a) it is possible to find subnetworks achieving performance that is comparable with that of the full model, and (b) similarly-sized subnetworks sampled from the rest of the model perform worse. |
Sai Prasanna; Anna Rogers; Anna Rumshisky; | emnlp | 2020-11-12 |
| 349 | MART: Memory-Augmented Recurrent Transformer For Coherent Video Paragraph Captioning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Towards this goal, we propose a new approach called Memory-Augmented Recurrent Transformer (MART), which uses a memory module to augment the transformer architecture. |
JIE LEI et. al. | arxiv-cs.CL | 2020-05-11 |
| 350 | GPT-4o System Card IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. … |
OPENAI AARON HURST et. al. | ArXiv | 2024-10-25 |
| 351 | Language in A Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: CBMs require manually specified concepts and often under-perform their black box counterparts, preventing their broad adoption. We address these shortcomings and are first to show how to construct high-performance CBMs without manual specification of similar accuracy to black box models. |
YUE YANG et. al. | cvpr | 2023-05-17 |
| 352 | Active Example Selection for In-Context Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We formulate example selection for in-context learning as a sequential decision problem, and propose a reinforcement learning algorithm for identifying generalizable policies to select demonstration examples. |
Yiming Zhang; Shi Feng; Chenhao Tan; | emnlp | 2022-12-30 |
| 353 | Optimus: Organizing Sentences Via Pre-trained Modeling Of A Latent Space IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the first large-scale language VAE model Optimus (Organizing sentences via Pre-Trained Modeling of a Universal Space). |
CHUNYUAN LI et. al. | emnlp | 2020-11-12 |
| 354 | Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we revisit the spatial shuffle as an efficient way to build connections among windows. |
ZILONG HUANG et. al. | arxiv-cs.CV | 2021-06-07 |
| 355 | Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explored the potential of Transformer Transducer (T-T) models for the fist pass decoding with low latency and fast speed on a large-scale dataset. |
Xie Chen; Yu Wu; Zhenghao Wang; Shujie Liu; Jinyu Li; | arxiv-cs.CL | 2020-10-21 |
| 356 | Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explored the potential of Transformer Transducer (T-T) models for the fist pass decoding with low latency and fast speed on a large-scale dataset. |
X. Chen; Y. Wu; Z. Wang; S. Liu; J. Li; | icassp | 2021-05-16 |
| 357 | GasHis-Transformer: A Multi-scale Visual Transformer Approach for Gastric Histopathological Image Detection IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a multi-scale visual transformer model, referred as GasHis-Transformer, is proposed for Gastric Histopathological Image Detection (GHID), which enables the automatic global detection of gastric cancer images. |
HAOYUAN CHEN et. al. | arxiv-cs.CV | 2021-04-29 |
| 358 | Inverse Compositional Spatial Transformer Networks IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we establish a theoretical connection between the classical Lucas & Kanade (LK) algorithm and the emerging topic of Spatial Transformer Networks (STNs). |
Chen-Hsuan Lin; Simon Lucey; | cvpr | 2017-06-17 |
| 359 | Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we conduct an in-depth analysis of GPT-2, which is the most downloaded text generation model on HuggingFace, with over half a million downloads per month. |
HANNAH ROSE KIRK et. al. | nips | 2021-11-20 |
| 360 | FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM cascade which learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. |
Lingjiao Chen; Matei Zaharia; James Zou; | arxiv-cs.LG | 2023-05-09 |
| 361 | Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We focus on generative language models as they are well-suited for extracting biases inherited from training data. |
HANNAH KIRK et. al. | arxiv-cs.CL | 2021-02-08 |
| 362 | Authorship Attribution For Neural Text Generation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, in the context of this Turing Test, we investigate the so-called authorship attribution problem in three versions: (1) given two texts T1 and T2, are both generated by the same method or not? |
Adaku Uchendu; Thai Le; Kai Shu; Dongwon Lee; | emnlp | 2020-11-12 |
| 363 | GPT-Driver: Learning to Drive with GPT IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a simple yet effective approach that can transform the OpenAI GPT-3.5 model into a reliable motion planner for autonomous vehicles. |
Jiageng Mao; Yuxi Qian; Junjie Ye; Hang Zhao; Yue Wang; | arxiv-cs.CV | 2023-10-02 |
| 364 | SMILES Transformer: Pre-trained Molecular Fingerprint For Low Data Drug Discovery IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this issue, we present SMILES Transformer. |
Shion Honda; Shoi Shi; Hiroki R. Ueda; | arxiv-cs.LG | 2019-11-12 |
| 365 | Large Language Models As Tool Makers IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We evaluate our approach across various complex reasoning tasks, including Big-Bench tasks. |
Tianle Cai; Xuezhi Wang; Tengyu Ma; Xinyun Chen; Denny Zhou; | arxiv-cs.LG | 2023-05-26 |
| 366 | Modelling Context And Syntactical Features For Aspect-based Sentiment Analysis IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper explores the grammatical aspect of the sentence and employs the self-attention mechanism for syntactical learning. |
Minh Hieu Phan; Philip O. Ogunbona; | acl | 2020-06-20 |
| 367 | Transformer for Image Quality Assessment IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we investigate the application of Transformer in Image Quality (TRIQ) assessment. |
Junyong You; Jari Korhonen; | arxiv-cs.CV | 2020-12-30 |
| 368 | A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4 IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Large language models (LLMs) are a special class of pretrained language models obtained by scaling model size, pretraining corpus and computation. LLMs, because of their large … |
Katikapalli Subramanyam Kalyan; | ArXiv | 2023-10-04 |
| 369 | Implicit Representations of Meaning in Neural Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In BART and T5 transformer language models, we identify contextual word representations that function as *models of entities and situations* as they evolve throughout a discourse. |
Belinda Z. Li; Maxwell Nye; Jacob Andreas; | acl | 2021-07-26 |
| 370 | Beat The AI: Investigating Adversarial Human Annotation For Reading Comprehension IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we investigate this annotation methodology and apply it in three different settings, collecting a total of 36,000 samples with progressively stronger models in the annotation loop. |
Max Bartolo; Alastair Roberts; Johannes Welbl; Sebastian Riedel; Pontus Stenetorp; | arxiv-cs.CL | 2020-02-01 |
| 371 | Reasoning Over Semantic-Level Graph For Fact Checking IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a method suitable for reasoning about the semantic-level structure of evidence. |
WANJUN ZHONG et. al. | acl | 2020-06-20 |
| 372 | Understanding And Improving Transformer From A Multi-Particle Dynamic System Point Of View IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide a novel perspective towards understanding the architecture: we show that the Transformer can be mathematically interpreted as a numerical Ordinary Differential Equation (ODE) solver for a convection-diffusion equation in a multi-particle dynamic system. |
YIPING LU et. al. | arxiv-cs.LG | 2019-06-06 |
| 373 | GAN-BERT: Generative Adversarial Learning For Robust Text Classification With A Bunch Of Labeled Examples IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose GAN-BERT that ex- tends the fine-tuning of BERT-like architectures with unlabeled data in a generative adversarial setting. |
Danilo Croce; Giuseppe Castellucci; Roberto Basili; | acl | 2020-06-20 |
| 374 | Bailando: 3D Dance Generation By Actor-Critic GPT With Choreographic Memory IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In addition, the generated dance sequence also needs to maintain temporal coherency with different music genres. To tackle these challenges, we propose a novel music-to-dance framework, Bailando, with two powerful components: 1) a choreographic memory that learns to summarize meaningful dancing units from 3D pose sequence to a quantized codebook, 2) an actor-critic Generative Pre-trained Transformer (GPT) that composes these units to a fluent dance coherent to the music. |
LI SIYAO et. al. | cvpr | 2022-06-07 |
| 375 | GPT-4V(ision) Is A Generalist Web Agent, If Grounded IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore the potential of LMMs like GPT-4V as a generalist web agent that can follow natural language instructions to complete tasks on any given website. |
Boyuan Zheng; Boyu Gou; Jihyung Kil; Huan Sun; Yu Su; | icml | 2024-06-12 |
| 376 | A Tensorized Transformer For Language Modeling IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD). |
XINDIAN MA et. al. | arxiv-cs.CL | 2019-06-24 |
| 377 | A Tensorized Transformer for Language Modeling IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD). |
XINDIAN MA et. al. | nips | 2019-11-15 |
| 378 | Prometheus: Inducing Fine-grained Evaluation Capability in Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Prometheus, a fully open-source LLM that is on par with GPT-4’s evaluation capabilities when the appropriate reference materials (reference answer, score rubric) are accompanied. |
SEUNGONE KIM et. al. | arxiv-cs.CL | 2023-10-12 |
| 379 | DeID-GPT: Zero-shot Medical Text De-Identification By GPT-4 IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we developed a novel GPT4-enabled de-identification framework (“DeID-GPT) to automatically identify and remove the identifying information. |
ZHENGLIANG LIU et. al. | arxiv-cs.CL | 2023-03-20 |
| 380 | MISSFormer: An Effective Medical Image Segmentation Transformer IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, taking medical image segmentation as an example, we present MISSFormer, an effective and powerful Medical Image Segmentation tranSFormer. |
Xiaohong Huang; Zhifang Deng; Dandan Li; Xueguang Yuan; | arxiv-cs.CV | 2021-09-15 |
| 381 | Video Super-Resolution Transformer IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we make the first attempt to adapt Transformer for VSR. |
Jiezhang Cao; Yawei Li; Kai Zhang; Luc Van Gool; | arxiv-cs.CV | 2021-06-12 |
| 382 | Multi-Agent Reinforcement Learning Is A Sequence Modeling Problem IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the task is to map agents’ observation sequence to agents’ optimal action sequence. |
MUNING WEN et. al. | arxiv-cs.MA | 2022-05-30 |
| 383 | Multi-Agent Reinforcement Learning Is A Sequence Modeling Problem IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the objective is to map agents’ observation sequences to agents’ optimal action sequences. |
MUNING WEN et. al. | nips | 2022-11-06 |
| 384 | Does Syntax Matter? A Strong Baseline for Aspect-based Sentiment Analysis with RoBERTa IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we firstly compare the induced trees from PTMs and the dependency parsing trees on several popular models for the ABSA task, showing that the induced tree from fine-tuned RoBERTa (FT-RoBERTa) outperforms the parser-provided tree. |
Junqi Dai; Hang Yan; Tianxiang Sun; Pengfei Liu; Xipeng Qiu; | naacl | 2021-05-23 |
| 385 | Glancing Transformer for Non-Autoregressive Neural Machine Translation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the Glancing Language Model (GLM) for single-pass parallel generation models. |
LIHUA QIAN et. al. | acl | 2021-07-26 |
| 386 | Microsoft Translator At WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the Microsoft Translator submissions to the WMT19 news translation shared task for English-German. |
Marcin Junczys-Dowmunt; | arxiv-cs.CL | 2019-07-14 |
| 387 | CharacterBERT: Reconciling ELMo And BERT For Word-Level Open-Vocabulary Representations From Characters IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For these reasons, we propose CharacterBERT, a new variant of BERT that drops the wordpiece system altogether and uses a Character-CNN module instead to represent entire words by consulting their characters. |
HICHAM EL BOUKKOURI et. al. | arxiv-cs.CL | 2020-10-20 |
| 388 | Colorization Transformer IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the Colorization Transformer, a novel approach for diverse high fidelity image colorization based on self-attention. |
Manoj Kumar; Dirk Weissenborn; Nal Kalchbrenner; | arxiv-cs.CV | 2021-02-08 |
| 389 | GPT-NER: Named Entity Recognition Via Large Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is due to the gap between the two tasks the NER and LLMs: the former is a sequence labeling task in nature while the latter is a text-generation model. In this paper, we propose GPT-NER to resolve this issue. |
SHUHE WANG et. al. | arxiv-cs.CL | 2023-04-20 |
| 390 | Can GPT-3 Pass A Writer’s Turing Test? IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Until recently the field of natural language generation relied upon formalized grammar systems, small-scale statistical models, and lengthy sets of heuristic rules. This older … |
Katherine Elkins; Jon Chun; | Journal of Cultural Analytics | 2020-09-14 |
| 391 | Auto-Debias: Debiasing Masked Language Models with Automated Biased Prompts IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an automatic method to mitigate the biases in pretrained language models. |
Yue Guo; Yi Yang; Ahmed Abbasi; | acl | 2022-05-17 |
| 392 | Birds Have Four Legs?! NumerSense: Probing Numerical Commonsense Knowledge Of Pre-Trained Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate whether and to what extent we can induce numerical commonsense knowledge from PTLMs as well as the robustness of this process. |
Bill Yuchen Lin; Seyeon Lee; Rahul Khanna; Xiang Ren; | emnlp | 2020-11-12 |
| 393 | TSTNN: Two-Stage Transformer Based Neural Network for Speech Enhancement in The Time Domain IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a transformer-based architecture, called two-stage transformer neural network (TSTNN) for end-to-end speech denoising in the time domain. |
K. Wang; B. He; W. -P. Zhu; | icassp | 2021-05-16 |
| 394 | AMMU : A Survey of Transformer-based Biomedical Pretrained Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this survey, we start with a brief overview of foundational concepts like self-supervised learning, embedding layer and transformer encoder layers. |
Katikapalli Subramanyam Kalyan; Ajit Rajasekharan; Sivanesan Sangeetha; | arxiv-cs.CL | 2021-04-16 |
| 395 | Finding Universal Grammatical Relations In Multilingual BERT IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by these results, we present an unsupervised analysis method that provides evidence mBERT learns representations of syntactic dependency labels, in the form of clusters which largely agree with the Universal Dependencies taxonomy. |
Ethan A. Chi; John Hewitt; Christopher D. Manning; | acl | 2020-06-20 |
| 396 | Transformers4Rec: Bridging The Gap Between NLP and Sequential / Session-Based Recommendation IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Much of the recent progress in sequential and session-based recommendation has been driven by improvements in model architecture and pretraining techniques originating in the … |
Gabriel de Souza Pereira Moreira; Sara Rabhi; Jeong Min Lee; Ronay Ak; Even Oldridge; | Fifteenth ACM Conference on Recommender Systems | 2021-01-01 |
| 397 | Transformers As Statisticians: Provable In-Context Learning with In-Context Algorithm Selection IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We establish this in theory by explicit constructions, and also observe this phenomenon experimentally. In theory, we construct two general mechanisms for algorithm selection with concrete examples: (1) Pre-ICL testing, where the transformer determines the right task for the given sequence (such as choosing between regression and classification) by examining certain summary statistics of the input sequence; (2) Post-ICL validation, where the transformer selects—among multiple base ICL algorithms (such as ridge regression with multiple regularization strengths)—a near-optimal one for the given sequence using a train-validation split. |
Yu Bai; Fan Chen; Huan Wang; Caiming Xiong; Song Mei; | nips | 2023-10-24 |
| 398 | GPT Takes The Bar Exam IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this research, we document our experimental evaluation of the performance of OpenAI’s `text-davinci-003` model, often-referred to as GPT-3.5, on the multistate multiple choice (MBE) section of the exam. |
Michael Bommarito II; Daniel Martin Katz; | arxiv-cs.CL | 2022-12-29 |
| 399 | How Does BERT Answer Questions?: A Layer-Wise Analysis Of Transformer Representations IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to better understand BERT and other Transformer-based models, we present a layer-wise analysis of BERT’s hidden states. |
Betty van Aken; Benjamin Winter; Alexander L�ser; Felix A. Gers; | cikm | 2019-11-03 |
| 400 | Tree Transformer: Integrating Tree Structures Into Self-Attention IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. |
Yau-Shian Wang; Hung-Yi Lee; Yun-Nung Chen; | arxiv-cs.CL | 2019-09-14 |
| 401 | Tree Transformer: Integrating Tree Structures Into Self-Attention IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. |
Yaushian Wang; Hung-Yi Lee; Yun-Nung Chen; | emnlp | 2019-11-02 |
| 402 | Contrastive Code Representation Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose ContraCode: a contrastive pre-training task that learns code functionality, not form. |
PARAS JAIN et. al. | arxiv-cs.LG | 2020-07-09 |
| 403 | How Does BERT Answer Questions? A Layer-Wise Analysis Of Transformer Representations IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to better understand BERT and other Transformer-based models, we present a layer-wise analysis of BERT’s hidden states. |
Betty van Aken; Benjamin Winter; Alexander Löser; Felix A. Gers; | arxiv-cs.CL | 2019-09-11 |
| 404 | Contrastive Code Representation Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose ContraCode: a contrastive pre-training task that learns code functionality, not form. |
PARAS JAIN et. al. | emnlp | 2021-11-05 |
| 405 | Transformer-Patcher: One Mistake Worth One Neuron IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus a preferable solution is to rectify the mistakes as soon as they appear nonstop. Therefore, we extend the existing ME into Sequential Model Editing (SME) to help develop more practical editing methods. |
ZEYU HUANG et. al. | arxiv-cs.CL | 2023-01-23 |
| 406 | A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: After introducing the fundamental techniques, this work focuses on the technological development of various AIGC tasks based on their output type, including text, images, videos, 3D content, etc., which depicts the full potential of ChatGPT’s future. |
CHAONING ZHANG et. al. | arxiv-cs.AI | 2023-03-21 |
| 407 | The Radicalization Risks Of GPT-3 And Advanced Neural Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we expand on our previous research of the potential for abuse of generative language models by assessing GPT-3. |
Kris McGuffie; Alex Newhouse; | arxiv-cs.CY | 2020-09-14 |
| 408 | Efficient Training of BERT By Progressively Stacking IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore an efficient training method for the state-of-the-art bidirectional Transformer (BERT) model. |
LINYUAN GONG et. al. | icml | 2019-05-24 |
| 409 | Thinking About GPT-3 In-Context Learning for Biomedical IE? Think Again IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the first systematic and comprehensive study to compare the few-shot performance of GPT-3 in-context learning with fine-tuning smaller (i.e., BERT-sized) PLMs on two highly representative biomedical information extraction tasks, named entity recognition and relation extraction. |
BERNAL JIMÉNEZ GUTIÉRREZ et. al. | arxiv-cs.CL | 2022-03-16 |
| 410 | Building Extraction with Vision Transformer IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Second, spatial details are not sufficiently preserved during the feature extraction of the Vision Transformer, resulting in the inability for fine-grained building segmentation. To handle these issues, we propose a novel Vision Transformer (BuildFormer), with a dual-path structure. |
Libo Wang; Shenghui Fang; Rui Li; Xiaoliang Meng; | arxiv-cs.CV | 2021-11-29 |
| 411 | Few-Shot Generative Conversational Query Rewriting IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a few-shot generative approach to conversational query rewriting. |
SHI YU et. al. | sigir | 2020-07-25 |
| 412 | Improving Large Language Models for Clinical Named Entity Recognition Via Prompt Engineering IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Objective: This study quantifies the capabilities of GPT-3.5 and GPT-4 for clinical named entity recognition (NER) tasks and proposes task-specific prompts to improve their performance. |
YAN HU et. al. | arxiv-cs.CL | 2023-03-28 |
| 413 | ChatGPT for Shaping The Future of Dentistry: The Potential of Multi-Modal Large Language Model IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce two primary LLM deployment methods in dentistry, including automated dental diagnosis and cross-modal dental diagnosis, and examine their potential applications. |
HANYAO HUANG et. al. | arxiv-cs.CL | 2023-03-23 |
| 414 | Low-Resource Languages Jailbreak GPT-4 IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: AI safety training and red-teaming of large language models (LLMs) are measures to mitigate the generation of unsafe content. Our work exposes the inherent cross-lingual … |
Zheng-Xin Yong; Cristina Menghini; Stephen H. Bach; | arxiv-cs.CL | 2023-10-03 |
| 415 | MEDITRON-70B: Scaling Medical Pretraining for Large Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. |
ZEMING CHEN et. al. | arxiv-cs.CL | 2023-11-27 |
| 416 | Primer: Searching for Efficient Transformers for Language Modeling IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we aim to reduce the costs of Transformers by searching for a more efficient variant. |
DAVID R. SO et. al. | arxiv-cs.LG | 2021-09-17 |
| 417 | Improving Transformer Optimization Through Better Initialization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work our contributions are two-fold. We first investigate and empirically validate the source of optimization problems in encoder-decoder Transformer architecture.We then propose a new weight initialization scheme with theoretical justification, which enables training without warmup or layer normalization. |
Xiao Shi Huang; Felipe Perez; Jimmy Ba; Maksims Volkovs; | icml | 2020-07-11 |
| 418 | HiT: Hierarchical Transformer With Momentum Contrast for Video-Text Retrieval IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach named Hierarchical Transformer (HiT) for video-text retrieval. |
SONG LIU et. al. | iccv | 2021-10-08 |
| 419 | MGPT: Few-Shot Learners Go Multilingual IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces two autoregressive GPT-like models with 1.3 billion and 13 billion parameters trained on 60 languages from 25 language families using Wikipedia and Colossal Clean Crawled Corpus. |
OLEH SHLIAZHKO et. al. | arxiv-cs.CL | 2022-04-15 |
| 420 | Patent Claim Generation By Fine-Tuning OpenAI GPT-2 IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Jieh-Sheng Lee; Jieh Hsiang; | ArXiv | 2019-01-01 |
| 421 | Transformers As Algorithms: Generalization and Stability in In-context Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formalize in-context learning as an algorithm learning problem where a transformer model implicitly constructs a hypothesis function at inference-time. |
Yingcong Li; M. Emrullah Ildiz; Dimitris Papailiopoulos; Samet Oymak; | arxiv-cs.LG | 2023-01-17 |
| 422 | RealTime QA: What’s The Answer Right Now? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce REALTIME QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version). |
JUNGO KASAI et. al. | arxiv-cs.CL | 2022-07-27 |
| 423 | Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Branch-Train-Merge (BTM), a communication-efficient algorithm for embarrassingly parallel training of large language models (LLMs). |
MARGARET LI et. al. | arxiv-cs.CL | 2022-08-05 |
| 424 | Transformers As Algorithms: Generalization and Stability in In-context Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formalize in-context learning as an algorithm learning problem where a transformer model implicitly constructs a hypothesis function at inference-time. |
Yingcong Li; Muhammed Emrullah Ildiz; Dimitris Papailiopoulos; Samet Oymak; | icml | 2023-06-27 |
| 425 | DyLoRA: Parameter-Efficient Tuning of Pre-trained Models Using Dynamic Search-Free Low-Rank Adaptation IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the ever-growing size of pretrained models (PMs), fine-tuning them has become more expensive and resource-hungry. As a remedy, low-rank adapters (LoRA) keep the main … |
Mojtaba Valipour; Mehdi Rezagholizadeh; I. Kobyzev; A. Ghodsi; | Conference of the European Chapter of the Association for … | 2022-10-14 |
| 426 | RealTime QA: What’s The Answer Right Now? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce RealTime QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version). |
JUNGO KASAI et. al. | nips | 2023-10-24 |
| 427 | Spectral–Spatial Morphological Attention Transformer for Hyperspectral Image Classification IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In recent years, convolutional neural networks (CNNs) have drawn significant attention for the classification of hyperspectral images (HSIs). Due to their self-attention … |
S. K. ROY et. al. | IEEE Transactions on Geoscience and Remote Sensing | 2023-01-01 |
| 428 | Augmenting Sequential Recommendation with Pseudo-Prior Items Via Reversely Pre-training Transformer IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new framework for Augmenting Sequential Recommendation with Pseudo-prior items (ASReP). |
Zhiwei Liu; Ziwei Fan; Yu Wang; Philip S. Yu; | sigir | 2021-07-13 |
| 429 | Unveiling Security, Privacy, and Ethical Concerns of ChatGPT IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By exploring the upgrade path from GPT-1 to GPT-4, discussing the model’s features, limitations, and potential applications, this study aims to shed light on the potential risks of integrating ChatGPT into our daily lives. |
Xiaodong Wu; Ran Duan; Jianbing Ni; | arxiv-cs.CR | 2023-07-26 |
| 430 | Reframing Human-AI Collaboration for Generating Free-Text Explanations IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider the task of generating free-text explanations using human-written examples in a few-shot manner. |
Sarah Wiegreffe; Jack Hessel; Swabha Swayamdipta; Mark Riedl; Yejin Choi; | naacl | 2022-07-09 |
| 431 | A Wavelet-based Differential Transformer Protection IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Summary form only given as follows. Transformer inrush currents were traditionally evaluated by means of Fourier analysis. Such an approach affects the design of transformer … |
M. Gomez-Morante; D.W. Nicoletti; | IEEE Power Engineering Society. 1999 Winter Meeting (Cat. … | 1999-01-01 |
| 432 | CPTR: Full Transformer Network for Image Captioning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the image captioning task from a new sequence-to-sequence prediction perspective and propose CaPtion TransformeR (CPTR) which takes the sequentialized raw images as the input to Transformer. |
Wei Liu; Sihan Chen; Longteng Guo; Xinxin Zhu; Jing Liu; | arxiv-cs.CV | 2021-01-26 |
| 433 | High-Frequency Transformer Design for Modular Power Conversion From Medium-Voltage AC to 400 VDC IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents a high-frequency modular medium-voltage AC (4160 VAC or 13.8 kVAC) to low-voltage DC (400 VDC) system that is scalable in order to be used for different scale … |
Shishuo Zhao; Qiang Li; Fred C. Lee; Bin Li; | IEEE Transactions on Power Electronics | 2018-01-01 |
| 434 | Mask3D: Mask Transformer for 3D Semantic Instance Segmentation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Building on the successes of recent Transformer-based methods for object detection and image segmentation, we propose the first Transformer-based approach for 3D semantic instance segmentation. |
JONAS SCHULT et. al. | arxiv-cs.CV | 2022-10-06 |
| 435 | Tensor Programs V: Tuning Large Neural Networks Via Zero-Shot Hyperparameter Transfer IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that, in the recently discovered Maximal Update Parametrization (muP), many optimal HPs remain stable even as model size changes. |
GREG YANG et. al. | arxiv-cs.LG | 2022-03-07 |
| 436 | Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the effect of code on enhancing LLMs’ reasoning capability by introducing different constraints on the Code Usage Frequency of GPT-4 Code Interpreter. |
AOJUN ZHOU et. al. | iclr | 2024-02-26 |
| 437 | When Do You Need Billions of Words of Pretraining Data? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To explore this question, we adopt five styles of evaluation: classifier probing, information-theoretic probing, unsupervised relative acceptability judgments, unsupervised language model knowledge probing, and fine-tuning on NLU tasks. |
Yian Zhang; Alex Warstadt; Xiaocheng Li; Samuel R. Bowman; | acl | 2021-07-26 |
| 438 | When Do You Need Billions Of Words Of Pretraining Data? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We adopt four probing methods—classifier probing, information-theoretic probing, unsupervised relative acceptability judgment, and fine-tuning on NLU tasks—and draw learning curves that track the growth of these different measures of linguistic ability with respect to pretraining data volume using the MiniBERTas, a group of RoBERTa models pretrained on 1M, 10M, 100M and 1B words. |
Yian Zhang; Alex Warstadt; Haau-Sing Li; Samuel R. Bowman; | arxiv-cs.CL | 2020-11-10 |
| 439 | Do Attention Heads In BERT Track Syntactic Dependencies? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We employ two methods—taking the maximum attention weight and computing the maximum spanning tree—to extract implicit dependency relations from the attention weights of each layer/head, and compare them to the ground-truth Universal Dependency (UD) trees. |
Phu Mon Htut; Jason Phang; Shikha Bordia; Samuel R. Bowman; | arxiv-cs.CL | 2019-11-27 |
| 440 | Efficient Transformer for Remote Sensing Image Segmentation IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Semantic segmentation for remote sensing images (RSIs) is widely applied in geological surveys, urban resources management, and disaster monitoring. Recent solutions on remote … |
Zhiyong Xu; Weicun Zhang; Tianxiang Zhang; Zhifang Yang; Jiangyun Li; | Remote. Sens. | 2021-01-01 |
| 441 | TransPath: Transformer-Based Self-supervised Learning for Histopathological Image Classification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View |
XIYUE WANG et. al. | 2021-01-01 | |
| 442 | InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose InstructUIE, a unified information extraction framework based on instruction tuning, which can uniformly model various information extraction tasks and capture the inter-task dependency. |
XIAO WANG et. al. | arxiv-cs.CL | 2023-04-17 |
| 443 | Visual Autoregressive Modeling: Scalable Image Generation Via Next-Scale Prediction IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine next-scale prediction or next-resolution prediction, diverging from the standard raster-scan next-token prediction. |
Keyu Tian; Yi Jiang; Zehuan Yuan; BINGYUE PENG; Liwei Wang; | nips | 2024-10-07 |
| 444 | Landmark Attention: Random-Access Infinite Context Length for Transformers IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel approach that allows access to the complete context while retaining random-access flexibility, closely resembling running attention on the entire context. |
Amirkeivan Mohtashami; Martin Jaggi; | arxiv-cs.CL | 2023-05-25 |
| 445 | Transformer for Graphs: An Overview from Architecture Perspective IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this survey, we provide a comprehensive review of various Graph Transformer models from the architectural design perspective. |
ERXUE MIN et. al. | arxiv-cs.LG | 2022-02-17 |
| 446 | Position Information in Transformers: An Overview IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we provide an overview and theoretical comparison of existing methods to incorporate position information into Transformer models. |
Philipp Dufter; Martin Schmitt; Hinrich Schütze; | arxiv-cs.CL | 2021-02-22 |
| 447 | A Fast Post-Training Pruning Framework for Transformers IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To retain high accuracy without retraining, we introduce three novel techniques: (i) a lightweight mask search algorithm that finds which heads and filters to prune based on the Fisher information; (ii) mask rearrangement that complements the search algorithm; and (iii) mask tuning that reconstructs the output activations for each layer. |
WOOSUK KWON et. al. | arxiv-cs.CL | 2022-03-29 |
| 448 | A Fast Post-Training Pruning Framework for Transformers IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To retain high accuracy without retraining, we introduce three novel techniques: (i) a lightweight mask search algorithm that finds which heads and filters to prune based on the Fisher information; (ii) mask rearrangement that complements the search algorithm; and (iii) mask tuning that reconstructs the output activations for each layer. |
WOOSUK KWON et. al. | nips | 2022-11-06 |
| 449 | Distilling Knowledge Learned In BERT For Text Generation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel approach, Conditional Masked Language Modeling (C-MLM), to enable the finetuning of BERT on target generation tasks. |
Yen-Chun Chen; Zhe Gan; Yu Cheng; Jingzhou Liu; Jingjing Liu; | arxiv-cs.CL | 2019-11-09 |
| 450 | Outlier Suppression: Pushing The Limit of Low-bit Transformer Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We discover that $\boldsymbol \gamma$ in LayerNorm (LN) acts as a sinful amplifier for the outliers, and the importance of outliers varies greatly where some outliers provided by a few tokens cover a large area but can be clipped sharply without negative impacts. Motivated by these findings, we propose an outlier suppression framework including two components: Gamma Migration and Token-Wise Clipping. |
XIUYING WEI et. al. | arxiv-cs.LG | 2022-09-27 |
| 451 | On The Effect of Dropping Layers of Pre-trained Transformer Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Transformer-based NLP models are trained using hundreds of millions or even billions of parameters, limiting their applicability in computationally constrained environments. |
Hassan Sajjad; Fahim Dalvi; Nadir Durrani; Preslav Nakov; | arxiv-cs.CL | 2020-04-08 |
| 452 | Adaptive Token Sampling for Efficient Vision Transformers IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although the GFLOPs of a vision transformer can be decreased by reducing the number of tokens in the network, there is no setting that is optimal for all input images. In this work, we therefore introduce a differentiable parameter-free Adaptive Token Sampler (ATS) module, which can be plugged into any existing vision transformer architecture. |
MOHSEN FAYYAZ et. al. | eccv | 2022-10-19 |
| 453 | Trojaning Language Models for Fun and Profit IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we present TROJAN-LM, a new class of trojaning attacks in which maliciously crafted LMs trigger host NLP systems to malfunction in a highly predictable manner. |
Xinyang Zhang; Zheng Zhang; Shouling Ji; Ting Wang; | arxiv-cs.CR | 2020-08-01 |
| 454 | AI Model GPT-3 (dis)informs Us Better Than Humans IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we evaluate whether recruited individuals can distinguish disinformation from accurate information, structured in the form of tweets, and determine whether a tweet is organic or synthetic, i.e., whether it has been written by a Twitter user or by the AI model GPT-3. |
Giovanni Spitale; Nikola Biller-Andorno; Federico Germani; | arxiv-cs.CY | 2023-01-23 |
| 455 | Temporal-Channel Transformer For 3D Lidar-Based Video Object Detection In Autonomous Driving IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new transformer, called Temporal-Channel Transformer, to model the spatial-temporal domain and channel domain relationships for video object detecting from Lidar data. |
ZHENXUN YUAN et. al. | arxiv-cs.CV | 2020-11-27 |
| 456 | Memory-assisted Prompt Editing to Improve GPT-3 After Deployment IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: to mean a homophone, while the user intended a synonym. Our goal is to effectively correct such errors via user interactions with the system but without retraining, which will be prohibitively costly. |
Aman Madaan; Niket Tandon; Peter Clark; Yiming Yang; | arxiv-cs.CL | 2022-01-16 |
| 457 | Former-DFER: Dynamic Facial Expression Recognition Transformer IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: This paper proposes a dynamic facial expression recognition transformer (Former-DFER) for the in-the-wild scenario. Specifically, the proposed Former-DFER mainly consists of a … |
Zengqun Zhao; Qingshan Liu; | Proceedings of the 29th ACM International Conference on … | 2021-01-01 |
| 458 | MiniF2F: A Cross-system Benchmark for Formal Olympiad-level Mathematics IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present miniF2F, a dataset of formal Olympiad-level mathematics problems statements intended to provide a unified cross-system benchmark for neural theorem proving. |
Kunhao Zheng; Jesse Michael Han; Stanislas Polu; | arxiv-cs.AI | 2021-08-31 |
| 459 | ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models Via Contrastive Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this issue, we propose a novel contrastive learning framework ERICA to obtain a deep understanding of the entities and their relations in text. |
YUJIA QIN et. al. | acl | 2021-07-26 |
| 460 | Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose GRACE, a Lifelong Model Editing method, which implements spot-fixes on streaming errors of a deployed model, ensuring minimal impact on unrelated inputs. |
Thomas Hartvigsen; Swami Sankaranarayanan; Hamid Palangi; Yoon Kim; Marzyeh Ghassemi; | nips | 2023-10-24 |
| 461 | ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: And thus unsup-SimCSE trained with these positive pairs is probably biased, which would tend to consider that sentences of the same or similar length are more similar in semantics. Through statistical observations, we find that unsup-SimCSE does have such a problem. |
XING WU et. al. | arxiv-cs.CL | 2021-09-09 |
| 462 | Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In deep learning, different kinds of deep networks typically need different optimizers, which have to be chosen after multiple trials, making the training process inefficient. To relieve this issue and consistently improve the model training speed across deep networks, we propose the ADAptive Nesterov momentum algorithm, Adan for short. |
Xingyu Xie; Pan Zhou; Huan Li; Zhouchen Lin; Shuicheng Yan; | arxiv-cs.LG | 2022-08-13 |
| 463 | CenterFormer: Center-based Transformer for 3D Object Detection IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose CenterFormer, a center-based transformer network for 3D object detection. |
Zixiang Zhou; Xiangchen Zhao; Yu Wang; Panqu Wang; Hassan Foroosh; | eccv | 2022-10-19 |
| 464 | Personalized Transformer for Explainable Recommendation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this problem, we present a PErsonalized Transformer for Explainable Recommendation (PETER), on which we design a simple and effective learning objective that utilizes the IDs to predict the words in the target explanation, so as to endow the IDs with linguistic meanings and to achieve personalized Transformer. |
Lei Li; Yongfeng Zhang; Li Chen; | acl | 2021-07-26 |
| 465 | DeepInception: Hypnotize Large Language Model to Be Jailbreaker IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, inspired by the authority influence demonstrated in the Milgram experiment, we present a lightweight method to take advantage of the LLMs’ personification capabilities to construct $\textit{a virtual, nested scene}$, allowing it to realize an adaptive way to escape the usage control in a normal scenario. |
XUAN LI et. al. | arxiv-cs.LG | 2023-11-06 |
| 466 | Residual Energy-Based Models For Text Generation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that Energy-Based models when trained on the residual of an auto-regressive language model can be used effectively and efficiently to generate text. |
Yuntian Deng; Anton Bakhtin; Myle Ott; Arthur Szlam; | iclr | 2019-12-21 |
| 467 | Understanding and Overcoming The Challenges of Efficient Transformer Quantization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore quantization for transformers. |
Yelysei Bondarenko; Markus Nagel; Tijmen Blankevoort; | emnlp | 2021-11-05 |
| 468 | Mixup-Transformer: Dynamic Data Augmentation For NLP Tasks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by this line of research, in this paper, we explore i) how to apply mixup to natural language processing tasks since text data can hardly be mixed in the raw format; ii) if mixup is still effective in transformer-based learning models, e.g., BERT. |
LICHAO SUN et. al. | arxiv-cs.CL | 2020-10-05 |
| 469 | Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent As Meta-Optimizers IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explain language models as meta-optimizers and understand in-context learning as implicit finetuning. |
DAMAI DAI et. al. | arxiv-cs.CL | 2022-12-20 |
| 470 | Larger-Scale Transformers for Multilingual Masked Language Modeling IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present the results of two larger multilingual masked language models, with 3.5B and 10.7B parameters. |
Naman Goyal; Jingfei Du; Myle Ott; Giri Anantharaman; Alexis Conneau; | arxiv-cs.CL | 2021-05-02 |
| 471 | Dual Aggregation Transformer for Image Super-Resolution IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This inspires us to combine the two dimensions in Transformer for a more powerful representation capability. Based on the above idea, we propose a novel Transformer model, Dual Aggregation Transformer (DAT), for image SR. |
ZHENG CHEN et. al. | iccv | 2023-09-27 |
| 472 | Revisiting Relation Extraction in The Era of Large Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address issues inherent to evaluating generative approaches to RE by doing human evaluations, in lieu of relying on exact matching. |
Somin Wadhwa; Silvio Amir; Byron Wallace; | acl | 2023-07-08 |
| 473 | Pretrained Language Models For Sequential Sentence Classification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that pretrained language models, BERT (Devlin et al., 2018) in particular, can be used for this task to capture contextual dependencies without the need for hierarchical encoding nor a CRF. |
Arman Cohan; Iz Beltagy; Daniel King; Bhavana Dalvi; Daniel S. Weld; | arxiv-cs.CL | 2019-09-09 |
| 474 | Revisiting Relation Extraction in The Era of Large Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address issues inherent to evaluating generative approaches to RE by doing human evaluations, in lieu of relying on exact matching. |
Somin Wadhwa; Silvio Amir; Byron C. Wallace; | arxiv-cs.CL | 2023-05-08 |
| 475 | Pretrained Language Models For Sequential Sentence Classification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that pretrained language models, BERT (Devlin et al., 2018) in particular, can be used for this task to capture contextual dependencies without the need for hierarchical encoding nor a CRF. |
Arman Cohan; Iz Beltagy; Daniel King; Bhavana Dalvi; Dan Weld; | emnlp | 2019-11-02 |
| 476 | What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we precisely answer this question. |
Jaejun Lee; Raphael Tang; Jimmy Lin; | arxiv-cs.CL | 2019-11-08 |
| 477 | SqueezeBERT: What Can Computer Vision Teach NLP About Efficient Neural Networks? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we observe that methods such as grouped convolutions have yielded significant speedups for computer vision networks, but many of these techniques have not been adopted by NLP neural network designers. |
Forrest N. Iandola; Albert E. Shaw; Ravi Krishna; Kurt W. Keutzer; | arxiv-cs.CL | 2020-06-19 |
| 478 | To BERT or Not To BERT: Comparing Speech and Language-based Approaches for Alzheimer’s Disease Detection IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Research related to automatically detecting Alzheimer’s disease (AD) is important, given the high prevalence of AD and the high cost of traditional methods. Since AD significantly … |
Aparna Balagopalan; Benjamin Eyre; Frank Rudzicz; Jekaterina Novikova; | 2020-01-01 | |
| 479 | CAT: Cross Attention in Vision Transformer IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new attention mechanism in Transformer termed Cross Attention, which alternates attention inner the image patch instead of the whole image to capture local information and apply attention between image patches which are divided from single-channel feature maps capture global information. |
HEZHENG LIN et. al. | arxiv-cs.CV | 2021-06-10 |
| 480 | Towards Improving Adversarial Training of NLP Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a simple and improved vanilla adversarial training process for NLP models, which we name Attacking to Training (A2T). |
Jin Yong Yoo; Yanjun Qi; | arxiv-cs.CL | 2021-09-01 |
| 481 | Gated Transformer Networks for Multivariate Time Series Classification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explored a simple extension of the current Transformer Networks with gating, named Gated Transformer Networks (GTN) for the multivariate time series classification problem. |
MINGHAO LIU et. al. | arxiv-cs.LG | 2021-03-26 |
| 482 | Thinking Like Transformers IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we aim to change that, proposing a computational model for the transformer-encoder in the form of a programming language. |
Gail Weiss; Yoav Goldberg; Eran Yahav; | arxiv-cs.LG | 2021-06-13 |
| 483 | Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this way, we achieve 100% attack success rate — according to GPT-4 as a judge — on Vicuna-13B, Mistral-7B, Phi-3-Mini, Nemotron-4-340B, Llama-2-Chat-7B/13B/70B, Llama-3-Instruct-8B, Gemma-7B, GPT-3.5, GPT-4o, and R2D2 from HarmBench that was adversarially trained against the GCG attack. |
Maksym Andriushchenko; Francesco Croce; Nicolas Flammarion; | arxiv-cs.CR | 2024-04-02 |
| 484 | A Transformer Model for Retrosynthesis IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We describe a Transformer model for a retrosynthetic reaction prediction task. The model is trained on 45 033 experimental reaction examples extracted from USA patents. It can … |
Pavel Karpov; Guillaume Godin; Igor V. Tetko; | 2019-01-01 | |
| 485 | Meta-Transformer: A Unified Framework for Multimodal Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a framework, named Meta-Transformer, that leverages a $\textbf{frozen}$ encoder to perform multimodal perception without any paired multimodal training data. |
YIYUAN ZHANG et. al. | arxiv-cs.CV | 2023-07-20 |
| 486 | Recognizing Emotion Cause in Conversations IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Method: We introduce the task of Recognizing Emotion Cause in CONversations with an accompanying dataset named RECCON, containing over 1,000 dialogues and 10,000 utterance cause-effect pairs. We address the problem of recognizing emotion cause in conversations, define two novel sub-tasks of this problem, and provide a corresponding dialogue-level dataset, along with strong Transformer-based baselines. |
SOUJANYA PORIA et. al. | arxiv-cs.CL | 2020-12-21 |
| 487 | Fine-tuning Pre-Trained Transformer Language Models To Distantly Supervised Relation Extraction IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this gap, we utilize a pre-trained language model, the OpenAI Generative Pre-trained Transformer (GPT) (Radford et al., 2018). |
Christoph Alt; Marc Hübner; Leonhard Hennig,; | acl | 2019-07-28 |
| 488 | Fine-tuning Pre-Trained Transformer Language Models To Distantly Supervised Relation Extraction IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this gap, we utilize a pre-trained language model, the OpenAI Generative Pre-trained Transformer (GPT) [Radford et al., 2018]. |
Christoph Alt; Marc Hübner; Leonhard Hennig; | arxiv-cs.CL | 2019-06-19 |
| 489 | Cross-Lingual BERT Transformation For Zero-Shot Dependency Parsing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Cross-Lingual BERT Transformation (CLBT), a simple and efficient approach to generate cross-lingual contextualized word embeddings based on publicly available pre-trained BERT models (Devlin et al., 2018). |
Yuxuan Wang; Wanxiang Che; Jiang Guo; Yijia Liu; Ting Liu; | emnlp | 2019-11-02 |
| 490 | Planning with Large Language Models for Code Generation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel Transformer decoding algorithm, Planning-Guided Transformer Decoding (PG-TD), that uses a planning algorithm to do lookahead search and guide the Transformer to generate better programs. |
SHUN ZHANG et. al. | arxiv-cs.LG | 2023-03-09 |
| 491 | Do Transformer Modifications Transfer Across Implementations and Applications? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we comprehensively evaluate many of these modifications in a shared experimental setting that covers most of the common uses of the Transformer in natural language processing. |
SHARAN NARANG et. al. | arxiv-cs.LG | 2021-02-23 |
| 492 | XLM-E: Cross-lingual Language Model Pre-training Via ELECTRA IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training. |
ZEWEN CHI et. al. | acl | 2022-05-17 |
| 493 | GPT (Generative Pre-Trained Transformer)— A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The Generative Pre-trained Transformer (GPT) represents a notable breakthrough in the domain of natural language processing, which is propelling us toward the development of … |
GOKUL YENDURI et. al. | IEEE Access | 2023-05-11 |
| 494 | Application of Geometric Programming to Transformer Design IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper considers the transformer design optimization problem. In its most general form, the design problem requires minimizing the total mass (or cost) of the core and wire … |
R.A. Jabr; | IEEE Transactions on Magnetics | 2005-01-01 |
| 495 | Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, they do not possess the ability to evaluate based on custom evaluation criteria, focusing instead on general attributes like helpfulness and harmlessness. To address these issues, we introduce Prometheus 2, a more powerful evaluator LM than its predecessor that closely mirrors human and GPT-4 judgements. |
SEUNGONE KIM et. al. | arxiv-cs.CL | 2024-05-02 |
| 496 | Do Transformer Modifications Transfer Across Implementations and Applications? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we comprehensively evaluate many of these modifications in a shared experimental setting that covers most of the common uses of the Transformer in natural language processing. |
SHARAN NARANG et. al. | emnlp | 2021-11-05 |
| 497 | What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To achieve this, we introduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. |
BOSEOP KIM et. al. | arxiv-cs.CL | 2021-09-09 |
| 498 | What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To achieve this, we introduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. |
BOSEOP KIM et. al. | emnlp | 2021-11-05 |
| 499 | Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models And Their Human- And Machine-based Detection IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that a low-skilled threat model can be built just by combining publicly available LMs and show that the produced fake reviews can fool both humans and machines. |
DAVID IFEOLUWA ADELANI et. al. | arxiv-cs.CL | 2019-07-22 |
| 500 | Language Models With Transformers IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore effective Transformer architectures for language model, including adding additional LSTM layers to better capture the sequential context while still keeping the computation efficient. |
Chenguang Wang; Mu Li; Alexander J. Smola; | arxiv-cs.CL | 2019-04-20 |