Most Influential ECCV Papers (2025-09 Version)
The European Conference on Computer Vision (ECCV) is one of the top computer vision conferences in the world. Paper Digest Team analyzes all papers published on ECCV in the past years, and presents the 15 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the latest version of this list or the most influential papers from other conferences/journals, please visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. (Version: 2025-09)
To search or review papers within ECCV related to a specific topic, please use the search by venue (ECCV) and review by venue (ECCV) services. To browse the most productive ECCV authors by year ranked by #papers accepted, here are the most productive ECCV authors grouped by year.
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Most Influential ECCV Papers (2025-09 Version)
| Year | Rank | Paper | Author(s) |
|---|---|---|---|
| 2024 | 1 | Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions. |
SHILONG LIU et. al. |
| 2024 | 2 | YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We proposed the concept of programmable gradient information (PGI) to cope with the various changes required by deep networks to achieve multiple objectives. |
Chien-Yao Wang; I-Hau Yeh; Hong-Yuan Mark Liao; |
| 2024 | 3 | MMBENCH: Is Your Multi-Modal Model An All-around Player? IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, subjective benchmarks, such as OwlEval, offer comprehensive evaluations of a model’s abilities by incorporating human labor, which is not scalable and may display significant bias. In response to these challenges, we propose MMBench, a bilingual benchmark for assessing the multi-modal capabilities of VLMs. |
YUAN LIU et. al. |
| 2024 | 4 | ShareGPT4V: Improving Large Multi-Modal Models with Better Captions IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we delve into the influence of training data on LMMs, uncovering three pivotal findings: 1) Highly detailed captions enable more nuanced vision-language alignment, significantly boosting the performance of LMMs in diverse benchmarks, surpassing outcomes from brief captions or VQA data 2) Cutting-edge LMMs can be close to the captioning capability of costly human annotators, and open-source LMMs could reach similar quality after lightweight fine-tuning 3) The performance of LMMs scales with the number of detailed captions, exhibiting remarkable improvements across a range from thousands to millions of captions. |
LIN CHEN et. al. |
| 2024 | 5 | LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. |
JIAXIANG TANG et. al. |
| 2024 | 6 | Adversarial Diffusion Distillation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1–4 steps while maintaining high image quality. |
Axel Sauer; Dominik Lorenz; Andreas Blattmann; Robin Rombach; |
| 2024 | 7 | LLaMA-VID: An Image Is Worth 2 Tokens in Large Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a novel method to tackle the token generation challenge in Vision Language Models (VLMs) for video and image understanding, called LLaMA-VID. |
Yanwei Li; Chengyao Wang; Jiaya Jia; |
| 2024 | 8 | MambaIR: A Simple Baseline for Image Restoration with State-Space Model IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a simple but effective baseline, named MambaIR, which introduces both local enhancement and channel attention to improve the vanilla Mamba. |
HANG GUO et. al. |
| 2024 | 9 | CoTracker: It Is Better to Track Together IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce , a transformer-based model that tracks a large number of 2D points in long video sequences. |
NIKITA KARAEV et. al. |
| 2024 | 10 | DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional image animation techniques mainly focus on animating natural scenes with stochastic dynamics (e.g. clouds and fluid) or domain-specific motions (e.g. human hair or body motions), and thus limits their applicability to more general visual content. To overcome this limitation, we explore the synthesis of dynamic content for open-domain images, converting them into animated videos. |
JINBO XING et. al. |
| 2024 | 11 | MathVerse: Does Your Multi-modal LLM Truly See The Diagrams in Visual Math Problems? IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce , an all-around visual math benchmark designed for an equitable and in-depth evaluation of MLLMs.We meticulously collect 2,612 high-quality, multi-subject math problems with diagrams from publicly available sources. |
RENRUI ZHANG et. al. |
| 2024 | 12 | VideoMamba: State Space Model for Efficient Video Understanding IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Addressing the dual challenges of local redundancy and global dependencies in video understanding, this work innovatively adapts the Mamba to the video domain. |
KUNCHANG LI et. al. |
| 2024 | 13 | MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce , an efficient model that, given sparse multi-view images as input, predicts clean feed-forward 3D Gaussians. |
YUEDONG CHEN et. al. |
| 2024 | 14 | Grounding Image Matching in 3D with MASt3R IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take a different stance and propose to cast matching as a 3D task with , a recent and powerful 3D reconstruction framework based on Transformers. |
Vincent Leroy; Yohann Cabon; Jerome Revaud; |
| 2024 | 15 | SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Scalable Interpolant Transformers (SiT), a family of generative models built on the backbone of Diffusion Transformers (DiT). |
NANYE MA et. al. |
| 2022 | 1 | Visual Prompt Tuning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. |
MENGLIN JIA et. al. |
| 2022 | 2 | ByteTrack: Multi-Object Tracking By Associating Every Detection Box IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The objects with low detection scores, e.g. occluded objects, are simply thrown away, which brings non-negligible true object missing and fragmented trajectories. To solve this problem, we present a simple, effective and generic association method, tracking by associating almost every detection box instead of only the high score ones. |
YIFU ZHANG et. al. |
| 2022 | 3 | BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images Via Spatiotemporal Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. |
ZHIQI LI et. al. |
| 2022 | 4 | TensoRF: Tensorial Radiance Fields IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present TensoRF, a novel approach to model and reconstruct radiance fields. |
Anpei Chen; Zexiang Xu; Andreas Geiger; Jingyi Yu; Hao Su; |
| 2022 | 5 | Simple Baselines for Image Restoration IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple baseline that exceeds the SOTA methods and is computationally efficient. |
Liangyu Chen; Xiaojie Chu; Xiangyu Zhang; Jian Sun; |
| 2022 | 6 | Exploring Plain Vision Transformer Backbones for Object Detection IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for object detection. |
Yanghao Li; Hanzi Mao; Ross Girshick; Kaiming He; |
| 2022 | 7 | MaxViT: Multi-axis Vision Transformer IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce an efficient and scalable attention model we call multi-axis attention, which consists of two aspects: blocked local and dilated global attention. |
ZHENGZHONG TU et. al. |
| 2022 | 8 | Detecting Twenty-Thousand Classes Using Image-Level Supervision IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Detic, which simply trains the classifiers of a detector on image classification data and thus expands the vocabulary of detectors to tens of thousands of concepts. |
Xingyi Zhou; Rohit Girdhar; Armand Joulin; Philipp Krä,henbü,hl; Ishan Misra; |
| 2022 | 9 | A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce A-OKVQA, a crowdsourced dataset composed of a diverse set of about 25K questions requiring a broad base of commonsense and world knowledge to answer. |
Dustin Schwenk; Apoorv Khandelwal; Christopher Clark; Kenneth Marino; Roozbeh Mottaghi; |
| 2022 | 10 | PETR: Position Embedding Transformation for Multi-View 3D Object Detection IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop position embedding transformation (PETR) for multi-view 3D object detection. |
Yingfei Liu; Tiancai Wang; Xiangyu Zhang; Jian Sun; |
| 2022 | 11 | Compositional Visual Generation with Composable Diffusion Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an alternative structured approach for compositional generation using diffusion models. |
Nan Liu; Shuang Li; Yilun Du; Antonio Torralba; Joshua B. Tenenbaum; |
| 2022 | 12 | DualPrompt: Complementary Prompting for Rehearsal-Free Continual Learning IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a simple yet effective framework, DualPrompt, which learns a tiny set of parameters, called prompt, to properly instruct a pre-trained model to learn tasks arriving sequentially, without buffering past examples. |
ZIFENG WANG et. al. |
| 2022 | 13 | Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The current popular two-stream, two-stage tracking framework extracts the template and the search region features separately and then performs relation modeling, thus the extracted features lack the awareness of the target and have limited target-background discriminability. To tackle the above issue, we propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling by bridging the template-search image pairs with bidirectional information flows. |
Botao Ye; Hong Chang; Bingpeng Ma; Shiguang Shan; Xilin Chen; |
| 2022 | 14 | Make-a-Scene: Scene-Based Text-to-Image Generation with Human Priors IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While these methods have incrementally improved the generated image fidelity and text relevancy, several pivotal gaps remain unanswered, limiting applicability and quality. We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mechanism complementary to text in the form of a scene, (ii) introducing elements that substantially improve the tokenization process by employing domain-specific knowledge over key image regions (faces and salient objects), and (iii) adapting classifier-free guidance for the transformer use case. |
ORAN GAFNI et. al. |
| 2022 | 15 | MOTR: End-to-End Multiple-Object Tracking with TRansformer IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose MOTR, which extends DETR \cite{carion2020detr} and introduces “track query” to model the tracked instances in the entire video. |
FANGAO ZENG et. al. |
| 2020 | 1 | End-to-End Object Detection With Transformers IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new method that views object detection as a direct set prediction. |
NICOLAS CARION et. al. |
| 2020 | 2 | NeRF: Representing Scenes As Neural Radiance Fields For View Synthesis IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. |
BEN MILDENHALL et. al. |
| 2020 | 3 | RAFT: Recurrent All-Pairs Field Transforms For Optical Flow IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Recurrent All-Pairs Field Transforms (RAFT), a new deep network architecture for estimating optical flow. |
Zachary Teed; Jia Deng; |
| 2020 | 4 | Contrastive Multiview Coding IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact. |
Yonglong Tian; Dilip Krishnan; Phillip Isola; |
| 2020 | 5 | UNITER: UNiversal Image-TExt Representation Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce UNITER, a UNiversal Image-TExt Representation, learned through large-scale pre-training over four image-text datasets (COCO, Visual Genome, Conceptual Captions, and SBU Captions), which can power heterogeneous downstream V+L tasks with joint multimodal embeddings. |
YEN-CHUN CHEN et. al. |
| 2020 | 6 | Oscar: Object-Semantics Aligned Pre-training For Vision-Language Tasks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing methods simply concatenate image region features and text features as input to the model to be pre-trained and use self-attention to learn image-text semantic alignments in a brute force manner, in this paper, we propose a new learning method Oscar, which uses object tags detected in images as anchor points to significantly ease the learning of alignments. |
XIUJUN LI et. al. |
| 2020 | 7 | Object-Contextual Representations For Semantic Segmentation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the semantic segmentation problem with a focus on the context aggregation strategy. |
Yuhui Yuan; Xilin Chen; Jingdong Wang; |
| 2020 | 8 | Contrastive Learning For Unpaired Image-to-Image Translation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a straightforward method for doing so — maximizing mutual information between the two, using a framework based on contrastive learning. |
Taesung Park Alexei A. Efros Richard Zhang Jun-Yan Zhu; |
| 2020 | 9 | Big Transfer (BiT): General Visual Representation Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). |
ALEXANDER KOLESNIKOV et. al. |
| 2020 | 10 | Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs By Implicitly Unprojecting To 3D IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new end-to-end architecture that directly extracts a bird’s-eye-view representation of a scene given image data from an arbitrary number of cameras. |
Jonah Philion; Sanja Fidler; |
| 2020 | 11 | Tracking Objects As Points IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. |
Xingyi Zhou; Vladlen Koltun; Philipp Krähenbühl; |
| 2020 | 12 | Square Attack: A Query-efficient Black-box Adversarial Attack Via Random Search IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Square Attack, a score-based black-box $l_2$- and $l_\infty$- adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking. |
Maksym Andriushchenko; Francesco Croce; Nicolas Flammarion; Matthias Hein; |
| 2020 | 13 | Convolutional Occupancy Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Convolutional Occupancy Networks, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes. |
Songyou Peng; Michael Niemeyer; Lars Mescheder; Marc Pollefeys; Andreas Geiger; |
| 2020 | 14 | Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Towards this end, we present Trajectron++, a modular, graph-structured recurrent model that forecasts the trajectories of a general number of diverse agents while incorporating agent dynamics and heterogeneous data (e.g., semantic maps). |
Tim Salzmann; Boris Ivanovic; Punarjay Chakravarty; Marco Pavone; |
| 2020 | 15 | Single Path One-Shot Neural Architecture Search With Uniform Sampling IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work propose a Single Path One-Shot model to address the challenge in the training. |
ZICHAO GUO et. al. |
| 2018 | 1 | CBAM: Convolutional Block Attention Module IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Convolutional Block Attention Module (CBAM), a simple and effective attention module that can be integrated with any feed-forward convolutional neural networks. |
Sanghyun Woo; Jongchan Park; Joon-Young Lee; In So Kweon; |
| 2018 | 2 | Encoder-Decoder With Atrous Separable Convolution For Semantic Image Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to combine the advantages from both methods. |
Liang-Chieh Chen; Yukun Zhu; George Papandreou; Florian Schroff; Hartwig Adam; |
| 2018 | 3 | ShuffleNet V2: Practical Guidelines For Efficient CNN Architecture Design IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Taking these factors into account, this work proposes practical guidelines for efficient network de- sign. |
Ningning Ma; Xiangyu Zhang; Hai-Tao Zheng; Jian Sun; |
| 2018 | 4 | Image Super-Resolution Using Very Deep Residual Channel Attention Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve these problems, we propose the very deep residual channel attention networks (RCAN). |
YULUN ZHANG et. al. |
| 2018 | 5 | Group Normalization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Group Normalization (GN) as a simple alternative to BN. |
Yuxin Wu; Kaiming He; |
| 2018 | 6 | CornerNet: Detecting Objects As Paired Keypoints IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. |
Hei Law; Jia Deng; |
| 2018 | 7 | Multimodal Unsupervised Image-to-image Translation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this limitation, we propose a Multimodal Unsupervised Image-to-image Translation (MUNIT) framework. |
Xun Huang; Ming-Yu Liu; Serge Belongie; Jan Kautz; |
| 2018 | 8 | BiSeNet: Bilateral Segmentation Network For Real-time Semantic Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address this dilemma with a novel Bilateral Segmentation Network (BiSeNet). |
CHANGQIAN YU et. al. |
| 2018 | 9 | Progressive Neural Architecture Search IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. |
CHENXI LIU et. al. |
| 2018 | 10 | Image Inpainting For Irregular Holes Using Partial Convolutions IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to use partial convolutions, where the convolution is masked and renormalized to be conditioned on only valid pixels. |
GUILIN LIU et. al. |
| 2018 | 11 | Unified Perceptual Parsing For Scene Understanding IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image. |
Tete Xiao; Yingcheng Liu; Bolei Zhou; Yuning Jiang; Jian Sun; |
| 2018 | 12 | Deep Clustering For Unsupervised Learning Of Visual Features IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features. |
Mathilde Caron; Piotr Bojanowski; Armand Joulin; Matthijs Douze; |
| 2018 | 13 | Simple Baselines For Human Pose Estimation And Tracking IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work provides simple and effective baseline methods. |
Bin Xiao; Haiping Wu; Yichen Wei; |
| 2018 | 14 | Memory Aware Synapses: Learning What (not) To Forget IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that, given the limited model capacity and the unlimited new information to be learned, knowledge has to be preserved or erased selectively. |
Rahaf Aljundi; Francesca Babiloni; Mohamed Elhoseiny; Marcus Rohrbach; Tinne Tuytelaars; |
| 2018 | 15 | ICNet For Real-Time Semantic Segmentation On High-Resolution Images IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus on the challenging task of real-time semantic segmentation in this paper. |
Hengshuang Zhao; Xiaojuan Qi; Xiaoyong Shen; Jianping Shi; Jiaya Jia; |