Paper Digest: ICASSP 2025 Papers & Highlights
Note: ICASSP-2025 accepts more than 3,300 papers, this page only includes 300 of them based on paper id in proceedings. Interested users can choose to read All ~3,300 ICASSP-2025 papers in a separate page, which takes quite some time to load.
To search for papers presented at ICASSP-2025 on a specific topic, please make use of the search by venue (ICASSP-2025) service. To summarize the latest research published at ICASSP-2025 on a specific topic, you can utilize the review by venue (ICASSP-2025) service.
We’ve developed a service – ICASSP-2025 Research that synthesizes the latest findings from ICASSP 2025 into comprehensive reports. For instance, we’ve generated a report on Advances in Accent Softening: Insights from ICASSP 2025 Papers. We encourage interested users to utilize our service to create tailored reports on other emerging topics.
This curated list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that gets you the personalized and comprehensive updates on the latest research in your field. It also empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Experience the full potential of our services today!
TABLE 1: Paper Digest: ICASSP 2025 Papers & Highlights
Paper | Author(s) | |
---|---|---|
1 | HypCAD: Geometry-Enhanced Hyperbolic Contrastive Learning for CAD Model Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the limitations of Euclidean space and improve CAD model retrieval, this paper introduces HypCAD, a contrastive learning framework in hyperbolic space. |
A. Misik; D. Salihu; X. Su; H. Brock; E. Steinbach; |
2 | Attention Augmented Structure-centric Bias Mitigation with Feature Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an attention augmented structure-centric bias mitigation method, considering network architecture can be flexibly manipulated to address a variety of visual features. |
X. Hou; Y. Li; S. Wang; |
3 | Context-Guided Active Domain Adaptation for Blended Target Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most ADA methods are still mainly oriented to a single target domain and not applicable for the blended target domain. To tackle these issues, we propose a concise and effective approach named Context-Guided Active Domain Adaptation (CGDA) to achieve active blended-target domain adaptation (BTDA). |
Y. Lu; Y. Yang; |
4 | Enhancing Graph-based Fraud Detection By Adversarial Confidence Reweighting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this aggregation process can sometimes introduce noise by incorporating neighbors from different categories, potentially diluting the central node’s representation. To tackle this issue, we introduce an innovative Adversarial Confidence Reweighting (ACR) technique designed to allocate discriminative weights to samples automatically. |
J. Gao; J. Cao; S. Qian; W. Guan; |
5 | Attention Disentanglement for Semantic Diffusion Modeling in Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an attention disentanglment for semantic diffusion where the semantic consistency is enhanced for text-to-image generation. |
H. -C. Yu; J. -T. Chien; |
6 | Weighted Density for The Win: Accurate Subspace Density Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: k-clustering typically struggles with the detection of irregular-distributed clusters due to the natural bias, while density clustering usually cannot well-adapt to different datasets and clustering tasks as it is not an oriented optimization process. This paper, therefore, proposes to perform density clustering in dynamically learned subspaces. |
M. Peng; Y. Wu; Y. Lu; M. Li; Y. Zhang; Y. -M. Cheung; |
7 | FedTLU: Federated Learning with Targeted Layer Updates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a targeted layer update strategy for fine-tuning in FL. |
J. -I. Park; C. Joe-Wong; |
8 | Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to further boost the performance, we propose multiple guidance on prompt learning without annotated labels. |
G. Chen; H. Zhang; C. Ding; Z. Chen; X. Di; |
9 | Animation Anycolor: Enhancing Line Drawing Colorization with Keypoint Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These issues are primarily attributed to the inadequate semantic correspondence between the reference images and line-drawings. To tackle this problem, we introduce the Animation Anycolor framework. |
L. Wang; |
10 | In-Context Multitask Learning for Few-shot Fine-tuning of Large Language Models in Traditional Chinese Medicine Tongue Diagnosis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The resource-intensive process of creating large-scale labeled datasets calls for efficient training methods on small datasets. Addressing these issues, this paper presents an In-Context Multitask Learning approach to improve few-shot fine-tuning accuracy for Large Language Models (LLMs) in constitution diagnosis. |
C. Fu; Z. Fu; S. Yan; X. Lyv; Y. Zhao; |
11 | Identification and Correction of Permutation Errors in Compressed Sensing-Based Group Testing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is called ‘permutation noise’ and it presents challenges in determining the signal vector containing p health status values of the participating subjects from the results on n ≪ p pooled tests. In this paper, we present a method to determine the health status values in a manner that is robust to a small number of such permutations. |
S. Banerjee; S. Peddabomma; R. Srivastava; J. Saunderson; A. Rajwade; |
12 | RestorMamba: An Enhanced Synergistic State Space Model for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce an image inpainting method based on the State Space Model (SSM), named Restoration Mamba (RestorMamba). |
Z. Wang; C. Li; H. Xu; X. Zhu; X. Huang; H. Li; |
13 | Improving Knowledge Base Question Answering Via Retrieval Enhancement and Stepwise Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel method of retrieval enhancement-stepwise reasoning (RESR), which transforms path retrieval into text semantic understanding to minimize unnecessary interference from path information in the reasoning process, guiding the LLM to generate interpretable reasoning paths rather than directly producing answers. |
D. Huang; J. Gao; X. Luo; H. Wu; |
14 | MambaInst: Lightweight State Space Model for Real-Time Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a lightweight and efficient state-space model-based instance segmentation network named MambaInst, which extracts deep semantic features through a LightSSM Block consisting of gating mechanisms and residual connectivity to model long-distance spatial dependencies with linear computational complexity. |
Z. Wang; C. Li; H. Xu; X. Zhu; X. Huang; H. Li; |
15 | Robust Qualitative Data Clustering Via Learnable Multi-Metric Space Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Compared with numerical data with its quantitative values embedded in well-defined Euclidean distance space, distances of the qualitative values are naturally unknown and are specially defined for certain data types or tasks. This paper, therefore, proposes a distance metric space fusion framework, which learns to fuse multiple distance metrics to form a statistical information-complete and prior knowledge-comprehensive metric for robust and accurate cluster analysis of qualitative data. |
S. Feng; M. Zhao; Z. Huang; Y. Ji; Y. Zhang; Y. -M. Cheung; |
16 | Improving Dialect Identification in Indian Languages Using Multimodal Features from Dialect Informed ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces a novel multimodal architecture that leverages speech and text features to enhance DID performance. |
Amartyaveer; |
17 | Gram: A Large-Scale General EEG Model for Raw Data Classification and Restoration Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Gram, a large general EEG model for raw EEG data classification and reconstruction tasks. |
Z. Li; W. -L. Zheng; B. -L. Lu; |
18 | Content-Aware Dynamic Superpixel Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the visual attention model based on saliency in the human visual system, we propose a content-aware dynamic superpixel segmentation network. |
T. Zhao; B. Peng; Z. Zhang; D. Yang; X. Wu; |
19 | Online Optimization of Offloading Video Analytics Tasks to Multiple Edges for Accuracy Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a detect + track approach with on-device object tracking and edge-assisted object detection. |
Y. Liang; S. Zhang; J. Wu; |
20 | A Risk Prediction Model for Real Estate Corporations Using High-Target Semantic BERT and Improved GRU Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As such, a novel prediction model called HRAGRU is proposed for real estate enterprises to forecast potential risk through multimodal data including news reports, policy updates, and stock information in this paper. |
X. Ma; P. Zhu; Q. Liu; Z. Wang; |
21 | Multi-domain Fusion Network for Underwater Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A novel network for underwater optical images enhancement is proposed in this paper. |
J. Zhuang; J. Zhou; Y. Zheng; Y. Chang; S. Mazhar; |
22 | KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite their advantages, they struggle with limited representation capabilities and misalignment with pre-trained intermediate features. To address these issues, we introduce an innovative Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission (KARST) for various recognition tasks. |
Y. Zhu; H. Diao; S. Gao; L. Chen; H. Lu; |
23 | Meta-MMD Fusion: Enhancing Cross-Subject Motor Imagery Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, feature alignment based on deep learning in zero-calibration cross-subject frameworks remains inadequately explored. To address these issues, we propose the Meta-MMD, a novel cross-subject MI classification method that integrates meta-learning and maximum mean discrepancy (MMD) strategies. |
M. Chen; C. Qu; J. Pan; |
24 | A MoE Multimodal Graph Attention Network Framework for Multimodal Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Mixture of Experts Multimodal Graph Attention Network Framework For Multimodal Emotion Recognition (MMGAT-EMO), which combines Multimodal Graph Attention Network (Multimodal GAT) with Mixture of Experts (MoE). |
C. Zhang; Y. Liu; B. Cheng; |
25 | Efficient and Expandable Token-Level Approach for Multi-Domain Sensitive Information Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, supervised named entity recognition is not feasible for multi-domain sensitive information classification due to numerous type-specific annotations and the scarcity of training data. This work presents ToSIC, a new sensitive information classification approach to tackle these problems. |
H. Li; J. Ye; J. Wu; L. Zu; |
26 | Follow-Your-MultiPose: Tuning-Free Multi-Character Text-to-Video Generation Via Pose Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches mainly focus on single-object video generation with pose guidance, ignoring the realistic situation that multi-character appear concurrently in a scenario. To tackle this, we propose a novel multi-character video generation framework in a tuning-free manner, which is based on the separated text and pose guidance. |
B. Zhang; Y. Ma; C. Fu; X. Song; Z. Sun; Z. Li; |
27 | Role-Specific Reward Design with Large Language Model for StarCraft II Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Impressed by the remarkable power of LLMs, this paper employs LLMs as role-specific reward designer for playing StarCraft II, making rewards more flexible and task-oriented. |
S. Li; H. Lou; X. Zhang; X. Zeng; Z. Shen; T. Li; |
28 | WebSurfer: Enhancing LLM Agents with Web-Wise Feedback for Web Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They also face inefficiencies in multi-task scenarios due to handcrafted exemplars and encounter error accumulation in long-horizon tasks, exacerbated by web-specific complexities like nested structures and interactive elements. To address these issues, we introduce WebSurfer, a novel web agent designed to filter, learn, and adapt in complex environments. |
D. Hu; J. Ge; W. Tang; G. Li; L. Li; B. Wu; |
29 | Exploiting Foundation Models for Label-Efficient Few-Shot Learning Via Feature Coupling: A Case Study of Cardiac CT Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The scarcity of labeled data poses a significant challenge for deep learning-based medical image segmentation. To address this, this study introduces the novel Foundation Model-based Few-Shot Segmentation (FM-FSS) paradigm. |
W. Chen; C. Li; W. Zhou; Y. Li; T. Guo; Y. Tang; |
30 | Filtering Resistant Large Language Model Watermarking Via Style Injection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel filtering-resistant LLM watermarking scheme, which takes advantage of imperceptible text styles to trigger the watermark. |
Z. Guo; G. Li; J. Huang; X. Zhang; Z. Qian; S. Li; |
31 | CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Crowd-Sourced Splatting (CSS), a novel 3D Gaussian Splatting (3DGS) pipeline designed to overcome the challenges of pose-free scene reconstruction using crowd-sourced imagery. |
R. Chen; |
32 | PDCE: Patch-wise Dynamic Curve Estimation for Low-Light Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional CE-based methods struggle with issues such as uniform processing across different regions, static parameter estimation, and lack of effective global semantic enhancement. To address these limitations, we propose a novel unsupervised learning framework, Patch-wise Dynamic Curve Estimation (PDCE), which dynamically adjusts and optimizes enhancement curves according to local patch brightness and the iteration process. |
R. Chen; Z. Li; H. Zeng; Y. Liu; T. He; T. Song; |
33 | AnimateSketches: Animate Sketches with Instance-Aware Mask Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose AnimateSketches, a novel optimization-based framework that focuses on animating complex vectorized sketches. |
H. Deng; X. Dai; J. Hu; Y. Qi; |
34 | Detecting OOD Samples Via Optimal Transport Scoring Function Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, in this study, we propose a novel score function based on the optimal transport theory, named OTOD, for OOD detection. |
H. Gao; Z. He; J. Pu; |
35 | PGD-Imp: Rethinking and Unleashing Potential of Classic PGD with Dual Strategies for Imperceptible Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we rethink the essence of imperceptible attacks and propose two simple yet effective strategies to unleash the potential of PGD, the common and classical attack, for imperceptibility from an optimization perspective. |
J. Li; Z. Yu; Z. He; Z. J. Wang; X. Kang; |
36 | Easy, Interpretable, Effective: OpenSMILE for Voice Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate that attacks in the latest ASVspoof5 dataset—a de facto standard in the field of voice authenticity and deepfake detection—can be identified with surprising accuracy using a small subset of very simplistic features. |
O. Pascu; D. Oneaţă; H. Cucu; N. Müller; |
37 | A Divide-and-conquer Approach for Sparse Recovery in High Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a Welch bound-based guarantee on the reconstruction error with BCS, revealing that sparse recovery deteriorates with more partitions. To address this performance loss, we propose a data-driven BCS technique that leverages correlation across signal partitions. |
A. Bevelander; K. Batselier; N. J. Myers; |
38 | Imitating Human Selective Attention Using Dual Policy Network for Scanpath Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing scanpath models ignore the internal sub-stages in visual search and do not imitate human-like selective attention. To bridge this gap, a novel inverse reinforcement learning model with a dual policy network (DPNet) is proposed to accurately predict how humans select and shift their attention at different stages of a task. |
K. Zhang; G. Tong; X. Zhang; |
39 | GPPT: Gaussian Process-infused Prompt Tuning for Vision-language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the reliability of fine-tuned VLMs in safety-critical scenarios remains a concern due to the under-explored issue of confidence calibration. To address this limitation, we introduce Gaussian Process-infused Prompt Tuning (GPPT), a novel framework that integrates a Gaussian process into the hidden representations of images. |
S. Si; H. Sun; J. Gu; |
40 | AS-Net: Adaptive Style-aware Network for Handwritten Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Adaptive Style-aware Network (AS-Net) to address the challenging HTG task. |
Y. Wang; H. Wei; H. Wang; |
41 | SmartExp: An Adaptive Data Expansion Strategy for Improving Handwritten Text Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We reveal a balancing mechanism in HTR that stronger data expansion should be paired with weaker data augmentation. Informed by these insights, we propose SmartExp, a purely data-driven strategy without introducing any extra data or computational costs. |
Y. Wang; H. Wei; S. Sun; |
42 | Promoting PLM Fine-Tuning Through Consistency Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this approach encounters challenges due to task conflicts between auxiliary tasks and specific downstream tasks. To overcome these issues, we introduce a novel strategy termed Consistency Adversarial Training (CAT). |
J. Gao; J. Cao; J. Tang; |
43 | Algorithm Design for Continual Learning in IoT Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a polynomial-time algorithm to achieve approximation ratios of $\frac{3}{2}$ for underparameterized case and $\frac{3}{2} + {r^{1 – T}}$ for overparameterized case, respectively. |
S. Hao; L. Duan; |
44 | Harnessing Contrastive Learning and Neural Transformation for Time Series Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a novel approach, CNT, that incorporates a window-based contrastive learning strategy fortified with learnable transformations. |
J. Chen; M. Feng; T. S. Wirjanto; |
45 | Knowledge Distillation for Image Restoration : Simultaneous Learning from Degraded and Clean Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, its potential in image-to-image translation, particularly in image restoration, remains underexplored. To address this gap, we propose a Simultaneous Learning Knowledge Distillation (SLKD) framework tailored for model compression in image restoration tasks. |
Y. Zhang; D. Yan; |
46 | Hierarchical Similarity Loss Enhanced Depth and Structural Fidelity in Monocular RGB-to-Depth Mapping with Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, commercial depth cameras on robots produce incomplete and light-sensitive depth maps, further degrading estimation quality. To surmount these issues, We created a comprehensive dataset of 9,600 RGB-Depth image pairs, capturing a range of indoor scenes under various lighting (dim, normal, and strong lighting), and interference conditions (local strong light interference, specular reflection interference, background similarity interference, and combinations of these factors). |
C. Fu; |
47 | Global Context MambaVision for EEG-based Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing SSMs face limitations in processing global information due to window constraints. Therefore, this paper introduces a novel Global Context (GC) MambaVision model, which combines the linear time complexity advantage of SSMs with a new type of local-global attention mechanism. |
H. Wang; L. Xu; Y. Yu; W. Ding; Y. Xu; |
48 | Automated Exposure Mapping for Networked Interference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, handcraft neighboring structures defined by such manual schemes struggle to capture the complex and flexible structures exhibited by real-world social networks. To bridge this gap, we propose an Automated Exposure Mapping Network (AEMNet) by capturing networked interference conditions automatically with Graph Neural Networks (GNNs) and achieving mapping with deep embedded clustering. |
Y. Mao; H. Wang; Y. Cai; M. Li; J. Wang; W. Yang; |
49 | BiMA: Bidimensional Multi-level Attention Embedded Network for Single-frame Infrared Small Target Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: As the depth of the detection network increases, features of small targets become less pronounced, and the model may inadvertently favor background clutter, thereby reducing the … |
H. Deng; X. Yin; X. Lan; |
50 | Revelio: A Real-World Screen-Camera Communication System with Visually Imperceptible Data Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ‘Revelio’, a real-world screen-camera communication system leveraging temporal flicker fusion in the OKLAB color space. |
A. A. M. Nishar; S. Kudekar; B. Kintzing; A. Ashok; |
51 | Bridging The Fairness Gap: Enhancing Pre-trained Models with LLM-Generated Sentences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With the rise of large language models and their extensive knowledge, we propose enhancing fairness (Fair-Gender) in PLMs by absorbing coherent, attribute-balanced, and semantically rich sentences. |
L. Yu; L. Guo; P. Kuang; F. Zhou; |
52 | TGDrag: Adding Semantic Control Into Point-based Image Editing Via Text Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, relying solely on point-based manipulations can lead to unintended outcomes due to the inherent lack of the users’ semantic intent. To address this issue, we introduce Text-Guided Drag (TGDrag), a novel approach to adding semantic control into point-based image editing by using text prompts to guide the manipulation of handle and target points. |
C. Lin; Y. Zhu; Y. Miao; Z. Zhao; S. Liu; C. Shen; |
53 | Active Visual Learning for Robots with Dueling Deep Q-Networks and Transformer Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although current research has begun to explore how reinforcement learning can drive robots to actively perceive their environment, it often overlooks the critical role of integrating historical contextual information. To address this, we propose an innovative approach to active vision learning, designed to enhance target detection performance in unknown environments. |
H. Zeng; P. Zhang; F. Li; Q. Yi; J. Wang; T. Ye; |
54 | Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a new approach named Problem Elaboration Prompting (PEP) to enhance the mathematical capacities of LLMs. |
H. Liao; J. Tian; S. Hu; Z. Zhu; H. He; Y. Jin; |
55 | Efficient Infrared Image Super-Resolution Reconstruction Via Guided Filter Coefficients Estimation with Parallax Attention Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the spectral range mismatch between the images, building an efficient infrared (IR) image super-resolution algorithm suitable for embedded devices remains a significant challenge. |
Q. Wu; B. Chen; C. Li; X. Tu; X. Ding; Y. Huang; |
56 | HFE-RWKV: High-Frequency Enhanced RWKV Model for Efficient Left Ventricle Segmentation in Pediatric Echocardiograms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the two challenges, we turn to a novel and efficient basic structure, RWKV, and propose High-Frequency Enhanced RWKV (HFE-RWKV) for accurate and efficient left ventricle segmentation. |
Z. Ye; T. Chen; Z. Wang; H. Zhang; L. Zhang; |
57 | Practical Radar Sensing Using Two Stage Neural Network for Denoising OTFS Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Noise contamination affects the performance of OTFS signals in real-world environments, making radar sensing challenging. This work introduces a two-stage approach to tackle this issue. |
A. S. Kumar; S. Kalyani; |
58 | Gaussian Constrained Diffeomorphic Deformation Network for Panoramic Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the appearance discrepancies between pinhole and panoramic images, we propose Gaussian Constrained Diffeomorphic Deformation Network (GCDDN), which applies a panoramic deformation transformation obtained by Gaussian kernels to the annotated pinhole images. |
J. Jiang; J. Zhu; Z. Xu; X. Chen; S. Zhao; H. Yao; |
59 | FedDiT: Federated Learning By Distillation Token Enhanced Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose FedDiT, a novel federated learning framework that combines knowledge distillation with vision transformers. |
J. Xiao; |
60 | Knowledge-Guided Prompt Learning for Deepfake Facial Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they usually lack the exploration of prior knowledge and rarely pay attention to the domain shift between training categories (e.g., natural and indoor objects) and testing ones (e.g., fine-grained human facial images), resulting in unsatisfactory detection performance. To address these issues, we propose a novel knowledge-guided prompt learning method for deepfake facial image detection. |
H. Wang; C. Deng; Z. Zhao; |
61 | Improving Zero-Shot Chinese-English Code-Switching ASR with KNN-CTC and Gated Monolingual Datastores Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although there is potential for performance improvement, a kNN-CTC model utilizing a single bilingual datastore can inadvertently introduce undesirable noise from the alternative language. To address this, we propose a novel kNN-CTC-based code-switching ASR (CS-ASR) framework that employs dual monolingual datastores and a gated datastore selection mechanism to reduce noise interference. |
J. Zhou; |
62 | Unsupervised Hierarchical Dynamic Similarity Hashing for Multimedia Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although these methods have made significant progress in the field of multimedia retrieval, they still face challenges related to inaccurate similarity measurements and incomplete embedding of relational information. To address these issues, we propose Unsupervised Hierarchical Dynamic Similarity Hashing(UHDSH) for multimedia retrieval. |
Y. Chen; Z. Yang; J. Long; |
63 | RemoteTrimmer: Adaptive Structural Pruning for Remote Sensing Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose an effective structural pruning approach for remote sensing image classification. |
G. Zou; |
64 | MonTransformer: Self-Supervised Phonetic to Glyph Conversion Leveraging Positional Context for Traditional Mongolian Texts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the first method specifically designed to address the out-of-vocabulary (OOV) word problem in traditional Mongolian script conversion. |
C. Zhou; Monghjaya; L. Wu; |
65 | M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose M2R-Whisper, a novel multi-stage and multi-scale retrieval augmentation approach designed to enhance ASR performance in low-resource settings. |
J. Zhou; |
66 | V-Fusion: 2D Detection-enhanced Multimodal 3D BEV Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose V-Fusion, a high-quality 2D detection-enhanced multimodal BEV object detection method. |
Z. Li; X. Zhao; J. Bian; B. Liu; W. Li; L. Zhang; |
67 | Less Over More: Interference Sample Gradient Purification For Parallel Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous research on PCL ignored inter-task interference, which may hinder knowledge transfer and exacerbate catastrophic forgetting. Therefore, in this paper, we investigate the interference problem of PCL in dynamic multi-task scenarios. |
T. Lu; J. Tan; L. Li; F. Hu; |
68 | Chat-Driven 3D Human Pose and Shape Editing with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel way to leverage Large Language Models (LLMs) to interactively reconstruct human pose and shape based on a Skinned Multi-Person Linear (SMPL) model. |
F. Zhou; |
69 | A Pre-trained Plug-in Mixture-of-LoRAs Model for Transferable Sequential Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, they are typically built on specific pre-trained model structures, limiting their model generalization to various different downstream domains and sequential recommenders in practical applications. This paper aims to address these issues by exploring the potential of mixture-of-LoRAs based domain adaption and an ensemble of sequential recommenders for learning generalizable and transferable domain-adaptive item representations. |
W. Sun; |
70 | Segment-Recurrent Transformer with Multi-Scale Fusion for Long-Term Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the Segment-Recurrent Transformer (SRTrans), designed to provide a more comprehensive understanding of historical time series dynamics. |
Z. Yang; L. Wei; B. Zhou; X. Tang; R. Li; S. Hu; |
71 | Tip The Scales: Achieving Balance in Adversarial Examples Across Modalities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, generating adversarial examples that can achieve balanced transferability remains a challenging and perplexing problem. In this paper, we propose the InterModality Balanced Attack (MOBA) to address this problem. |
Z. Shi; |
72 | Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Vision-Language Models for remote sensing have shown promising uses thanks to their extensive pretraining. |
K. El Khoury; |
73 | SUFT: Sparse and Uncertain Fusion Transformers for Multi-Atlas Brain Network Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve upon these, we propose the Sparse and Uncertain Fusion Transformers (SUFT) for multi-atlas brain network analysis. |
Z. Su; J. Huang; S. Jiang; M. Wang; W. Ding; |
74 | Leveraging Boolean Directivity Embedding for Binaural Target Speaker Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Boolean Directivity Embedding (BDE) as a new direction feature in order to precisely lock onto the target speaker independent on microphone array configurations for binaural TSE (BiTSE). |
Y. Wang; J. Zhang; C. Jiang; W. Zhang; Z. Ye; L. Dai; |
75 | Integrating Failures in Robot Skill Acquisition with Offline Action-Sequence Diffusion RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this significantly reduces the sample efficiency of the method and results in a policy limited by the data collection behavior policy. In this paper, we introduce a vision-language-conditioned action-sequence diffusion policy and an action-sequence diffusion policy learning with Q-guided refinement for its training. |
H. Wang; L. Qi; Y. Sun; |
76 | Dual-PST: Dual-Branch SpatioTemporal-Planar Network for Video Forgery Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, traditional methods struggle to capture local details and temporal dynamics simultaneously, making it difficult to achieve high detection accuracy while maintaining low computational overhead. To address this problem, we propose a Dual-branch SpatioTemporal-Planar Network (Dual-PST) based on the selective state-space model. |
S. Liu; Z. Zhang; J. Duan; J. Cao; A. Zheng; |
77 | A Robust Online Miscalibration Detection and Correction Method for LiDAR-Camera Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this letter, we introduce CalibOnline, a novel method for the online detection and correction of miscalibration in multi-sensor setups. |
F. Pan; W. Wang; J. Zhang; |
78 | DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods often overlook the bidirectional interactions and inherent noises present in each modality, which can crucially impact the quality and efficacy of cross-modal integration. To address this limitation, we introduce DiffGAP, a novel approach incorporating a lightweight generative module within the contrastive space. |
S. Mo; Z. Chen; F. Bao; J. Zhu; |
79 | Integrating Pause Information with Word Embeddings in Language Models for Alzheimer’s Disease Detection from Spontaneous Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach to AD detection from spontaneous speech, which incorporates pause information into language models. |
Y. Pu; W. -Q. Zhang; |
80 | Consensus Graph Filter Learning for Multiple Graph Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These filters are usually derived from either a specific view or a consensus graph across all views, which limits their effectiveness in fully integrating multi-view information. To address this limitation, we propose a novel method, Consensus Graph Filter Learning for Multiple Graph Clustering (CGFMVC). |
J. Zou; Y. Chen; P. Zhou; C. Wen; L. Du; Y. Qian; |
81 | Advancing Dark Action Recognition Via Modality Fusion and Dark-to-Light Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose Modality Fusion Dark-to-Light (MFDL), a two-stage framework to simultaneously enhance the invisibility of poorly-lit videos and strengthen recognition performance. |
Y. Wang; Z. Xing; Z. Wu; |
82 | Robust Detection Based on The K-Score Test Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Within this framework, we introduce a new robust score-type detector. |
K. Todros; |
83 | Training-Free Task Planning By Parsing Language Signals With Common Sense Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method named Language Signal Parse Tree (LSPT for short) for task planning. |
X. Zhang; W. Wang; S. Chai; X. Wang; X. Fan; |
84 | Keypoint Aware Masked Image Modelling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an efficient patch-wise weighting derived from keypoint features which captures the local information and provides better context during SimMIM’s reconstruction phase. |
M. Krishna; A. V. Subramanyam; |
85 | Enhancing Vision: Harmonizing Frequency for Imaging Quality and Perception Accuracy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that independent low-level reconstruction algorithms can simultaneously enhance imaging quality and downstream perception accuracy. |
H. Chen; K. Ma; |
86 | CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Downstream Vision Tasks Under Unknown Degradations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches are confined to known single degradation scenarios, constraining their practical applicability in unpredictable environments. Therefore, we propose a Chain-of-thought Prompted Adaptive Enhancer, CPA-Enhancer, for enhancing important features crucial for downstream vision tasks under unknown degradations. |
Y. Zhang; Y. Wu; Y. Liu; X. Peng; |
87 | Gated Cross-Attention Network for Depth Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the same time, we use the Ray Tune mechanism with the AsyncHyperBandScheduler and the HyperOptSearch algorithm to automatically search for the optimal number of module iterations, which also allows us to achieve performance comparable to state-of-the-art methods. |
X. Jia; S. Jian; Y. Tan; Y. Che; W. Chen; Z. Liang; |
88 | Modulo Sampling and Recovery with Unknown and Time-Varying Folding Parameter Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the modulo recovery problem in scenarios where the folding parameter is unknown and potentially time-varying, motivated by practical hardware limitations that make it difficult to maintain a constant and precise folding parameter. |
Y. Kvich; A. Yasar; E. Tasci; R. T. Yazicigil; Y. C. Eldar; |
89 | Synthetic Dataset Generation for String Ensemble Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on dataset generation using a neural synthesis model, we propose an approach that incorporates musical expressions into the dataset generation process using MusicXML files. |
M. Kim; J. Bae; E. Shin; K. Lee; |
90 | PaSTS: Parameter-affined Seasonal-Trend Synthesis for Multi-dimensional Long-Term Time Series Forecasting Within LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose PaSTS, a novel framework designed to integrate decomposition methods into LLMs through a specialized temporal synthesis layer, thereby improving predictive accuracy and mitigating overfitting of LLMs in LTSF tasks. |
Q. Lv; J. Ge; Y. Xu; T. Li; L. Li; |
91 | Developing A Multilingual Dataset and Evaluation Metrics for Code-Switching: A Focus on Hong Kong’s Polylingual Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We have introduced a novel evaluation metric called Fidelity to the Original Audio, Accuracy, and Latency (FAL). |
P. Xie; K. Chen; |
92 | Spiking Transformer with Spatial-Temporal Spiking Self-Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, SSA focuses solely on spatial dimension at each time step, overlooking the crucial features across temporal dimension. To address this, we propose the Spatial-Temporal Spiking Self-Attention (STSSA), a spike-driven mechanism that leverages both spatial and temporal information with negligible additional computational overhead. |
Z. Zhou; J. Niu; Y. Zhang; L. Yuan; Y. Zhu; |
93 | SpikingPoint: Rethinking Point As Spike for Efficient 3D Point Cloud Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the SpikingPoint, a pure Spiking Multi-layer Perceptron (MLP) Architecture that leverages the low power consumption of SNNs and the computational efficiency of linear layers. |
Z. Zhou; Y. Lu; J. Zhan; G. Luo; Y. Zhu; |
94 | Vision-Language Model Guided Semi-supervised Learning for No-Reference Video Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current no-reference video quality assessment (NR-VQA) algorithms require a large amount of human annotated videos. In this work, we address this problem by specifically designing a dual-model based Semi-supervised Learning (SSL) method for NR-VQA. |
S. Mitra; R. Soundararajan; |
95 | Segue: Side-information Guided Generative Unlearnable Examples for Facial Privacy Protection in Real World Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To remedy it, we introduce Side-information Guided Generative Unlearnable Examples (Segue). |
Z. Zhang; |
96 | Part in Part Embedding Network for Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, local features are represented at various scales across different layers of a neural network, making it hard to capture their local details fully. To address these challenges, we propose a novel part in part embedding network, termed PPEN. |
Z. Zhou; L. Xiao; G. -S. Xie; |
97 | Federated Smoothing ADMM for Quantile Regression with Non-Convex Sparse Penalties Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods for penalized quantile regression often struggle with asynchronous operations and multiple updates per node, leading to inconsistencies across these distributed nodes. To address these challenges, we propose the Federated Smoothing ADMM (FSAD) algorithm, which integrates non-convex sparse penalties – specifically, the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD)–to effectively identify significant predictors while retaining sparsity. |
R. Mirzaeifard; S. Werner; |
98 | Tool Playgrounds: A Comprehensive and Analyzable Benchmark for LLM Tool Invocation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing benchmarks typically only provide end-to-end scores but lack in-depth analysis and often suffer from issues such as instability. To address this gap, we have meticulously designed the Tool Playgrounds framework, a comprehensive, analyzable, and extensible benchmark. |
Z. Dong; |
99 | Distilling Generative-Discriminative Representations for Very Low-Resolution Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make more complete knowledge transfer, we propose a generative-discriminative representation distillation approach that combines generative representation with cross-resolution aligned knowledge distillation. |
J. Zhang; W. Guo; B. Liu; R. Shi; Y. Li; S. Ge; |
100 | Spectral Low-Rank Attention with Flow-Based Refinement for Spectral Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods encounter computational and memory overheads that scale quadratically with the size of the HSIs. To overcome these challenges, we introduce a novel Spectral-wise Low-Rank Attention (SLORA) mechanism that captures inter-spectral consistency in a low-dimensional space, thereby reducing both computational costs and model complexity. |
Y. Wang; Z. Tang; Y. Hu; G. Liu; T. -X. Jiang; |
101 | Scalable Speech Enhancement With Dynamic Channel Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce Dynamic Channel Pruning to the audio domain for the first time and apply it to a custom convolutional architecture for SE. |
R. Miccini; C. Laroche; T. Piechowiak; L. Pezzarossa; |
102 | Joint Space-Time Adaptive Processing and Beamforming Design for Cell-Free ISAC Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore cooperative sensing and communication within cell-free integrated sensing and communication (ISAC) systems. |
R. Liu; M. Li; Q. Liu; |
103 | Speech Retrieval-Augmented Generation Without Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While this cascaded pipeline has proven effective in many practical settings, ASR errors can propagate to the retrieval and generation steps. To overcome this limitation, we introduce SpeechRAG, a novel framework designed for open-question answering over spoken data. |
D. J. Min; K. Mundnich; A. Lapastora; E. Soltanmohammadi; S. Ronanki; K. Han; |
104 | GREST: Ghost Targets Removal Algorithm Using Multipath Angle Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an algorithm for the removal of ghost targets based on angle estimation termed GREST (Ghost targets Removal using ESTAR). |
R. Takahashi; P. Wang; |
105 | Separate Estimation of Angular Velocity and Angle for Digital Array Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method that enables one-dimensional estimation of angular velocity of a high-speed target by adding preprocessing to a received signal of a digital array radar. |
T. Terada; T. Ito; R. Takahashi; |
106 | SSDViT: Exploring Siamese and Self Distillation in ViTs for Generalizable Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the generalization ability of ViTs and propose a novel Siamese and Self Distillation Vision Transformer (SSDViT) framework towards addressing the DG re-ID problem. |
J. Jia; J. Yang; |
107 | Secure Analog Beamforming Design for Wireless Communication Systems With Movable Antennas Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve the resulting non-convex problem, we propose a penalty product manifold (PPM) method, which converts MA position constraints into a penalty function, reformulating the problem as unconstrained optimization on the product manifold space (PMS). |
W. Xiong; K. Zhong; Z. Xiao; J. Lin; Q. Li; |
108 | BIAWDiff: Enhancing Low-Light Images with Bio-Inspired Attention and Wavelet Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing traditional algorithms and deep learning approaches, often struggle with balancing brightness enhancement and detail preservation, leading to issues such as overexposure, artifacts, and loss of high-frequency details. To address these challenges, we propose a novel method, Bio-Inspired Attention and Wavelet Diffusion (BIAWDiff), that integrates Retinex theory with bio-inspired attention and wavelet-based diffusion models to enhance low-light Images. |
Z. Li; S. Yang; H. Yang; X. Tang; F. Wu; F. Xu; |
109 | Uncertainty-Participation Context Consistency Learning for Semi-supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing consistency regularization methods only utilize high certain pixels with prediction confidence surpassing a fixed threshold for training, failing to fully leverage the potential supervisory information within the network. Therefore, this paper proposes the Uncertainty-participation Context Consistency Learning (UCCL) method to explore richer supervisory signals. |
J. Yin; Y. Chen; Z. Zheng; J. Zhou; Y. Gu; |
110 | One-Shot Face Avatar Generation in A Single Forward Pass with Identity Preservation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this research gap, we propose a novel one-shot approach, which achieves effective face avatar generation in only a single forward pass. |
Y. Miao; |
111 | Conservative Offline Meta-Reinforcement Learning with Task Similarity Measurement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, OMRL faces challenges such as Q-function overestimation and difficulties in inferring tasks correctly and robustly due to distribution discrepancy. In this paper, we introduce ConseRvative q-learning and task similarity mEAsuremenT for Offline meta-Reinforcement learning (CREATOR), a method to address these challenges using only offline datasets, without requiring additional interactions. |
H. Li; J. Liang; L. Li; D. Zeng; |
112 | Agentic Copyright Watermarking Against Adversarial Evidence Forgery with Purification-Agnostic Curriculum Proxy Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents several contributions to model watermarking: a self-authenticating black-box watermarking protocol using hash techniques, a study on evidence forgery attacks using adversarial perturbations, a proposed defense involving a purification step to counter adversarial attacks, and a purification-agnostic curriculum proxy learning method to enhance watermark robustness and model performance. |
E. Bao; C. -C. Chang; H. Wang; I. Echizen; |
113 | BP-GPT: Auditory Neural Decoding Using FMRI-prompted LLM Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel method, the Brain Prompt GPT (BP-GPT). |
X. Chen; C. Du; C. Liu; Y. Wang; H. He; |
114 | DS-BTIAN: A Novel Deep-Shallow Bidirectional Transformer Interactive Attention Network for Multimodal Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel Deep-Shallow Bidirectional Transformer Interactive Attention Network (DS-BTIAN) designed for robust multimodal emotion recognition. |
Z. Chen; C. Zhao; Z. Wang; C. Liu; Q. Zheng; C. Zou; |
115 | LiSenNet: Lightweight Sub-band and Dual-Path Modeling for Real-Time Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a lightweight SE network (LiSenNet) for real-time applications. |
H. Yan; J. Zhang; C. Fan; Y. Zhou; P. Liu; |
116 | Enhancing Data-Free Class-Incremental Learning Via Image-Centric Dual Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent approaches using model inversion have made progress in addressing this issue, yet the suboptimal application of knowledge distillation hampers new task learning, limiting overall model performance. To overcome this, we propose a novel method incorporating image-centric dual distillation, designed to retain more old knowledge while facilitating new knowledge acquisition, thus enhancing DFCIL performance. |
F. Fu; Z. Lu; |
117 | A Hierarchical Compression Technique for 3D Gaussian Splatting Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, compression of the GS data itself has hardly been explored. To address this gap, we propose a Hierarchical GS Compression (HGSC) technique. |
H. Huang; W. Huang; Q. Yang; Y. Xu; Z. Li; |
118 | MAID: Model Attribution Via Inverse Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods either struggle to attribute across multiple frameworks or rely on additional conditions, such as textual descriptions and white-box access to the source model, limiting their practicality and effectiveness in real-world scenarios. To address this gap, we introduce Model Attribution via Inverse Diffusion (MAID), the first framework-agnostic and self-sufficient approach that leverages the source model features extracted by diffusion models, which also works for images generated from GANs. |
L. Zhu; |
119 | Fourth-Order Cumulant Based 3-D Near-Field Underdetermined Parameter Estimation With Exact Spatial Propagation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the exact spherical wavefront model, an under-determined estimation method for three-dimensional (3-D) parameters of near-field (NF) sources using L-shaped nested arrays is proposed, referred to as the cumulant algorithm. |
L. Jin; H. Chen; J. Fang; W. Liu; Y. Tian; G. Wang; |
120 | Can Large Language Models Grasp Event Signals? Exploring Pure Zero-Shot Event-based Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate that LLMs can achieve event-based object recognition without additional training or fine-tuning in conjunction with CLIP, effectively enabling pure zero-shot event-based recognition. |
Z. Yu; Q. Qu; X. Chen; C. Wang; |
121 | Contrastive Learning Via Randomly Generated Deep Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this often leads to intra-class collision in a large latent space, compromising the quality of learned representations. To address this issue, we propose a novel contrastive learning method that utilizes randomly generated supervision signals. |
S. Wang; |
122 | Fusion-OSR: Cross-Domain Contrastive Learning with Weibull Calibration for Time Series Open Set Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In recent years, numerous Time Series Anomaly Detection methods have emerged, focusing primarily on detecting anomalies within time series, with limited work on open set … |
S. Hu; X. Zhao; S. Hu; X. Gao; |
123 | Learning Markup Language Model for Composite Relationships Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we adapt pre-trained language models (PLMs) to extract complex relations in RTE, for which we propose a novel learning framework, named MarkET. |
F. Lu; J. Duan; J. Liu; |
124 | A Weakly Supervised Semantic Segmentation Model with Enhanced CLIP Feature Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the limitations of the Contrastive Language-Image Pre-training (CLIP) model’s image encoder and proposes a segmentation model WSSS-ECFE with enhanced CLIP feature extraction, aiming to improve the performance of the Weakly Supervised Semantic Segmentation (WSSS) task. |
F. Kong; J. Lu; |
125 | OCTAMamba: A State-Space Model Approach for Precision OCTA Vasculature Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we proposed OCTAMamba, a novel U-shaped network based on the Mamba architecture, designed to segment vasculature in OCTA accurately. |
S. Zou; Z. Zhang; G. Gao; |
126 | BDCKD: Unlocking The Power of Brownian Distance Covariance in Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a comprehensive approach that utilizes Brownian Distance Covariance (BDC) to measure the discrepancy between the logits produced by the teacher and student models. |
G. Lu; H. Yin; Z. Shu; J. Wang; G. Luo; |
127 | VibeGait: Enhancing Structural-Vibration Based Gait Recognition Using Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a multi-modal gait recognition system that integrates both vision and structural vibration modalities. |
M. Chakraborty; Chandan; B. Mukhopadhyay; S. Anchal; S. Kar; |
128 | Diverse Collaboration in Multi-Agent Reinforcement Learning Via Self-Adaptive Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, while this sharing facilitates teamwork, it can also result in agent homogenization, which limits individualized behaviors. To address this issue, we introduce a novel method called Diverse Collaboration in Multi-Agent Reinforcement Learning via Self-Adaptive Method (DC-SA). |
X. Xue; Q. Liu; M. Shi; Y. Jin; |
129 | SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, directly cascading existing models can introduce significant interference and reduce video clarity due to limited interaction space in the low-level RGB domain. To solve this, we propose SwapTalk, a unified framework that performs face-swapping and lip synchronization within the same latent VQ-embedding space, known for its editability and fidelity. |
Z. Zhang; |
130 | MPAM-3DGS: Multi-Parametric Adversarial Manipulation for 3D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given that some tasks involve high risks, it is crucial to investigate the adversarial robustness of 3DGS and its downstream tasks—a topic that remains largely unexplored. In this study, we introduce a framework, Multi-Parametric Adversarial Manipulation for 3D Gaussian Splatting (MPAM-3DGS), that allows to attack 3DGS and its downstream tasks, such as object detection and classification, by perturbing a specified subset of parameters. |
W. Jiang; H. Zhang; W. Wang; Z. Guo; T. Zhang; H. Wang; |
131 | Deep Unfolding of Full Waveform Inversion for Quantitative Ultrasound Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a deep unfolding-based approach for Full Waveform Inversion (FWI) in quantitative ultrasound imaging. |
N. Cohen; Y. Kvich; R. Guo; Y. C. Eldar; |
132 | BS-Breath: Respiration Sensing with Cell-free Massive MIMO Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper demonstrates the feasibility of respiration pattern estimation utilizing a communication-centric cell-free massive MIMO OFDM Base Station (BS). |
H. Xiong; R. Beerten; Z. Cui; Y. Miao; S. Pollin; |
133 | Subspace-Based Range-Angle Tracking for Coherent FDA Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a subspace-based range-angle tracking method, including the subspace tracking and range-angle estimation. |
Y. Sun; W. -Q. Wang; M. S. Greco; F. Gini; |
134 | Causal FMRI-Mamba: Causal State Space Model for Neural Decoding and Brain Task States Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, a novel causal state space model, Causal fMRI-Mamba, is proposed for neural decoding and task state mapping. |
W. Deng; F. Han; Q. Ling; Q. Liu; H. Han; |
135 | Generating Editable Head Avatars with 3D Gaussian GANs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel approach that enhances the editability and animation control of 3D head avatars by incorporating 3D Gaussian Splatting (3DGS) as an explicit 3D representation. |
G. Li; |
136 | Towards Interactive Deepfake Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper aims to explore interactive deepfake analysis by performing instruction tuning on multi-modal large language models (MLLMs). |
L. Qin; |
137 | Detecting and Defending Against Adversarial Attacks on Automatic Speech Recognition Via Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we systematically investigate the use of DMs for defending against adversarial attacks on sentences and examine the effect of varying forward diffusion steps. |
N. L. Kühne; |
138 | Controllable Forgetting Mechanism for Few-Shot Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Fine-tuning the model on novel classes often leads to the phenomenon of catastrophic forgetting, where the accuracy of base classes declines unpredictably and significantly. In this paper, we propose a simple yet effective mechanism to address this challenge by controlling the trade-off between novel and base class accuracy. |
K. Paramonov; M. Ozay; E. Yang; J. Moon; U. Michieli; |
139 | Advancing Active Speaker Detection for Egocentric Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an improved approach to multimodal active speaker detection in egocentric videos, specifically designed to be robust against the rapid movements and motion blur commonly found in such videos. |
J. Huh; |
140 | Rethinking The Fragility and Robustness of Fingerprints of Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite of their distinct motivations, we show that both kinds of neural network fingerprints can be evaluated under a modification-scalable framework, which gives rise to a duality between their key metrics. |
F. Li; S. Wang; L. Yang; |
141 | Emotion-aware Structural Enhancement Graph Auto-Encoder for Rumor Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods mainly rely on textual information and event propagation structures, but provocative comments and unreliable interactions increase propagation uncertainty. To address these challenges, we propose an Emotionally-Aware Structural Enhancement Graph Auto-Encoder (EASE-GARD) to improve rumor representations. |
G. Li; Z. Yao; D. Hu; Y. Xu; X. Zhang; H. Lyu; |
142 | Synergistic Spotting and Recognition of Micro-Expression Via Temporal State Transition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel temporal state transition architecture grounded in the state space model, which replaces conventional window-level classification with video-level regression. |
B. Zou; Z. Guo; W. Qin; X. Li; K. Wang; H. Ma; |
143 | Interpreting Deep Neural Network-Based Receiver Under Varying Signal-To-Noise Ratios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel method for interpreting neural networks, focusing on convolutional neural network-based receiver model. |
M. Tuononen; D. Korpi; V. Hautamäki; |
144 | Efficient Co-clustering Via Anchor-refined Label Spreading Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, when noise is present in original data, the anchor-refined label spreading mechanism may fail. To address this, we propose an Efficient Co-clustering via Anchor-refined Label Spreading (ECALS), which simultaneously clusters original data and anchors. |
F. Xie; F. Nie; W. Yu; X. Li; |
145 | Multi-Modal Medical Image Fusion Via 3D Manifold Fitting and Dual-Domain Cross-Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its importance, MIF faces two primary challenges: the lack of tailored paradigms for CMSF extraction and insufficient dual exploration of multi-modality and multi-frequency domains. To address these challenges, we propose a novel MIF model in this study. |
Z. Wang; J. Wang; H. Song; J. Feng; H. Duan; |
146 | A Frequency-aware Augmentation Network for Mental Disorders Assessment from Audio Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current research, relying on labor-intensive hand-crafted features or simplistic time-frequency representations, often overlooks critical details by not accounting for the differential impacts of various frequency bands and temporal fluctuations. Therefore, we propose a frequency-aware augmentation network with dynamic convolution for depression and ADHD assessment. |
S. Li; S. Song; R. Nair; S. M. Naqvi; |
147 | Bootstrapping Language-Audio Pre-training for Music Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce BLAP, a model capable of generating high-quality captions for music. |
L. A. Lanzendörfer; C. Pinkl; N. Perraudin; R. Wattenhofer; |
148 | Speaking Without Sound: Multi-speaker Silent Speech Voicing with Facial Inputs Only Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel framework for generating multi-speaker speech without relying on any audible inputs. |
J. Lee; Y. Oh; K. Lee; |
149 | The Sound of Water: Inferring Physical Properties from Pouring Liquids Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Given only the sound of liquid pouring into a container, our objective is to automatically infer physical properties such as the liquid level, the shape and size of the container, the pouring rate, and the time to fill. |
P. Bagad; M. Tapaswi; C. G. M. Snoek; A. Zisserman; |
150 | Realistic Real-Time Talking Head Synthesis with Grid Encoding and Progressive Conditioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce R2-Talker, an efficient and effective framework for real-time talking head synthesis. |
Z. Ye; L. -G. Zhang; D. Zeng; Q. Lu; N. Jiang; |
151 | AKI360: Enabling Highly Interactive 360-degree Video Streaming By Adaptive Keyframe Interval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, reducing the keyframe interval poses greater challenges for network transmission and encoding/decoding overhead. To address these issues, we design and implement AKI360, a 360-degree video streaming system with an Adaptive Keyframe Interval (AKI) mechanism along with Quantization Parameter (QP) adjustments. |
H. Liu; X. Zhang; C. Jia; Y. Li; G. Xie; |
152 | Nonlinear Anisotropic Diffusion-Based Channel Estimation in 5G Wireless Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the context of the fifth-generation new radio downlink scenario, we introduce an innovative approach for channel estimation in this paper that circumvents the requirement for the prior dataset. |
K. S. Gahlot; S. Joshi; K. Wang; |
153 | MorphFader: Enabling Fine-grained Controllable Morphing with Text-to-Audio Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose MorphFader, a controllable method for morphing sounds generated by disparate prompts using text-to-audio models. |
P. Kamath; C. Gupta; S. Nanayakkara; |
154 | LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This limitation results in models that are highly sensitive to variations encountered in real-world scenarios. To address this issue, we propose a novel framework, LipGen, which aims to improve model robustness by leveraging speech-driven synthetic visual data, thereby mitigating the constraints of current datasets. |
B. Hao; |
155 | Granularity-Aware Contrastive Learning for Fine-Grained Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For a balanced understanding of coarse and fine-grained distinctions, we propose the Granularity-Aware Contrastive Learning (GACon) framework to improve contrastive learning for fine-grained action recognition. |
H. Zhang; X. Wang; Q. Zhao; |
156 | AD2T: Adversarial Distortion Domain Translation for Robust Watermarking Against Non-differentiable Distortions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle non-differentiable distortions, current methods only train the decoder with distorted images, which breaks the joint optimization of the encoder-decoder, resulting in suboptimal performance. To address this problem, we propose an Adversarial Distortion Domain Translation (AD2T) method by treating the distortion as an image-to-image translation task. |
C. Zhao; H. Ling; J. Chen; H. Fang; Z. Li; S. Xie; |
157 | Adversarial Feature Disentanglement Framework for Voice Pathology Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, a novel adversarial feature disentanglement framework is proposed to achieve the voice pathology detection task by extracting task-oriented features and optimizing feature space. |
Y. Xiong; D. Guo; L. Shen; W. Mo; H. Yang; Y. Lin; |
158 | Enhancing Extrapolation Reasoning on Temporal Knowledge Graphs with Logic Rules and Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Crucially, these methods are challenging in generating explicit reasoning paths. To address these gaps, we propose an innovative framework (LogiQ) for extrapolation reasoning on TKGs, steered by temporal logic rules and queries. |
T. Chen; L. Yang; Z. Wang; S. Luo; J. Long; |
159 | Dual-Pyramid Attention Collaborative Network for Oracle Bone Inscription Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The reliance on global feature vectors ignores the subtle differences between different scales and the uneven importance of different regions in the inscriptions. To address this issue, we propose a dual-pyramid attention collaborative network that enables classification models to learn OBI multi-scale attention. |
J. Gao; F. Giunchiglia; T. Zhao; C. Li; H. Xu; |
160 | Multi-Prototype-based Embedding Refinement for Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, traditional linear classifiers, limited by a single learnable weight per class, struggle to capture this finer distinction. To address the above challenges, we propose a Multi-Prototype-based Embedding Refinement method for semi-supervised medical image segmentation. |
Y. Bi; E. Che; Y. Chen; Y. He; J. Qu; |
161 | LSU-NET: Lightweight Automatic Organs Segmentation Network for Medical Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the substantial number of parameters and computational complexity of these models make them less suitable for use in clinical settings with limited computational resources. To address this limitation, we propose a novel Lightweight Shift U-Net (LSU-Net). |
Y. Ding; S. Teng; Z. Li; X. Chen; |
162 | Zero-shot Document Retrieval with Hybrid Pseudo-document Retriever Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a hybrid retriever to further improve the quality of the pseudo-documents and to obtain the relevant information more effectively. |
D. Sun; W. Guo; X. Liu; Y. Zhang; Z. Hou; Z. Li; |
163 | Electrocardiogram Report Generation and Question Answering Via Retrieval-Augmented Self-Supervised Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Interpreting electrocardiograms (ECGs) and generating comprehensive reports remain challenging tasks in cardiology, often requiring specialized expertise and significant time investment. To address these critical issues, we propose ECG-ReGen, a retrieval-based approach for ECG-to-text report generation and question answering. |
J. Tang; T. Xia; Y. Lu; C. Mascolo; A. Saeed; |
164 | Robust Activity Detection for Massive Access Using Covariance-based Matching Pursuit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a robust activity detection for grant free random access using greedy covariance-learning-based matching pursuit (RCL-MP) algorithm. |
X. Wang; E. Ollila; S. A. Vorobyov; |
165 | A Key to Effective Multi-task Learning: Separate Query Selection for Task-Synergized Handling and Node Utilization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better address multiple vision problems, we introduce SeTano, an integrated Graph Neural Network (GNN)-based framework. |
S. -Y. Yang; H. -C. Cheng; C. -Y. Wang; J. -C. Wang; C. -Y. Lee; |
166 | Micro-expression Spotting Based on Multi-modal Hierarchical Semantic-guided Deep Fusion Model* Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing micro-expression spotting (MES) methods predominantly learn from optical flow features while neglecting the detailed information contained in RGB images and few MES works have explored the optimal interaction fusion of the optical flow modality and RGB modality. To address these issues, we propose a multi-scale layered semantic-guided end-to-end cross-modal fusion framework for MES with a convolutional neural network (CNN)-Transformer, named MESFusion. |
Z. Xie; H. Chang; |
167 | Discriminating Mizo Hunting and War Chants Using Acoustic Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study explores the acoustic characteristics of expressive Hunting chants and War chants of Mizoram, a northeastern state in India. |
E. Ramdinmawii; V. K. Mittal; |
168 | Content and Salient Semantics Collaboration for Cloth-Changing Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we promote cloth-changing person re-identification by leveraging abundant semantics present within pedestrian images, without the need for any auxiliaries. |
Q. Wang; X. Qian; B. Li; L. Chen; Y. Fu; X. Xue; |
169 | GS-PT: Exploiting 3D Gaussian Splatting for Comprehensive Point Cloud Understanding Via Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our pipeline utilizes transformers as the backbone for self-supervised pre-training and introduces novel contrastive learning tasks through 3DGS. |
K. Liu; |
170 | Out-of-Distribution Detectors: Not Yet Primed for Practical Deployment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper examines the practical robustness of OoD detectors, taking computer vision tasks as examples and considering natural input perturbations that may come from camera positions and lighting conditions. |
C. Wu; W. Ding; X. Huang; S. Bensalem; |
171 | Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a unified framework for Singing Voice Synthesis (SVS) and Conversion (SVC), addressing the limitations of existing approaches in cross-domain SVS/SVC, poor output musicality, and scarcity of singing data. |
S. Dai; Y. Wang; R. B. Dannenberg; Z. Jin; |
172 | Instance–wise Feature Acquisition with Classifier Selection Option for Structured Data Instances Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method that sequentially acquires features and selects a classifier for label assignment in data instances of related variables. |
S. P. Ekanayake; D. Zois; |
173 | Cross-Component Residual Prediction for Geometry-Based Point Cloud Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the significant impact of cross-component prediction in traditional image and video coding, we investigate and present our pioneering work on cross-component residual prediction for RAHT in G-PCC. |
B. Vishwanath; Y. Xu; K. Zhang; L. Zhang; |
174 | Selective Attention Merging for Low Resource Tasks: A Case Study of Child ASR Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While Speech Foundation Models (SFMs) excel in various speech tasks, their performance for low-resource tasks such as child Automatic Speech Recognition (ASR) is hampered by limited pretraining data. To address this, we explore different model merging techniques to leverage knowledge from models trained on larger, more diverse speech corpora. |
N. B. Shankar; Z. Wang; E. Eren; A. Alwan; |
175 | Interference-Resilient Hybrid Multi-Antenna ARQ Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes combining CCA-based interference excision and HARQ-I. |
N. D. Sidiropoulos; Y. Tang; |
176 | Physics-Informed Neural Networks for Ocean Acoustic Field Prediction with Envelope Smoothing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a physics-informed neural network (PINN) with the Helmholtz equation as a physics constraint, enhancing prediction accuracy with scarce data. |
Y. Park; P. Gerstoft; S. Yoon; W. Seong; |
177 | Sharpness-Aware Minimization with Adaptive Regularization for Training Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the SAM with Adaptive Regularization (SAMAR), which introduces a flexible sharpness ratio rule to update the regularization parameter dynamically. |
J. Zou; X. Deng; T. Sun; |
178 | Lead Instrument Detection from Multitrack Music Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel approach to lead instrument detection in multitrack music audio by crafting expertly annotated datasets and designing a novel framework that integrates a self-supervised learning model with a track-wise, frame-level attention-based classifier. |
L. Ou; Y. Takahashi; Y. Wang; |
179 | Debiased Training For Semi-supervised Sound Event Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, traditional semi-supervised learning methods can lead to training instability and confirmation bias because of potentially incorrect pseudo labels. To address this issue, we propose the debiased training, a novel approach to reduce the inherent bias of pseudo labels. |
S. Xiao; X. Zhang; P. Zhang; Y. Yan; |
180 | Hierarchical Proxy Learning for Cloth-Changing Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is quite challenging due to the large intra-person variance and small inter-person variance caused by clothes changing. To address these issues, in this work we propose a Hierarchical Proxy Learning (HPL) framework to extract clothes-irrelevant and person-invariant features. |
C. Yu; X. Liu; J. Dai; P. Zhang; H. Lu; |
181 | A Cost-effective Solution for Remote Sensing Image Segmentation Via Train/Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 2) Target-data Train-time Fine-tuning: We propose a joint positive and negative learning (JPNL) algorithm that adds both positive and negative samples to effectively learn domain-invariant knowledge from noisy pseudo-labeled target data. |
W. Chen; |
182 | Fooling The Forgers: A Multi-Stage Framework for Audio Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel approach for audio deepfake detection using Generative Adversarial Networks (GANs) and contrastive learning in a multi-stage detection framework. |
G. S. Kashyap; |
183 | Breaking Through The Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we thoroughly investigate the spike property of CTC outputs and further propose the conjecture that adjacent frames to non-blank spikes carry semantic information beneficial to the model. Building on this, we propose the Spike Window Decoding algorithm, which greatly improves the inference speed by making the number of frames decoded in WFST linearly related to the number of spiking frames in the CTC output, while guaranteeing the recognition performance. |
W. Zhang; |
184 | Graph Structure Learning Via Transfer Entropy for Multivariate Time Series Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes GSLTE, a graph structure learning method for MTAD. |
M. Liu; Y. Wang; X. Zhou; Y. Wang; |
185 | Can AI See What We Can’t? Leveraging Deep Learning and Multi-Temporal Satellite Data to Revolutionize Crop Type Mapping and Yield Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an approach that combines advanced deep learning algorithms with Sentinel-2 and MODIS satellite data for improving the accuracy of crop type mapping and yield prediction. |
G. S. Kashyap; |
186 | Exploring Temporal Constraints for Unsupervised Iris Motion Tracking in AS-OCT Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Temporal Constraint-based Tracking Morph (TCTMorph) for estimating iris trajectory in long-term AS-OCT videos. |
L. Hu; |
187 | Local Statistics for Generative Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we highlighted the effectiveness of Bayer pattern and local statistics in distinguishing digital camera images from DM-generated images. |
Y. J. Wong; T. K. Ng; |
188 | Modality Modulation and Dual Consistency for Multi-Modality Semi-Supervised Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: (2) The use of generative methods to leverage unlabeled data may not be reliable for SSL learning. To address these challenges, we propose Modality Modulation Dual Consistency, dubbed MM-DC. |
Y. Chen; Z. Yang; D. Xiong; Y. Zhang; |
189 | Enhancing Emotion Reasoning for Image Multi-Emotion Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing researchers primarily focus on analyzing image features, which are limited to the perceptual level, leading to a superficial understanding of emotions. To address this gap, we propose an Emotional Reasoning Chain (EReC) based on a multimodal large language model, which learns both perception and reasoning abilities for multi-emotion prediction. |
B. Wang; |
190 | Multi-view Feature Discrepancy Attack for Single Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by traditional military camouflage, we propose a texture pattern attack method with a similar implementation, called the Multi-View Feature Discrepancy Attack (MFDA). |
Z. Li; W. Zhimin; Y. Wang; |
191 | TopoRefine: Iterative Refinement with Reasoning Topology As High-Level Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method named TopoRefine, which integrates the reasoning topology within the model’s outputs as reliable high-level feedback. |
H. Liao; S. Hu; Z. Zhu; H. He; Y. Jin; |
192 | Retinex-Based Self-Conditioned Diffusion Model for Low-Light Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issue, we propose Retinex-Based Self-Conditioned Diffusion Models, dubbed RSCDM, which utilizes self-conditioned illumination representation learning and representation guidance enhancement to generate high-quality image. |
J. Zhang; Z. Li; J. Zhang; Y. Wang; |
193 | Faithful Self-Refinement in Mathematical Reasoning Via Progressive Back-Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel framework called Progressive Back-Translation refinement (PBT). |
H. Liao; Z. Zhu; S. Hu; H. He; Y. Jin; |
194 | Map-Guided Few-Shot Audio-Visual Acoustics Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To sufficiently exploit the provided few-shot data for accurate acoustic modeling, we present a map-guided framework by constructing acoustic-related visual semantic feature maps of the scenes. |
D. Huang; K. Lin; P. Chen; Q. Du; |
195 | Unveiling Local Well-posedness Influence for Cross-modal Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we introduce a mask-based local well-posedness modeling (MLWM) strategy, including text-based entity masking (TEM), text-based attribute-specific masking (TAM), and image-based appearance masking (IAM) to phased collaboratively consider image prompting-based text entities, image prompting-based text attributes, and text prompting-based appearance inference contrast, respectively. |
Y. Yang; G. -N. Dong; A. Zhu; M. Ni; Y. Li; |
196 | G-Depth: An Efficient Graph Method for Robust Depth Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The absence of these data can mislead the network into overfitting incorrect patterns, ultimately diminishing the performance of depth completion models. To tackle these challenges, we propose G-Depth, a method that innovatively involves Graph Neural Networks (GNNs) into a two-branch backbone. |
Z. Huang; Y. Chen; A. Aiersilan; L. Li; |
197 | Complementary Graph Learning and Prompt-based Cross-modal Generation for Missing-modality Fake News Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel fake news detection approach named Complementary Graph learning and Prompt-based cross-modal Generation network (CG-PG), which contains two main modules: a complementary graph learning module and a prompt-based cross-modal generation module. |
F. Wu; R. Zhou; C. Hu; Q. Huang; X. -Y. Jing; |
198 | ComplexDec: A Domain-robust High-fidelity Neural Audio Codec with Complex Spectrum Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Neural audio codecs have been widely adopted in audio-generative tasks because their compact and discrete representations are suitable for both large-language-model-style and … |
Y. -C. Wu; D. Marković; S. Krenn; I. D. Gebru; A. Richard; |
199 | Explainable Detection of Alzheimer’s Disease Through Analysis of Human Behavior in Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a computer vision-based method to detect AD using behavioral data collected from the Timed Up and Go (TUG) test and the Cookie Theft (CT) picture description task. |
B. -H. Huang; P. -C. Kuo; L. Huang; C. -J. Hu; C. -Y. Chen; |
200 | A Robust Distributed Recurrent Neural Network for Multi-Agent Consensus Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the control accuracy of RNN-based systems can be compromised by noise interference, and there has been little research on RNN-based control in disturbed multi-agent systems. To address this, we developed an enhanced Distributed RNN (DRNN) structure and proposed a Novel DRNN-based Control Protocol (NDRNN-CP). |
Y. Li; |
201 | Signal Processing Challenges in Automotive Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As automotive radars continue to proliferate, there is a continuous need for improved performance and several critical problems that need to be solved. All of this is driving research across industry and academia. |
S. Rao; R. Narasimha; S. Sun; |
202 | STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Text-to-video generation has made remarkable progress over the past year, but the absence of harmonious audio in generated video limits its broader applications. In this paper, we propose Semantic and Temporal Aligned Video-to-Audio (STA-V2A), an approach that enhances audio generation from videos by extracting both local temporal and global semantic video features and combining these refined video features with text as cross-modal guidance. |
Y. Ren; |
203 | Improving Acoustic Scene Classification in Low-Resource Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores ASC in low-resource conditions and proposes a novel model, DS-FlexiNet, which combines depthwise separable convolutions from MobileNetV2 with ResNet-inspired residual connections for a balance of efficiency and accuracy. |
Z. Chen; Y. -F. Shao; Y. Ma; M. Wei; L. Zhang; W. -Q. Zhang; |
204 | Spectral Enhancement and Pseudo-Anchor Guidance for Infrared-Visible Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current studies relying on unsupervised modality transformations as well as inefficient embedding constraints to bridge the spectral differences between infrared and visible images, however, limit their potential performance. To tackle the limitations of the above approaches, this paper introduces a simple yet effective Spectral Enhancement and Pseudo-anchor Guidance Network, named SEPG-Net. |
Y. Ge; Z. Chen; Z. Wang; J. Kang; M. Zhang; |
205 | WMRE: Enhancing Distant Supervised Relation Extraction with Word-level Multi-instance Learning and Multi-hierarchical Feature Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Distant supervised relation extraction (DSRE) obtains large amounts of data cost-effectively by aligning knowledge base with natural texts but also brings noisy data. Existing … |
X. Ying; X. Xie; T. Xu; Y. Zhao; Z. Meng; M. Zhao; |
206 | SkeletonMix: A Mixup-Based Data Augmentation Framework for Skeleton-Based Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the problem by proposing a comprehensive data augmentation framework named SkeletonMix, which contains a pair sample selection module for mixup and random augmentations tailored for skeleton modality. |
Z. Zhang; H. Zhou; Q. Liu; Y. Wang; |
207 | Exploring The Interpretability of EEG-Inception Convolutional Neural Networks for Epilepsy Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the challenges above, this paper proposes a novel approach—an epilepsy prediction framework that combines EEG-Inception Convolutional Neural Networks (EICNN) with Feature Pattern Interpretability Post-processing (FPIP). |
G. Zhang; T. Wang; J. Guo; Z. Yang; Y. Wu; G. Kang; |
208 | Self-Optimization Training for Weakly Supervised Image Manipulation Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing fully supervised methods require large amounts of costly pixel-level annotations, whereas weakly supervised methods often fall short in localization performance due to their inability to accurately localize tampered regions with precise boundaries. To tackle this issue, we propose a Self-Optimization Weakly Supervised Localization (SO-WSL) framework, which consists of two main components: a Pseudo-Label Generator (PLG) and a Self Iterative Optimization (SIO) module. |
Z. Zhu; J. Li; Y. Wen; |
209 | DCD-MUSIC: Deep-Learning-Aided Cascaded Differentiable MUSIC Algorithm for Near-Field Localization of Multiple Sources Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces deep-learning-aided cascaded differentiable MUSIC (DCD-MUSIC) that augments MUSIC near-field localization with dedicated deep neural networks (DNNs), allowing it to operate reliably and interpretably. |
A. Gast; L. Le Magoarou; N. Shlezinger; |
210 | A Reinforcement Learning Agent Controlled Multi-branch Small Object Detection Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate the issues, we propose a multi-branch small object detection framework with a regular-scale detection branch and a small-scale detection branch. |
J. Hong; Y. Long; Y. Luo; L. Hua; J. Long; Q. Qi; |
211 | HieClip: Hierarchical CLIP with Explicit Alignment for Zero-Shot Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the hierarchical alignment clip(HieClip) framework, to achieve hierarchical alignment between images and text. |
L. Hua; X. Su; Y. Luo; S. You; J. Long; |
212 | UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, training large ASR models from scratch remains costly. To address this issue, we introduce UME, a novel method that efficiently Upcycles pretrained dense ASR checkpoints into larger Mixture-of-Eperts (MoE) architectures. |
L. Fu; S. Yu; S. Li; L. Fan; Y. Wu; X. He; |
213 | Extending MPR for Locating A Moving Object Based on TDOA and FDOA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Maximum Likelihood Estimator (MLE) implemented by Gauss-Newton (GN) iteration based on the extended MPR to estimate the position and velocity of source and test the performance under the new MPR. |
B. Tang; Y. Sun; X. Heng; Y. Yang; L. Chen; |
214 | Seek and Solve Reasoning for Table Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper reveals that the reasoning process during task simplification may be more valuable than the simplified tasks themselves and aims to improve TQA performance by leveraging LLMs’ reasoning capabilities. We propose a Seek-and-Solve pipeline that instructs the LLM to first seek relevant information and then answer questions, integrating these two stages at the reasoning level into a coherent Seek-and-Solve Chain of Thought (SS-CoT). |
R. Jiang; C. Wang; W. Deng; |
215 | TS-Net: Assembling Task-specific Features from Multiple Feature Levels for Multi-task Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the impact of multilevel features on different tasks and propose a novel level-assembling MTL architecture named TS-Net. |
C. Liu; Z. Wan; P. Wang; X. Wang; X. Fan; |
216 | Hybrid Content Caching Empowered By AIGC in Wireless Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach that integrates artificial intelligence-generated content (AIGC) into the BS operations. |
D. Xu; L. Duan; H. Zhu; |
217 | Reduced Effectiveness of Kolmogorov-Arnold Networks on Functions with Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an oversampling technique combined with denoising to alleviate the impact of noise. |
H. Shen; C. Zeng; J. Wang; Q. Wang; |
218 | Learned Approximated Optimization for Rapid Low-Complexity Hybrid Beamforming Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide approximations to the optimization process of projected gradient ascent based hybrid precoders, which drastically reduce the computational complexity. |
A. Milstein; T. Yablonka; N. Shlezinger; |
219 | ECG-guided Individual Identification Via PPG Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, a novel cross-modal knowledge distillation framework is implemented to propagate discriminate knowledge from ECG modality to PPG modality without incurring additional computational demands at the inference phase. |
R. Wei; H. Chen; K. Yao; C. Yang; J. Wang; C. Li; |
220 | Atom-Constrained Maximum Likelihood Gridless DOA with Wirtinger Gradients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This approach enables using Wirtinger gradients for DOA. |
P. Gerstoft; Y. Park; |
221 | Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a simple yet effective automatic process for creating speech-text pair data that carefully injects speech paralinguistic understanding abilities into SLMs while preserving the inherent language capabilities of the text-based LLM. |
K. -H. Lu; |
222 | MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the pre-trained audio encoder has constrained capacity to capture features for new tasks and datasets. To address this, we propose to incorporate mixtures of ‘weak’ encoders (MoWE) into the AudioLLM framework. |
W. Zhang; |
223 | DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While automated audio captioning (AAC) has made notable progress, traditional fully supervised AAC models still face two critical challenges: the need for expensive audio-text pair data for training and performance degradation when transferring across domains. To overcome these limitations, we present DRCap, a data-efficient and flexible zero-shot audio captioning system that requires text-only data for training and can quickly adapt to new domains without additional fine-tuning. |
X. Li; |
224 | Identity-Agnostic Learning for Deepfake Face Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel training approach called Identity-Agnostic Learning (IAL) for deepfake face detection. |
X. Zhou; Z. Deng; Q. Zhao; |
225 | VN-GT: Optimizing Virtual Network Deployment Via Game Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing studies have focused solely on the defender’s perspective, resulting in overly idealistic solutions that are ineffective in real-world scenarios. To address this, we propose VN-GT, a game-theoretic based model that optimizes virtual network deployment by considering both attackers and defenders. |
W. Wang; |
226 | MUPO-Net: A Multilevel Dual-domain Progressive Enhancement Network with Embedded Attention for CT Metal Artifact Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a multilevel dual-domain progressive enhancement network with embedded attention for MAR, termed MUPO-Net. |
X. Yao; J. Tan; Z. Deng; D. Xiong; Q. Zhao; M. Wu; |
227 | Semantic-oriented Visual Prompt Learning for Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior studies reveal that PEFT methods’ extended parameters do not directly contribute to semantic perception, limiting performance with significant category and domain gaps. To address this, we propose semantic-oriented visual prompt learning (SVPL), which enhances semantic perception and improves task-specific knowledge extraction. |
S. Guo; |
228 | Efficient Hierarchical Domain Adaptive Thermal Infrared Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the domain discrepancy between TIR and RGB images limits effective utilization of RGB features, significantly degrades TIR tracking performance. To solve this challenge, we propose a hierarchical domain adaptation model to transfer useful pre-trained RGB features into TIR tracking more effective and efficient. |
Q. Li; K. Tan; Q. Liu; D. Yuan; X. Li; Y. Liu; |
229 | DASSL: Domain Agnostic Self-Supervised Learning with Multiple Missing Information Reconstruction Branches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel Domain Agnostic Self-Supervised Learning framework called DASSL, which learns superior high-level feature representations of samples by reconstructing the samples’ missing information in the representation space. |
J. Fang; |
230 | Harnessing Dimensional Contrast and Information Compensation for Sentence Embedding Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this approach can lead to over-compression and dimensional contamination from noisy data augmentation and unconstrained ICL processes. To mitigate these issues, we propose a novel enhancement method, MSSE, which incorporates an Information Compensation Mechanism (ICM) and a Dimensional-Level Contrastive Learning Mechanism (DCM). |
K. He; |
231 | Dual Encoders for Diffusion-based Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we utilize dual encoders – a Convolutional Neural Network (CNN) encoder and the pre-trained Variational AutoEncoder (VAE) encoder, to encode masked images. |
D. Zheng; K. Deng; J. Wang; L. Shen; |
232 | Hypergraph-Based Dynamic Graph Node Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods based on RNNs and self-attention only aggregate features of the same node across different time slices, which cannot adequately address and capture the diverse dynamic changes in dynamic graphs. Therefore, we propose a novel model named Hypergraph-Based Multi-granularity Dynamic Graph Node Classification (HYDG). |
X. Ma; C. Zhao; M. Shao; Y. Lin; |
233 | Bidirectional Reference Image Quality Assessment Via Content-Quality Correlation Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The emphasis on no-reference image quality assessment has often overshadowed the significance of Full-Reference Image Quality Assessment (FR-IQA), which generally better reflects … |
B. Hu; W. Chen; C. Li; J. Leng; W. Li; X. Gao; |
234 | PDSeg: Patch-Wise Distillation and Controllable Image Generation for Weakly-Supervised Histopathology Tissue Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the recent success of the teacher-student strategy in various vision tasks, we present a transformer-based weakly supervised framework that distills knowledge from a CNN teacher. |
W. -H. Li; Y. -H. Hsieh; H. -F. Yang; C. -S. Chen; |
235 | Redefining Well Exposedness for Locally Adaptive Multi-Exposure Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we work on the fundamentals to define a novel, well-exposedness function along with a tile-based processing approach. |
P. Arya; S. Kumar; A. Agarwal; N. Yenneti; N. Pai; |
236 | GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, another challenge in DocRE is the insufficient extraction of complex cross-relations between long-distance entities, which is often due to an inadequate understanding of long-distance semantic text. To overcome these challenges, we propose GEGA, a novel model for DocRE. |
Y. Mao; X. Chen; P. Liu; T. Cui; Z. Yue; Z. Li; |
237 | Learn from Balance: Rectifying Knowledge Transfer for Long-Tailed Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework called Knowledge Rectification Distillation (KRDistill) to address the imbalanced knowledge inherited in the teacher network through the incorporation of the balanced category priors. |
X. Huang; J. Tang; X. Zheng; J. Zhou; W. Yu; N. Jiang; |
238 | Low-Complexity Own Voice Reconstruction for Hearables with An In-Ear Microphone Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the limited computational resources available on hearables, in this paper we propose low-complexity variants of an OVR system based on the frequency and time joint non-linear filter (FT-JNF) architecture and investigate the required amount of device-specific recorded signals for effective data augmentation and fine-tuning. |
M. Ohlenbusch; C. Rollwage; S. Doclo; |
239 | Global Tropical Cyclone Intensity Forecasting with Multi-modal Multi-scale Causal Autoregressive Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current methods predominantly rely on limited spatiotemporal information from ERA5 data and neglect the causal relationships between these physical variables, failing to fully capture the spatial and temporal patterns required for intensity forecasting. To address this issue, we propose a Multi-modal multi-Scale Causal AutoRegressive model (MSCAR), which is the first model that combines causal relationships with large-scale multimodal data for global TC intensity autoregressive forecasting. |
X. Wang; K. Chen; L. Liu; T. Han; B. Li; L. Bai; |
240 | An Adaptive Framework for Multi-View Clustering Leveraging Conditional Entropy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite significant advancements, existing MVC methods struggle with effectively quantifying the consistency and complementarity among views, and are particularly susceptible to the adverse effects of noisy views, known as the Noisy-View Drawback (NVD). To address these challenges, we propose CE-MVC, a novel framework that integrates an adaptive weighting algorithm with a parameter-decoupled deep model. |
L. Li; Y. He; C. -M. Pun; |
241 | Multi-label Body Constitution Recognition Via Dual Transform MLP-like Architecture Using Tongue Images Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: According to the traditional Chinese medicine composite constitution theory, body constitution recognition is modeled as a unique task of multi-label problems using tongue images. … |
M. Zhang; G. Wen; P. Yang; |
242 | Graph-Driven Insights: Enhancing Stock Market Prediction with Relational Temporal Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel approach to predicting stock returns based on Relational Temporal Graph Neural Networks (RTGNN). |
R. Jia; K. Yang; D. Cheng; L. Han; Y. Liang; |
243 | Training Better Embedding With Perturbed Data Augmentation for Automatic Singing Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we incorporate perturbed augmented data strategies into the training of a singing quality assessment (SQA) network without additional annotations. |
P. -W. Chen; V. -W. Soo; |
244 | RetinaStereo: Dynamic-Volume Stereo Matching Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing stereo matching techniques often struggle with detailing subtle objects on depth edges. To alleviate this problem, we introduced the Dynamic-Range Disparity Initialization module, which integrates three complementary branches: the dynamic dense volume for localized disparity sampling, the sparse global volume for encoding search center information, and the background static volume with skip connections for enhancing depth edge accuracy. |
X. Liao; |
245 | One-step Incomplete Multi-view Clustering Based on Bipartite Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Practical applications may contain some missing instances, which require Incomplete Multi-View Clustering (IMVC) methods to hold them. In this paper, we propose a novel method named One-step Incomplete Multi-View Clustering based on Bipartite Graph Learning (OIMVC-BGL) which aims to solve the above problems. |
M. Li; H. Lin; H. Xu; Z. Wang; X. Zhu; X. Huang; |
246 | Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a robust loudspeaker beamforming algorithm which is used to enhance the performance of voice driven applications in scenarios where the loudspeakers introduce the majority of the noise, e.g. when music is playing loudly. |
D. de Groot; B. Karslioglu; O. Scharenborg; J. Martinez; |
247 | Deep Feedback Cancellation for Hearing Aids with Improved System Stability and Sound Quality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel approach inspired by traditional adaptive filtering and deep learning techniques to achieve a significantly faster convergence and lower steady-state errors at the same time. |
E. Lydaki; Z. -H. Tan; J. Jensen; M. Guo; |
248 | Towards Efficient Deep Hashing Retrieval: Condensing Your Data Via Feature-Embedding Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the effect of mainstream dataset condensation methods for deep hashing retrieval and propose IEM (Information-intensive feature-Embedding Matching), which is centered on distribution matching and incorporates model and data augmentation techniques to further enhance the feature of hashing space. |
T. Feng; J. Zhang; H. Liu; Z. Wang; S. Pang; |
249 | Latent Watermarking of Audio Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we introduce a method that watermarks latent audio generative models by directly watermarking their training data. |
R. S. Roman; P. Fernandez; A. Deleforge; Y. Adi; R. Serizel; |
250 | Volatile MAB-based Configuration Selection for Offloading Video Analytics Tasks to Edges Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The demand for video analytics is increasing rapidly. Due to the limited computational and network resources on edge servers, adjusting video configurations such as resolution and … |
Y. Liang; S. Zhang; J. Wu; |
251 | Distributed IRSs Mitigate Spatial Wideband & Beam Split Effects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a distributed IRS design that aids in naturally combating the SW/BSP effects by parallelizing the spatial delays across the multiple IRSs. |
L. Yashvanth; C. R. Murthy; B. D. Rao; |
252 | Towards Feature-Consistent Parameter Collaboration for Personalized Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces FedFPC, a PFL method that allows effective and robust parameter-wise collaboration to achieve outperforming performance. |
X. Lu; J. Li; Y. Zhang; W. Wang; |
253 | Try Before You Buy: Solving Multi-Model Complex Tasks By Model Competitions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, heuristically binding one model to one subtask may generate a less satisfying subtask result, thereby affecting the overall performance. Therefore, we propose CompeMLLM, which introduces an innovative method of dynamic orchestration of the workflows. |
Y. Zhao; |
254 | Enhanced Corneal Endothelial Cell Segmentation Via Frequency-Selected Residual Fourier Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is further compounded by labor-intensive manual annotations and a lack of large annotated datasets. To address these issues, we introduce a novel two-stage framework using Denoising Diffusion Probabilistic Models (DDPMs) for generating training pairs of corneal endothelial cell images. |
T. Wang; X. Nan; Y. Wang; Y. Yan; Z. Gao; J. Liu; |
255 | Spatially-Aware Cross-Modal Contrastive Learning for Low-Shot HSI Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing self-supervised methods for HSI data focus predominantly on spectral attributes, neglecting the spatial details crucial for effective HSI classification. To address this, we introduce the Cross-Modal Spatial Contrastive (CM-SCON) framework, a novel self-supervised approach that employs co-registered, unlabeled HSI and LiDAR data. |
A. Vasim; P. Kashyap; S. Choudhury; B. Banerjee; |
256 | Sampling Nonsmooth Log-Concave Densities: A Comparative Study of Primal-Dual Based Proposal Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Sampling from a distribution on the real d-space, whose density is nonsmooth and log-concave, is a computational issue that often arises in Machine Learning and Statistics. Langevin-based Hastings-Metropolis methods were proposed: they extend the Unadjusted Langevin Algorithm by using proximal methods to define a smoothed version of the density of interest. |
J. Chevallier; G. Fort; |
257 | A Spectrum-enhanced Attention Model for Semantic Segmentation of Remote Sensing Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Spectrum-Enhanced Network (SPENet) that leverages the Frequency Transformer Block (FTB) to capture rich spectral context. |
X. Li; |
258 | When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a log-likelihood ratio (LLR) approach to analyze the comparative benefits of visual prompting and linear probing. |
H. -A. Tsao; L. Hsiung; P. -Y. Chen; T. -Y. Ho; |
259 | Accelerating Convergence in Bounding Box Regression with A Refined IoU Loss Function Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: (ii) There is a spatial imbalance caused by the disproportionate influence of anchor boxes with minimal overlap with the ground truth boxes. To resolve these two challenges, this paper proposes a novel loss function termed Fast-IoU, designed to swiftly and precisely measure the overlap area and aspect ratio in BBR. |
E. Chai; X. Li; T. Cui; Z. Lu; F. B. Tesema; |
260 | Dual Trajectory Revised Diffusion Model for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Dual Trajectory Revised Diffusion Model (TimeDTR) for time-series forecasting, which leverages an unconventional conditioning strategy to incorporate the historical information into both forward and backward trajectories in the diffusion model. |
Z. Hu; |
261 | Ambisonics Binaural Rendering Via Masked Magnitude Least Squares Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Magnitude Least Squares rendering became the de facto standard for this, which discards high-frequency interaural phase information in favor of reducing magnitude errors. Building upon this idea, we suggest Masked Magnitude Least Squares, which optimized the Ambisonics coefficients with a neural network and employs a spatio-spectral weighting mask to control the accuracy of the magnitude reconstruction. |
O. Berebi; F. Brinkmann; S. Weinzierl; B. Rafaely; |
262 | Rethinking Dual-Stream Super-Resolution for Enhancing Remote Sensing Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We have re-evaluated existing dual-stream learning frameworks and identified their limitations in focusing on small objects. To address this issue, we propose a Dual-Stream Object Detection (DSOD) framework, which incorporates a Feature Fusion Guidance Module (FFGM). |
A. Luo; K. Hu; K. Jiang; |
263 | Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a pioneering effort to investigate the capability of LLMs in transcribing speech in multi-talker environments, following versatile instructions related to multi-talker automatic speech recognition (ASR), target talker ASR, and ASR based on specific talker attributes such as sex, occurrence order, language, and keyword spoken. |
L. Meng; |
264 | SSFMamba: Spatial-Spectral Fusion State Space Model for Pansharpening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we propose spatial-spectral fusion state space model (SSFMamba), which consists of multi-scale spatial-wise visual state space (MSpa-VSS) block, bi-directional spectral-wise visual state space (BSpe-VSS) block, and gated spatial-spectral fusion (GSSF) block. |
M. Ma; M. Zhao; Y. Jiang; X. Li; W. Zhang; |
265 | MSA-ITEI: A Novel Method for Multimodal Analysis of Social Media Stickers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current research is limited by these characteristics and the lack of datasets. To address this gap, we introduce MSA-ITEI: Multimodal Sticker Analysis through Image, Text, and underlying Emotions and Intentions. |
Y. Shi; F. Kong; |
266 | Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our visualization reveals that CTC guides the encoder to represent different speakers in distinct temporal regions of acoustic embeddings. Leveraging this insight, we propose a novel Speaker-Aware CTC (SACTC) training objective, based on the Bayes risk CTC framework. |
J. Kang; |
267 | Efficient Streaming LLM for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent works have shown that prompting large language models with audio encodings can unlock speech recognition capabilities. However, existing techniques do not scale … |
J. Jia; |
268 | Efficient Co-Approximate Parallel Compressive Depth Reconstruction on FPGA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A new co-approximate framework of parallel approximate compressive depth reconstruction engine on FPGA is proposed using ℓ1 solvers, proximal gradient decent (PGD), with instrumented frequency and voltage scaling during the iterative optimization process. |
Y. Wu; J. McAllister; |
269 | Modular Prompt Learning Improves Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Modular Prompt Learning (MPL) that is designed to promote the preservation of information contained in the inserted prompts. |
Z. Huang; T. Pedapati; P. -Y. Chen; J. Gao; |
270 | First-order State Space Model for Lightweight Image Super-resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the First-order State Space Model (FSSM) to improve the original Mamba module, enhancing performance by incorporating token correlations. |
Y. Zhu; X. Zhang; Y. Lu; G. Yang; F. Fang; G. Zhang; |
271 | Leveraging Registers in Vision Transformers for Robust Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we examine the utility of register token embeddings in providing additional features for improving generalization and anomaly rejection. |
S. Yellapragada; |
272 | Segment Any Bone in CT with Partial Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore a less studied problem setting that assumes only partially labeled bone CT data. |
T. Liang; X. Li; Y. Peng; M. Xu; |
273 | Whitening Effects for ML-DoA Estimation Using A Sparse Representation of Array Covariance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using the whitened model, we recently shown equivalence between sparse DoA estimators and the ML thereby enabling efficient implementation of ML DoA estimation under white Gaussian noise.In this work, the noise pre-whitening transform is shown to significantly improve the sparse problem conditioning by spatially decorrelating the dictionary vectors associated to sources directions thus simplifying the implementation of ML DoA estimation with a sparse representation. |
T. Aussaguès; A. Ferréol; A. Delmer; P. Larzabal; |
274 | Towards Fully Test-Time Adaptation Via Variance Balancing and Semantic Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional methods like entropy minimization primarily focus on reducing uncertainty in output predictions, yet often overlook the diversity in target prediction results, which is critical for unbalanced classes in complex datasets. To address this, our study introduces a new method named Variance Balancing and Semantic Augmentation (VBSA). |
H. Su; B. Wang; D. Liu; J. Li; C. -B. Feng; C. -M. Vong; |
275 | Enhancing Multimodal Emotion Recognition Through Multi-Granularity Cross-Modal Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such a narrow focus not only limits model performance but also fails to address the complexity and ambiguity inherent in emotional expressions. In response, this paper introduces a Multi-Granularity Cross-Modal Alignment (MGCMA) framework, distinguished by its comprehensive approach encompassing distribution-based, instance-based, and token-based alignment modules. |
X. Wang; S. Zhao; H. Sun; H. Wang; J. Zhou; Y. Qin; |
276 | SpeechCaps: Advancing Instruction-Based Universal Speech Models with Multi-Talker Speaking Style Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a multi-talker speaking style captioning task to enhance the understanding of speaker and prosodic information. |
C. -Y. Huang; M. -H. Shih; K. -H. Lu; C. -Y. Hsiao; H. -Y. Lee; |
277 | FasterGold-DETR: An Efficient End-to-End Fire Detection Model Via Gather-and-Distribute Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the performance of current YOLO-based detection models is limited by NMS, and DETR-based detection models struggle with real-time performance. To address these challenges, a new fire detection model, FasterGold-DETR, is proposed. |
C. Liu; F. Wu; L. Shi; |
278 | Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization for Enhanced Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, time series data is uniquely challenging due to significant distribution shifts and intrinsic noise levels. To address these two challenges, we introduce the Sparse Vector Quantized FFN-Free Transformer (Sparse-VQ). |
Y. Zhao; T. Zhou; C. Chen; L. Sun; Y. Qian; R. Jin; |
279 | DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose DiffAttack, a novel timbre-reserved adversarial attack approach, that exploits the capability of a diffusion-based voice conversion (DiffVC) model to generate adversarial fake audio with distinct target speaker attribution. |
Q. Wang; J. Yao; Z. Sun; P. Guo; L. Xie; J. H. L. Hansen; |
280 | SlotFusion: Object-Centric Audiovisual Feature Fusion with Slot Attention for Remote Sensing Scene Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce an object-centric feature fusion method named SlotFusion. |
F. Han; T. Yu; L. Zhang; L. Si; Y. Zhang; |
281 | Adversarial Knowledge Transfer for Black-Box Model Inversion Attack Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new black-box model inversion attack, Label-Controlled Adversarial Knowledge Transfer (L-AdKT). |
X. Liu; Z. Lin; Y. Jiang; Q. Yan; |
282 | Efficient Object Placement Via LLM and Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage the LLM to predict the coordinates of the added object with the help of user instruction. |
W. Liu; L. Wang; J. Sun; |
283 | Audio Array-Based 3D UAV Trajectory Estimation with LiDAR Pseudo-Labeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, this paper introduces a novel framework that utilizes audio array for 3D UAV trajectory estimation. |
A. H. . -X. Lei; T. Deng; H. Wang; J. Yang; S. Yuan; |
284 | A Novel Network for Short-Term Wind Speed Prediction: Mitigating Distribution Shift and Feature Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Distribution Shift and Feature Decoupling Network (DSFD-Net), which addresses the issue of distributional shifts occurring both within the input series and between the input and predicted series through a distribution matching model and distribution mapping module, respectively. |
M. Yu; S. Dong; X. Li; Z. Shang; Y. Sun; Z. Liu; |
285 | Indoor Sensing with Measurements Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Many of these require indoor sensing capabilities, which can be realized by exploiting the perturbation in the indoor channel. In this work, we conduct an indoor channel measurement campaign to study these perturbations and develop AI-based algorithms for estimating sensing parameters. |
V. Yajnanarayana; P. Geuer; S. Dwivedi; |
286 | TAME: Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The increasing prevalence of compact UAVs has introduced significant risks to public safety, while traditional drone detection systems are often bulky and costly. To address these challenges, we present TAME, the Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification. |
Z. Xiao; H. Hu; G. Xu; J. He; |
287 | Pruning for Sparse Diffusion Models Based on Gradient Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus we propose a iterative pruning method based on gradient flow, including the gradient flow pruning process and the gradient flow pruning criterion. |
B. Wan; T. Zheng; Z. Chen; Y. Wang; J. Wang; |
288 | UniFaceGAN: High-Quality 3D Face Editing With A Unified Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce UniFaceGAN, a novel framework for 3D facial editing, leveraging a unified latent space to facilitate diverse and user-friendly 3D facial manipulation. |
J. Wei; Z. Zhang; R. Liao; D. Gao; |
289 | Diffusion Augmentation Sub-center Modeling for Unsupervised Anomalous Sound Detection with Partially Attribute-Unavailable Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, limited sample diversity in the target domain further hinders learning robust discriminative features. To address these challenges, we propose a diffusion augmentation sub-center modeling (DASM) approach for embedding learning. |
J. Yin; Y. Gao; W. Zhang; T. Wang; M. Zhang; |
290 | Speech Enhancement Using Continuous Embeddings of Neural Audio Codec Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel, efficient SE approach by leveraging the pre-quantization output of a pretrained NAC encoder. |
H. Li; J. Q. Yip; T. Fan; E. S. Chng; |
291 | LLM-Guided Dual-Branch Diffusion Model for Fine-Grained Motion Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce LGDDM, a LLM-Guided Dual-branch Diffusion Model, which relies on the LLM’s ability to decompose text and provide fine-grained and interpretable guidance to generate motions. |
W. Wang; D. Gao; X. Liu; |
292 | Feature Refinement Decomposition and Relation Preference Enhancement for Remote Sensing Change Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel feature refinement methodology guided by relation preferences, specifically designed for RSCD tasks. |
W. Zheng; |
293 | Pre-training with Synthetic Patterns for Audio Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to pre-train audio encoders using synthetic patterns instead of real audio data. |
Y. Ishikawa; T. Komatsu; Y. Aoki; |
294 | Partial Reconstruction Error for Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Partial Reconstruction Error to perform deepfake detection based on the reconstruction of masked regions in an image. |
Y. Zhang; Z. Meng; B. Peng; J. Dong; B. Chu; W. Wang; |
295 | Leveraging Out-of-Domain Noise for Unsupervised Domain Adaptation in Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce PHA-ReMixIT, a novel approach for leveraging out-of-domain (OOD) noise signals to enhance unsupervised domain adaptation in SE. |
Y. Liao; H. Guan; S. Wei; Y. Long; |
296 | Birds of A Feather: Learning to Retrieve Dance Poses From Music Via Ground-Truth Annotation Lifting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we found that the mainstream choreography dataset lacks discriminative power in terms of 3D pose and shape annotations across different dance genres, hindering the model’s ability to learn effective mappings, which in turn reduces retrieval performance. To address the issue, we propose LiftNet, a deep-net model that uses dance genres as guidance to lift 3D pose and shape annotations, making them more discriminative and easier for the downstream retrieval model to learn. |
B. -W. Tseng; W. -L. Wei; J. -C. Lin; |
297 | SimulTron: On-Device Simultaneous Speech to Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, achieving accurate, real-time translation through mobile devices remains a major challenge. We introduce SimulTron, a novel S2ST architecture designed to tackle this task. |
A. Agranovich; |
298 | Image Compressive Sensing With Adaptive Sampling By Median Filtering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: (2) Traditional CNN is difficult to capture broader contextual priors during iterative recovery. In this paper, we propose a novel network ASMFNet to solve the above two issues. |
Y. Wu; C. Hui; R. Liao; S. Liu; D. Zhao; |
299 | Exploring Kolmogorov-Arnold Networks for Realistic Image Sharpness Assessment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study introduces the Taylor series-based KAN (TaylorKAN). |
S. Yu; Z. Chen; Z. Yang; J. Gu; B. Feng; Q. Sun; |
300 | Improving 5G Positioning Through Signal-to-Noise Ratio Recognition Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate that integrating SNR recognition training with the positioning network can enhance localization performance. |
W. Zheng; |
This table only includes 300 papers selected based on paper id in proceeddings. To continue with the full list (~3,300 papers), please visit Paper Digest: ICASSP-2025 (Full List).