Paper Digest: ICASSP 2024 Papers & Highlights
Note: ICASSP-2024 accepts more than 2,700 papers, this page only includes 300 of them based on paper id in proceedings. Interested users can choose to read All ~2,700 ICASSP-2024 papers in a separate page, which takes quite some time to load.
To search for papers presented at ICASSP-2024 on a specific topic, please make use of the search by venue (ICASSP-2024) service. To summarize the latest research published at ICASSP-2024 on a specific topic, you can utilize the review by venue (ICASSP-2024) service.
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Try us today and unlock the full potential of our services for free!
TABLE 1: Paper Digest: ICASSP 2024 Papers & Highlights
Paper | Author(s) | |
---|---|---|
1 | Fastmandarin: Efficient Local Modeling for Natural Mandarin Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Attention-based speech synthesis methods often suffer from dispersed attention across the entire input sequence, resulting in poor local modeling and unnatural Mandarin synthesized speech. To address these issues, we present FastMandarin, a rapid and natural Mandarin speech synthesis framework that employs two explicit methods to enhance local modeling and improve pronunciation representation. |
C. Jiang; Y. Gao; H. Jin; L. Pan; W. W. Y. Ng; |
2 | Ultra Low Complexity Deep Learning Based Noise Suppression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces an innovative method for reducing the computational complexity of deep neural networks in real-time speech enhancement on resource-constrained devices. |
S. S. Shetu; S. Chakrabarty; O. Thiergart; E. Mabande; |
3 | Binaural Rendering of Heterogeneous Sound Sources with Extent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an approach for binaural rendering of heterogeneously extended sound sources. |
C. Anemüller; O. Thiergart; E. A. P. Habets; |
4 | NOLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Shaping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A short-coming of the LACE model is, however, that quality quickly saturates when the model size is scaled up. To mitigate this problem, we propose a novel adatpive temporal shaping module that adds high temporal resolution to the LACE model resulting in the Non-Linear Adaptive Coding Enhancer (NoLACE). |
J. Büthe; A. Mustafa; J. -M. Valin; K. Helwani; M. M. Goodwin; |
5 | Music Source Separation With Band-Split Rope Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel frequency-domain approach (called BS-RoFormer) based on a Band-Split RoPE Transformer architecture. |
W. -T. Lu; J. -C. Wang; Q. Kong; Y. -N. Hung; |
6 | Mtdiffusion: Multi-Task Diffusion Model With Dual-Unet for Foley Sound Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Combining Dual-Unet and Variational AutoEncoders with Residual Vector Quantizer, we propose Multitask diffusion model(MTDiffusion), which can generate foley sound audio with a given label. |
A. Qi; X. Xie; J. Wang; |
7 | Audio-Free Prompt Tuning for Language-Audio Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, by leveraging the modality alignment in CLAP, we propose an efficient audio-free prompt tuning scheme aimed at optimizing a few prompt tokens from texts instead of audios, which regularizes the model space to avoid overfitting the seen classes as well. |
Y. Li; X. Wang; H. Liu; |
8 | RVAE-EM: Generative Speech Dereverberation Based On Recurrent Variational Auto-Encoder And Convolutive Transfer Function Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a generative dereverberation method. |
P. Wang; X. Li; |
9 | A Practical Online Multichannel Dereverberation Approach with Data-Reuse Technique Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: One of the most effective online dereverberation algorithms is the weighted prediction error (WPE) method and its improved version, switching WPE (SwWPE). This paper proposes a … |
W. Huang; C. Xue; J. Feng; W. B. Kleijn; |
10 | An Active Noise Control System Based On Soundfield Interpolation Using A Physics-Informed Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper designs a feasible monitoring microphone arrangement placed outside the ROI, providing a user with more freedom of movement. |
Y. A. Zhang; F. Ma; T. D. Abhayapala; P. N. Samarasinghe; A. Bastine; |
11 | Directional Gain Based Noise Covariance Matrix Estimation for MVDR Beamforming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper is devoted to the problem of noise covariance matrix (NCM) estimation. |
F. Zhang; C. Pan; J. Benesty; J. Chen; |
12 | Noisy-Arcmix: Additive Noisy Angular Margin Loss Combined With Mixup For Anomalous Sound Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a training technique aimed at ensuring intra-class compactness and increasing the angle gap between normal and anomalous samples. |
S. Choi; J. -W. Choi; |
13 | Mertech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model with Multi-Task Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to apply a self-supervised learning model pre-trained on large-scale unlabeled music data and finetune it on IPT detection tasks. |
D. Li; |
14 | Music Auto-Tagging with Robust Music Representation Learned Via Domain Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes a method inspired by speech-related tasks to enhance music auto-tagging performance in noisy settings. |
H. Joung; K. Lee; |
15 | An Explainable Proxy Model for Multilabel Audio Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an explainable multilabel segmentation model that solves speech activity (SAD), music (MD), noise (ND), and overlapped speech detection (OSD) simultaneously. |
T. Mariotte; A. Almudévar; M. Tahon; A. Ortega; |
16 | Pre-Echo Reduction in Transform Audio Coding Via Temporal Envelope Control with Machine Learning Based Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new method for pre-echo reduction in transform-based audio coding by controlling the temporal envelope of the waveform. |
J. -W. Kim; B. Jo; S. Beack; H. Park; |
17 | Semantic Proximity Alignment: Towards Human Perception-Consistent Audio Tagging By Aligning with Label Text Description Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the impact of training audio tagging models with auxiliary text descriptions of sound events. |
W. Liu; Y. Ren; |
18 | GASS: Generalizing Audio Source Separation with Large-Scale Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We assess GASS models on a diverse set of tasks. |
J. Pons; X. Liu; S. Pascual; J. Serrà; |
19 | GaP-Aug: Gamma Patch-Wise Correction Augmentation Method for Respiratory Sound Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, the challenge persists due to the scarcity of abnormal samples, and the distinct characteristics between low-pitched and discontinuous crackles and high-pitched and continuous wheezes. In this study, we proposed a novel augmentation method, namely gamma patch-wise correction augmentation, which directly operates on spectrograms to handle with these two challenges. |
A. -Y. Chang; |
20 | Conjugate Gradient Based Adaptive Algorithm for Nonlinear AEC Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, in this paper, we propose a conjugate gradient (CG)-based algorithm referred to as nonlinear improved sparse conjugate algorithm. |
S. Burra; A. Kar; M. G. Christensen; |
21 | Online Target Sound Extraction with Knowledge Distillation from Partially Non-Causal Teacher Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Simply converting the non-causal TSE model architecture to a causal one leads to significant performance degradation. To mitigate this problem, we propose using Knowledge Distillation (KD) from a non-causal teacher to a causal student for TSE. |
K. Wakayama; |
22 | SuperCodec: A Neural Speech Codec with Selective Back-Projection Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we introduce SuperCodec, a neural speech codec that achieves state-of-the-art performance at low bitrates. |
Y. Zheng; W. Tu; L. Xiao; X. Xu; |
23 | BAE-Net: A Low Complexity and High Fidelity Bandwidth-Adaptive Neural Network for Speech Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a streaming adaptive bandwidth extension solution dubbed BAE-Net, which is suitable to handle the low-resolution speech with unknown and varying effective bandwidth. |
G. Yu; |
24 | Array Geometry Optimization for Region-of-Interest Near-Field Beamforming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an array geometry optimization approach for near-field beamforming. |
R. Moisseev; G. Itzhak; I. Cohen; |
25 | Retrieval-Augmented Text-to-Audio Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We refer to this problem as long-tailed text-to-audio generation. To address this issue, we propose a simple retrieval-augmented approach for TTA models. |
Y. Yuan; H. Liu; X. Liu; Q. Huang; M. D. Plumbley; W. Wang; |
26 | LightCodec: A High Fidelity Neural Audio Codec with Low Computation Complexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a low-complexity audio codec is proposed. |
L. Xu; J. Wang; J. Zhang; X. Xie; |
27 | FunCodec: A Fundamental, Reproducible and Integrable Open-Source Toolkit for Neural Speech Codec Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents FunCodec, a fundamental neural speech codec toolkit, which is an extension of the open-source speech processing toolkit FunASR. |
Z. Du; S. Zhang; K. Hu; S. Zheng; |
28 | VRDMG: Vocal Restoration Via Diffusion Posterior Sampling with Multiple Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify that there are potential issues which will degrade current DPS-based methods’ performance and introduce the way to mitigate the issues inspired by diverse diffusion guidance techniques including the RePaint (RP) strategy and the Pseudoinverse-Guided Diffusion Models (ΠGDM). |
C. Hernandez-Olivan; |
29 | Permutation-Alignment Method Using Manifold Optimization for Frequency-Domain Blind Source Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes a permutation-alignment method for frequency-domain blind source separation. |
S. Emura; |
30 | Two-Stage Acoustic Echo Cancellation Network with Dual-Path Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, most deep learning-based methods use single-stage networks, the limited learning capability of which hinders the performance under harsh echo conditions. To address these challenges, this paper proposes a two-stage system with dual-path alignment. |
Z. Jiang; H. Li; N. Zheng; |
31 | Real-Time Low-Latency Music Source Separation Using Hybrid Spectrogram-Tasnet Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there has been little attention given to how these neural networks can be adapted for real-time low-latency applications, which could be helpful for hearing aids, remixing audio streams and live shows. In this paper, we investigate the various challenges involved in adapting current demixing models in the literature for this use case. |
S. Venkatesh; A. Benilov; P. Coleman; F. Roskam; |
32 | Binaural Room Transfer Function Interpolation Via System Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper is concerned with the spatial interpolation of Binaural Room Transfer Functions (BRTFs). |
A. Emthyas; S. V. Amengual Garí; E. De Sena; |
33 | Leveraging Sound Localization to Improve Continuous Speaker Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel multi-channel approach for continuous speaker separation based on multi-input multi-output (MIMO) complex spectral mapping. |
H. Taherian; A. Pandey; D. Wong; B. Xu; D. Wang; |
34 | Single and Few-Step Diffusion for Generative Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This results in a slow inference process and causes discretization errors that accumulate over the sampling trajectory. In this paper, we address these limitations through a two-stage training approach. |
B. Lay; J. -M. Lermercier; J. Richter; T. Gerkmann; |
35 | Can Synthetic Data Boost The Training of Deep Acoustic Vehicle Counting Networks? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel approach to acoustic vehicle counting by developing: i) a traffic noise simulation framework to synthesize realistic vehicle pass-by events; ii) a strategy to mix synthetic and real data to train a deep-learning model for traffic counting. |
S. Damiano; L. Bondi; S. Ghaffarzadegan; A. Guntoro; T. van Waterschoot; |
36 | Zero- and Few-Shot Sound Event Localization and Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the assignment problem in overlapping cases, we propose an embed-ACCDOA model, which is trained to output track-wise CLAP embedding and corresponding activity-coupled Cartesian direction-of-arrival (ACCDOA). |
K. Shimada; |
37 | Active Noise Control Over A Large Region with Multiple Spherical Microphone Arrays In Wave Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed a wave domain adaptive ANC algorithm using the joint information from multiple error spherical microphone arrays (SMAs) on the boundary of the ROI. |
X. Tang; J. A. Zhang; T. Abhayapala; |
38 | U2R: Underwater Ultrasonic Reflection Wave Dataset Toward Pose-Invariant Material Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel dataset comprising reflected wave components collected from objects made of various materials and observed from various angles. |
M. Kono; |
39 | Sector-Based Interference Cancellation for Robust Keyword Spotting Applications Using An Informed MPDR Beamformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A low-complexity, sector-based interference cancellation approach is proposed for voice-controlled devices, e.g., smart speakers. |
G. Milano; O. Thiergart; E. A. P. Habets; |
40 | Music Source Separation Based on A Lightweight Deep Learning Framework (DTTNET: DUAL-PATH TFC-TDF UNET) Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In our paper, we introduce a novel and lightweight architecture called DTTNet1, which is based on Dual-Path Module and Time-Frequency Convolutions Time-Distributed Fully-connected UNet (TFC-TDF UNet). |
J. Chen; S. Vekkot; P. Shukla; |
41 | Low-Latency Speech Enhancement Via Speech Token Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the low-latency scenario and regard speech enhancement as a speech generation problem conditioned on the noisy signal, where we generate clean speech instead of identifying and removing noises. |
H. Xue; X. Peng; Y. Lu; |
42 | Ambisonics Networks – The Effect of Radial Functions Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to investigate the impact of different ways of regularization on Deep Neural Network (DNN) training and performance. |
B. Shaybet; A. Kumar; V. Tourbabin; B. Rafaely; |
43 | On The Effect Of Data-Augmentation On Local Embedding Properties In The Contrastive Learning Of Music Audio Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we show that when learning audio representations on music datasets via contrastive learning, musical properties that are typically homogeneous within a track (e.g., key and tempo) are reflected in the locality of neighborhoods in the resulting embedding space. |
M. C. McCallum; M. E. P. Davies; F. Henkel; J. Kim; S. E. Sandberg; |
44 | Meta-AF Echo Cancellation for Improved Keyword Spotting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus on classification tasks, where we introduce a novel training methodology that harnesses self-supervision and classifier feedback. |
J. Casebeer; J. Wu; P. Smaragdis; |
45 | Binaural Speech Enhancement Using Deep Complex Convolutional Transformer Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a binaural speech enhancement method using a complex convolutional neural network with an encoder-decoder architecture and a complex multi-head attention transformer. |
V. Tokala; E. Grinstein; M. Brookes; S. Doclo; J. Jensen; P. A. Naylor; |
46 | Similar But Faster: Manipulation of Tempo in Music Audio Embeddings for Tempo Prediction and Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose tempo translation functions that allow for efficient manipulation of tempo within a pre-existing embedding space whilst maintaining other properties such as genre. |
M. C. McCallum; F. Henkel; J. Kim; S. E. Sandberg; M. E. P. Davies; |
47 | Multi-Modal Continual Pre-Training For Audio Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to combine CL methods with several audio encoder pre-training methods. |
G. Kim; H. -H. Wu; L. Bondi; B. Liu; |
48 | Multi-Dimensional Speech Quality Assessment in Crowdsourcing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We create a crowd-sourcing implementation of a multi-dimensional subjective test following the scales from P.804 and extend it to include reverberation, the speech signal, and overall quality. |
B. Naderi; R. Cutler; N. -C. Ristea; |
49 | Neural Ambisonics Encoding For Compact Irregular Microphone Arrays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a method for Ambisonics encoding that uses a deep neural network (DNN) to estimate a signal transform from microphone inputs to Ambisonics signals. |
M. Heikkinen; A. Politis; T. Virtanen; |
50 | A Transformer Approach for Polyphonic Audio-to-Score Transcription Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While current state-of-the-art methods, which rely on Convolutional Recurrent Neural Networks trained with the Connectionist Temporal Classification loss function, have shown promising results under constrained circumstances, these approaches still exhibit fundamental limitations, especially when dealing with complex sequence modeling tasks, such as polyphonic music. To address these conditions, this work introduces an alternative learning scheme based on a Transformer decoder, specifically tailored for A2S by incorporating a two-dimensional positional encoding to preserve frequency-time relationships when processing the audio signal. |
M. Alfaro-Contreras; A. Ríos-Vila; J. J. Valero-Mas; J. Calvo-Zaragoza; |
51 | Advancing Acoustic Howling Suppression Through Recursive Training of Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel training framework designed to comprehensively address the acoustic howling issue by examining its fundamental formation process. |
H. Zhang; Y. Zhang; M. Yu; D. Yu; |
52 | Multi-Level Graph Learning For Audio Event Classification And Human-Perceived Annoyance Rating Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To create annoyance-related monitoring, this paper proposes a graph-based model to identify AEs in a sound-scape, and explore relations between diverse AEs and human-perceived annoyance rating (AR). |
Y. Hou; Q. Ren; S. Song; Y. Song; W. Wang; D. Botteldooren; |
53 | Unsupervised Multi-Channel Separation And Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We use MixIT to train a model on far-field microphone array recordings of overlapping reverberant and noisy speech from the AMI Corpus. |
C. Han; K. Wilson; S. Wisdom; J. R. Hershey; |
54 | Quantifying The Effect Of Simulator-Based Data Augmentation For Speech Recognition On Augmented Reality Glasses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For deployment in real environments, such systems, however, need to be able to separate the speech of interest from noise and other speakers. In this paper, we evaluate the effectiveness of leveraging a room simulator to generate large amounts of simulated training data for such front-end sound separation models, to complement the ideal, but costly, collection of real-world data recorded on the device. |
R. Arakawa; M. Parvaix; C. Lai; H. Erdogan; A. Olwal; |
55 | Comparison Of Frequency-Fusion Mechanisms For Binaural Direction-Of-Arrival Estimation For Multiple Speakers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For a binaural hearing aid setup, in this paper we propose an interaural time difference (ITD)-based speaker-grouped frequency fusion mechanism. |
D. Fejgin; E. Hadad; S. Gannot; Z. Koldovsky; S. Doclo; |
56 | Improving Acoustic Echo Cancellation for Voice Assistants Using Neural Echo Suppression and Multi-Microphone Noise Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose to combine a single microphone AEC system, consisting of an adaptive linear filter (linear AEC) and a neural echo suppressor (NES), with an adaptive filter developed for multi-microphone noise reduction, called Cleaner. |
J. Heitkaemper; |
57 | MDX-GAN: Enhancing Perceptual Quality in Multi-Class Source Separation Via Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose MDX-GAN, an efficient and high-fidelity audio source separator based on MDX-Net for multiple sound classes. |
K. Chen; J. Su; Z. Jin; |
58 | Quantifying Spatial Audio Quality Impairment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: By using a combination of least-square optimization and heuristics, we propose a signal decomposition method to isolate the spatial error, in terms of interchannel gain leakages and changes in relative delays, from a processed signal. |
K. N. Watcharasupat; A. Lerch; |
59 | A Closer Look at Wav2vec2 Embeddings for On-Device Single-Channel Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their utility in speech enhancement systems is yet to be firmly established, and perhaps slightly misunderstood. In this paper, we investigate the uses of SSL representations for single-channel speech enhancement in challenging conditions and establish the impact they can have on the enhancement task. |
R. Shankar; K. Tan; B. Xu; A. Kumar; |
60 | A Computationally Efficient Semi-Blind Source Separation Approach for Nonlinear Echo Cancellation Based on An Element-Wise Iterative Source Steering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, because of the introduction of CTF approximation and nonlinear expansion, this algorithm becomes computationally very expensive, which makes it difficult to implement in embedded systems. Thus, we attempt in this paper to improve this IP-based algorithm, thereby developing an element-wise iterative source steering (EISS) algorithm. |
K. Lu; X. Wang; T. Ueda; S. Makino; J. Chen; |
61 | Resource-Efficient Separation Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our main contribution is the development of the Resource-Efficient Separation Transformer (RE-SepFormer), a self-attention-based architecture that reduces the computational burden in two ways. |
L. Della Libera; C. Subakan; M. Ravanelli; S. Cornell; F. Lepoutre; F. Grondin; |
62 | Spiking Structured State Space Model for Monaural Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these, we introduce the Spiking Structured State Space Model (SpikingS4). |
Y. Du; X. Liu; Y. Chua; |
63 | Learning from Taxonomy: Multi-Label Few-Shot Classification for Everyday Sound Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce an ontology-aware framework to train multi-label few-shot audio networks with both relative and absolute relationships in an audio taxonomy. |
J. Liang; H. Phan; E. Benetos; |
64 | Differential Beamforming with Null Constraints for Spherical Microphone Arrays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It presents a novel design approach based on the null constraints formed from the desired directivity pattern. |
X. Zhao; X. Luo; G. Huang; J. Chen; J. Benesty; |
65 | A Deep Representation Learning-Based Speech Enhancement Method Using Complex Convolution Recurrent Variational Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building upon our preliminary framework, this paper introduces a novel approach for SE using deep complex convolutional recurrent networks with a VAE (DCCRN-VAE). |
Y. Xiang; J. Tian; X. Hu; X. Xu; Z. Yin; |
66 | Stereophonic Music Source Separation with Spatially-Informed Bridging Band-Split Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a spatially-informed MSS method using a bridging band-split neural network that incorporates both spatial and spectral information. |
Y. Yang; H. Li; X. Wang; W. Zhang; S. Makino; J. Chen; |
67 | Ultra-Low Delay Lossless Compression of Higher Order Ambisonics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new lossless, ultra-low delay HOA codec, optimized to take advantage of both the spatial and temporal redundancies in HOA data, while maintaining an extremely low delay. |
M. Namazi; K. Rose; |
68 | Stack-and-Delay: A New Codebook Pattern for Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we compare different decoding strategies that aim to understand what codes can be decoded in parallel without penalizing the quality too much. |
G. Le Lan; |
69 | DDD: A Perceptually Superior Low-Response-Time DNN-Based Declipper Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce DDD (Demucs-Discriminator-Declipper), a real-time-capable speech-declipping deep neural network (DNN) that requires less response time by design. |
J. Yi; J. Koo; K. Lee; |
70 | Remixed2remixed: Domain Adaptation for Speech Enhancement By Noise2noise Learning with Remixing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a domain adaptation method for speech enhancement called Remixed2Remixed. |
L. Li; S. Seki; |
71 | Enhancing Violin Fingering Generation Through Audio-Symbolic Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current deep-learningbased models, relying solely on symbolic data, are able to generate playable fingerings but struggle to capture the personal nuances of musical performance, which only lie in the audio data. To address this limitation, we introduce a novel model that incorporates both audio and symbolic data, allowing users to upload music scores and their corresponding violinist recordings to obtain personalized fingerings related to the audio data. |
W. -Y. Lin; Y. -C. F. Wang; L. Su; |
72 | On HRTF Notch Frequency Prediction Using Anthropometric Features and Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the prediction of N1 frequency from pinna anthropometry using a neural model. |
L. Arbel; I. Ananthabhotla; Z. Ben-Hur; D. L. Alon; B. Rafaely; |
73 | TF-SepNet: An Efficient 1D Kernel Design in Cnns for Low-Complexity Acoustic Scene Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the time-frequency nature of audio signals, we propose TF-SepNet, a CNN architecture that separates the feature processing along the time and frequency dimensions. |
Y. Cai; P. Zhang; S. Li; |
74 | Enriching Music Descriptions with A Finetuned-LLM and Metadata for Text-to-Music Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, users also articulate a need to explore music that shares similarities with their favorite tracks or artists, such as I need a similar track to Superstition by Stevie Wonder. To address these concerns, this paper proposes an improved Text-to-Music Retrieval model, denoted as TTMR++, which utilizes rich text descriptions generated with a finetuned large language model and metadata. |
S. Doh; M. Lee; D. Jeong; J. Nam; |
75 | Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes a multi-task pseudo-label learning (MPL)-based non-intrusive speech quality assessment model called MTQ-Net. |
R. E. Zezario; B. -R. Brian Bai; C. -S. Fuh; H. -M. Wang; Y. Tsao; |
76 | Odaq: Open Dataset of Audio Quality Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Research into the prediction and analysis of perceived audio quality is hampered by the scarcity of openly available datasets of audio signals accompanied by corresponding subjective quality scores. To address this problem, we present the Open Dataset of Audio Quality (ODAQ), a new dataset containing the results of a MUSHRA listening test conducted with expert listeners from 2 international laboratories. |
M. Torcoli; |
77 | Generalized Specaugment Via Multi-Rectangle Inverse Masking For Acoustic Scene Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the multi-rectangle inverse masking (MRIM), an extension and generalization of the traditional SpecAugment technique, for acoustic scene classification. |
P. M. Byun; J. -H. Chang; |
78 | A Flexible Online Framework for Projection-Based Stft Phase Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to extend RTISI—an existing online (frame-by-frame) variant of the Griffin-Lim algorithm—into a flexible framework that enables straightforward online implementation of any algorithm based on iterative projections. |
T. Peer; S. Welker; J. Kolhoff; T. Gerkmann; |
79 | Non-Intrusive Speech Quality Assessment with Multi-Task Learning Based on Tensor Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the concept of kernel method, which maps features into high dimensional space through dot product, in order to enhance the extraction of relationships among all feature points. |
H. Liu; M. Liu; J. Wang; X. Xie; L. Yang; |
80 | Blind Estimation of Audio Effects Using An Auto-Encoder Approach and Differentiable Digital Signal Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This involves knowing the exact implementation of the AFXs used for the process. In this work, we propose an alternative solution that eliminates the requirement for knowing this implementation. |
C. Peladeau; G. Peeters; |
81 | Intelligent Cardiac Auscultation for Murmur Detection Via Parallel-Attentive Models with Uncertainty Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a heart murmur detection method based on a parallel-attentive model, which consists of two branches: One is based on a self-attention module and the other one is based on a convolutional network. |
Z. Zhang; T. Pang; J. Han; B. W. Schuller; |
82 | Enhancing Audio Generation Diversity with Visual Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a clustering-based method, leveraging visual information to guide the model in generating distinct audio content within each category. |
Z. Xie; B. Li; X. Xu; M. Wu; K. Yu; |
83 | Determined BSS By Combination of IVA and DNN Via Proximal Average Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel approach for determined blind source separation (BSS) assisted by deep neural network (DNN). |
K. Matsumoto; K. Yatabe; |
84 | MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MOS-FAD, where MOS can be leveraged at two key points in FAD: training data selection and model fusion. |
W. Zhou; |
85 | Cross-Triggering Issue in Audio Event Detection and Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, not much attention (if not none) has been paid to this problem in the AED research community. In this work, we tackle this problem via a regularization approach. |
H. Phan; |
86 | Efficient Functional Link Adaptive Filters Based On Nearest Kronecker Product Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel class of FLAFs based on the nearest Kronecker product (NKP) decomposition. |
A. Nezamdoust; M. Huemer; A. Uncini; D. Comminiello; |
87 | Single-Channel Blind Dereverberation Based on Rank-1 Matrix Lifting in Time-Frequency Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, for better single-channel dereverberation, we propose to simultaneously estimates the source signal and the room impulse response (RIR) instead of only predicting reverberation. |
F. Yohena; K. Yatabe; |
88 | Active Learning for Sound Event Classification Using Bayesian Neural Networks with Gaussian Variational Posterior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose the Gaussian-dense active learning (GDAL) algorithm to train a sound event classifier. |
S. Shishkin; D. Hollosi; S. Goetze; S. Doclo; |
89 | Snore Sound Features Based on Percussive Enhancing and Positional Encoding Combined with Multi-Task Learning for Osahs Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose novel sound features for the classification of OSA, hypopnea and normal snores. |
A. Hu; |
90 | On The Role of Room Acoustics in Audio Presentation Attack Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a method based on convolutional neural networks (CNNs) to estimate the spectral standard deviation directly from a speech signal, leading to a zero-shot PAD approach. |
N. D. Gaubitch; D. Looney; |
91 | Fine-Tune The Pretrained ATST Model for Sound Event Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the fine-tuning method of the pretrained models for SED. |
N. Shao; X. Li; X. Li; |
92 | Class-Incremental Learning for Multi-Label Audio Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method for class-incremental learning of potentially overlapping sounds for solving a sequence of multi-label audio classification tasks. |
M. Mulimani; A. Mesaros; |
93 | Estimation of Impulse Responses for A Moving Source Using Optimal Transport Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to leverage the information shared between the closely spaced source positions by means of an optimal transport regularizer when estimating IRs from noisy input-output relations. |
D. Sundström; F. Elvander; A. Jakobsson; |
94 | SSL-Net: A Synergistic Spectral and Learning-Based Network for Efficient Bird Sound Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present an efficient and general framework called SSL-Net, which combines spectral and learned features to identify different bird sounds. |
Y. Yang; K. Zhou; N. Trigoni; A. Markham; |
95 | One-Epoch Training with Single Test Sample in Test Time for Better Generalization of Cough-Based Covid-19 Detection Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, in practical application, the model has to make a prediction for the current test audio without any prior information, which requires good model generalization under limited training data. To address the above issues, we adopt a test-time training framework to achieve a cough-based COVID-19 detection model with better generalizability. |
J. Shen; |
96 | Syncfusion: Multimodal Onset-Synchronized Video-to-Audio Foley Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a system to extract repetitive actions onsets from a video, which are then used – in conjunction with audio or textual embeddings – to condition a diffusion model trained to generate a new synchronized sound effects audio track. |
M. Comunità; R. F. Gramaccioni; E. Postolache; E. Rodolà; D. Comminiello; J. D. Reiss; |
97 | Multi-View Midivae: Fusing Track- and Bar-View Representations for Long Multi-Track Symbolic Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music. |
Z. Lin; |
98 | A Fully Differentiable Model for Unsupervised Singing Voice Separation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to extend this framework and to build a fully differentiable model by integrating a multipitch estimator and a novel differentiable assignment module within the core model. |
G. Richard; P. Chouteau; B. Torres; |
99 | Structure-Informed Positional Encoding for Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, multi-scale hierarchical structure is a distinctive feature of music signals. To leverage this information, we propose a structure-informed positional encoding framework for music generation with Transformers. |
M. Agarwal; C. Wang; G. Richard; |
100 | Adapting Pitch-Based Self Supervised Learning Models for Tempo Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the applicability of two successful pitch-based SSL models, SPICE and PESTO, for the purpose of tempo estimation. |
A. Gagneré; S. Essid; G. Peeters; |
101 | Consistent and Relevant: Rethink The Query Embedding in General Sound Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present CaRE-SEP, a consistent and relevant embedding network for general sound separation to encourage a comprehensive reconsideration of query usage in audio separation. |
Y. Wang; |
102 | An Efficient Temporary Deepfake Location Approach Based Embeddings for Partially Spoofed Audio Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Partially spoofed audio detection is a challenging task, lying in the need to accurately locate the authenticity of audio at the frame level. To address this issue, we propose a fine-grained partially spoofed audio detection method, namely Temporal Deepfake Location (TDL), which can effectively capture information of both features and locations. |
Y. Xie; H. Cheng; Y. Wang; L. Ye; |
103 | GTCRN: A Speech Enhancement Model Requiring Ultralow Computational Resources Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Grouped Temporal Convolutional Recurrent Network (GTCRN), which incorporates grouped strategies to efficiently simplify a competitive model, DPCRN. |
X. Rong; T. Sun; X. Zhang; Y. Hu; C. Zhu; J. Lu; |
104 | On The Choice of The Optimal Temporal Support for Audio Classification with Pre-Trained Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the influence of the TS for well-established or emerging pre-trained embeddings, chosen to represent different types of architectures and learning paradigms. |
A. Quelennec; M. Olvera; G. Peeters; S. Essid; |
105 | Cognitive Virtual Sensing Technique for Feedforward Active Noise Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, instances where noise characteristics and primary paths exhibit variations lead to a noticeable decline in performance for the conventional VS technique. To address this challenge, we propose the cognitive VS technique in this paper. |
R. Xie; A. Tu; C. Shi; S. Elliott; H. Li; L. Zhang; |
106 | SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SpecDiff-GAN, a neural vocoder based on HiFi-GAN, which was initially devised for speech synthesis from mel spectrogram. |
T. Baoueb; H. Liu; M. Fontaine; J. Le Roux; G. Richard; |
107 | Personalized Neural Speech Codec Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a personalized neural speech codec, envisioning that personalization can reduce the model complexity or improve perceptual speech quality. |
I. Jang; H. Yang; W. Lim; S. Beack; M. Kim; |
108 | A Unified Loss Function to Tackle Inter-Class and Intra-Class Data Imbalance in Sound Event Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified loss function (ULF), which adeptly addresses both the inter-class imbalance and intra-class imbalance simultaneously. |
Y. Zhang; R. Togneri; D. Huang; |
109 | An Empirical Study on The Impact of Positional Encoding in Transformer-Based Monaural Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we perform a comprehensive empirical study evaluating five positional encoding methods, i.e., Sinusoidal and learned absolute position embedding (APE), T5-RPE, KERPLE, as well as the Transformer without positional encoding (No-Pos), across both causal and noncausal configurations. |
Q. Zhang; |
110 | Speech Enhancement in Hearing Aids Using Target Speech Presence Estimation Based on A Delayed Remote Microphone Signal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose a practically operational method to use an RM with HAs in the presence of wireless transmission delays. |
V. Sathyapriyan; M. S. Pedersen; M. Brookes; J. Østergaard; P. A. Naylor; J. Jensen; |
111 | NOMAD: Unsupervised Learning of Perceptual Embeddings For Speech Enhancement and Non-Matching Reference Audio Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents NOMAD (Non-Matching Audio Distance), a differentiable perceptual similarity metric that measures the distance of a degraded signal against non-matching references. |
A. Ragano; J. Skoglund; A. Hines; |
112 | NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the neural infinite impulse response filter field (NIIRF) method that instead estimates the coefficients of cascaded IIR filters. |
Y. Masuyama; |
113 | Contrastive Loss Based Frame-Wise Feature Disentanglement for Polyphonic Sound Event Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve the problem, we propose a disentangled feature learning framework to learn a category-specific representation. |
Y. Guan; |
114 | Unrestricted Global Phase Bias-Aware Single-Channel Speech Enhancement with Conformer-Based Metric Gan Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we found that the human ear cannot sensitively perceive the difference between a precise phase spectrum and a biased phase (BP) spectrum. Therefore, we propose an optimization method of phase reconstruction, allowing freedom on the global-phase bias instead of reconstructing the precise phase spectrum. |
S. Zhang; Z. Qiu; D. Takeuchi; N. Harada; S. Makino; |
115 | Low Bitrate Loss Resilience Scheme for A Speech Enhancing Neural Codec Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use a neural speech codec designed end-to-end, encompassing a versatile set of features ranging from efficient low-bitrate speech coding and decoding to advanced functionalities such as noise removal, dereverberation, and packet loss concealment. |
M. Kolundžija; M. Kavalekalam; I. Balić; M. Mao; R. Casas; |
116 | Unsupervised Pitch-Timbre Disentanglement of Musical Instruments Using A Jacobian Disentangled Sequential Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A main challenge in unsupervised disentanglement using autoencoders is that strong regularisation, while necessary for consistent disentanglement, comes at the expense of accurate data reconstruction. To address this, we introduce a teacher-student framework that incorporates a variational sequential autoencoder and a Jacobian constraint that regularises the variation of observations relative to latent factors. |
Y. -J. Luo; S. Ewert; S. Dixon; |
117 | Three-Dimensional Sound Wave Propagation Reproduction By CE-FDTD Simulation Applying Actual Radiation Characteristics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, the technique of applying captured acoustic signals for dense grid arrangement in the CE-FDTD method has not been considered. In this paper, we propose a hardware and software system that captures the radiation characteristics for a dense grid arrangement and applies them in the CE-FDTD method while controlling the sound wave propagation with non-propagation region. |
S. Okubo; T. Horiuchi; |
118 | A Steered Response Power Approach with Bilinear Prediction-Based Trade-Off Prewhitening for Speaker Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It presents an improved steered response power (SRP) approach with low-complexity and trade-off prewhitening. |
Z. Wang; H. He; J. Chen; J. Benesty; Y. Yu; |
119 | High Resolution Guitar Transcription Via Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the use of a high-resolution piano transcription model to train a new guitar transcription model. |
X. Riley; D. Edwards; S. Dixon; |
120 | Effect of Target Signals and Delays on Spatially Selective Active Noise Control for Open-Fitting Hearables Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we systematically investigate the influence of delays in different target signals on the ANC performance and provide an intuitive explanation for how the system obtains the desired signal. |
T. Xiao; S. Doclo; |
121 | Max-AST: Combining Convolution, Local and Global Self-Attentions for Audio Event Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our approach leverages convolution, local window-attention, and global grid-attention in all the transformer blocks. |
T. Alex; S. Ahmed; A. Mustafa; M. Awais; P. J. Jackson; |
122 | TIA: A Teaching Intonation Assessment Dataset in Real Teaching Situations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to test the validity of the dataset, this paper proposes a teaching intonation assessment model (TIAM) based on low-level and deep-level features of speech. |
S. Liu; C. Zhang; B. Li; N. Qin; H. Cheng; H. Zhang; |
123 | A Scalable Sparse Transformer Model for Singing Melody Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet effective scalable sparse transformer for singing melody extraction. |
S. Yu; J. Liu; Y. Yu; W. Li; |
124 | Audiosr: Versatile Audio Super-Resolution at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a diffusion-based generative model, AudioSR, that is capable of performing robust audio super-resolution on versatile audio types, including sound effects, music, and speech. |
H. Liu; K. Chen; Q. Tian; W. Wang; M. D. Plumbley; |
125 | Investigating Personalization Methods in Text to Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate the personalization of text-to-music diffusion models in a few-shot setting. |
M. Plitsis; T. Kouzelis; G. Paraskevopoulos; V. Katsouros; Y. Panagakis; |
126 | Learning Ontology Informed Representations with Constraints for Acoustic Event Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most open datasets contain meta information on the hierarchy of labels, which can be utilized for building robust AED systems. Our study aims at injecting this domain knowledge by enforcing ontology-informed constraints upon the output space. |
A. Raina; S. I. Sheikh; V. Arora; |
127 | A Detailed Audio-Text Data Simulation Pipeline Using Single-Event Sounds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the analysis, we propose an automatic pipeline for curating audio-text pairs with rich details 1. |
X. Xu; X. Xu; Z. Xie; P. Zhang; M. Wu; K. Yu; |
128 | Performance and Energy Balance: A Comprehensive Study of State-of-the-Art Sound Event Detection Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an analysis focused on SED systems based on the challenge submissions. |
F. Ronchini; R. Serizel; |
129 | Microphone Subset Selection for The Weighted Prediction Error Algorithm Using A Group Sparsity Penalty Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using the popular convex relaxation method, in this paper we propose to perform microphone subset selection for the weighted prediction error (WPE) multi-channel dereverberation algorithm by introducing a group sparsity penalty on the prediction filter coefficients. |
A. Lohmann; T. van Waterschoot; J. Bitzer; S. Doclo; |
130 | HRTF Recommendation Based on The Predicted Binaural Colouration Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper is concerned with the task of recommending one HRTF set out of an entire database based on the listener’s own anthropometric features alone. |
N. Marggraf-Turley; M. Lovedee-Turner; E. De Sena; |
131 | ByteHum: Fast and Accurate Query-by-Humming in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the ByteHum system, a fast and efficient humming retrieval system which is capable of searching against large-scale databases built on raw song audios without the need for extensive preprocessing. |
X. Du; P. Zou; M. Liu; X. Liang; M. Chu; B. Zhu; |
132 | STEMGEN: A Music Generation Model That Listens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an alternative paradigm for producing music generation models that can listen and respond to musical context. |
J. D. Parker; |
133 | Perceptually-Motivated Spatial Audio Codec for Higher-Order Ambisonics Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The number of channels required to deliver high spatial resolution Ambisonic audio, however, can be prohibitive for low-bandwidth applications. Therefore, this paper proposes a compression codec, which is based upon the parametric higher-order Directional Audio Coding (HO-DirAC) model. |
C. Hold; L. McCormack; A. Politis; V. Pulkki; |
134 | Joint Music and Language Attention Models for Zero-Shot Music Tagging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a zero-shot music tagging system modeled by a joint music and language attention (JMLA) model to address the open-set music tagging problem. |
X. Du; Z. Yu; J. Lin; B. Zhu; Q. Kong; |
135 | SPATIALCODEC: Neural Spatial Speech Coding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we address the challenge of encoding speech captured by a microphone array using deep learning techniques with the aim of preserving and accurately reconstructing crucial spatial cues embedded in multi-channel recordings. |
Z. Xu; Y. Xu; V. Kothapally; H. Wang; M. Yang; D. Yu; |
136 | AutoPrep: An Automatic Preprocessing Framework for In-The-Wild Speech Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, human annotation of speech data is both time-consuming and costly. To address this issue, we introduce an automatic in-the-wild speech data preprocessing framework (AutoPrep) in this paper, which is designed to enhance speech quality, generate speaker labels, and produce transcriptions automatically. |
J. Yu; |
137 | An Experimental Comparison of Multi-View Self-Supervised Methods for Music Tagging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we expand the scope of pretext tasks applied to music by investigating and comparing the performance of new self-supervised methods for music tagging. |
G. Meseguer-Brocal; D. Desblancs; R. Hennequin; |
138 | Ainur: Harmonizing Speed and Quality in Deep Music Generation Through Lyrics-Audio Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the domain of music generation, prevailing methods focus on text-to-music tasks, predominantly relying on diffusion models. |
G. Concialdi; A. Koudounas; E. Pastor; B. Di Eugenio; E. Baralis; |
139 | Parody Detection Using Source-Target Attention with Teacher-Forced Lyrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an approach to detect parodies in singing voices, analyzing attention weights derived from an encoder-decoder-based automatic speech recognition (ASR) model. |
T. Ariga; Y. Higuchi; K. Hayasaka; N. Okamoto; T. Ogawa; |
140 | Generation or Replication: Auscultating Audio Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we make an initial attempt at understanding the inner workings of audio latent diffusion models by investigating how their audio outputs compare with the training data, similar to how a doctor auscultates a patient by listening to the sounds of their organs. |
D. Bralios; |
141 | Recap: Retrieval-Augmented Audio Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present RECAP (REtrieval-Augmented Audio CAPtioning), a novel and effective audio captioning system that generates captions conditioned on an input audio and other captions similar to the audio retrieved from a datastore. |
S. Ghosh; S. Kumar; C. K. Reddy Evuru; R. Duraiswami; D. Manocha; |
142 | Bass Accompaniment Generation Via Latent Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To provide control over the timbre of generated samples, we introduce a technique to ground the latent space to a user-provided reference style during diffusion sampling. |
M. Pasini; M. Grachten; S. Lattner; |
143 | Learning Speaker-Listener Mutual Head Orientation By Leveraging HRTF and Voice Directivity on Headphones Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a convolution neural network model that, given binaural speech recording, can predict the orientation of both speaker and listener with respect to the line joining the two. |
H. Takawale; N. Roy; |
144 | Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and Spectral Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, jointly training pitch estimators and synthesizers is a challenge when using standard audio-to-audio reconstruction loss, leading to reliance on external pitch trackers. To address this issue, we propose using a spectral loss function inspired by optimal transportation theory that minimizes the displacement of spectral energy. |
B. Torres; G. Peeters; G. Richard; |
145 | Parameter Efficient Audio Captioning with Faithful Guidance Using Audio-Text Shared Latent Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Albeit performance improvements, such models frequently suffer from hallucination and large memory footprint making them challenging to deploy on edge devices. In this paper, we address both these issues for the application of automated audio captioning. |
A. K. Sridhar; Y. Guo; E. Visser; R. Mahfuz; |
146 | Fine-Grained Engine Fault Sound Event Detection Using Multimodal Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we apply SED to engine fault detection by introducing a multimodal SED framework that detects fine-grained engine faults of automobile engines using audio and accelerometer-recorded vibration. |
D. Fedorishin; L. Forte; P. Schneider; S. Setlur; V. Govindaraju; |
147 | Hyperbolic Distance-Based Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the task of hierarchical distance-based speech separation defined on a hyperbolic manifold. |
D. Petermann; M. Kim; |
148 | DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study introduces DPM-TSE, a generative method based on diffusion probabilistic modeling (DPM) for Target Sound Extraction (TSE), to achieve both cleaner target renderings as well as improved separability from unwanted sounds. |
J. Hai; H. Wang; D. Yang; K. Thakkar; N. Dehak; M. Elhilali; |
149 | Binaural Angular Separation Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a neural network model that can separate target speech sources from interfering sources at different angular regions using two microphones. |
Y. Yang; G. Sung; S. -F. Shih; H. Erdogan; C. Lee; M. Grundmann; |
150 | MusicLDM: Enhancing Novelty in Text-to-music Generation Using Beat-Synchronous Mixup Strategies Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation. |
K. Chen; Y. Wu; H. Liu; M. Nezhurina; T. Berg-Kirkpatrick; S. Dubnov; |
151 | Exploring Meta Information for Audio-Based Zero-Shot Bird Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study investigates how meta-information can improve zero-shot audio classification, utilising bird species as an example case study due to the availability of rich and diverse meta-data. |
A. Gebhard; A. Triantafyllopoulos; T. Bez; L. Christ; A. Kathan; B. W. Schuller; |
152 | Scalable and Efficient Speech Enhancement Using Modified Cold Diffusion: A Residual Learning Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce flexibility to the supervised learning-based speech enhancement framework to achieve scalable and efficient speech enhancement (SESE). |
M. Kim; T. Kristjansson; |
153 | Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present SpatialScaper, a library for SELD data simulation and augmentation. |
I. R. Roman; C. Ick; S. Ding; A. S. Roman; B. McFee; J. P. Bello; |
154 | A Foundation Model for Music Informatics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we conduct an in-depth comparative study among various foundation model variants, examining key determinants such as model architectures, tokenization methods, temporal resolution, data, and model scalability. This research aims to bridge the existing knowledge gap by elucidating how these individual factors contribute to the success of foundation models in music informatics. |
M. Won; Y. -N. Hung; D. Le; |
155 | From RIR to BRIR: A Sparse Recovery Beamforming Approach for Virtual Binaural Sound Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a Binaural Room Impulse Response (BRIR) calculation method based on HOM recordings of Room Impulse Responses (RIRs) to improve binaural sound rendering. |
H. Sun; H. Y. Zhu; M. T. D. Nguyen; V. Nguyen; C. -T. Lin; C. T. Jin; |
156 | Active Noise Control Over 3D Space with A Dynamic Noise Source Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this design, which utilizes the remote microphone technique using an observation filter (OF) for noise field modelling, has a drawback that the system is not robust against variations in the noise source location. In this paper, we address this drawback and present an improved spatial ANC system that overcomes this limitation. |
H. Sun; C. T. Jin; T. Abhayapala; P. Samarasinghe; |
157 | Investigating Self-Supervised Deep Representations for EEG-Based Auditory Attention Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we undertake a comprehensive investigation into the performance of linear decoders across 12 deep and 2 shallow representations, applied to EEG data from multiple studies spanning 57 subjects and multiple languages. |
K. Thakkar; J. Hai; M. Elhilali; |
158 | Quantization Noise Masking in Perceptual Neural Audio Coder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel training strategy to incorporate the PAM into the NAC more accurately. |
S. Shin; J. Byun; J. Sung; S. Beack; Y. Park; |
159 | Generative De-Quantization for Neural Speech Codec Via Latent Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to separate the representation learning and information reconstruction tasks. |
H. Yang; I. Jang; M. Kim; |
160 | Piano Transcription with Harmonic Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose piano harmonic attention, a mask self-attention, for better capturing harmonic features. |
R. Wu; X. Wang; Y. Li; W. Xu; W. Cheng; |
161 | Dual-Path Minimum-Phase and All-Pass Decomposition Network for Single Channel Speech Dereverberation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a dual-path neural network structure to separately process minimum-phase and all-pass components of single channel speech. |
X. Liu; S. -J. Chen; J. H. L. Hansen; |
162 | A Dual-Path Framework with Frequency-and-Time Excited Network for Anomalous Sound Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an automated dual-path framework that learns prominent frequency and temporal patterns for diverse machine types. |
Y. Zhang; J. Liu; Y. Tian; H. Liu; M. Li; |
163 | First-Shot Unsupervised Anomalous Sound Detection with Unknown Anomalies Estimated By Metadata-Assisted Audio Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new framework for the first-shot unsupervised ASD, where metadata-assisted audio generation is used to estimate unknown anomalies, by utilising the available machine information (i.e., metadata and sound data) to fine-tune a text-to-audio generation model for generating the anomalous sounds that contain unique acoustic characteristics accounting for each different machine type. |
H. Zhang; |
164 | SCNet: Sparse Compression Network for Music Source Separation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose SCNet, a novel frequency-domain network to explicitly split the spectrogram of the mixture into several subbands and introduce a sparsity-based encoder to model different frequency bands. |
W. Tong; |
165 | Exploring Self-supervised Contrastive Learning of Spatial Sound Event Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode ‘what’ and ‘where’ of spatial audios. |
X. Jiang; C. Han; Y. A. Li; N. Mesgarani; |
166 | SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing GTT datasets are quite limited in size and scope, rendering models trained on them prone to overfitting and incapable of generalizing to out-of-domain data. In order to address this issue, we present a methodology for synthesizing large-scale GTT audio using commercial acoustic and electric guitar plugins. |
Y. Zang; Y. Zhong; F. Cwitkowitz; Z. Duan; |
167 | Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Timbre-Trap, a novel framework which unifies music transcription and audio reconstruction by exploiting the strong separability between pitch and timbre. |
F. Cwitkowitz; |
168 | A Comparative Analysis of Poetry Reading Audio: Singing, Narrating, or Somewhere in Between? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop signal processing methods, which are tailored to capture the unique acoustic characteristics of poetry reading based on their silence patterns, temporal variations of local pitch, and beat stability. |
K. Choi; M. Kim; |
169 | A Hybrid Deep-Online Learning Based Method for Active Noise Control in Wave Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposed a hybrid deep-online learning based spatial ANC system which combines online learning with pre-trained deep neural networks. |
D. Wu; X. Wu; T. Qu; |
170 | Towards High Resolution Weather Monitoring With Sound Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper harnesses acoustic data to identify variations in rain, wind and air temperature at different thresholds, with rain being the most successfully predicted. |
E. B. Çoban; M. Perra; M. I. Mandel; |
171 | Stealthy Backdoor Attack Towards Federated Automatic Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Federated Stealthy Backdoor Attack method ($FedSBA$). |
L. Zhang; L. Liu; D. Meng; J. Wang; S. Hu; |
172 | Transferable Models for Bioacoustics with Human Language Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose BioLingual, a new model for bioacoustics based on contrastive language-audio pretraining. |
D. Robinson; A. Robinson; L. Akrapongpisak; |
173 | Robust DoA Estimation from Deep Acoustic Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our study highlights the relevance of acoustic imaging for DoAE tasks. |
A. S. Roman; I. R. Roman; J. P. Bello; |
174 | Exploring Large Scale Pre-Trained Models for Robust Machine Anomalous Sound Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Machine anomalous sound detection is a useful technique for various applications, but it often suffers from poor generalization due to the challenges of data collection and complex acoustic environment. To address this issue, we propose a robust machine anomalous sound detection model that leverages self-supervised pre-trained models on large-scale speech data. |
B. Han; |
175 | Adapting Frechet Audio Distance for Generative Music Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose reducing sample size bias by extrapolating scores towards an infinite sample size. |
A. Gui; H. Gamper; S. Braun; D. Emmanouilidou; |
176 | Sparse Sound Field Representation Using Complex Orthogonal Matching Pursuit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the possible sparse representations of the sound field recorded by multiple microphones in reverberant environments. |
S. Xu; J. A. Zhang; T. D. Abhayapala; A. Bastine; W. -T. Lai; P. N. Samarasinghe; |
177 | Attention Is All You Need For Blind Room Volume Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With the recent trend of self-attention mechanisms, this paper introduces a purely attention-based model to blindly estimate room volumes based on single-channel noisy speech signals. |
C. Wang; M. Jia; M. Li; C. Bao; W. Jin; |
178 | Plug-and-Play MVDR Beamforming for Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the recent advance in integrating physics-based and data-driven approaches, this paper introduces a novel speech separation framework. |
C. Chang; Z. Yang; J. Chen; |
179 | Broadband Personal Sound Zone Control in The Presence of Nonlinearities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new signal model for personal sound zone control that takes into consideration any nonlinear behaviour that may arise from the loudspeakers. |
S. S. Bhattacharjee; |
180 | Tempo Estimation As Fully Self-Supervised Binary Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given that annotating tempo is time-consuming and requires certain musical expertise, few publicly available data sources exist to train machine learning models for this task. Towards alleviating this issue, we propose a fully self-supervised approach that does not rely on any human labeled data. |
F. Henkel; J. Kim; M. C. McCallum; S. E. Sandberg; M. E. P. Davies; |
181 | AAT: Adapting Audio Transformer for Various Acoustics Recognition Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Other fine-tuning methods either struggle to address this issue or fail to achieve matching performance. Therefore, we conducted a comprehensive analysis of existing fine-tuning methods and proposed an efficient fine-tuning approach based on Adapter tuning, namely AAT. |
Y. Liang; H. Lin; S. Qiu; Y. Zhang; |
182 | MIR-MLPop: A Multilingual Pop Music Dataset with Time-Aligned Lyrics and Audio Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MIR-MLPop, a publicly available multilingual pop music dataset designed for automatic lyrics transcription and lyrics alignment in polyphonic music. |
J. -Y. Wang; C. -C. Wang; C. -I. Leong; J. -S. R. Jang; |
183 | A Real-Time Lyrics Alignment System Using Chroma and Phonetic Features for Classical Vocal Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a real-time lyrics alignment system for classical vocal performances with two contributions. First, we improve the lyrics alignment algorithm by finding an optimal combination of chromagram and phonetic posteriorgram (PPG) that capture melodic and phonetics features of the singing voice, respectively. |
J. Park; S. Yong; T. Kwon; J. Nam; |
184 | Improving Speech Attenuation in Headphones Using Harmonic Model Decomposition and Multiple-Frequency ANC Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The non-causality is due to the delay incurred by, e.g., digital processing or acoustic propagation paths. To deal with this, we propose a new feedforward ANC system for headphone applications, HMD-ANC, which improves voiced speech attenuation. |
Y. Iotov; S. M. Nørholm; P. J. McCutcheon; M. G. Christensen; |
185 | Noise-Aware Speech Separation with Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a noise-aware SS (NASS) method, which aims to improve the speech quality for separated signals under noisy conditions. |
Z. Zhang; C. Chen; H. -H. Chen; X. Liu; Y. Hu; E. S. Chng; |
186 | Efficient High-Performance Bark-Scale Neural Network for Residual Echo and Noise Suppression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the application in dedicated communication devices, such as speakerphones, hands-free car systems, or smartphones, efficiency plays a major role along with performance. In this context, we present an efficient, high-performance hybrid joint acoustic echo control and noise suppression system, whereby our main contribution is the post-filter NN, performing both noise and residual echo suppression. |
E. Seidel; P. Mowlaee; T. Fingscheidt; |
187 | Few-Shot Anomalous Sound Detection Based on Anomaly Map Estimation Using Pseudo Abnormal Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel anomalous sound detection based on anomaly map estimation. |
R. Tanaka; S. Tamura; |
188 | Improving Target Sound Extraction with Timestamp Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a timestamp knowledge distillation (TKD) method that adopts privileged knowledge distillation to enhance the performance of deep neural network (DNN)-based target sound extraction (TSE). |
D. Kim; M. -S. Baek; Y. Kim; J. -H. Chang; |
189 | Class: Continual Learning Approach for Speech Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although SSL-based deep learning models have shown to produce better representations than their supervised counterparts when trained naively, their effectiveness diminishes in when the model learns different tasks sequentially. To address this problem, we propose a continual learning framework called CLASS, which incorporates continual learning (CL) and self-supervised pretraining (SSP) to improve BWE performance. |
D. Kim; Y. Kim; J. -H. Chang; |
190 | Multi-Scale Permutation Entropy for Audio Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we employ the multi-scale permutation entropy (MPE) in audio deepfake detection, which could help measure the complexity and detect the dynamic characteristics of audio signals at different scales. |
C. Wang; J. He; J. Yi; J. Tao; C. Y. Zhang; X. Zhang; |
191 | 6DoF SELD: Sound Event Localization and Detection Using Microphones and Motion Tracking Sensors on Self-Motioning Human Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We aim to perform sound event localization and detection (SELD) using wearable equipment for a moving human, such as a pedestrian. |
M. Yasuda; S. Saito; A. Nakayama; N. Harada; |
192 | From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to optimize AST training by linking to the resolution in the time-axis. |
J. Feng; M. H. Erol; J. Son Chung; A. Senocak; |
193 | Speech Foundation Models on Intelligibility Prediction for Hearing-Impaired Listeners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the SFM paradigm has been significantly less explored for applications of interest to the speech perception community. In this paper we present a systematic evaluation of 10 SFMs on one such application: Speech intelligibility prediction. |
S. Cuervo; R. Marxer; |
194 | Microphone Conversion: Mitigating Device Variability in Sound Event Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we introduce a new augmentation technique to enhance the resilience of sound event classification (SEC) systems against device variability through the use of CycleGAN. |
M. Ryu; H. Oh; S. Lee; H. Park; |
195 | Stethoscope-Guided Supervised Contrastive Learning for Cross-Domain Adaptation on Respiratory Sound Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When a significant distribution shift occurs within the test dataset or in a practical scenario, it can substantially decrease the performance. To tackle this issue, we introduce cross-domain adaptation techniques, which transfer the knowledge from a source domain to a distinct target domain. |
J. -W. Kim; S. Bae; W. -Y. Cho; B. Lee; H. -Y. Jung; |
196 | Regularized Contrastive Pre-Training for Few-Shot Bioacoustic Sound Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In our study, we introduce a regularization to supervised contrastive loss, to learn non redundant features that exhibit effective transferability to few-shot tasks involving the detection of animal sounds not encountered during the training phase. |
I. Moummad; N. Farrugia; R. Serizel; |
197 | Crowdsourced Multilingual Speech Intelligibility Testing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Standards and recommendations are yet to be defined, and publicly available multilingual test materials are lacking. In response to this challenge, we propose an approach for a crowdsourced intelligibility assessment. |
L. Lechler; K. Wojcicki; |
198 | Audio Prompt Tuning for Universal Sound Separation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose audio prompt tuning (APT), a simple yet effective approach to enhance existing USS systems. |
Y. Liu; |
199 | Selecting N-Lowest Scores for Training MOS Prediction Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the hypothesis, we propose the more reliable representative value Nlow-MOS, the mean of the N-lowest opinion scores. |
Y. Kondo; H. Kameoka; K. Tanaka; T. Kaneko; |
200 | Audio Difference Learning for Audio Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study introduces a novel training paradigm, audio difference learning, for improving audio captioning. |
T. Komatsu; Y. Fujita; K. Takeda; T. Toda; |
201 | Small-Footprint Automatic Speech Recognition System Using Two-Stage Transfer Learning Based Symmetrized Ternary Weight Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional automatic speech recognition (ASR) models face challenges when deployed on edge devices due to their high computational requirements and storage demands. To address this issue, we present a novel ASR system specifically designed for edge applications, encompassing both keyword spotting (KWS) and speaker verification (SV) functionalities with on chip learning for speaker registration. |
X. Zhang; H. Kou; C. Xia; H. Cai; B. Liu; |
202 | Tracking Beyond The Unambiguous Range with Modulo Single-Photon Lidar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose an interpolation and denoising method that operates directly over the modulo samples. |
S. Fernández-Menduiña; J. Rapp; H. Mansour; M. Greiff; K. Parsons; |
203 | Modulo Sampling and Recovery in Shift-Invariant Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The recovery of high dynamic range shift-invariant (SI) signals presents a significant challenge in the realm of analog-to-digital converters (ADCs). A critical aspect in this … |
Y. Kvich; Y. C. Eldar; |
204 | Text2Avatar: Text to 3d Human Avatar Generation with Codebook-Driven Body Controllable Attribute Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, achieving multi-attribute controllable and realistic 3D human avatar generation is still challenging due to feature coupling and the scarcity of realistic 3D human avatar datasets. To address these issues, we propose Text2Avatar, which can generate realistic-style 3D avatars based on the coupled text prompts. |
C. Gong; |
205 | The Joint Grid-Free DOA and Polarization Estimation Algorithm Based on Atomic Norm Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issue of estimation accuracy degradation caused by off-grid in compressed sensing-based direction of arrival (DOA) estimation algorithms for polarized sensitive arrays, this paper proposes a joint estimation algorithm for two-dimensional DOA and polarization parameters based on the atomic norm minimization (ANM) theory for orthogonal dipole array. |
T. Chen; M. Li; Z. Liu; |
206 | A Learning-Based System for Automatic Intentional Non-Adherence Detection from Dosing Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel learning-based system combining vision and metadata for detecting potentially deceptive dosing videos. |
S. Feng; X. Lu; D. K. Desai; L. Guan; |
207 | MaDE: Multi-Scale Decision Enhancement for Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, this paper introduces a novel methodology, termed Multi-scale Decision Enhancement (MaDE), anchored by a dual-wise bisimulation framework for pre-training agent encoders. |
J. Ruan; R. Xie; X. Xiong; S. Xu; B. Xu; |
208 | Encoder-Minimal and Decoder-Minimal Framework for Remote Sensing Image Dehazing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose RSHazeNet, an encoder-minimal and decoder-minimal framework for efficient remote sensing image dehazing. |
Y. Wen; T. Gao; Z. Li; J. Zhang; T. Chen; |
209 | An Error Self-Corrected DOA Estimation Model for Sparse Array Based on ANM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a model based on atomic norm minimization for sparse array, which corrects the amplitude-phase errors and estimates direction-of-arrival parameters. |
T. Chen; Q. An; M. Li; |
210 | UAV Operation Time Minimization for Wireless-Powered Data Collection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose jointly optimizing the UAV’s trajectory and time allocation per GN to reduce operation time. |
Y. Zhang; D. Mishra; H. H. Gharakheili; D. Wing Kwan Ng; |
211 | Dicetrack: Lightweight Dice Classification on Resource-Constrained Platforms with Optimized Deep Learning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus on optimizing MobileNet for seamless ESP32 deployment and propose two novel ultra-lightweight models, Separable Convolutional Layers with Quantization Network (SCLQNet) and Binarized Neural Network (BNNet), for dice classification. |
C. El Zeinaty; G. Herrou; W. Hamidouche; D. Menard; |
212 | MMCOUNT: Stationary Crowd Counting System Based on Commodity Millimeter-Wave Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose that people’s respiration and natural fidgeting (restless behavior) carry valuable information, which could be captured by millimeter (mmWave) radar. |
K. Hu; H. Liao; M. Li; F. Wang; |
213 | Crowd Modeling and Control Via Cooperative Adaptive Filtering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a crowd modeling and motion control approach that employs diffusion adaptation within an adaptive network. |
Z. Wan; S. Sanei; |
214 | Deep Learning AMR Model Inference Acceleration with CFU for Edge Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This research investigates a promising approach to enhance the processing capabilities of edge devices by using the RISC-V Custom Function Unit (CFU) extension to offload computation to dedicated accelerator hardware. |
P. Hilei; M. Petruk; I. Korotkyi; O. Farenyuk; |
215 | Real-Time Stereo Speech Enhancement with Spatial-Cue Preservation Based on Dual-Path Structure Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a real-time, multichannel speech enhancement algorithm which maintains the spatial cues of stereo recordings including two speech sources. |
M. Togami; J. -M. Valin; K. Helwani; R. Giri; U. Isik; M. M. Goodwin; |
216 | SERC-GCN: Speech Emotion Recognition In Conversation Using Graph Convolutional Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, recent text-based emotion recognition methods have effectively shifted towards emotion recognition in conversation (ERC) that considers conversational context. Motivated by this shift, here we propose SERC-GCN, a method for speech emotion recognition in conversation (SERC) that predicts a speaker’s emotional state by incorporating conversational context, speaker interactions, and temporal dependencies between utterances. |
D. Chandola; E. Altarawneh; M. Jenkin; M. Papagelis; |
217 | Sensing-Assisted Distributed User Scheduling and Beamforming in Muli-Cell MmWave Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the emerging integrated sensing and communication (ISAC) technique for mmWave systems, we propose in this paper a new distributed user scheduling and beamforming framework with a small signaling overhead. |
T. Cai; L. Li; T. -H. Chang; |
218 | Enhanced Deep Reinforcement Learning for Parcel Singulation in Non-Stationary Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we frame the parcel singulation issue as a Markov Decision Process with a variable state space dimension, addressed through a deep reinforcement learning (RL) algorithm complemented by a State Space Standardization Module (S3). |
J. Shen; H. Lu; H. Zhang; S. Lyu; Y. Lu; |
219 | Unsupervised Human Activity Recognition Via Large Language Models and Iterative Evolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the challenge above, we propose a novel method named LLMIE-UHAR that that leverages LLMs and Iterative Evolution to realize Unsupervised HAR. |
J. Gao; Y. Zhang; Y. Chen; T. Zhang; B. Tang; X. Wang; |
220 | ANM-Based Source Localization Under Mixed Field Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a mixed field source localization algorithm based on atomic norm minimization (ANM) under low snapshots signal. |
T. Chen; Z. Liu; L. Zhan; |
221 | Reinforcement Learning Compensated Filter for Multi-Agents Cooperative Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In complex and unreliable environments, these algorithms often yield poor localization results. To address these issues, this paper proposes a multi-agent collaborative localization algorithm based on reinforcement learning compensation filtering to tackle localization problems in complex environments and improve the robustness and accuracy of the localization algorithm. |
R. Wang; J. Sun; C. Xu; R. Li; S. Duan; X. Zhang; |
222 | Quantum Ranging Enhanced TDoA Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we consider localization with quantum-based TDoA measurements. |
E. He; Y. Yang; C. Wu; |
223 | Contactless Radar Heart Rate Variability Monitoring Via Deep Spatio-Temporal Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing works neglect heartbeat-driven body surface motions spreading across the entire body with spatial variations, which limits their accuracy in identifying fine-grid consecutive heartbeat timings and overall HRV performance. In this paper, we propose to exploit the entire body reflections and model the inherent spatial-temporal relationship between these reflections and heartbeats by deep neural network for contactless HRV monitoring. |
H. Wang; |
224 | Quantum Inspired Image Augmentation Applicable to Waveguides and Optical Image Transfer Via Anderson Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a quantum inspired image augmentation protocol which is applicable to classical images and, in principle, due to its known quantum formulation applicable to quantum systems and quantum machine learning in the future. |
N. Palaiodimopoulos; V. F. Rey; M. Tschöpe; C. Jörg; P. Lukowicz; M. Kiefer-Emmanouilidis; |
225 | Political Tweet Sentiment Analysis for Public Opinion Polling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose two innovative methods for employing tweet sentiment analysis results for public opinion polling. |
A. Kaimakamidis; I. Pitas; |
226 | Enhanced Axle-Based Vehicle Classification Using Angle-Based Micro-Doppler Signature Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study introduces an angle-based micro-Doppler analysis using Frequency Modulated Continuous Wave (FMCW) radar tailored for axle-based vehicle classification. |
V. R. J. Deville; C. M. Lievers; J. H. Manton; |
227 | Applying Hybrid Quantum LSTM for Indoor Localization Based on RSSI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the superior performance of quantum algorithms, we explore Quantum LSTM (QLSTM) for indoor localization, leveraging a variational quantum circuit (VQC). |
S. F. Chien; D. Chieng; S. Y. C. Chen; C. C. Zarakovitis; H. S. Lim; Y. H. Xu; |
228 | Optimizing Trading Strategies in Quantitative Markets Using Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, we introduce two novel multi-agent RL (MARL) methods: CPPI-MADDPG and TIPP-MADDPG, tailored for probing strategic trading within quantitative markets. |
H. Zhang; Z. Shi; Y. Hu; W. Ding; E. E. Kuruoğlu; X. -P. Zhang; |
229 | Motif-Matching Based Sub-Braingraph Level Networks for Noisy Resting-State FMRI Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, several graph-based methods have been proposed for modeling the braingraph of brain disorders and brain disorders diagnosis. |
Y. Zhang; X. Liu; Z. Zhang; |
230 | Detecting Continuous Gravitational Waves Using Generated Training Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As participants, we present our approach in this paper. |
J. Herrmann; |
231 | Hardware-Limited Time Constant Estimation Using A Weighted Linear Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work discusses estimating the time constant of a decaying exponential’s ADC samples using a simple weighted linear regression and describes the on-chip implementation of the regression on a low-cost, low-power microprocessor. |
T. Yuan; F. Maksimovic; D. C. Burnett; K. S. J. Pister; |
232 | Joint Transmit Precoders and Passive Reflection Beamformer Design in IRS-Aided IoT Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This necessitates the need for efficient preprocessing of each SN’s observations to combat wireless fading effects and optimize transmit power utilization. In this context, this paper presents a novel approach that jointly designs the transmit precoding matrix (TPM) for IoT SNs and optimizes the phase reflection matrix (PRM) for the IRS. |
K. P. Rajput; L. Wu; M. R. Bhavani Shankar; P. K. Varshney; |
233 | RobustTSVar: A Robust Time Series Variance Estimation Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The time warping of the periodicity makes it further complicated. To deal with these challenges, we propose a robust algorithm called RobustTSVar, based on quantile regression for robust variance estimation. |
Z. Zhou; L. Yang; Q. Wen; L. Sun; |
234 | RoFi: Robust WiFi Intrusion Detection Via Distribution Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose RoFi, a robust WiFi intrusion detection system which can handle more complex scenarios. |
X. Wang; |
235 | Digital Task-Oriented Communication with Hardware-Limited Task-Based Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal is to minimize the mean squared error (MSE) between the transmitted and received task-relevant signals to achieve optimal task performance under a certain bit budget. |
W. Hu; Y. Yang; Y. C. Eldar; C. Feng; C. Guo; |
236 | Automotive Radar Interference Mitigation Via SINR Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The resulted ghost target interference will reduce the sensitivity of the radar sensor and increase the false alarm rate. To tackle this problem, in this paper, we make full use of two characteristics of interference to achieve ghost target interference mitigation in the Doppler domain. |
S. Yang; |
237 | A Low-Latency Fft-Ifft Cascade Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the design of a partly-parallel cascaded FFT-IFFT architecture that does not require any intermediate buffer. |
K. K. Parhi; |
238 | Cuffless Blood Pressure Estimation Using Magnetic Flux In A Ring Form Factor Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide an end-to-end analysis to assess the performance of our smart rings in estimating BP. |
S. A. G. Asgar; K. Sel; A. Paul; R. I. Pettigrew; R. Jafari; |
239 | UNeC: Unsupervised Exploring In Controllable Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The omnipresence of uncontrollable states, akin to noise, can impede the efficacy of unsupervised exploration. To address this issue, we introduce the UNeC framework, standing for UNsupervised Exploring in Controllable Space. |
X. Xiong; L. Meng; J. Ruan; S. Xu; B. Xu; |
240 | MAML-Based 24-Hour Personalized Blood Pressure Estimation from Wrist Photoplethysmography Signals in Free-Living Context Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A model agnostic meta learning (MAML)-based 24-hour personalized approach is proposed in this paper for BP estimation using wrist photoplethysmography (PPG) signals from smart watch in free-living context. |
J. -Y. Yang; C. -I. Ho; P. -Y. Tsai; H. -J. Lin; T. -D. Wang; |
241 | Aerial-IRS-Assisted Load Balancing In Downlink Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a model-free approach based on adaptive particle swarm optimization (APSO) and blind beamforming, which recovers the solution from random explorations of the solution space. |
S. Ren; B. Huang; X. Li; K. Shen; |
242 | Multi-Layer Relation Knowledge Distillation For Fingerprint Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduced a multi-layer relation knowledge distillation (MRKD). |
Y. -M. Chiu; C. -T. Chiu; D. -H. Luo; |
243 | A Concept for A Slam Back End Hardware Accelerator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This research aims to develop energy-efficient hardware accelerators for Simultaneous Location And Mapping (SLAM) back end applications by employing algorithm-hardware codesign. |
T. Henningson; S. Adalbjörnsson; A. Berkeman; C. Drougge; X. Erickson; A. Hunt; |
244 | Practical Challenge and Solution for IRS-Aided Indoor Localization System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an IRS-aided localization system using omnidirectional antennas and reveals two critical challenges in practical deployment. |
G. Zhang; D. Zhang; H. Deng; Y. Wu; F. Zhan; Y. Chen; |
245 | SVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel SNN-based VAD model, referred to as sVAD, which features an auditory encoder with an SNN-based attention mechanism. |
Q. Yang; Q. Liu; N. Li; M. Ge; Z. Song; H. Li; |
246 | Spiking-Leaf: A Learnable Auditory Front-End for Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their performance in speech processing remains limited due to the lack of an effective auditory front-end. To address this limitation, we introduce Spiking-LEAF, a learnable auditory front-end meticulously designed for SNN-based speech processing. |
Z. Song; J. Wu; M. Zhang; M. Z. Shou; H. Li; |
247 | Application of SNNS Model Based On Multi-Dimensional Attention In Drone Radio Frequency Signal Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Spiking Neural Networks (SNNs) are attracting attention due to their energy efficiency and importance in neuromorphic computing. Therefore, we propose an SNN-based method for classifying drone RF signals in complex electromagnetic environments. |
Z. Si; C. Liu; J. Liu; Y. Zhou; |
248 | Differentiable Quantum Architecture Search For Job Shop Scheduling Problem Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we redefine the operation pool and extend DQAS to a framework JSSP-DQAS by evaluating circuits to generate circuits for JSSP automatically. |
Y. Sun; J. Liu; Y. Ma; V. Tresp; |
249 | Low-Complexity GLRT Based Quickest Detection With Unknown Parameters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a drift-oriented GLRT (D-GLRT) algorithm is proposed to avoid the repeated enumerations and reduce the storage burden. |
P. Wang; Q. He; |
250 | Towards Enabling DPOAE Estimation on Single-Speaker Earbuds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces an innovative approach to trigger DPOAEs using single-speaker earbuds. |
I. Shahid; K. Al-Naimi; T. Dang; Y. Liu; F. Kawsar; A. Montanari; |
251 | Efficient 3D Position Estimation in Badminton Scene Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 2) The distribution of 3D spheres and humans in motion space differs significantly, making it difficult to employ a unified method. To address these issues, we propose a dual-view method that circumvents the occlusion problems associated with single-view methods while reducing the cost of multi-view methods. |
B. Han; L. Han; |
252 | F1-EV Score: Measuring The Likelihood of Estimating A Good Decision Threshold for Semi-Supervised Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, F1-EV a novel threshold-independent performance measure for ASD systems that also includes the likelihood of estimating a good decision threshold is proposed and motivated using specific toy examples. |
K. Wilkinghoff; K. Imoto; |
253 | SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SoundLoCD, a novel text-to-sound generation framework, which incorporates a LoRA-based conditional discrete contrastive latent diffusion model. |
X. Niu; J. Zhang; C. Walder; C. P. Martin; |
254 | StofNet: Super-Resolution Time of Flight Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper highlights the potential of modern super-resolution techniques to learn varying surroundings for a reliable and accurate ToF detection. |
C. Hahne; M. Hayoz; R. Sznitman; |
255 | Semi-Supervised Sound Event Detection with Local and Global Consistency Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we design a Local and Global Consistency (LGC) regularization scheme to enhance the model on both label- and feature-level. |
Y. Li; X. Wang; H. Liu; R. Tao; L. Yan; K. Ouchi; |
256 | Self-Supervised Learning for Anomalous Sound Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, feature exchange (FeatEx), a simple yet effective SSL approach for ASD, is proposed. |
K. Wilkinghoff; |
257 | “It Os Okay to Be Uncommon”: Quantizing Sound Event Detection Networks on Hardware Accelerators with Uncommon Sub-Byte Support Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify a new class of neural network accelerators (e.g., NE16 on GAP9) that allows network weights to be quantized to different common (e.g., 8 bits) and uncommon bit-widths (e.g., 3 bits). |
Y. Wu; X. Quan; M. R. Izadi; C. -C. J. Huang; |
258 | Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcity of large-scale publicly available music datasets with natural language captions. To address this, we propose the Music Understanding LLaMA (MU-LLaMA), capable of answering music-related questions and generating captions for music files. |
S. Liu; A. S. Hussain; C. Sun; Y. Shan; |
259 | CED: Consistent Ensemble Distillation for Audio Tagging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes CED, a simple training framework that distils student models from large teacher ensembles with consistent teaching. |
H. Dinkel; Y. Wang; Z. Yan; J. Zhang; Y. Wang; |
260 | Semi-Blind Estimation of Direct-to-Reverberant Energy Ratio Using Residual Energy Test Statistics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method for semi-blind estimation of direct-to-reverberant energy ratio (DRR) using a number of statistical features of the residual energy test (RENT), a recently proposed direct-path metric that evaluates the dominance of a direct or diffuse component in an ambisonics recording. |
A. Gökçe; H. Hacıhabiboğlu; |
261 | DJCM: A Deep Joint Cascade Model for Singing Voice Separation and Vocal Pitch Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the other hand, naive joint learning methods simply add the losses of both tasks, possibly leading to a misalignment between the distinct objectives of each task. To solve these problems, we propose a Deep Joint Cascade Model (DJCM) for singing voice separation and vocal pitch estimation. |
H. Wei; X. Cao; W. Xu; T. Dan; Y. Chen; |
262 | Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users Using Intermediate ASR Features and Human Memory Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work combines the use of Whisper ASR decoder layer representations as neural network input features with an exemplar-based, psychologically motivated model of human memory to predict human intelligibility ratings for hearing-aid users. |
R. Mogridge; |
263 | Vocal Fold Dynamics for Automatic Detection of Amyotrophic Lateral Sclerosis from Voice Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we represent voices with algorithmically estimated vocal fold dynamics from physical models of phonation. |
J. Zhang; R. Singh; |
264 | Improving Audio Captioning Models with Fine-Grained Audio Features, Text Embedding Supervision, and LLM Mix-Up Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Following the macro-trend of applied machine learning research, in this work, we strive to improve the performance of seq2seq AAC models by extensively leveraging pretrained models and large language models (LLMs). |
S. -L. Wu; |
265 | Localizing Acoustic Energy in Sound Field Synthesis By Directionally Weighted Exterior Radiation Suppression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A method for synthesizing the desired sound field while suppressing the exterior radiation power with directional weighting is proposed. |
Y. Tomita; S. Koyama; H. Saruwatari; |
266 | SPGM: Prioritizing Local Features for Enhanced Speech Separation Performance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, we propose the Single-Path Global Modulation (SPGM) block to replace inter-blocks. |
J. Q. Yip; |
267 | Voice Toxicity Detection Using Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel end-to-end multi-task learning (MTL) paradigm for audio-based toxicity detection, addressing the challenges associated with existing automatic speech recognition (ASR) and text-based systems. |
M. Kumar Nandwana; |
268 | Natural Language Supervision For General-Purpose Audio Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Contrastive Language-Audio Pretraining model that is pretrained with a diverse collection of 4.6M audio-text pairs employing two innovative encoders for Zero-Shot inference. |
B. Elizalde; S. Deshmukh; H. Wang; |
269 | Enhancing Note-Level Singing Transcription Model with Unlabeled and Weakly Labeled Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, this field suffers from a severe data scarcity problem. To address this issue, we developed a singing transcription model based on wav2vec 2.0, a pretrained speech representation model. |
Y. Qiu; J. Zhang; Y. Shan; J. Zhou; |
270 | Simultaneous Interior and Exterior Sound Field Synthesis Using Cylindrical and Spherical Loudspeaker Arrays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In actual implementation, the reproduced sound field is limited by the array’s finite length. To overcome this limitation, this paper proposes a sound field synthesis method using a linear loudspeaker array and multiple distributed spherical loudspeaker arrays. |
Y. Sasaki; Y. Nakayama; |
271 | Multi-CMGAN+/+: Leveraging Multi-Objective Speech Quality Metric Prediction for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, a non-intrusive multi-metric prediction approach is introduced, wherein a model trained on artificial labelled data using inference of an adversarially trained metric prediction neural network. |
G. Close; W. Ravenscroft; T. Hain; S. Goetze; |
272 | Soft Dynamic Time Warping with Variable Step Weights Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend SDTW to allow for variable step weights and provide efficient dynamic programming algorithms for the forward and backward passes. |
J. Zeitler; M. Krause; M. Müller; |
273 | ScoreDec: A Phase-Preserving High-Fidelity Audio Codec with A Generalized Score-Based Diffusion Post-Filter Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve human-level naturalness with a reasonable bitrate, preserve the original phase, and get rid of the tricky and opaque GAN training, we develop a score-based diffusion post-filter (SPF) in the complex spectral domain and combine our previous AudioDec with the SPF to propose ScoreDec, which can be trained using only spectral and score-matching losses. |
Y. -C. Wu; D. Marković; S. Krenn; I. D. Gebru; A. Richard; |
274 | Learning Audio Concepts from Counterfactual Natural Language Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study introduces causal reasoning and counterfactual analysis in the audio domain. |
A. Vosoughi; L. Bondi; H. -H. Wu; C. Xu; |
275 | Training Audio Captioning Models Without Audio Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The creation of these audio-caption pairs is costly, resulting in general data scarcity for the task. In this work, we address this major limitation and propose an approach to train AAC systems using only text. |
S. Deshmukh; B. Elizalde; D. Emmanouilidou; B. Raj; R. Singh; H. Wang; |
276 | Corn: Co-Trained Full- and No-Reference Speech Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Both the FR and NR approaches exhibit advantages and drawbacks relative to each other. In this paper, we present a novel framework called CORN that amalgamates these dual approaches, concurrently training both FR and NR models together. |
P. Manocha; D. Williamson; A. Finkelstein; |
277 | Multi-Channel Mosra: Mean Opinion Score and Room Acoustics Estimation Using Simulated Data and A Teacher Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Following our hypothesis that a model may benefit from multi-channel training, we develop a multi-channel model for joint MOS and room acoustics prediction (MOSRA) for five channels in parallel. |
J. Coldenhoff; A. Harper; P. Kendrick; T. Stojkovic; M. Cernak; |
278 | Unsupervised Acoustic Scene Mapping Based on Acoustic Features and Dimensionality Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an unsupervised data-driven approach that exploits the natural structure of the data. |
I. Cohen; S. Gannot; O. Lindenbaum; |
279 | Bringing The Discussion of Minima Sharpness to The Audio Domain: A Filter-Normalised Evaluation for Acoustic Scene Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The correlation between the sharpness of loss minima and generalisation in the context of deep neural networks has been subject to discussion for a long time. Whilst mostly investigated in the context of selected benchmark data sets in the area of computer vision, we explore this aspect for the acoustic scene classification task of the DCASE2020 challenge data. |
M. Milling; A. Triantafyllopoulos; I. Tsangko; S. D. N. Rampp; B. W. Schuller; |
280 | Beast: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose BEAt tracking Streaming Transformer (BEAST), an online joint beat and downbeat tracking system based on the streaming Transformer. |
C. -C. Chang; L. Su; |
281 | Improving Acoustic Echo Cancellation By Exploring Speech and Echo Affinity with Multi-Head Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DCA-Net, a dual-branch cross-attention neural network, to improve AEC performance by exploring the affinities between speech and echo in the representation space. |
Y. Zhang; X. Xu; W. Tu; |
282 | ASPED: An Audio Dataset for Detecting Pedestrians Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the new audio analysis task of pedestrian detection and present a new large-scale dataset for this task. |
P. Seshadri; C. Han; B. -W. Koo; N. Posner; S. Guhathakurta; A. Lerch; |
283 | Environmental Sound Synthesis from Vocal Imitations and Sound Event Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a framework for environmental sound synthesis from vocal imitations and sound event labels based on a framework of a vector quantized encoder and the Tacotron2 decoder. |
Y. Okamoto; K. Imoto; S. Takamichi; R. Nagase; T. Fukumori; Y. Yamashita; |
284 | Multi-Microphone Noise Data Augmentation for DNN-Based Own Voice Reconstruction for Hearables in Noisy Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate several noise data augmentation techniques based on measured transfer functions to simulate multi-microphone noise. |
M. Ohlenbusch; C. Rollwage; S. Doclo; |
285 | 3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the TSE task using microphone array and introduce a novel three-stage solution that systematically decouples the process: First, a neural network is trained to estimate the direction of the target speaker. |
S. He; J. Liu; H. Li; Y. Yang; F. Chen; X. Zhang; |
286 | Improving Music Source Separation with Simo Stereo Band-Split Rnn Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend BSRNN to single-input-multi-output (SIMO) and stereo mode where all tracks are jointly extracted with a same network that supports stereo signal modeling. |
Y. Luo; R. Gu; |
287 | A Study of Multichannel Spatiotemporal Features and Knowledge Distillation on Robust Target Speaker Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to improve the robustness of TSE, in this work we propose several new multichannel spatiotemporal features to represent the discriminability of the target speaker. |
Y. Wang; |
288 | Resource-Constrained Stereo Singing Voice Cancellation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of stereo singing voice cancellation, a subtask of music source separation, whose goal is to estimate an instrumental background from a stereo mix. |
C. Borrelli; J. Rae; D. Basaran; M. McVicar; M. Souden; M. Mauch; |
289 | Unsupervised Learning Based End-to-End Delayless Generative Fixed-Filter Active Noise Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Labelling noise data can be resource-intensive and may introduce some biases. In this paper, we propose an unsupervised-GFANC approach to simplify the 1D CNN training process and enhance its practicality. |
Z. Luo; D. Shi; X. Shen; W. -S. Gan; |
290 | Boosting Unknown-Number Speaker Separation with Transformer Decoder-Based Attractor Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel speech separation model designed to separate mixtures with an unknown number of speakers. |
Y. Lee; S. Choi; B. -Y. Kim; Z. -Q. Wang; S. Watanabe; |
291 | Srcodec: Split-Residual Vector Quantization for Neural Speech Codec Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SRCodec, a neural speech codec that relies on a fully convolutional encoder/decoder network with specifically proposed split-residual vector quantization. |
Y. Zheng; W. Tu; L. Xiao; X. Xu; |
292 | A Light-Weight State Detection Model for Kalman-Filter-Based Acoustic Feedback Cancellation with Rapid Recovery from Abrupt Path Changes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, the Kalman filter with a light-weight state detection model (KF-SD) is proposed to effectively improve the robustness of AFC against abrupt path changes. |
H. Guo; X. Le; K. Chen; J. Lu; |
293 | Phase Reconstruction in Single Channel Speech Enhancement Based on Phase Gradients and Estimated Clean-Speech Amplitudes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, using purely synthetic phase in speech enhancement yields unnatural-sounding output. Therefore we derive a closed-form phase estimate that combines the synthetic phase with that of the enhanced speech, yielding more natural output. |
Y. Song; N. Madhu; |
294 | Anomalous Sound Detection By Feature-Level Anomaly Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose AudDSR, a simulation-based anomaly detection method that learns to detect anomalies without additional annotated data and instead focuses on a discrete feature space sampling method for an anomaly simulation process. |
V. Zavrtanik; M. Marolt; M. Kristan; D. Skočaj; |
295 | Generating Stereophonic Music with Single-Stage Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The recent success of audio language models (LMs) has revolutionized the field of neural music generation. Among all audio LM approaches, MusicGen has demonstrated the success of … |
X. Li; |
296 | Reconstruction of Sound Field Through Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a data-driven generative model for reconstructing the magnitude of acoustic fields in rooms with a focus on the modal frequency range. |
F. Miotello; L. Comanducci; M. Pezzoli; A. Bernardini; F. Antonacci; A. Sarti; |
297 | DP-MAE: A Dual-Path Masked Autoencoder Based Self-Supervised Learning Method for Anomalous Sound Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel general-purpose audio representation learning method named Dual-Path Masked AutoEncoder (DPMAE) for anomalous sound detection (ASD) task. |
Z. -L. Liu; Y. Song; X. -M. Zeng; L. -R. Dai; I. McLoughlin; |
298 | A Lightweight Hybrid Multi-Channel Speech Extraction System with Directional Voice Activity Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a hybrid system that can more effectively integrate the generalized sidelobe canceller (GSC) and a lightweight post-filtering model under the assistance of spatial speaker activity information provided by a directional voice activity detection (DVAD) module. |
T. Sun; T. Lei; X. Zhang; Y. Hu; C. Zhu; J. Lu; |
299 | String Sound Synthesizer On Gpu-Accelerated Finite Difference Scheme Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a nonlinear string sound synthesizer, based on a finite difference simulation of the dynamic behavior of strings under various excitations. |
J. W. Lee; M. Jun Choi; K. Lee; |
300 | SMMA-Net: An Audio Clue-Based Target Speaker Extraction Network with Spectrogram Matching and Mutual Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a deep neural network with spectrogram matching and mutual attention (SMMA-Net) for audio clue-based target speaker extraction (TSE). |
Y. Hu; H. Xu; Z. Guo; H. Huang; L. He; |
This table only includes 300 papers selected based on paper id in proceeddings. To continue with the full list (~2,700 papers), please visit Paper Digest: ICASSP-2024 (Full List).