A list of researchers in the area of speech ordered by the number of relevant publications, for the purpose of identifying potential academic supervisors.
Report exported at 2024-10-16 04:11:58, see here for how it is created.
Export parameters: --year_start 2019 --year_end 2024 --year_shift 1 --author_start_year 1900 --exclude_venue SSW,ASRU,IWSLT,SLT --n_pubs 20 --rank_start 0 --rank_end 200 --output speech_rankings.html
TASLP2024
Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe 0001, 
End-to-End Speech Recognition: A Survey.
TASLP2024
Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe 0001, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis.
TASLP2024
Shu-Wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li 0001, Abdelrahman Mohamed, Shinji Watanabe 0001, Hung-yi Lee, 
A Large-Scale Evaluation of Speech Foundation Models.
ICASSP2024
Siddhant Arora, George Saon, Shinji Watanabe 0001, Brian Kingsbury, 
Semi-Autoregressive Streaming ASR with Label Context.
ICASSP2024
Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe 0001, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang, 
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
ICASSP2024
William Chen, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe 0001, 
Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing.
ICASSP2024
Kwanghee Choi, Jee-Weon Jung, Shinji Watanabe 0001, 
Understanding Probe Behaviors Through Variational Bounds of Mutual Information.
ICASSP2024
Samuele Cornell, Jee-Weon Jung, Shinji Watanabe 0001, Stefano Squartini, 
One Model to Rule Them All ? Towards End-to-End Joint Speaker Diarization and Speech Recognition.
ICASSP2024
Ruizhe Huang, Xiaohui Zhang 0007, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe 0001, Daniel Povey, Sanjeev Khudanpur, 
Less Peaky and More Accurate CTC Forced Alignment by Label Priors.
ICASSP2024
Chien-Yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan S. Sharma, Shinji Watanabe 0001, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee, 
Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech.
ICASSP2024
Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization.
ICASSP2024
Amir Hussein, Dorsa Zeinali, Ondrej Klejch, Matthew Wiesner, Brian Yan, Shammur Absar Chowdhury, Ahmed Ali 0002, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Speech Collage: Code-Switched Audio Generation by Collaging Monolingual Corpora.
ICASSP2024
Jee-Weon Jung, Roshan S. Sharma, William Chen, Bhiksha Raj, Shinji Watanabe 0001, 
AugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language Models.
ICASSP2024
Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe 0001, Yong Man Ro, 
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens.
ICASSP2024
Doyeop Kwak, Jaemin Jung, Kihyun Nam, Youngjoon Jang, Jee-Weon Jung, Shinji Watanabe 0001, Joon Son Chung, 
VoxMM: Rich Transcription of Conversations in the Wild.
ICASSP2024
Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhongqiu Wang, Shinji Watanabe 0001, 
Boosting Unknown-Number Speaker Separation with Transformer Decoder-Based Attractor.
ICASSP2024
Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji Watanabe 0001, 
Hubertopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model.
ICASSP2024
Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-Weon Jung, Xuankai Chang, Shinji Watanabe 0001, 
VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks.
ICASSP2024
Salvador Medina, Sarah L. Taylor, Carsten Stoll, Gareth Edwards, Alex Hauptmann 0001, Shinji Watanabe 0001, Iain A. Matthews, 
PhISANet: Phonetically Informed Speech Animation Network.
ICASSP2024
Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe 0001, Karen Livescu, 
Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models.
TASLP2024
Shujie Hu, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Yi Wang, Mingyu Cui, Tianzi Wang, Helen Meng, Xunying Liu, 
Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition.
TASLP2024
Jingbei Li, Sipan Li, Ping Chen, Luwen Zhang, Yi Meng, Zhiyong Wu 0001, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang 0002, 
Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing.
TASLP2024
Dongchao Yang, Songxiang Liu, Rongjie Huang, Chao Weng, Helen Meng, 
InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt.
ICASSP2024
Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction.
ICASSP2024
Xueyuan Chen, Xi Wang 0016, Shaofei Zhang, Lei He 0005, Zhiyong Wu 0001, Xixin Wu, Helen Meng, 
Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis.
ICASSP2024
Qiaochu Huang, Xu He, Boshi Tang, Haolin Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu 0001, Haozhi Huang 0004, Helen Meng, 
Enhancing Expressiveness in Dance Generation Via Integrating Frequency and Music Style Information.
ICASSP2024
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Dan Luo, Zhiyong Wu 0001, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han 0001, Helen Meng, 
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.
ICASSP2024
Zhe Li, Man-Wai Mak, Helen Mei-Ling Meng, 
Dual Parameter-Efficient Fine-Tuning for Speaker Representation Via Speaker Prompt Tuning and Adapters.
ICASSP2024
Zhiwei Lin, Jun Chen 0024, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu 0001, Helen Meng, 
Multi-View Midivae: Fusing Track- and Bar-View Representations for Long Multi-Track Symbolic Music Generation.
ICASSP2024
Hui Lu, Xixin Wu, Haohan Guo, Songxiang Liu, Zhiyong Wu 0001, Helen Meng, 
Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.
ICASSP2024
Binzhu Sha, Xu Li 0015, Zhiyong Wu 0001, Ying Shan, Helen Meng, 
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion.
ICASSP2024
Yuejiao Wang, Xixin Wu, Disong Wang, Lingwei Meng, Helen Meng, 
UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization.
ICASSP2024
Haiwei Xue, Sicheng Yang, Zhensong Zhang, Zhiyong Wu 0001, Minglei Li 0001, Zonghong Dai, Helen Meng, 
Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models.
ICML2024
Dongchao Yang, Jinchuan Tian, Xu Tan 0003, Rongjie Huang, Songxiang Liu, Haohan Guo, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian 0002, Zhou Zhao, Xixin Wu, Helen M. Meng, 
UniAudio: Towards Universal Audio Generation with Large Language Models.
TASLP2023
Haohan Guo, Fenglong Xie, Xixin Wu, Frank K. Soong, Helen Meng, 
MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS.
TASLP2023
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Zhiyong Wu 0001, Xixin Wu, Shiyin Kang, Helen Meng, 
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.
TASLP2023
Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu, 
Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition.
TASLP2023
Xixin Wu, Hui Lu, Kun Li 0003, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms.
TASLP2023
Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, Helen Meng, 
Meta-Generalization for Domain-Invariant Speaker Verification.
ICASSP2023
Jun Chen 0024, Wei Rao, Zilin Wang, Jiuxin Lin, Zhiyong Wu 0001, Yannan Wang, Shidong Shang, Helen Meng, 
Inter-Subnet: Speech Enhancement with Subband Interaction.
SpeechComm2024
Shuai Wang 0016, Zhengyang Chen, Bing Han, Hongji Wang, Chengdong Liang, Binbin Zhang, Xu Xiang, Wen Ding, Johan Rohdin, Anna Silnova, Yanmin Qian, Haizhou Li 0001, 
Advancing speaker embedding learning: Wespeaker toolkit for research and production.
TASLP2024
Lei Liu, Li Liu 0036, Haizhou Li 0001, 
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition.
TASLP2024
Tianchi Liu 0004, Kong Aik Lee, Qiongqiong Wang, Haizhou Li 0001, 
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification.
TASLP2024
Rui Liu 0008, Berrak Sisman, Guanglai Gao, Haizhou Li 0001, 
Controllable Accented Text-to-Speech Synthesis With Fine and Coarse-Grained Intensity Rendering.
TASLP2024
Congcong Sun, Hui Tian 0002, Peng Tian, Haizhou Li 0001, Zhenxing Qian, 
Multi-Agent Deep Learning for the Detection of Multiple Speech Steganography Methods.
TASLP2024
Wupeng Wang, Zexu Pan, Xinke Li, Shuai Wang 0016, Haizhou Li 0001, 
Speech Separation With Pretrained Frontend to Minimize Domain Mismatch.
TASLP2024
Koichiro Yoshino, Yun-Nung Chen, Paul A. Crook, Satwik Kottur, Jinchao Li, Behnam Hedayatnia, Seungwhan Moon, Zhengcong Fei, Zekang Li, Jinchao Zhang, Yang Feng 0004, Jie Zhou 0016, Seokhwan Kim, Yang Liu 0004, Di Jin, Alexandros Papangelis, Karthik Gopalakrishnan 0001, Dilek Hakkani-Tur, Babak Damavandi, Alborz Geramifard, Chiori Hori, Ankit Shah, Chen Zhang 0020, Haizhou Li 0001, João Sedoc, Luis F. D'Haro, Rafael E. Banchs, Alexander Rudnicky, 
Overview of the Tenth Dialog System Technology Challenge: DSTC10.
TASLP2024
Mingyang Zhang 0003, Yi Zhou 0020, Yi Ren 0006, Chen Zhang 0020, Xiang Yin 0006, Haizhou Li 0001, 
RefXVC: Cross-Lingual Voice Conversion With Enhanced Reference Leveraging.
TASLP2024
Xuehao Zhou, Mingyang Zhang 0003, Yi Zhou 0020, Zhizheng Wu 0001, Haizhou Li 0001, 
Accented Text-to-Speech Synthesis With Limited Data.
ICASSP2024
Yu Chen, Xinyuan Qian, Zexu Pan, Kainan Chen, Haizhou Li 0001, 
LOCSELECT: Target Speaker Localization with an Auditory Selective Hearing Mechanism.
ICASSP2024
Sho Inoue, Kun Zhou 0003, Shuai Wang 0016, Haizhou Li 0001, 
Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis.
ICASSP2024
Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li 0001, 
Prompt-Driven Target Speech Diarization.
ICASSP2024
Junjie Li, Ruijie Tao, Zexu Pan, Meng Ge, Shuai Wang 0016, Haizhou Li 0001, 
Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-Talker Speech.
ICASSP2024
Yi Ma, Kong Aik Lee, Ville Hautamäki, Meng Ge, Haizhou Li 0001, 
Gradient Weighting for Speaker Verification in Extremely Low Signal-to-Noise Ratio.
ICASSP2024
Zeyang Song, Jibin Wu, Malu Zhang, Mike Zheng Shou, Haizhou Li 0001, 
Spiking-Leaf: A Learnable Auditory Front-End for Spiking Neural Networks.
ICASSP2024
Shuai Wang 0016, Qibing Bai, Qi Liu 0018, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li 0001, 
Leveraging in-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition.
ICASSP2024
Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li 0001, 
SVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks.
AAAI2024
Rui Liu 0008, Yifan Hu, Yi Ren 0006, Xiang Yin 0006, Haizhou Li 0001, 
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling.
AAAI2024
Jiadong Wang, Zexu Pan, Malu Zhang, Robby T. Tan, Haizhou Li 0001, 
Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition.
SpeechComm2023
Buddhi Wickramasinghe, Eliathamby Ambikairajah, Vidhyasaharan Sethu, Julien Epps, Haizhou Li 0001, Ting Dang, 
DNN controlled adaptive front-end for replay attack detection systems.
SpeechComm2024
Li Zhang 0106, Ning Jiang, Qing Wang 0039, Yue Li, Quan Lu, Lei Xie 0001, 
Whisper-SV: Adapting Whisper for low-data-resource speaker verification.
TASLP2024
Tao Li, Zhichao Wang 0002, Xinfa Zhu, Jian Cong, Qiao Tian, Yuping Wang, Lei Xie 0001, 
U-Style: Cascading U-Nets With Multi-Level Speaker and Style Modeling for Zero-Shot Voice Cloning.
TASLP2024
Qijie Shao, Pengcheng Guo, Jinghao Yan, Pengfei Hu 0004, Lei Xie 0001, 
Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition.
TASLP2024
Zhichao Wang 0002, Liumeng Xue, Qiuqiang Kong, Lei Xie 0001, Yuanzhe Chen, Qiao Tian, Yuping Wang, 
Multi-Level Temporal-Channel Speaker Retrieval for Zero-Shot Voice Conversion.
TASLP2024
Kun Wei, Bei Li, Hang Lv 0001, Quan Lu, Ning Jiang, Lei Xie 0001, 
Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation.
TASLP2024
Jixun Yao, Qing Wang 0039, Pengcheng Guo, Ziqian Ning, Lei Xie 0001, 
Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted Matrix.
TASLP2024
Xinfa Zhu, Yi Lei, Tao Li, Yongmao Zhang, Hongbin Zhou, Heng Lu 0004, Lei Xie 0001, 
METTS: Multilingual Emotional Text-to-Speech by Cross-Speaker and Cross-Lingual Emotion Transfer.
ICASSP2024
Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu 0004, Shuai Wang, Jixun Yao, Lei Xie 0001, Mengxiao Bi, 
Dualvc 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion.
ICASSP2024
He Wang, Pengcheng Guo, Pan Zhou, Lei Xie 0001, 
MLCA-AVSR: Multi-Layer Cross Attention Fusion Based Audio-Visual Speech Recognition.
ICASSP2024
Ziqian Wang, Xinfa Zhu, Zihan Zhang, Yuanjun Lv, Ning Jiang, Guoqing Zhao, Lei Xie 0001, 
SELM: Speech Enhancement using Discrete Tokens and Language Models.
ICASSP2024
Jixun Yao, Yuguang Yang 0005, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, Jingjing Yin, Hongbin Zhou, Heng Lu 0004, Lei Xie 0001, 
Promptvc: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts.
ACL2024
Zhichao Wang 0002, Yuanzhe Chen, Xinsheng Wang, Lei Xie 0001, Yuping Wang, 
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion.
TASLP2023
Tao Li, Chenxu Hu, Jian Cong, Xinfa Zhu, Jingbei Li, Qiao Tian, Yuping Wang, Lei Xie 0001, 
DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech - A Study Between English and Mandarin.
TASLP2023
Zhichao Wang 0002, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie 0001, Qiao Tian, Yuping Wang, 
MSM-VC: High-Fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-Scale Style Modeling.
TASLP2023
Qing Wang 0039, Jixun Yao, Li Zhang 0106, Pengcheng Guo, Lei Xie 0001, 
Timbre-Reserved Adversarial Attack in Speaker Identification.
ICASSP2023
Mingshuai Liu, Shubo Lv, Zihan Zhang, Runduo Han, Xiang Hao, Xianjun Xia, Li Chen, Yijian Xiao, Lei Xie 0001, 
Two-Stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge.
ICASSP2023
Ziqian Ning, Qicong Xie, Pengcheng Zhu 0004, Zhichao Wang 0002, Liumeng Xue, Jixun Yao, Lei Xie 0001, Mengxiao Bi, 
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features.
ICASSP2023
Kun Song, Yongmao Zhang, Yi Lei, Jian Cong, Hanzhao Li, Lei Xie 0001, Gang He, Jinfeng Bai, 
DSPGAN: A Gan-Based Universal Vocoder for High-Fidelity TTS by Time-Frequency Domain Supervision from DSP.
ICASSP2023
Zhichao Wang 0002, Xinsheng Wang, Lei Xie 0001, Yuanzhe Chen, Qiao Tian, Yuping Wang, 
Delivering Speaking Style in Low-Resource Voice Conversion with Multi-Factor Constraints.
ICASSP2023
Xiaopeng Yan, Yindi Yang, Zhihao Guo, Liangliang Peng, Lei Xie 0001, 
The NPU-Elevoc Personalized Speech Enhancement System for Icassp2023 DNS Challenge.
SpeechComm2024
Shuai Wang 0016, Zhengyang Chen, Bing Han, Hongji Wang, Chengdong Liang, Binbin Zhang, Xu Xiang, Wen Ding, Johan Rohdin, Anna Silnova, Yanmin Qian, Haizhou Li 0001, 
Advancing speaker embedding learning: Wespeaker toolkit for research and production.
TASLP2024
Zhengyang Chen, Bing Han, Shuai Wang 0016, Yanmin Qian, 
Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer.
TASLP2024
Xun Gong 0005, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, Rui Zhao 0017, Xie Chen 0001, Yanmin Qian, 
Advanced Long-Content Speech Recognition With Factorized Neural Transducer.
TASLP2024
Bing Han, Zhengyang Chen, Yanmin Qian, 
Self-Supervised Learning With Cluster-Aware-DINO for High-Performance Robust Speaker Verification.
TASLP2024
Jiahong Li, Chenda Li, Yifei Wu, Yanmin Qian, 
Unified Cross-Modal Attention: Robust Audio-Visual Speech Recognition and Beyond.
TASLP2024
Bei Liu, Haoyu Wang 0007, Yanmin Qian, 
Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization.
TASLP2024
Wei Wang 0010, Yanmin Qian, 
Universal Cross-Lingual Data Generation for Low Resource ASR.
ICASSP2024
Bing Han, Zhiqiang Lv, Anbai Jiang, Wen Huang 0004, Zhengyang Chen, Yufeng Deng, Jiawei Ding, Cheng Lu 0007, Wei-Qiang Zhang 0001, Pingyi Fan, Jia Liu 0001, Yanmin Qian, 
Exploring Large Scale Pre-Trained Models for Robust Machine Anomalous Sound Detection.
ICASSP2024
Wen Huang 0004, Bing Han, Shuai Wang 0016, Zhengyang Chen, Yanmin Qian, 
Robust Cross-Domain Speaker Verification with Multi-Level Domain Adapters.
ICASSP2024
Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li 0001, 
Prompt-Driven Target Speech Diarization.
ICASSP2024
Hang Shao, Bei Liu, Yanmin Qian, 
One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models.
ICASSP2024
Shuai Wang 0016, Qibing Bai, Qi Liu 0018, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li 0001, 
Leveraging in-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition.
ICASSP2024
Linfeng Yu, Wangyou Zhang, Chenpeng Du, Leying Zhang, Zheng Liang, Yanmin Qian, 
Generation-Based Target Speech Extraction with Speech Discretization and Vocoder.
ICASSP2024
Wangyou Zhang, Jee-weon Jung, Yanmin Qian, 
Improving Design of Input Condition Invariant Speech Enhancement.
TASLP2023
Bei Liu, Zhengyang Chen, Yanmin Qian, 
Depth-First Neural Architecture With Attentive Feature Fusion for Efficient Speaker Verification.
ICASSP2023
Xun Gong 0005, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, Rui Zhao 0017, Xie Chen 0001, Yanmin Qian, 
LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer.
ICASSP2023
Xun Gong 0005, Wei Wang 0010, Hang Shao, Xie Chen 0001, Yanmin Qian, 
Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR.
ICASSP2023
Bing Han, Zhengyang Chen, Yanmin Qian, 
Exploring Binary Classification Loss for Speaker Verification.
ICASSP2023
Jiahong Li, Chenda Li, Yifei Wu, Yanmin Qian, 
Robust Audio-Visual ASR with Unified Cross-Modal Attention.
ICASSP2023
Chenda Li, Yao Qian, Zhuo Chen 0006, Dongmei Wang, Takuya Yoshioka, Shujie Liu 0001, Yanmin Qian, Michael Zeng 0001, 
Target Sound Extraction with Variable Cross-Modality Clues.
TASLP2024
Jiaming Cheng, Ruiyu Liang, Lin Zhou 0001, Li Zhao 0003, Chengwei Huang, Björn W. Schuller, 
Residual Fusion Probabilistic Knowledge Distillation for Speech Enhancement.
TASLP2024
Ruiyu Liang, Yue Xie, Jiaming Cheng, Cong Pang, Björn W. Schuller, 
A Non-Invasive Speech Quality Evaluation Algorithm for Hearing Aids With Multi-Head Self-Attention and Audiogram-Based Features.
ICASSP2024
Zhongren Dong, Zixing Zhang 0001, Weixiang Xu, Jing Han 0010, Jianjun Ou, Björn W. Schuller, 
HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech.
ICASSP2024
Xiangheng He, Junjie Chen, Björn W. Schuller, 
Task Selection and Assignment for Multi-Modal Multi-Task Dialogue Act Classification with Non-Stationary Multi-Armed Bandits.
ICASSP2024
Cheng Lu 0005, Yuan Zong, Hailun Lian, Yan Zhao, Björn W. Schuller, Wenming Zheng, 
Improving Speaker-Independent Speech Emotion Recognition using Dynamic Joint Distribution Adaptation.
ICASSP2024
Manuel Milling, Andreas Triantafyllopoulos, Iosif Tsangko, Simon David Noel Rampp, Björn Wolfgang Schuller, 
Bringing the Discussion of Minima Sharpness to the Audio Domain: A Filter-Normalised Evaluation for Acoustic Scene Classification.
ICASSP2024
Liyizhe Peng, Zixing Zhang 0001, Tao Pang, Jing Han 0010, Huan Zhao 0003, Hao Chen, Björn W. Schuller, 
Customising General Large Language Models for Specialised Emotion Recognition Tasks.
ICASSP2024
Yong Wang, Cheng Lu 0005, Hailun Lian, Yan Zhao, Björn W. Schuller, Yuan Zong, Wenming Zheng, 
Speech Swin-Transformer: Exploring a Hierarchical Transformer with Shifted Windows for Speech Emotion Recognition.
ICASSP2024
Yan Zhao, Jincen Wang, Cheng Lu 0005, Sunan Li, Björn W. Schuller, Yuan Zong, Wenming Zheng, 
Emotion-Aware Contrastive Adaptation Network for Source-Free Cross-Corpus Speech Emotion Recognition.
ICASSP2023
Felix Burkhardt, Anna Derington, Matthias Kahlau, Klaus R. Scherer, Florian Eyben, Björn W. Schuller, 
Masking Speech Contents by Random Splicing: is Emotional Expression Preserved?
ICASSP2023
Yi Chang 0004, Zhao Ren, Thanh Tam Nguyen, Kun Qian 0003, Björn W. Schuller, 
Knowledge Transfer for on-Device Speech Emotion Recognition With Neural Structured Learning.
ICASSP2023
Najla D. Al Futaisi, Alejandrina Cristià, Björn W. Schuller, 
Hearttoheart: The Arts of Infant Versus Adult-Directed Speech Classification.
ICASSP2023
Shuo Liu 0012, Adria Mallol-Ragolta, Björn W. Schuller, 
COVID-19 Detection from Speech in Noisy Conditions.
ICASSP2023
Zhao Ren, Thanh Tam Nguyen, Yi Chang 0004, Björn W. Schuller, 
Fast Yet Effective Speech Emotion Recognition with Self-Distillation.
ICASSP2023
Georgios Rizos, Rafael A. Calvo, Björn W. Schuller, 
Positive-Pair Redundancy Reduction Regularisation for Speech-Based Asthma Diagnosis Prediction.
ICASSP2023
Meishu Song, Andreas Triantafyllopoulos, Zijiang Yang 0007, Hiroki Takeuchi, Toru Nakamura, Akifumi Kishi, Tetsuro Ishizawa, Kazuhiro Yoshiuchi, Xin Jing, Vincent Karas, Zhonghao Zhao, Kun Qian 0003, Bin Hu 0001, Björn W. Schuller, Yoshiharu Yamamoto, 
Daily Mental Health Monitoring from Speech: A Real-World Japanese Dataset and Multitask Learning Analysis.
ICASSP2023
Panagiotis Tzirakis, Alice Baird, Jeffrey A. Brooks, Christopher Gagne, Lauren Kim, Michael Opara, Christopher B. Gregory, Jacob Metrick, Garrett Boseck, Vineet Tiruvadi, Björn W. Schuller, Dacher Keltner, Alan Cowen, 
Large-Scale Nonverbal Vocalization Detection Using Transformers.
ICASSP2023
Xinzhou Xu, Jun Deng, Zixing Zhang 0001, Zhen Yang, Björn W. Schuller, 
Zero-Shot Speech Emotion Recognition Using Generative Learning with Reconstructed Prototypes.
ICASSP2023
Yongzi Yu, Wanyong Qiu, Chen Quan, Kun Qian 0003, Zhihua Wang, Yu Ma, Bin Hu 0001, Björn W. Schuller, Yoshiharu Yamamoto, 
Federated Intelligent Terminals Facilitate Stuttering Monitoring.
ICASSP2023
Ziping Zhao 0001, Huan Wang, Haishuai Wang, Björn W. Schuller, 
Hierarchical Network with Decoupled Knowledge Distillation for Speech Emotion Recognition.
TASLP2024
Kai-Wei Chang, Haibin Wu, Yu-Kai Wang, Yuan-Kuei Wu, Hua Shen, Wei-Cheng Tseng, Iu-thing Kang, Shang-wen Li 0001, Hung-Yi Lee, 
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks.
TASLP2024
Shensian Syu, Juncheng Xie, Hung-yi Lee, 
Improving Non-Autoregressive Translation Quality With Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC.
TASLP2024
Shu-Wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li 0001, Abdelrahman Mohamed, Shinji Watanabe 0001, Hung-yi Lee, 
A Large-Scale Evaluation of Speech Foundation Models.
ICASSP2024
Kevin Everson, Yile Gu, Chao-Han Huck Yang, Prashanth Gurunath Shivakumar, Guan-Ting Lin, Jari Kolehmainen, Ivan Bulyko, Ankur Gandhe, Shalini Ghosh, Wael Hamza, Hung-Yi Lee, Ariya Rastrow, Andreas Stolcke, 
Towards ASR Robust Spoken Language Understanding Through in-Context Learning with Word Confusion Networks.
ICASSP2024
Chien-Yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan S. Sharma, Shinji Watanabe 0001, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee, 
Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech.
ICASSP2024
Kuan-Po Huang, Chih-Kai Yang, Yu-Kuan Fu, Ewan Dunbar, Hung-Yi Lee, 
Zero Resource Code-Switched Speech Benchmark Using Speech Utterance Pairs for Multiple Spoken Languages.
ICASSP2024
Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li 0001, Abdelrahman Mohamed, Hung-Yi Lee, Lin-Shan Lee, 
SpeechDPR: End-To-End Spoken Passage Retrieval For Open-Domain Spoken Question Answering.
ICASSP2024
Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-Yi Lee, Ivan Bulyko, 
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue.
ICASSP2024
Yuan Tseng, Layne Berry, Yiting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Poyao Huang 0001, Chun-Mao Lai, Shang-Wen Li 0001, David Harwath, Yu Tsao 0001, Abdelrahman Mohamed, Chi-Luen Feng, Hung-Yi Lee, 
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.
ICASSP2024
Haibin Wu, Heng-Cheng Kuo, Yu Tsao 0001, Hung-Yi Lee, 
Scalable Ensemble-Based Detection Method Against Adversarial Attacks For Speaker Verification.
ACL2024
Guan-Ting Lin, Cheng-Han Chiang, Hung-yi Lee, 
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations.
ACL-Findings2024
Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan S. Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe 0001, 
On the Evaluation of Speech Foundation Models for Spoken Language Understanding.
TASLP2023
Yun-Yen Chuang, Hung-Min Hsu, Kevin Lin, Ray-I Chang, Hung-Yi Lee, 
MetaEx-GAN: Meta Exploration to Improve Natural Language Generation via Generative Adversarial Networks.
TASLP2023
Po-Chun Hsu, Da-Rong Liu, Andy T. Liu, Hung-yi Lee, 
Parallel Synthesis for Autoregressive Speech Generation.
ICASSP2023
Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-Yi Lee, David Harwath, 
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval.
ICASSP2023
Hsuan-Jui Chen, Yen Meng, Hung-yi Lee, 
Once-for-All Sequence Compression for Self-Supervised Speech Models.
ICASSP2023
Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola García, Hung-Yi Lee, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Euro: Espnet Unsupervised ASR Open-Source Toolkit.
ICASSP2023
Chan-Jan Hsu, Ho-Lam Chung, Hung-Yi Lee, Yu Tsao 0001, 
T5lephone: Bridging Speech and Text Self-Supervised Models for Spoken Language Understanding Via Phoneme Level T5.
ICASSP2023
Sung-Feng Huang, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-Yi Lee, 
Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning.
ICASSP2023
Kuan-Po Huang, Tzu-hsun Feng, Yu-Kuan Fu, Tsu-Yuan Hsu, Po-Chieh Yen, Wei-Cheng Tseng, Kai-Wei Chang, Hung-Yi Lee, 
Ensemble Knowledge Distillation of Self-Supervised Speech Models.
TASLP2024
Shujie Hu, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Yi Wang, Mingyu Cui, Tianzi Wang, Helen Meng, Xunying Liu, 
Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition.
TASLP2024
Zengrui Jin, Mengzhe Geng, Jiajun Deng, Tianzi Wang, Shujie Hu, Guinan Li, Xunying Liu, 
Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition.
ICASSP2024
Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction.
ICASSP2024
Jiajun Deng, Xurong Xie, Guinan Li, Mingyu Cui, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Zhaoqing Li, Xunying Liu, 
Towards High-Performance and Low-Latency Feature-Based Speaker Adaptation of Conformer Speech Recognition Systems.
ICASSP2024
Zengrui Jin, Xurong Xie, Tianzi Wang, Mengzhe Geng, Jiajun Deng, Guinan Li, Shujie Hu, Xunying Liu, 
Towards Automatic Data Augmentation for Disordered Speech Recognition.
ICASSP2024
Huimeng Wang, Zengrui Jin, Mengzhe Geng, Shujie Hu, Guinan Li, Tianzi Wang, Haoning Xu, Xunying Liu, 
Enhancing Pre-Trained ASR System Fine-Tuning for Dysarthric Speech Recognition Using Adversarial Data Augmentation.
TASLP2023
Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Guinan Li, Shujie Hu, Xunying Liu, 
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems.
TASLP2023
Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu, 
Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition.
TASLP2023
Xixin Wu, Hui Lu, Kun Li 0003, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms.
ICASSP2023
Shujie Hu, Xurong Xie, Zengrui Jin, Mengzhe Geng, Yi Wang, Mingyu Cui, Jiajun Deng, Xunying Liu, Helen Meng, 
Exploring Self-Supervised Pre-Trained ASR Models for Dysarthric and Elderly Speech Recognition.
ICASSP2023
Zengrui Jin, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shujie Hu, Jiajun Deng, Guinan Li, Xunying Liu, 
Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition.
ICASSP2023
Jinchao Li, Kaitao Song, Junan Li, Bo Zheng, Dongsheng Li 0002, Xixin Wu, Xunying Liu, Helen Meng, 
Leveraging Pretrained Representations With Task-Related Keywords for Alzheimer's Disease Detection.
ICASSP2023
Jinchao Li, Xixin Wu, Kaitao Song, Dongsheng Li 0002, Xunying Liu, Helen Meng, 
A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition.
ICASSP2023
Xurong Xie, Xunying Liu, Hui Chen 0020, Hongan Wang, 
Unsupervised Model-Based Speaker Adaptation of End-To-End Lattice-Free MMI Model for Speech Recognition.
Interspeech2023
Mingyu Cui, Jiawen Kang 0002, Jiajun Deng, Xi Yin 0010, Yutao Xie, Xie Chen 0001, Xunying Liu, 
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems.
Interspeech2023
Jiajun Deng, Guinan Li, Xurong Xie, Zengrui Jin, Mingyu Cui, Tianzi Wang, Shujie Hu, Mengzhe Geng, Xunying Liu, 
Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems.
Interspeech2023
Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Jiajun Deng, Mingyu Cui, Guinan Li, Jianwei Yu, Xurong Xie, Xunying Liu, 
Use of Speech Impairment Severity for Dysarthric Speech Recognition.
Interspeech2023
Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui Jin, Tianzi Wang, Shujie Hu, Zi Ye 0001, Helen Meng, Xunying Liu, 
On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition.
Interspeech2023
Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Guinan Li, Tianzi Wang, Helen Meng, Xunying Liu, 
Exploiting Cross-Domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition.
Interspeech2023
Zhaoqing Li, Tianzi Wang, Jiajun Deng, Junhao Xu, Shoukang Hu, Xunying Liu, 
Lossless 4-bit Quantization of Architecture Compressed Conformer ASR Systems on the 300-hr Switchboard Corpus.
TASLP2024
Hao Zhang, Yixuan Zhang 0005, Meng Yu 0003, Dong Yu 0001, 
Enhanced Acoustic Howling Suppression via Hybrid Kalman Filter and Deep Learning Models.
ICASSP2024
Muqiao Yang, Chunlei Zhang, Yong Xu 0004, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu 0001, 
uSee: Unified Speech Enhancement And Editing with Conditional Diffusion Models.
ICML2024
Manjie Xu, Chenxing Li, Duzhen Zhang, Dan Su 0002, Wei Liang, Dong Yu 0001, 
Prompt-guided Precise Audio Editing with Diffusion Models.
ACL2024
Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Jinchuan Tian, Zhenhui Ye, Luping Liu, Zehan Wang 0001, Ziyue Jiang 0001, Xuankai Chang, Jiatong Shi, Chao Weng, Zhou Zhao, Dong Yu 0001, 
Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.
ACL2024
Yongxin Zhu 0003, Dan Su 0002, Liqiang He, Linli Xu, Dong Yu 0001, 
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer.
TASLP2023
Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu 0001, 
Unsupervised TTS Acoustic Modeling for TTS With Conditional Disentangled Sequential VAE.
TASLP2023
Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu 0001, 
Integrating Lattice-Free MMI Into End-to-End Speech Recognition.
TASLP2023
Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu 0001, 
Diffsound: Discrete Diffusion Model for Text-to-Sound Generation.
Interspeech2023
Yong Xu 0004, Vinay Kothapally, Meng Yu 0003, Shixiong Zhang, Dong Yu 0001, 
Zoneformer: On-device Neural Beamformer For In-car Multi-zone Speech Separation, Enhancement and Echo Cancellation.
Interspeech2023
Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu 0001, Shinji Watanabe 0001, 
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction.
Interspeech2023
Wei Xiao, Wenzhe Liu, Meng Wang, Shan Yang, Yupeng Shi, Yuyong Kang, Dan Su 0002, Shidong Shang, Dong Yu 0001, 
Multi-mode Neural Speech Coding Based on Deep Generative Networks.
Interspeech2023
Yuping Yuan, Zhao You, Shulin Feng, Dan Su 0002, Yanchun Liang 0001, Xiaohu Shi, Dong Yu 0001, 
Compressed MoE ASR Model Based on Knowledge Distillation and Quantization.
Interspeech2023
Hao Zhang, Meng Yu 0003, Yuzhong Wu, Tao Yu, Dong Yu 0001, 
Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression.
Interspeech2023
Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu 0001, Zhao You, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation.
EMNLP2023
Dian Yu 0001, Xiaoyang Wang, Wanshun Chen, Nan Du, Longyue Wang, Haitao Mi, Dong Yu 0001, 
More Than Spoken Words: Nonverbal Message Extraction and Generation.
ACL-Findings2023
Rongjie Huang, Chunlei Zhang, Yi Ren 0006, Zhou Zhao, Dong Yu 0001, 
Prosody-TTS: Improving Prosody with Masked Autoencoder and Conditional Diffusion Model For Expressive Text-to-Speech.
ICASSP2022
Jiachen Lian, Chunlei Zhang, Dong Yu 0001, 
Robust Disentangled Variational Speech Representation Learning for Zero-Shot Voice Conversion.
ICASSP2022
Songxiang Liu, Shan Yang, Dan Su 0002, Dong Yu 0001, 
Referee: Towards Reference-Free Cross-Speaker Style Transfer with Low-Quality Data for Expressive Speech Synthesis.
ICASSP2022
Dongpeng Ma, Yiwen Wang, Liqiang He, Mingjie Jin, Dan Su 0002, Dong Yu 0001, 
DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition.
ICASSP2022
Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu 0003, Zhenyu Tang 0001, Dinesh Manocha, Dong Yu 0001, 
Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator.
TASLP2024
Jingbei Li, Sipan Li, Ping Chen, Luwen Zhang, Yi Meng, Zhiyong Wu 0001, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang 0002, 
Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing.
ICASSP2024
Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction.
ICASSP2024
Xueyuan Chen, Xi Wang 0016, Shaofei Zhang, Lei He 0005, Zhiyong Wu 0001, Xixin Wu, Helen Meng, 
Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis.
ICASSP2024
Qiaochu Huang, Xu He, Boshi Tang, Haolin Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu 0001, Haozhi Huang 0004, Helen Meng, 
Enhancing Expressiveness in Dance Generation Via Integrating Frequency and Music Style Information.
ICASSP2024
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Dan Luo, Zhiyong Wu 0001, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han 0001, Helen Meng, 
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.
ICASSP2024
Xingda Li, Fan Zhuo, Dan Luo, Jun Chen 0024, Shiyin Kang, Zhiyong Wu 0001, Tao Jiang, Yang Li, Han Fang, Yahui Zhou, 
Generating Stereophonic Music with Single-Stage Language Models.
ICASSP2024
Zhiwei Lin, Jun Chen 0024, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu 0001, Helen Meng, 
Multi-View Midivae: Fusing Track- and Bar-View Representations for Long Multi-Track Symbolic Music Generation.
ICASSP2024
Hui Lu, Xixin Wu, Haohan Guo, Songxiang Liu, Zhiyong Wu 0001, Helen Meng, 
Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.
ICASSP2024
Binzhu Sha, Xu Li 0015, Zhiyong Wu 0001, Ying Shan, Helen Meng, 
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion.
ICASSP2024
Haiwei Xue, Sicheng Yang, Zhensong Zhang, Zhiyong Wu 0001, Minglei Li 0001, Zonghong Dai, Helen Meng, 
Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models.
ICASSP2024
Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu 0001, 
FreeTalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness.
AAAI2024
Yaoxun Xu, Hangting Chen, Jianwei Yu, Qiaochu Huang, Zhiyong Wu 0001, Shi-Xiong Zhang, Guangzhi Li, Yi Luo 0004, Rongzhi Gu, 
SECap: Speech Emotion Captioning with Large Language Model.
TASLP2023
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Zhiyong Wu 0001, Xixin Wu, Shiyin Kang, Helen Meng, 
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.
TASLP2023
Xixin Wu, Hui Lu, Kun Li 0003, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms.
ICASSP2023
Jun Chen 0024, Wei Rao, Zilin Wang, Jiuxin Lin, Zhiyong Wu 0001, Yannan Wang, Shidong Shang, Helen Meng, 
Inter-Subnet: Speech Enhancement with Subband Interaction.
ICASSP2023
Jun Chen 0024, Yupeng Shi, Wenzhe Liu, Wei Rao, Shulin He, Andong Li, Yannan Wang, Zhiyong Wu 0001, Shidong Shang, Chengshi Zheng, 
Gesper: A Unified Framework for General Speech Restoration.
ICASSP2023
Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fuping Pan, Zhiyong Wu 0001, 
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech.
ICASSP2023
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis.
ICASSP2023
Jiuxin Lin, Xinyu Cai, Heinrich Dinkel, Jun Chen 0024, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Zhiyong Wu 0001, Yujun Wang, Helen Meng, 
Av-Sepformer: Cross-Attention Sepformer for Audio-Visual Target Speaker Extraction.
ICASSP2023
Xingchen Song, Di Wu 0061, Zhiyong Wu 0001, Binbin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu, 
TrimTail: Low-Latency Streaming ASR with Simple But Effective Spectrogram-Level Length Penalty.
TASLP2024
Xun Gong 0005, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, Rui Zhao 0017, Xie Chen 0001, Yanmin Qian, 
Advanced Long-Content Speech Recognition With Factorized Neural Transducer.
TASLP2024
Xiaofei Wang 0007, Manthan Thakker, Zhuo Chen 0006, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu 0001, Jinyu Li 0001, Takuya Yoshioka, 
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
TASLP2024
Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu 0012, Shujie Liu 0001, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Furu Wei, 
VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation.
TASLP2024
Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu 0012, Shuo Ren, Shujie Liu 0001, Zhuoyuan Yao, Xun Gong 0005, Li-Rong Dai 0001, Jinyu Li 0001, Furu Wei, 
SpeechLM: Enhanced Speech Pre-Training With Unpaired Textual Data.
ICASSP2024
Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li 0001, Yashesh Gaur, 
Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation.
ICASSP2024
Yiming Wang, Jinyu Li 0001, 
Residualtransformer: Residual Low-Rank Learning With Weight-Sharing For Transformer Layers.
ICASSP2024
Jian Wu 0027, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao 0017, Zhuo Chen 0006, Jinyu Li 0001, 
T-SOT FNT: Streaming Multi-Talker ASR with Text-Only Domain Adaptation Capability.
ICASSP2024
Mu Yang, Naoyuki Kanda, Xiaofei Wang 0009, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li 0001, Takuya Yoshioka, 
Diarist: Streaming Speech Translation with Speaker Diarization.
ICML2024
Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan 0003, Detai Xin, Dongchao Yang, Eric Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu 0001, Tao Qin 0001, Xiangyang Li 0001, Wei Ye 0004, Shikun Zhang, Jiang Bian 0002, Lei He 0005, Jinyu Li 0001, Sheng Zhao, 
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
ICASSP2023
Zhuo Chen 0006, Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiaofei Wang 0009, Takuya Yoshioka, Jinyu Li 0001, Sunit Sivasankaran, Sefik Emre Eskimez, 
Speech Separation with Large-Scale Self-Supervised Learning.
ICASSP2023
Ruchao Fan, Yiming Wang, Yashesh Gaur, Jinyu Li 0001, 
CTCBERT: Advancing Hidden-Unit Bert with CTC Objectives.
ICASSP2023
Xun Gong 0005, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, Rui Zhao 0017, Xie Chen 0001, Yanmin Qian, 
LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer.
ICASSP2023
Zili Huang, Zhuo Chen 0006, Naoyuki Kanda, Jian Wu 0027, Yiming Wang, Jinyu Li 0001, Takuya Yoshioka, Xiaofei Wang 0009, Peidong Wang, 
Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.
ICASSP2023
Naoyuki Kanda, Jian Wu 0027, Xiaofei Wang 0009, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
ICASSP2023
Xiaoqiang Wang 0006, Yanqing Liu, Jinyu Li 0001, Sheng Zhao, 
Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation.
ICASSP2023
Kun Wei, Long Zhou, Ziqiang Zhang, Liping Chen, Shujie Liu 0001, Lei He 0005, Jinyu Li 0001, Furu Wei, 
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation.
ICASSP2023
Jian Wu 0027, Zhuo Chen 0006, Min Hu, Xiong Xiao, Jinyu Li 0001, 
Speaker Change Detection For Transformer Transducer ASR.
ICASSP2023
Muqiao Yang, Naoyuki Kanda, Xiaofei Wang 0009, Jian Wu 0027, Sunit Sivasankaran, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.
ICASSP2023
Rui Zhao 0017, Jian Xue, Partha Parthasarathy, Veljko Miljanic, Jinyu Li 0001, 
Fast and Accurate Factorized Neural Transducer for Text Adaption of End-to-End Speech Recognition Models.
Interspeech2023
Yuang Li, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, 
Accelerating Transducers through Adjacent Token Merging.
TASLP2024
Hassan Taherian, DeLiang Wang, 
Multi-Channel Conversational Speaker Separation via Neural Diarization.
ICASSP2024
Vahid Ahmadi Kalkhorani, Anurag Kumar 0003, Ke Tan 0001, Buye Xu, DeLiang Wang, 
Audiovisual Speaker Separation with Full- and Sub-Band Modeling in the Time-Frequency Domain.
ICASSP2024
Hassan Taherian, Ashutosh Pandey 0004, Daniel Wong, Buye Xu, DeLiang Wang, 
Leveraging Sound Localization to Improve Continuous Speaker Separation.
TASLP2023
Ashutosh Pandey 0004, DeLiang Wang, 
Attentive Training: A New Training Framework for Speech Enhancement.
TASLP2023
Yixuan Zhang 0005, Heming Wang, DeLiang Wang, 
$F0$ Estimation and Voicing Detection With Cascade Architecture in Noisy Speech.
ICASSP2023
Hassan Taherian, DeLiang Wang, 
Multi-Resolution Location-Based Training for Multi-Channel Continuous Speech Separation.
ICASSP2023
Heming Wang, Yao Qian, Hemin Yang, Nauyuki Kanda, Peidong Wang, Takuya Yoshioka, Xiaofei Wang 0009, Yiming Wang, Shujie Liu 0001, Zhuo Chen 0006, DeLiang Wang, Michael Zeng 0001, 
DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks.
ICASSP2023
Heming Wang, DeLiang Wang, 
Cross-Domain Diffusion Based Speech Enhancement for Very Noisy Speech.
Interspeech2023
Vahid Ahmadi Kalkhorani, Anurag Kumar 0003, Ke Tan 0001, Buye Xu, DeLiang Wang, 
Time-domain Transformer-based Audiovisual Speaker Separation.
Interspeech2023
Hassan Taherian, Ashutosh Pandey 0004, Daniel Wong, Buye Xu, DeLiang Wang, 
Multi-input Multi-output Complex Spectral Mapping for Speaker Separation.
Interspeech2023
Yufeng Yang, Ashutosh Pandey 0004, DeLiang Wang, 
Time-Domain Speech Enhancement for Robust Automatic Speech Recognition.
TASLP2022
Ashutosh Pandey 0004, DeLiang Wang, 
Self-Attending RNN for Speech Enhancement to Improve Cross-Corpus Generalization.
TASLP2022
Hassan Taherian, Ke Tan 0001, DeLiang Wang, 
Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training.
TASLP2022
Ke Tan 0001, Zhong-Qiu Wang, DeLiang Wang, 
Neural Spectrospatial Filtering.
TASLP2022
Heming Wang, DeLiang Wang, 
Neural Cascade Architecture With Triple-Domain Loss for Speech Enhancement.
TASLP2022
Heming Wang, Xueliang Zhang 0001, DeLiang Wang, 
Fusing Bone-Conduction and Air-Conduction Sensors for Complex-Domain Speech Enhancement.
TASLP2022
Hao Zhang, DeLiang Wang, 
Neural Cascade Architecture for Multi-Channel Acoustic Echo Suppression.
ICASSP2022
Ashutosh Pandey 0004, Buye Xu, Anurag Kumar 0003, Jacob Donley, Paul Calamia, DeLiang Wang, 
TPARN: Triple-Path Attentive Recurrent Network for Time-Domain Multichannel Speech Enhancement.
ICASSP2022
Hassan Taherian, Ke Tan 0001, DeLiang Wang, 
Location-Based Training for Multi-Channel Talker-Independent Speaker Separation.
ICASSP2022
Heming Wang, Yao Qian, Xiaofei Wang 0009, Yiming Wang, Chengyi Wang 0002, Shujie Liu 0001, Takuya Yoshioka, Jinyu Li 0001, DeLiang Wang, 
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.
TASLP2024
Hang Chen, Qing Wang 0008, Jun Du, Bao-Cai Yin, Jia Pan, Chin-Hui Lee 0001, 
Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition.
TASLP2024
Zilu Guo, Qing Wang 0008, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui Lee 0001, 
A Variance-Preserving Interpolation Approach for Diffusion Models With Applications to Single Channel Speech Enhancement and Recognition.
ICASSP2024
Hanbo Cheng, Jun Du, Pengfei Hu 0006, Jiefeng Ma, Zhenrong Zhang, Mobai Xue, 
Viewing Writing as Video: Optical Flow based Multi-Modal Handwritten Mathematical Expression Recognition.
ICASSP2024
Feng Ma, Yanhui Tu, Maokui He, Ruoyu Wang 0029, Shutong Niu, Lei Sun 0010, Zhongfu Ye, Jun Du, Jia Pan, Chin-Hui Lee 0001, 
A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition.
ICASSP2024
Haotian Wang, Jun Du, Yusheng Dai, Chin-Hui Lee 0001, Yuling Ren, Yu Liu, 
Improving Multi-Modal Emotion Recognition Using Entropy-Based Fusion and Pruning-Based Network Architecture Optimization.
ICASSP2024
Minghui Wu, Haitao Tang, Jiahuan Fan, Ruoyu Wang, Hang Chen, Yanyong Zhang, Jun Du, Hengshun Zhou, Lei Sun, Xin Fang, Tian Gao, Genshun Wan, Jia Pan, Jianqing Gao, 
Implicit Enhancement of Target Speaker in Speaker-Adaptive ASR through Efficient Joint Optimization.
ICASSP2024
Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang 0029, Hongbo Lan, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao, 
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
ICASSP2024
Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang 0029, Yanyan Yue, Shuangqing Qian, Shilong Wu, Jun Du, Chin-Hui Lee 0001, 
Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture.
SpeechComm2023
Shi Cheng, Jun Du, Shutong Niu, Alejandrina Cristià, Xin Wang 0037, Qing Wang 0008, Chin-Hui Lee 0001, 
Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions.
SpeechComm2023
Li Chai 0002, Hang Chen, Jun Du, Qing-Feng Liu, Chin-Hui Lee 0001, 
Space-and-speaker-aware acoustic modeling with effective data augmentation for recognition of multi-array conversational speech.
TASLP2023
Mao-Kui He, Jun Du, Qing-Feng Liu, Chin-Hui Lee 0001, 
ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding.
TASLP2023
Shutong Niu, Jun Du, Lei Sun 0010, Yu Hu 0003, Chin-Hui Lee 0001, 
QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization.
TASLP2023
Jie Zhang 0042, Rui Tao, Jun Du, Li-Rong Dai 0001, 
SDW-SWF: Speech Distortion Weighted Single-Channel Wiener Filter for Noise Reduction.
ICASSP2023
Hang Chen, Shilong Wu, Yusheng Dai, Zhe Wang, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge.
ICASSP2023
Shutong Niu, Jun Du, Qing Wang 0008, Li Chai 0002, Huaxin Wu, Zhaoxu Nian, Lei Sun 0010, Yi Fang, Jia Pan, Chin-Hui Lee 0001, 
An Experimental Study on Sound Event Localization and Detection Under Realistic Testing Conditions.
ICASSP2023
Ruoyu Wang 0029, Jun Du, Tian Gao, 
Quantum Transfer Learning Using the Large-Scale Unsupervised Pre-Trained Model Wavlm-Large for Synthetic Speech Detection.
ICASSP2023
Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.
ICASSP2023
Chenyue Zhang, Hang Chen, Jun Du, Bao-Cai Yin, Jia Pan, Chin-Hui Lee 0001, 
Incorporating Visual Information Reconstruction into Progressive Learning for Optimizing audio-visual Speech Enhancement.
Interspeech2023
Zilu Guo, Jun Du, Chin-Hui Lee 0001, Yu Gao, Wenbin Zhang, 
Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement.
Interspeech2023
Shutong Niu, Jun Du, Maokui He, Chin-Hui Lee 0001, Baoxiang Li, Jiakui Li, 
Unsupervised Adaptation with Quality-Aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization.
TASLP2024
Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe 0001, 
End-to-End Speech Recognition: A Survey.
ICASSP2024
Junwen Bai, Bo Li 0028, Qiujia Li, Tara N. Sainath, Trevor Strohman, 
Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR.
ICASSP2024
Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li 0028, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal, 
USM-Lite: Quantization and Sparsity Aware Fine-Tuning for Speech Recognition with Universal Speech Models.
ICASSP2024
Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara N. Sainath, Françoise Beaufays, Pedro Moreno Mengibar, 
Improving Speech Recognition for African American English with Audio Classification.
ICASSP2024
W. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang 0033, Yongqiang Wang, Shuo-Yiin Chang, Tara N. Sainath, 
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study.
ICASSP2024
Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno 0001, 
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models.
ICASSP2024
Haozhe Shan, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara N. Sainath, 
Augmenting Conformers With Structured State-Space Sequence Models For Online Speech Recognition.
ICASSP2024
Khe Chai Sim, Zhouyuan Huo, Tsendsuren Munkhdalai, Nikhil Siddhartha, Adam Stooke, Zhong Meng, Bo Li 0028, Tara N. Sainath, 
A Comparison of Parameter-Efficient ASR Domain Adaptation Methods for Universal Speech and Language Models.
NAACL2024
Weiran Wang, Rohit Prabhavalkar, Haozhe Shan, Zhong Meng, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li 0028, James Qin, Xingyu Cai, Adam Stooke, Chengjian Zheng, Yanzhang He, Tara N. Sainath, Pedro Moreno Mengibar, 
Massive End-to-end Speech Recognition Models with Time Reduction.
ICASSP2023
Rami Botros, Rohit Prabhavalkar, Johan Schalkwyk, Ciprian Chelba, Tara N. Sainath, Françoise Beaufays, 
Lego-Features: Exporting Modular Encoder Features for Streaming and Deliberation ASR.
ICASSP2023
Shuo-Yiin Chang, Chao Zhang 0031, Tara N. Sainath, Bo Li 0028, Trevor Strohman, 
Context-Aware end-to-end ASR Using Self-Attentive Embedding and Tensor Fusion.
ICASSP2023
Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw, 
Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models.
ICASSP2023
Ke Hu, Tara N. Sainath, Bo Li 0028, Nan Du 0002, Yanping Huang, Andrew M. Dai, Yu Zhang 0033, Rodrigo Cabrera, Zhifeng Chen, Trevor Strohman, 
Massively Multilingual Shallow Fusion with Large Language Models.
ICASSP2023
W. Ronny Huang, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, David Rybach, Robert David, Rohit Prabhavalkar, Cyril Allauzen, Cal Peyser, Trevor D. Strohman, 
E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model.
ICASSP2023
Zhouyuan Huo, Khe Chai Sim, Bo Li 0028, Dongseong Hwang, Tara N. Sainath, Trevor Strohman, 
Resource-Efficient Transfer Learning from Speech Foundation Model Using Hierarchical Feature Fusion.
ICASSP2023
Bo Li 0028, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang 0033, Wei Han 0002, Trevor Strohman, Françoise Beaufays, 
Efficient Domain Adaptation for Speech Foundation Models.
ICASSP2023
Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang 0033, Bo Li 0028, Andrew Rosenberg, Bhuvana Ramabhadran, 
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition.
ICASSP2023
Cal Peyser, Michael Picheny, Kyunghyun Cho, Rohit Prabhavalkar, W. Ronny Huang, Tara N. Sainath, 
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale.
ICASSP2023
Tara N. Sainath, Rohit Prabhavalkar, Diamantino Caseiro, Pat Rondon, Cyril Allauzen, 
Improving Contextual Biasing with Text Injection.
ICASSP2023
Weiran Wang, Ding Zhao, Shaojin Ding, Hao Zhang 0010, Shuo-Yiin Chang, David Rybach, Tara N. Sainath, Yanzhang He, Ian McGraw, Shankar Kumar, 
Multi-Output RNN-T Joint Networks for Multi-Task Learning of ASR and Auxiliary Tasks.
TASLP2024
Cunhang Fan, Mingming Ding, Jianhua Tao 0001, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Zhao Lv, 
Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection.
ICASSP2024
Yong Ren, Tao Wang 0074, Jiangyan Yi, Le Xu, Jianhua Tao 0001, Chu Yuan Zhang, Junzuo Zhou, 
Fewer-Token Neural Speech Codec with Time-Invariant Codes.
ICASSP2024
Chenglong Wang, Jiayi He, Jiangyan Yi, Jianhua Tao 0001, Chu Yuan Zhang, Xiaohui Zhang 0006, 
Multi-Scale Permutation Entropy for Audio Deepfake Detection.
ICASSP2024
Mingyu Xu, Zheng Lian, Bin Liu 0041, Zerui Chen, Jianhua Tao 0001, 
Pseudo Labels Regularization for Imbalanced Partial-Label Learning.
AAAI2024
Xiaohui Zhang 0006, Jiangyan Yi, Chenglong Wang, Chu Yuan Zhang, Siding Zeng, Jianhua Tao 0001, 
What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection.
SpeechComm2023
Jiangyan Yi, Jianhua Tao 0001, Ye Bai, Zhengkun Tian, Cunhang Fan, 
Transfer knowledge for punctuation prediction via adversarial training.
TASLP2023
Jiangyan Yi, Jianhua Tao 0001, Ruibo Fu, Tao Wang 0074, Chu Yuan Zhang, Chenglong Wang, 
Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings.
ICASSP2023
Guanjun Li, Wei Xue, Wenju Liu, Jiangyan Yi, Jianhua Tao 0001, 
GCC-Speaker: Target Speaker Localization with Optimal Speaker-Dependent Weighting in Multi-Speaker Scenarios.
ICASSP2023
Jinlong Xue, Yayue Deng, Fengping Wang, Ya Li, Yingming Gao, Jianhua Tao 0001, Jianqing Sun, Jiaen Liang, 
M2-CTTS: End-to-End Multi-Scale Multi-Modal Conversational Text-to-Speech Synthesis.
Interspeech2023
Haiyang Sun, Zheng Lian, Bin Liu 0041, Ying Li, Jianhua Tao 0001, Licai Sun, Cong Cai, Meng Wang, Yuan Cheng, 
EmotionNAS: Two-stream Neural Architecture Search for Speech Emotion Recognition.
Interspeech2023
Chenglong Wang, Jiangyan Yi, Jianhua Tao 0001, Chu Yuan Zhang, Shuai Zhang 0014, Xun Chen, 
Detection of Cross-Dataset Fake Audio Based on Prosodic and Pronunciation Features.
Interspeech2023
Chenglong Wang, Jiangyan Yi, Jianhua Tao 0001, Chu Yuan Zhang, Shuai Zhang 0014, Ruibo Fu, Xun Chen, 
TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection.
Interspeech2023
Ruiteng Zhang, Jianguo Wei, Xugang Lu, Yongwei Li, Junhai Xu, Di Jin 0001, Jianhua Tao 0001, 
SOT: Self-supervised Learning-Assisted Optimal Transport for Unsupervised Adaptive Speech Emotion Recognition.
ICML2023
Xiaohui Zhang 0006, Jiangyan Yi, Jianhua Tao 0001, Chenglong Wang, Chu Yuan Zhang, 
Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection.
SpeechComm2022
Wenhuan Lu, Xinyue Zhao, Na Guo, Yongwei Li, Jianguo Wei, Jianhua Tao 0001, Jianwu Dang 0001, 
One-shot emotional voice conversion based on feature separation.
TASLP2022
Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao 0001, Zhengqi Wen, 
NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.
TASLP2022
Tao Wang 0074, Jiangyan Yi, Ruibo Fu, Jianhua Tao 0001, Zhengqi Wen, 
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.
ICASSP2022
Cong Cai, Bin Liu 0041, Jianhua Tao 0001, Zhengkun Tian, Jiahao Lu, Kexin Wang, 
End-to-End Network Based on Transformer for Automatic Detection of Covid-19.
ICASSP2022
Ya Li, Mingyue Niu, Ziping Zhao 0001, Jianhua Tao 0001, 
Automatic Depression Level Assessment from Speech By Long-Term Global Information Embedding.
ICASSP2022
Tao Wang 0074, Jiangyan Yi, Liqun Deng, Ruibo Fu, Jianhua Tao 0001, Zhengqi Wen, 
Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.
SpeechComm2024
Yuqin Lin, Jianwu Dang 0001, Longbiao Wang, Sheng Li 0010, Chenchen Ding, 
Disordered speech recognition considering low resources and abnormal articulation.
SpeechComm2024
Nan Li, Longbiao Wang, Meng Ge, Masashi Unoki, Sheng Li 0010, Jianwu Dang 0001, 
Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network.
TASLP2024
Cheng Gong, Xin Wang 0037, Erica Cooper, Dan Wells, Longbiao Wang, Jianwu Dang 0001, Korin Richmond, Junichi Yamagishi, 
ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations.
TASLP2024
Rui Liu 0008, Yifan Hu, Haolin Zuo, Zhaojie Luo, Longbiao Wang, Guanglai Gao, 
Text-to-Speech for Low-Resource Agglutinative Language With Morphology-Aware Language Model Pre-Training.
TASLP2024
Xiao Wei, Yuhang Li, Yuke Si, Longbiao Wang, Xiaobao Wang, Jianwu Dang 0001, 
A Prompt-Based Hierarchical Pipeline for Cross-Domain Slot Filling.
ICASSP2024
Chunyu Qiang, Hao Li, Yixin Tian, Ruibo Fu, Tao Wang 0074, Longbiao Wang, Jianwu Dang 0001, 
Learning Speech Representation from Contrastive Token-Acoustic Pretraining.
ICASSP2024
Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang 0001, 
High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models.
ICASSP2024
Linjuan Zhang, Kong Aik Lee, Lin Zhang, Longbiao Wang, Baoning Niu, 
CPAUG: Refining Copy-Paste Augmentation for Speech Anti-Spoofing.
TASLP2023
Yuqin Lin, Longbiao Wang, Yanbing Yang, Jianwu Dang 0001, 
CFDRN: A Cognition-Inspired Feature Decomposition and Recombination Network for Dysarthric Speech Recognition.
TASLP2023
Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, Helen Meng, 
Meta-Generalization for Domain-Invariant Speaker Verification.
ICASSP2023
Hui Chen, Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, 
Self-Supervised Audio-Visual Speaker Representation with Co-Meta Learning.
ICASSP2023
Yuhao Liu, Cheng Gong, Longbiao Wang, Xixin Wu, Qiuyu Liu, Jianwu Dang 0001, 
VF-Taco2: Towards Fast and Lightweight Synthesis for Autoregressive Models with Variation Autoencoder and Feature Distillation.
ICASSP2023
Xiaohui Liu, Meng Liu, Longbiao Wang, Kong Aik Lee, Hanyi Zhang, Jianwu Dang 0001, 
Leveraging Positional-Related Local-Global Dependency for Synthetic Speech Detection.
ICASSP2023
Meng Liu, Kong Aik Lee, Longbiao Wang, Hanyi Zhang, Chang Zeng, Jianwu Dang 0001, 
Cross-Modal Audio-Visual Co-Learning for Text-Independent Speaker Verification.
ICASSP2023
Haoyu Lu, Nan Li, Tongtong Song, Longbiao Wang, Jianwu Dang 0001, Xiaobao Wang, Shiliang Zhang, 
Speech and Noise Dual-Stream Spectrogram Refine Network With Speech Distortion Loss For Robust Speech Recognition.
ICASSP2023
Hao Shi, Masato Mimura, Longbiao Wang, Jianwu Dang 0001, Tatsuya Kawahara, 
Time-Domain Speech Enhancement Assisted by Multi-Resolution Frequency Encoder and Decoder.
ICASSP2023
Yao Sun, Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, 
Noise-Disentanglement Metric Learning for Robust Speaker Verification.
ICASSP2023
Yiwei Wei, Shaozu Yuan, Meng Chen 0006, Longbiao Wang, 
Enhancing Multimodal Alignment with Momentum Augmentation for Dense Video Captioning.
Interspeech2023
Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang 0001, Chengyun Deng, Fei Wang, 
Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation.
Interspeech2023
Junjie Li, Meng Ge, Zexu Pan, Rui Cao, Longbiao Wang, Jianwu Dang 0001, Shiliang Zhang, 
Rethinking the Visual Cues in Audio-Visual Speaker Extraction.
SpeechComm2024
Yuqin Lin, Jianwu Dang 0001, Longbiao Wang, Sheng Li 0010, Chenchen Ding, 
Disordered speech recognition considering low resources and abnormal articulation.
SpeechComm2024
Nan Li, Longbiao Wang, Meng Ge, Masashi Unoki, Sheng Li 0010, Jianwu Dang 0001, 
Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network.
TASLP2024
Cheng Gong, Xin Wang 0037, Erica Cooper, Dan Wells, Longbiao Wang, Jianwu Dang 0001, Korin Richmond, Junichi Yamagishi, 
ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations.
TASLP2024
Xiao Wei, Yuhang Li, Yuke Si, Longbiao Wang, Xiaobao Wang, Jianwu Dang 0001, 
A Prompt-Based Hierarchical Pipeline for Cross-Domain Slot Filling.
ICASSP2024
Chunyu Qiang, Hao Li, Yixin Tian, Ruibo Fu, Tao Wang 0074, Longbiao Wang, Jianwu Dang 0001, 
Learning Speech Representation from Contrastive Token-Acoustic Pretraining.
ICASSP2024
Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang 0001, 
High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models.
TASLP2023
Yuqin Lin, Longbiao Wang, Yanbing Yang, Jianwu Dang 0001, 
CFDRN: A Cognition-Inspired Feature Decomposition and Recombination Network for Dysarthric Speech Recognition.
TASLP2023
Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, Helen Meng, 
Meta-Generalization for Domain-Invariant Speaker Verification.
ICASSP2023
Hui Chen, Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, 
Self-Supervised Audio-Visual Speaker Representation with Co-Meta Learning.
ICASSP2023
Zhongjie Li, Bin Zhao, Gaoyan Zhang, Jianwu Dang 0001, 
Brain Network Features Differentiate Intentions from Different Emotional Expressions of the Same Text.
ICASSP2023
Yuhao Liu, Cheng Gong, Longbiao Wang, Xixin Wu, Qiuyu Liu, Jianwu Dang 0001, 
VF-Taco2: Towards Fast and Lightweight Synthesis for Autoregressive Models with Variation Autoencoder and Feature Distillation.
ICASSP2023
Xiaohui Liu, Meng Liu, Longbiao Wang, Kong Aik Lee, Hanyi Zhang, Jianwu Dang 0001, 
Leveraging Positional-Related Local-Global Dependency for Synthetic Speech Detection.
ICASSP2023
Meng Liu, Kong Aik Lee, Longbiao Wang, Hanyi Zhang, Chang Zeng, Jianwu Dang 0001, 
Cross-Modal Audio-Visual Co-Learning for Text-Independent Speaker Verification.
ICASSP2023
Haoyu Lu, Nan Li, Tongtong Song, Longbiao Wang, Jianwu Dang 0001, Xiaobao Wang, Shiliang Zhang, 
Speech and Noise Dual-Stream Spectrogram Refine Network With Speech Distortion Loss For Robust Speech Recognition.
ICASSP2023
Hao Shi, Masato Mimura, Longbiao Wang, Jianwu Dang 0001, Tatsuya Kawahara, 
Time-Domain Speech Enhancement Assisted by Multi-Resolution Frequency Encoder and Decoder.
ICASSP2023
Yao Sun, Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, 
Noise-Disentanglement Metric Learning for Robust Speaker Verification.
Interspeech2023
Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang 0001, Chengyun Deng, Fei Wang, 
Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation.
Interspeech2023
Junjie Li, Meng Ge, Zexu Pan, Rui Cao, Longbiao Wang, Jianwu Dang 0001, Shiliang Zhang, 
Rethinking the Visual Cues in Audio-Visual Speaker Extraction.
Interspeech2023
Yuhang Li, Xiao Wei, Yuke Si, Longbiao Wang, Xiaobao Wang, Jianwu Dang 0001, 
Improving Zero-shot Cross-domain Slot Filling via Transformer-based Slot Semantics Fusion.
Interspeech2023
Zhongjie Li, Gaoyan Zhang, Longbiao Wang, Jianwu Dang 0001, 
Discrimination of the Different Intents Carried by the Same Text Through Integrating Multimodal Information.
TASLP2024
Yuchen Hu, Chen Chen 0075, Qiushi Zhu, Eng Siong Chng, 
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR.
TASLP2024
Linhui Sun, Shuo Yuan, Aifei Gong, Lei Ye, Eng Siong Chng, 
Dual-Branch Modeling Based on State-Space Model for Speech Enhancement.
ICASSP2024
Weiguang Chen, Tran The Anh, Xionghu Zhong, Eng Siong Chng, 
Enhancing Low-Latency Speaker Diarization with Spatial Dictionary Learning.
ICASSP2024
Dianwen Ng, Chong Zhang 0003, Ruixi Zhang, Yukun Ma, Fabian Ritter Gutierrez, Trung Hieu Nguyen 0001, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma 0001, 
Are Soft Prompts Good Zero-Shot Learners for Speech Recognition?
ICASSP2024
Duc-Tuan Truong, Ruijie Tao, Jia Qi Yip, Kong Aik Lee, Eng Siong Chng, 
Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification.
ICASSP2024
Jia Qi Yip, Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang 0003, Hao Wang 0199, Trung Hieu Nguyen 0001, Kun Zhou 0003, Dianwen Ng, Eng Siong Chng, Bin Ma 0001, 
SPGM: Prioritizing Local Features for Enhanced Speech Separation Performance.
ICASSP2024
Zizheng Zhang, Chen Chen 0075, Hsin-Hung Chen, Xiang Liu, Yuchen Hu, Eng Siong Chng, 
Noise-Aware Speech Separation with Contrastive Learning.
ICASSP2024
Heqing Zou, Meng Shen 0002, Yuchen Hu, Chen Chen 0075, Eng Siong Chng, Deepu Rajan, 
Cross-Modality and Within-Modality Regularization for Audio-Visual Deepfake Detection.
ICLR2024
Chen Chen 0075, Ruizhe Li 0001, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Engsiong Chng, Chao-Han Huck Yang, 
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition.
ICLR2024
Yuchen Hu, Chen Chen 0075, Chao-Han Huck Yang, Ruizhe Li 0001, Chao Zhang 0031, Pin-Yu Chen, Engsiong Chng, 
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition.
ICASSP2023
Chen Chen 0075, Yuchen Hu, Weiwei Weng, Eng Siong Chng, 
Metric-Oriented Speech Enhancement Using Diffusion Probabilistic Model.
ICASSP2023
Chen Chen 0075, Yuchen Hu, Heqing Zou, Linhui Sun, Eng Siong Chng, 
Unsupervised Noise Adaptation Using Data Simulation.
ICASSP2023
Yuchen Hu, Chen Chen 0075, Ruizhe Li 0001, Qiushi Zhu, Eng Siong Chng, 
Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition.
ICASSP2023
Yuchen Hu, Chen Chen 0075, Heqing Zou, Xionghu Zhong, Eng Siong Chng, 
Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation.
ICASSP2023
Dianwen Ng, Ruixi Zhang, Jia Qi Yip, Zhao Yang, Jinjie Ni, Chong Zhang 0003, Yukun Ma, Chongjia Ni, Eng Siong Chng, Bin Ma 0001, 
De'hubert: Disentangling Noise in a Self-Supervised Model for Robust Speech Recognition.
ICASSP2023
Dianwen Ng, Ruixi Zhang, Jia Qi Yip, Chong Zhang 0003, Yukun Ma, Trung Hieu Nguyen 0001, Chongjia Ni, Eng Siong Chng, Bin Ma 0001, 
Contrastive Speech Mixup for Low-Resource Keyword Spotting.
ICASSP2023
Shangeth Rajaa, Kriti Anandan, Swaraj Dalmia, Tarun Gupta, Eng Siong Chng, 
Improving Spoken Language Identification with Map-Mix.
ICASSP2023
Alexey Sholokhov, Nikita Kuzmin, Kong Aik Lee, Eng Siong Chng, 
Probabilistic Back-ends for Online Speaker Recognition and Clustering.
ICASSP2023
Yuhang Yang, Haihua Xu, Hao Huang 0009, Eng Siong Chng, Sheng Li 0010, 
Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition.
Interspeech2023
Chen Chen 0075, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng Siong Chng, 
A Neural State-Space Modeling Approach to Efficient Speech Separation.
TASLP2024
Xuenan Xu, Zeyu Xie, Mengyue Wu, Kai Yu 0004, 
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning.
ICASSP2024
Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen 0001, Kai Yu 0004, 
VoiceFlow: Efficient Text-To-Speech with Rectified Flow Matching.
ICASSP2024
Junjie Li, Yiwei Guo, Xie Chen 0001, Kai Yu 0004, 
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention.
ICASSP2024
Tao Liu, Chenpeng Du, Shuai Fan 0005, Feilong Chen, Kai Yu 0004, 
DiffDub: Person-Generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-Encoder.
ICASSP2024
Sen Liu, Yiwei Guo, Xie Chen 0001, Kai Yu 0004, 
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations.
ICASSP2024
Feiyu Shen, Yiwei Guo, Chenpeng Du, Xie Chen 0001, Kai Yu 0004, 
Acoustic BPE for Speech Generation with Discrete Tokens.
ICASSP2024
Zeyu Xie, Baihan Li, Xuenan Xu, Mengyue Wu, Kai Yu 0004, 
Enhancing Audio Generation Diversity with Visual Information.
ICASSP2024
Xuenan Xu, Xiaohang Xu 0004, Zeyu Xie, Pingyue Zhang, Mengyue Wu, Kai Yu 0004, 
A Detailed Audio-Text Data Simulation Pipeline Using Single-Event Sounds.
ICASSP2024
Hongshen Xu, Ruisheng Cao, Su Zhu, Sheng Jiang, Hanchong Zhang, Lu Chen 0002, Kai Yu 0004, 
A Birgat Model for Multi-Intent Spoken Language Understanding with Hierarchical Semantic Frames.
ICASSP2024
Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu 0004, Daniel Povey, Xie Chen 0001, 
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.
AAAI2024
Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen 0001, Shuai Wang 0016, Hui Zhang, Kai Yu 0004, 
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding.
TASLP2023
Chenpeng Du, Yiwei Guo, Xie Chen 0001, Kai Yu 0004, 
Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature.
TASLP2023
Wenbin Jiang, Kai Yu 0004, 
Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking.
ICASSP2023
Chenpeng Du, Yiwei Guo, Feiyu Shen, Kai Yu 0004, 
Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge.
ICASSP2023
Yiwei Guo, Chenpeng Du, Xie Chen 0001, Kai Yu 0004, 
Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance.
ICASSP2023
Guangwei Li, Xuenan Xu, Lingfeng Dai, Mengyue Wu, Kai Yu 0004, 
Diverse and Vivid Sound Generation from Text Descriptions.
ICASSP2023
Tao Liu, Zhengyang Chen, Yanmin Qian, Kai Yu 0004, 
Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for the MISP 2022 Challenge.
ICASSP2023
Zhijun Liu, Yiwei Guo, Kai Yu 0004, 
DiffVoice: Text-to-Speech with Latent Diffusion.
Interspeech2023
Wenbin Jiang, Fei Wen, Yifan Zhang, Kai Yu 0004, 
UnSE: Unsupervised Speech Enhancement Using Optimal Transport.
Interspeech2023
Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu 0004, Xie Chen 0001, 
Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation.
TASLP2024
Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda, 
Pretraining and Adaptation Techniques for Electrolaryngeal Speech Recognition.
TASLP2024
Rui Wang, Li Li 0063, Tomoki Toda, 
Dual-Channel Target Speaker Extraction Based on Conditional Variational Autoencoder and Directional Information.
ICASSP2024
Jiajun He, Xiaohan Shi, Xingfeng Li 0001, Tomoki Toda, 
MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction.
ICASSP2024
Tatsuya Komatsu, Yusuke Fujita, Kazuya Takeda, Tomoki Toda, 
Audio Difference Learning for Audio Captioning.
ICASSP2024
Yamato Ohtani, Takuma Okamoto, Tomoki Toda, Hisashi Kawai, 
FIRNet: Fundamental Frequency Controllable Fast Neural Vocoder With Trainable Finite Impulse Response Filter.
ICASSP2024
Takuma Okamoto, Yamato Ohtani, Tomoki Toda, Hisashi Kawai, 
Convnext-TTS And Convnext-VC: Convnext-Based Fast End-To-End Sequence-To-Sequence Text-To-Speech And Voice Conversion.
ICASSP2024
Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda, 
Electrolaryngeal Speech Intelligibility Enhancement through Robust Linguistic Encoders.
TASLP2023
Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Hisashi Kawai, 
Harmonic-Net: Fundamental Frequency and Speech Rate Controllable Fast Neural Vocoder.
TASLP2023
Chao Xie, Tomoki Toda, 
Noisy-to-Noisy Voice Conversion Under Variations of Noisy Condition.
TASLP2023
Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda, 
High-Fidelity and Pitch-Controllable Neural Vocoder Based on Unified Source-Filter Networks.
ICASSP2023
Takuya Fujimura, Tomoki Toda, 
Analysis Of Noisy-Target Training For Dnn-Based Speech Enhancement.
ICASSP2023
Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda, 
Low-Latency Electrolaryngeal Speech Enhancement Based on Fastspeech2-Based Voice Conversion and Self-Supervised Speech Representation.
ICASSP2023
Atsushi Miyashita, Tomoki Toda, 
Representation of Vocal Tract Length Transformation Based on Group Theory.
ICASSP2023
Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda, 
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition.
ICASSP2023
Ryuichi Yamamoto, Reo Yoneyama, Tomoki Toda, 
NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit.
ICASSP2023
Yusuke Yasuda, Tomoki Toda, 
Text-To-Speech Synthesis Based on Latent Variable Conversion Using Diffusion Probabilistic Model and Variational Autoencoder.
ICASSP2023
Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda, 
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder.
Interspeech2023
Yeonjong Choi, Chao Xie, Tomoki Toda, 
Reverberation-Controllable Voice Conversion Using Reverberation Time Estimator.
Interspeech2023
Cheng-Hung Hu, Yusuke Yasuda, Tomoki Toda, 
Preference-based training framework for automatic speech quality assessment using deep neural network.
Interspeech2023
Takuma Okamoto, Tomoki Toda, Hisashi Kawai, 
E2E-S2S-VC: End-To-End Sequence-To-Sequence Voice Conversion.
TASLP2024
Cheng Gong, Xin Wang 0037, Erica Cooper, Dan Wells, Longbiao Wang, Jianwu Dang 0001, Korin Richmond, Junichi Yamagishi, 
ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations.
TASLP2024
Michele Panariello, Natalia A. Tomashenko, Xin Wang 0037, Xiaoxiao Miao, Pierre Champion, Hubert Nourtel, Massimiliano Todisco, Nicholas W. D. Evans, Emmanuel Vincent 0001, Junichi Yamagishi, 
The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation.
ICASSP2024
Xin Wang 0037, Junichi Yamagishi, 
Can Large-Scale Vocoded Spoofed Data Improve Speech Spoofing Countermeasure with a Self-Supervised Front End?
ICASSP2024
Wanying Ge, Xin Wang 0037, Junichi Yamagishi, Massimiliano Todisco, Nicholas W. D. Evans, 
Spoofing Attack Augmentation: Can Differently-Trained Attack Models Improve Generalisation?
ICASSP2024
Xiaoxiao Miao, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Nicholas W. D. Evans, Massimiliano Todisco, Jean-François Bonastre, Mickael Rouvier, 
Synvox2: Towards A Privacy-Friendly Voxceleb2 Dataset.
TASLP2023
Xuechen Liu, Xin Wang 0037, Md. Sahidullah, Jose Patino 0001, Héctor Delgado, Tomi Kinnunen, Massimiliano Todisco, Junichi Yamagishi, Nicholas W. D. Evans, Andreas Nautsch, Kong Aik Lee, 
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild.
TASLP2023
Xiaoxiao Miao, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Natalia A. Tomashenko, 
Speaker Anonymization Using Orthogonal Householder Neural Network.
TASLP2023
Lin Zhang, Xin Wang 0037, Erica Cooper, Nicholas W. D. Evans, Junichi Yamagishi, 
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance.
ICASSP2023
Haoyu Li, Yun Liu, Junichi Yamagishi, 
Joint Noise Reduction and Listening Enhancement for Full-End Speech Enhancement.
ICASSP2023
Paul-Gauthier Noé, Xiaoxiao Miao, Xin Wang 0037, Junichi Yamagishi, Jean-François Bonastre, Driss Matrouf, 
Hiding Speaker's Sex in Speech Using Zero-Evidence Speaker Representation in an Analysis/Synthesis Pipeline.
ICASSP2023
Xuan Shi, Erica Cooper, Xin Wang 0037, Junichi Yamagishi, Shrikanth Narayanan, 
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural Midi-to-Audio Synthesis Systems?
ICASSP2023
Xin Wang 0037, Junichi Yamagishi, 
Spoofed Training Data for Speech Spoofing Countermeasure Can Be Efficiently Created Using Neural Vocoders.
Interspeech2023
Erica Cooper, Junichi Yamagishi, 
Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech.
Interspeech2023
Hieu-Thi Luong, Junichi Yamagishi, 
Controlling Multi-Class Human Vocalization Generation via a Simple Segment-based Labeling Scheme.
Interspeech2023
Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang 0037, Xuechen Liu, Md. Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas W. D. Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung, 
Towards Single Integrated Spoofing-aware Speaker Verification Embeddings.
Interspeech2023
Chang Zeng, Xin Wang 0037, Xiaoxiao Miao, Erica Cooper, Junichi Yamagishi, 
Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms.
Interspeech2023
Lin Zhang, Xin Wang 0037, Erica Cooper, Nicholas W. D. Evans, Junichi Yamagishi, 
Range-Based Equal Error Rate for Spoof Localization.
TASLP2022
Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi, 
Optimizing Tandem Speaker Verification and Anti-Spoofing Systems.
TASLP2022
Xuan Shi, Erica Cooper, Junichi Yamagishi, 
Use of Speaker Recognition Approaches for Learning and Evaluating Embedding Representations of Musical Instrument Sounds.
TASLP2022
Brij Mohan Lal Srivastava, Mohamed Maouche, Md. Sahidullah, Emmanuel Vincent 0001, Aurélien Bellet, Marc Tommasi, Natalia A. Tomashenko, Xin Wang 0037, Junichi Yamagishi, 
Privacy and Utility of X-Vector Based Speaker Anonymization.
TASLP2024
Tsubasa Ochiai, Kazuma Iwamoto, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance.
ICASSP2024
Takanori Ashihara, Marc Delcroix, Takafumi Moriya, Kohei Matsuura, Taichi Asami, Yusuke Ijima, 
What Do Self-Supervised Speech and Speaker Models Learn? New Findings from a Cross Model Layer-Wise Analysis.
ICASSP2024
William Chen, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe 0001, 
Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing.
ICASSP2024
Kenichi Fujita, Hiroshi Sato, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix, Takafumi Moriya, Yusuke Ijima, 
Noise-Robust Zero-Shot Text-to-Speech Synthesis Conditioned on Self-Supervised Speech-Representation Model with Adapters.
ICASSP2024
Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
How Does End-To-End Speech Recognition Training Impact Speech Enhancement Artifacts?
ICASSP2024
Dominik Klement, Mireia Díez, Federico Landini, Lukás Burget, Anna Silnova, Marc Delcroix, Naohiro Tawara, 
Discriminative Training of VBx Diarization.
ICASSP2024
Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Shoko Araki, Jan Cernocký, 
Target Speech Extraction with Pre-Trained Self-Supervised Learning Models.
TASLP2023
Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Yasunori Ohishi, Shoko Araki, 
SoundBeam: Target Sound Extraction Conditioned on Sound-Class Labels and Enrollment Clues for Increased Performance and Continuous Learning.
TASLP2023
Thilo von Neumann, Keisuke Kinoshita, Christoph Böddeker, Marc Delcroix, Reinhold Haeb-Umbach, 
Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation Criteria.
ICASSP2023
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan S. Sharma, Kohei Matsuura, Shinji Watanabe 0001, 
Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders.
ICASSP2023
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix, Ryo Masumura, 
Leveraging Large Text Corpora For End-To-End Speech Summarization.
ICASSP2023
Thilo von Neumann, Christoph Böddeker, Keisuke Kinoshita, Marc Delcroix, Reinhold Haeb-Umbach, 
On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems.
ICASSP2023
Atsunori Ogawa, Takafumi Moriya, Naoyuki Kamo, Naohiro Tawara, Marc Delcroix, 
Iterative Shallow Fusion of Backward Language Model for End-To-End Speech Recognition.
Interspeech2023
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix, Yukinori Honma, 
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Interspeech2023
Marc Delcroix, Naohiro Tawara, Mireia Díez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukás Burget, Shoko Araki, 
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization.
Interspeech2023
Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani, 
Target Speech Extraction with Conditional Diffusion Model.
Interspeech2023
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, 
Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization.
Interspeech2023
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami, 
Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data.
Interspeech2023
Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo, 
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss.
ICASSP2022
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe 0001, 
Integrating Multiple ASR Systems into NLP Backend with Attention Fusion.
ICASSP2024
Chowdam Venkata Thirumala Kumar, Tanuka Bhattacharjee, Seena Vengalil, Saraswati Nashi, Madassu Keerthipriya, Yamini Belur, Atchayaram Nalini, Prasanta Kumar Ghosh, 
Spectral Analysis of Vowels and Fricatives at Varied Levels of Dysarthria Severity for Amyotrophic Lateral Sclerosis.
ICASSP2023
Tanuka Bhattacharjee, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Prasanta Kumar Ghosh, 
Exploring the Role of Fricatives in Classifying Healthy Subjects and Patients with Amyotrophic Lateral Sclerosis and Parkinson's Disease.
ICASSP2023
Tanuka Bhattacharjee, Chowdam Venkata Thirumala Kumar, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Prasanta Kumar Ghosh, 
Static and Dynamic Source and Filter Cues for Classification of Amyotrophic Lateral Sclerosis Patients and Healthy Subjects.
ICASSP2023
Abhayjeet Singh, Amala Nagireddi, Deekshitha G, Jesuraja Bandekar, Roopa R., Sandhya Badiger, Sathvik Udupa, Prasanta Kumar Ghosh, Hema A. Murthy, Heiga Zen, Pranaw Kumar, Kamal Kant, Amol Bole, Bira Chandra Singh, Keiichi Tokuda, Mark Hasegawa-Johnson, Philipp Olbrich, 
Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech.
ICASSP2023
Sathvik Udupa, Prasanta Kumar Ghosh, 
Real-Time MRI Video Synthesis from Time Aligned Phonemes with Sequence-to-Sequence Networks.
ICASSP2023
Sathvik Udupa, C. Siddarth, Prasanta Kumar Ghosh, 
Improved Acoustic-to-Articulatory Inversion Using Representations from Pretrained Self-Supervised Learning Models.
Interspeech2023
Jesuraja Bandekar, Sathvik Udupa, Prasanta Kumar Ghosh, 
Exploring a classification approach using quantised articulatory movements for acoustic to articulatory inversion.
Interspeech2023
Varun Belagali, M. V. Achuth Rao, Prasanta Kumar Ghosh, 
Weakly supervised glottis segmentation in high-speed videoendoscopy using bounding box labels.
Interspeech2023
Tanuka Bhattacharjee, Anjali Jayakumar, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Prasanta Kumar Ghosh, 
Transfer Learning to Aid Dysarthria Severity Classification for Patients with Amyotrophic Lateral Sclerosis.
Interspeech2023
Siddarth Chandrasekar, Arvind Ramesh, Tilak Purohit, Prasanta Kumar Ghosh, 
A Study on the Importance of Formant Transitions for Stop-Consonant Classification in VCV Sequence.
Interspeech2023
Shelly Jain, Priyanshi Pal, Anil Kumar Vuppala, Prasanta Kumar Ghosh, Chiranjeevi Yarra, 
An Investigation of Indian Native Language Phonemic Influences on L2 English Pronunciations.
Interspeech2023
Chowdam Venkata Thirumala Kumar, Tanuka Bhattacharjee, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Prasanta Kumar Ghosh, 
Classification of Multi-class Vowels and Fricatives From Patients Having Amyotrophic Lateral Sclerosis with Varied Levels of Dysarthria Severity.
Interspeech2023
Mohammad Shaique Solanki, Ashutosh Bharadwaj, Jeevan Kylash, Prasanta Kumar Ghosh, 
Do Vocal Breath Sounds Encode Gender Cues for Automatic Gender Classification?
SpeechComm2022
Chiranjeevi Yarra, Prasanta Kumar Ghosh, 
Automatic syllable stress detection under non-parallel label and data condition.
ICASSP2022
Aravind Illa, Aanish Nair, Prasanta Kumar Ghosh, 
The impact of cross language on acoustic-to-articulatory inversion and its influence on articulatory speech synthesis.
ICASSP2022
Abinay Reddy Naini, Bhavuk Singhal, Prasanta Kumar Ghosh, 
Dual Attention Pooling Network for Recording Device Classification Using Neutral and Whispered Speech.
ICASSP2022
Anwesha Roy, Varun Belagali, Prasanta Kumar Ghosh, 
An Error Correction Scheme for Improved Air-Tissue Boundary in Real-Time MRI Video for Speech Production.
Interspeech2022
Anish Bhanushali, Grant Bridgman, Deekshitha G, Prasanta Kumar Ghosh, Pratik Kumar, Saurabh Kumar, Adithya Raj Kolladath, Nithya Ravi, Aaditeshwar Seth, Ashish Seth, Abhayjeet Singh, Vrunda N. Sukhadia, Srinivasan Umesh, Sathvik Udupa, Lodagala V. S. V. Durga Prasad, 
Gram Vaani ASR Challenge on spontaneous telephone speech recordings in regional variations of Hindi.
Interspeech2022
Anwesha Roy, Varun Belagali, Prasanta Kumar Ghosh, 
Air tissue boundary segmentation using regional loss in real-time Magnetic Resonance Imaging video for speech production.
Interspeech2022
C. Siddarth, Sathvik Udupa, Prasanta Kumar Ghosh, 
Watch Me Speak: 2D Visualization of Human Mouth during Speech.
TASLP2024
Hang Chen, Qing Wang 0008, Jun Du, Bao-Cai Yin, Jia Pan, Chin-Hui Lee 0001, 
Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition.
TASLP2024
Zilu Guo, Qing Wang 0008, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui Lee 0001, 
A Variance-Preserving Interpolation Approach for Diffusion Models With Applications to Single Channel Speech Enhancement and Recognition.
ICASSP2024
Feng Ma, Yanhui Tu, Maokui He, Ruoyu Wang 0029, Shutong Niu, Lei Sun 0010, Zhongfu Ye, Jun Du, Jia Pan, Chin-Hui Lee 0001, 
A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition.
ICASSP2024
Haotian Wang, Jun Du, Yusheng Dai, Chin-Hui Lee 0001, Yuling Ren, Yu Liu, 
Improving Multi-Modal Emotion Recognition Using Entropy-Based Fusion and Pruning-Based Network Architecture Optimization.
ICASSP2024
Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang 0029, Hongbo Lan, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao, 
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
ICASSP2024
Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang 0029, Yanyan Yue, Shuangqing Qian, Shilong Wu, Jun Du, Chin-Hui Lee 0001, 
Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture.
ICASSP2024
Hao Yen, Sabato Marco Siniscalchi, Chin-Hui Lee 0001, 
Boosting End-to-End Multilingual Phoneme Recognition Through Exploiting Universal Speech Attributes Constraints.
SpeechComm2023
Shi Cheng, Jun Du, Shutong Niu, Alejandrina Cristià, Xin Wang 0037, Qing Wang 0008, Chin-Hui Lee 0001, 
Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions.
SpeechComm2023
Li Chai 0002, Hang Chen, Jun Du, Qing-Feng Liu, Chin-Hui Lee 0001, 
Space-and-speaker-aware acoustic modeling with effective data augmentation for recognition of multi-array conversational speech.
TASLP2023
Mao-Kui He, Jun Du, Qing-Feng Liu, Chin-Hui Lee 0001, 
ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding.
TASLP2023
Shutong Niu, Jun Du, Lei Sun 0010, Yu Hu 0003, Chin-Hui Lee 0001, 
QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization.
ICASSP2023
Hang Chen, Shilong Wu, Yusheng Dai, Zhe Wang, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge.
ICASSP2023
Shutong Niu, Jun Du, Qing Wang 0008, Li Chai 0002, Huaxin Wu, Zhaoxu Nian, Lei Sun 0010, Yi Fang, Jia Pan, Chin-Hui Lee 0001, 
An Experimental Study on Sound Event Localization and Detection Under Realistic Testing Conditions.
ICASSP2023
Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.
ICASSP2023
Chao-Han Huck Yang, Bo Li 0028, Yu Zhang 0033, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee 0001, 
A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition.
ICASSP2023
Chenyue Zhang, Hang Chen, Jun Du, Bao-Cai Yin, Jia Pan, Chin-Hui Lee 0001, 
Incorporating Visual Information Reconstruction into Progressive Learning for Optimizing audio-visual Speech Enhancement.
Interspeech2023
Zilu Guo, Jun Du, Chin-Hui Lee 0001, Yu Gao, Wenbin Zhang, 
Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement.
Interspeech2023
Pin-Jui Ku, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee 0001, 
A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models.
Interspeech2023
Shutong Niu, Jun Du, Maokui He, Chin-Hui Lee 0001, Baoxiang Li, Jiakui Li, 
Unsupervised Adaptation with Quality-Aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization.
Interspeech2023
Haotian Wang, Jun Du, Hengshun Zhou, Chin-Hui Lee 0001, Yuling Ren, Jiangjiang Zhao, 
A Multiple-Teacher Pruning Based Self-Distillation (MT-PSD) Approach to Model Compression for Audio-Visual Wake Word Spotting.
TASLP2024
Vinay Kothapally, John H. L. Hansen, 
Monaural Speech Dereverberation Using Deformable Convolutional Networks.
TASLP2024
Nursadul Mamun, John H. L. Hansen, 
Speech Enhancement for Cochlear Implant Recipients Using Deep Complex Convolution Transformer With Frequency Transformation.
ICASSP2024
John H. L. Hansen, Aditya Joglekar, Meena M. Chandra Shekar, Szu-Jui Chen, Xi Liu, 
Fearless Steps Apollo: Team Communications Based Community Resource Development for Science, Technology, Education, and Historical Preservation.
ICASSP2024
Taylor Lawson, John H. L. Hansen, 
Situational Signal Processing with Ecological Momentary Assessment: Leveraging Environmental Context for Cochlear Implant Users.
ICASSP2024
Xi Liu, Szu-Jui Chen, John H. L. Hansen, 
Dual-Path Minimum-Phase and All-Pass Decomposition Network for Single Channel Speech Dereverberation.
ICASSP2024
Mufan Sang, John H. L. Hansen, 
Efficient Adapter Tuning of Pre-Trained Speech Models for Automatic Speaker Verification.
ICASSP2024
Meena M. Chandra Shekar, John H. L. Hansen, 
Apollo's Unheard Voices: Graph Attention Networks for Speaker Diarization and Clustering for Fearless Steps Apollo Collection.
SpeechComm2023
Midia Yousefi, John H. L. Hansen, 
Single-channel speech separation using soft-minimum permutation invariant training.
TASLP2023
Shahram Ghorbani, John H. L. Hansen, 
Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech.
TASLP2023
Wei Xia, John H. L. Hansen, 
Attention and DCT Based Global Context Modeling for Text-Independent Speaker Recognition.
ICASSP2023
Iván López-Espejo, Ram C. M. C. Shekar, Zheng-Hua Tan, Jesper Jensen 0001, John H. L. Hansen, 
Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting.
ICASSP2023
Mufan Sang, Yong Zhao 0008, Gang Liu 0001, John H. L. Hansen, Jian Wu 0027, 
Improving Transformer-Based Networks with Locality for Automatic Speaker Verification.
Interspeech2023
Nursadul Mamun, John H. L. Hansen, 
CFTNet: Complex-valued Frequency Transformation Network for Speech Enhancement.
Interspeech2023
Meena M. Chandra Shekar, John H. L. Hansen, 
Speaker Tracking using Graph Attention Networks with Varying Duration Utterances across Multi-Channel Naturalistic Data: Fearless Steps Apollo-11 Audio Corpus.
Interspeech2023
Ram C. M. C. Shekar, Mu Yang, Kevin Hirschi, Stephen D. Looney, Okim Kang, John H. L. Hansen, 
Assessment of Non-Native Speech Intelligibility using Wav2vec2-based Mispronunciation Detection and Multi-level Goodness of Pronunciation Transformer.
Interspeech2023
Jiamin Xie, John H. L. Hansen, 
MixRep: Hidden Representation Mixup for Low-Resource Speech Recognition.
Interspeech2023
Mu Yang, Ram C. M. C. Shekar, Okim Kang, John H. L. Hansen, 
What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model.
SpeechComm2022
Rasa Lileikyte, Dwight Irvin, John H. L. Hansen, 
Assessing child communication engagement and statistical speech patterns for American English via speech recognition in naturalistic active learning spaces.
TASLP2022
Vinay Kothapally, John H. L. Hansen, 
SkipConvGAN: Monaural Speech Dereverberation Using Generative Adversarial Networks via Complex Time-Frequency Masking.
TASLP2022
Zhenyu Wang, John H. L. Hansen, 
Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition.
TASLP2024
Syu-Siang Wang, Jia-Yang Chen, Bo-Ren Bai, Shih-Hau Fang, Yu Tsao 0001, 
Unsupervised Face-Masked Speech Enhancement Using Generative Adversarial Networks With Human-in-the-Loop Assessment Metrics.
ICASSP2024
Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-Based ASR.
ICASSP2024
Yuan Tseng, Layne Berry, Yiting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Poyao Huang 0001, Chun-Mao Lai, Shang-Wen Li 0001, David Harwath, Yu Tsao 0001, Abdelrahman Mohamed, Chi-Luen Feng, Hung-Yi Lee, 
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.
ICASSP2024
Haibin Wu, Heng-Cheng Kuo, Yu Tsao 0001, Hung-Yi Lee, 
Scalable Ensemble-Based Detection Method Against Adversarial Attacks For Speaker Verification.
ICASSP2024
Ryandhimas E. Zezario, Bo-Ren Brian Bai, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001, 
Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model.
ICLR2024
Szu-Wei Fu, Kuo-Hsuan Hung, Yu Tsao 0001, Yu-Chiang Frank Wang, 
Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech.
TASLP2023
Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, Ching-Feng Liu, Jeih-weih Hung, Shinji Watanabe 0001, Yu Tsao 0001, 
Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information.
TASLP2023
Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen 0011, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001, 
Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features.
ICASSP2023
Chan-Jan Hsu, Ho-Lam Chung, Hung-Yi Lee, Yu Tsao 0001, 
T5lephone: Bridging Speech and Text Self-Supervised Models for Spoken Language Understanding Via Phoneme Level T5.
ICASSP2023
Hsin-Yi Lin, Huan-Hsin Tseng, Yu Tsao 0001, 
On the Robustness of Non-Intrusive Speech Quality Model by Adversarial Examples.
Interspeech2023
Hsin-Hao Chen 0006, Yung-Lun Chien, Ming-Chi Yen, Shu-Wei Tsai, Tai-Shih Chi, Hsin-Min Wang, Yu Tsao 0001, 
Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features.
Interspeech2023
Li-Wei Chen, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao 0001, Hsin-Min Wang, 
A Training and Inference Strategy Using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech.
Interspeech2023
Yung-Lun Chien, Hsin-Hao Chen 0006, Ming-Chi Yen, Shu-Wei Tsai, Hsin-Min Wang, Yu Tsao 0001, Tai-Shih Chi, 
Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion.
Interspeech2023
Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao 0001, 
Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition.
ICLR2023
Chi-Chang Lee, Yu Tsao 0001, Hsin-Min Wang, Chu-Song Chen, 
D4AM: A General Denoising Framework for Downstream Acoustic Models.
TASLP2022
Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao 0001, 
Improved Lite Audio-Visual Speech Enhancement.
TASLP2022
Yu-Chen Lin, Cheng Yu, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao 0001, Tei-Wei Kuo, 
SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points.
ICASSP2022
Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao 0001, 
MetricGAN-U: Unsupervised Speech Enhancement/ Dereverberation Based Only on Noisy/ Reverberated Speech.
ICASSP2022
Yu-Chen Lin, Tsun-An Hsieh, Kuo-Hsuan Hung, Cheng Yu, Harinath Garudadri, Yu Tsao 0001, Tei-Wei Kuo, 
Speech Recovery For Real-World Self-Powered Intermittent Devices.
ICASSP2022
Guan-Ting Lin, Chan-Jan Hsu, Da-Rong Liu, Hung-Yi Lee, Yu Tsao 0001, 
Analyzing The Robustness of Unsupervised Speech Recognition.
TASLP2024
Chenfeng Miao, Qingying Zhu, Minchuan Chen, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion.
ICASSP2024
Yimin Deng, Huaizhen Tang, Xulong Zhang 0001, Ning Cheng 0001, Jing Xiao 0006, Jianzong Wang, 
Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval.
ICASSP2024
Haobin Tang, Xulong Zhang 0001, Ning Cheng 0001, Jing Xiao 0006, Jianzong Wang, 
ED-TTS: Multi-Scale Emotion Modeling Using Cross-Domain Emotion Diarization for Emotional Speech Synthesis.
ICASSP2024
Zeyu Yang, Minchuan Chen, Yanping Li, Wei Hu, Shaojun Wang, Jing Xiao 0006, Zijian Li, 
ESVC: Combining Adaptive Style Fusion and Multi-Level Feature Disentanglement for Expressive Singing Voice Conversion.
ICASSP2024
Yong Zhang, Hanzhang Li, Zhitao Li, Ning Cheng 0001, Ming Li, Jing Xiao 0006, Jianzong Wang, 
Leveraging Biases in Large Language Models: "bias-kNN" for Effective Few-Shot Learning.
ICASSP2024
Ziyang Zhuang, Kun Zou, Chenfeng Miao, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, Jing Xiao 0006, 
Improving Attention-Based End-to-End Speech Recognition by Monotonic Alignment Attention Matrix Reconstruction.
ICML2024
Chenfeng Miao, Qingying Zhu, Minchuan Chen, Wei Hu, Zijian Li, Shaojun Wang, Jing Xiao 0006, 
DFlow: A Generative Model Combining Denoising AutoEncoder and Normalizing Flow for High Fidelity Waveform Generation.
ICASSP2023
Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Xiaoyang Qu, Jing Xiao 0006, 
Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification.
ICASSP2023
Ganghui Ru, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Improving Music Genre Classification from multi-modal Properties of Music and Genre Correlations Perspective.
ICASSP2023
Huaizhen Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Learning Speech Representations with Flexible Hidden Feature Dimensions.
ICASSP2023
Huaizhen Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
VQ-CL: Learning Disentangled Speech Representations with Contrastive Learning and Vector Quantization.
ICASSP2023
Haobin Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis.
ICASSP2023
Xulong Zhang 0001, Haobin Tang, Jianzong Wang, Ning Cheng 0001, Jian Luo, Jing Xiao 0006, 
Dynamic Alignment Mask CTC: Improved Mask CTC With Aligned Cross Entropy.
ICASSP2023
Kexin Zhu, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Improving EEG-based Emotion Recognition by Fusing Time-Frequency and Spatial Representations.
Interspeech2023
Minchuan Chen, Chenfeng Miao, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
Exploring multi-task learning and data augmentation in dementia detection with self-supervised pretrained models.
Interspeech2023
Jiaxin Fan, Yong Zhang, Hanzhang Li, Jianzong Wang, Zhitao Li, Sheng Ouyang, Ning Cheng 0001, Jing Xiao 0006, 
Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism.
Interspeech2023
Zuheng Kang, Jianzong Wang, Junqing Peng, Jing Xiao 0006, 
SVVAD: Personal Voice Activity Detection for Speaker Verification.
Interspeech2023
Yifu Sun, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Kaiyu Hu, Jing Xiao 0006, 
Investigation of Music Emotion Recognition Based on Segmented Semi-Supervised Learning.
Interspeech2023
Fengyun Tan, Chaofeng Feng, Tao Wei, Shuai Gong, Jinqiang Leng, Wei Chu, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
Improving End-to-End Modeling For Mandarin-English Code-Switching Using Lightweight Switch-Routing Mixture-of-Experts.
Interspeech2023
Haobin Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis.
ICASSP2024
Tiantian Feng, Rajat Hebbar, Shrikanth Narayanan, 
TRUST-SER: On The Trustworthiness Of Fine-Tuning Pre-Trained Speech Embeddings For Speech Emotion Recognition.
ICASSP2024
Tiantian Feng, Shrikanth Narayanan, 
Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting.
ICASSP2024
Yoonsoo Nam, Adam Lehavi, Daniel Yang, Digbalay Bose, Swabha Swayamdipta, Shrikanth Narayanan, 
Does Video Summarization Require Videos? Quantifying the Effectiveness of Language in Video Summarization.
ICASSP2024
Shanti Stewart, Kleanthis Avramidis, Tiantian Feng, Shrikanth Narayanan, 
Emotion-Aligned Contrastive Learning Between Images and Music.
ICASSP2024
Anfeng Xu, Kevin Huang, Tiantian Feng, Helen Tager-Flusberg, Shrikanth Narayanan, 
Audio-Visual Child-Adult Speaker Classification in Dyadic Interactions.
ICASSP2023
Nikolaos Antoniou, Athanasios Katsamanis, Theodoros Giannakopoulos, Shrikanth Narayanan, 
Designing and Evaluating Speech Emotion Recognition Systems: A Reality Check Case Study with IEMOCAP.
ICASSP2023
Digbalay Bose, Rajat Hebbar, Krishna Somandepalli, Shrikanth Narayanan, 
Contextually-Rich Human Affect Perception Using Multimodal Scene Information.
ICASSP2023
Georgios Chochlakis, Gireesh Mahajan, Sabyasachee Baruah, Keith Burghardt, Kristina Lerman, Shrikanth Narayanan, 
Using Emotion Embeddings to Transfer Knowledge between Emotions, Languages, and Annotation Formats.
ICASSP2023
Rajat Hebbar, Digbalay Bose, Krishna Somandepalli, Veena Vijai, Shrikanth Narayanan, 
A Dataset for Audio-Visual Sound Event Detection in Movies.
ICASSP2023
Xuan Shi, Erica Cooper, Xin Wang 0037, Junichi Yamagishi, Shrikanth Narayanan, 
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural Midi-to-Audio Synthesis Systems?
ICASSP2023
Tuo Zhang, Tiantian Feng, Samiul Alam, Sunwoo Lee, Mi Zhang 0002, Shrikanth S. Narayanan, Salman Avestimehr, 
FedAudio: A Federated Learning Benchmark for Audio Tasks.
Interspeech2023
Reed Blaylock, Shrikanth Narayanan, 
Beatboxing Kick Drum Kinematics.
Interspeech2023
Rimita Lahiri, Tiantian Feng, Rajat Hebbar, Catherine Lord, So Hyun Kim, Shrikanth Narayanan, 
Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism.
Interspeech2023
Thomas Melistas, Lefteris Kapelonis, Nikolaos Antoniou, Petros Mitseas, Dimitris Sgouropoulos, Theodoros Giannakopoulos, Athanasios Katsamanis, Shrikanth Narayanan, 
Cross-Lingual Features for Alzheimer's Dementia Detection from Speech.
Interspeech2023
Shrikanth Narayanan, 
Bridging Speech Science and Technology - Now and Into the Future.
Interspeech2023
Anfeng Xu, Rajat Hebbar, Rimita Lahiri, Tiantian Feng, Lindsay Butler, Lue Shen, Helen Tager-Flusberg, Shrikanth Narayanan, 
Understanding Spoken Language Development of Children with ASD Using Pre-trained Speech Embeddings.
ICASSP2022
Tiantian Feng, Hanieh Hashemi, Murali Annavaram, Shrikanth S. Narayanan, 
Enhancing Privacy Through Domain Adaptive Noise Injection For Speech Emotion Recognition.
Interspeech2022
Tiantian Feng, Shrikanth Narayanan, 
Semi-FedSER: Semi-supervised Learning for Speech Emotion Recognition On Federated Learning using Multiview Pseudo-Labeling.
Interspeech2022
Tiantian Feng, Raghuveer Peri, Shrikanth Narayanan, 
User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition on Federated Learning.
Interspeech2022
Nikolaos Flemotomos, Shrikanth Narayanan, 
Multimodal Clustering with Role Induced Constraints for Speaker Diarization.