A list of researchers in the area of speech ordered by the number of relevant publications, for the purpose of identifying potential academic supervisors.
Report exported at 2024-10-16 04:11:58, see here for how it is created.
Export parameters: --year_start 2019 --year_end 2024 --year_shift 1 --author_start_year 1900 --exclude_venue SSW,ASRU,IWSLT,SLT --n_pubs 20 --rank_start 0 --rank_end 200 --output speech_rankings.html
TASLP2024
Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe 0001, 
End-to-End Speech Recognition: A Survey.
TASLP2024
Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe 0001, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis.
TASLP2024
Shu-Wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li 0001, Abdelrahman Mohamed, Shinji Watanabe 0001, Hung-yi Lee, 
A Large-Scale Evaluation of Speech Foundation Models.
ICASSP2024
Siddhant Arora, George Saon, Shinji Watanabe 0001, Brian Kingsbury, 
Semi-Autoregressive Streaming ASR with Label Context.
ICASSP2024
Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe 0001, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang, 
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
ICASSP2024
William Chen, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe 0001, 
Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing.
ICASSP2024
Kwanghee Choi, Jee-Weon Jung, Shinji Watanabe 0001, 
Understanding Probe Behaviors Through Variational Bounds of Mutual Information.
ICASSP2024
Samuele Cornell, Jee-Weon Jung, Shinji Watanabe 0001, Stefano Squartini, 
One Model to Rule Them All ? Towards End-to-End Joint Speaker Diarization and Speech Recognition.
ICASSP2024
Ruizhe Huang, Xiaohui Zhang 0007, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe 0001, Daniel Povey, Sanjeev Khudanpur, 
Less Peaky and More Accurate CTC Forced Alignment by Label Priors.
ICASSP2024
Chien-Yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan S. Sharma, Shinji Watanabe 0001, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee, 
Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech.
ICASSP2024
Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization.
ICASSP2024
Amir Hussein, Dorsa Zeinali, Ondrej Klejch, Matthew Wiesner, Brian Yan, Shammur Absar Chowdhury, Ahmed Ali 0002, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Speech Collage: Code-Switched Audio Generation by Collaging Monolingual Corpora.
ICASSP2024
Jee-Weon Jung, Roshan S. Sharma, William Chen, Bhiksha Raj, Shinji Watanabe 0001, 
AugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language Models.
ICASSP2024
Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe 0001, Yong Man Ro, 
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens.
ICASSP2024
Doyeop Kwak, Jaemin Jung, Kihyun Nam, Youngjoon Jang, Jee-Weon Jung, Shinji Watanabe 0001, Joon Son Chung, 
VoxMM: Rich Transcription of Conversations in the Wild.
ICASSP2024
Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhongqiu Wang, Shinji Watanabe 0001, 
Boosting Unknown-Number Speaker Separation with Transformer Decoder-Based Attractor.
ICASSP2024
Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji Watanabe 0001, 
Hubertopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model.
ICASSP2024
Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-Weon Jung, Xuankai Chang, Shinji Watanabe 0001, 
VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks.
ICASSP2024
Salvador Medina, Sarah L. Taylor, Carsten Stoll, Gareth Edwards, Alex Hauptmann 0001, Shinji Watanabe 0001, Iain A. Matthews, 
PhISANet: Phonetically Informed Speech Animation Network.
ICASSP2024
Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe 0001, Karen Livescu, 
Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models.
TASLP2024
Shujie Hu, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Yi Wang, Mingyu Cui, Tianzi Wang, Helen Meng, Xunying Liu, 
Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition.
TASLP2024
Jingbei Li, Sipan Li, Ping Chen, Luwen Zhang, Yi Meng, Zhiyong Wu 0001, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang 0002, 
Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing.
TASLP2024
Dongchao Yang, Songxiang Liu, Rongjie Huang, Chao Weng, Helen Meng, 
InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt.
ICASSP2024
Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction.
ICASSP2024
Xueyuan Chen, Xi Wang 0016, Shaofei Zhang, Lei He 0005, Zhiyong Wu 0001, Xixin Wu, Helen Meng, 
Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis.
ICASSP2024
Qiaochu Huang, Xu He, Boshi Tang, Haolin Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu 0001, Haozhi Huang 0004, Helen Meng, 
Enhancing Expressiveness in Dance Generation Via Integrating Frequency and Music Style Information.
ICASSP2024
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Dan Luo, Zhiyong Wu 0001, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han 0001, Helen Meng, 
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.
ICASSP2024
Zhe Li, Man-Wai Mak, Helen Mei-Ling Meng, 
Dual Parameter-Efficient Fine-Tuning for Speaker Representation Via Speaker Prompt Tuning and Adapters.
ICASSP2024
Zhiwei Lin, Jun Chen 0024, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu 0001, Helen Meng, 
Multi-View Midivae: Fusing Track- and Bar-View Representations for Long Multi-Track Symbolic Music Generation.
ICASSP2024
Hui Lu, Xixin Wu, Haohan Guo, Songxiang Liu, Zhiyong Wu 0001, Helen Meng, 
Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.
ICASSP2024
Binzhu Sha, Xu Li 0015, Zhiyong Wu 0001, Ying Shan, Helen Meng, 
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion.
ICASSP2024
Yuejiao Wang, Xixin Wu, Disong Wang, Lingwei Meng, Helen Meng, 
UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization.
ICASSP2024
Haiwei Xue, Sicheng Yang, Zhensong Zhang, Zhiyong Wu 0001, Minglei Li 0001, Zonghong Dai, Helen Meng, 
Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models.
ICML2024
Dongchao Yang, Jinchuan Tian, Xu Tan 0003, Rongjie Huang, Songxiang Liu, Haohan Guo, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian 0002, Zhou Zhao, Xixin Wu, Helen M. Meng, 
UniAudio: Towards Universal Audio Generation with Large Language Models.
TASLP2023
Haohan Guo, Fenglong Xie, Xixin Wu, Frank K. Soong, Helen Meng, 
MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS.
TASLP2023
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Zhiyong Wu 0001, Xixin Wu, Shiyin Kang, Helen Meng, 
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.
TASLP2023
Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu, 
Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition.
TASLP2023
Xixin Wu, Hui Lu, Kun Li 0003, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms.
TASLP2023
Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, Helen Meng, 
Meta-Generalization for Domain-Invariant Speaker Verification.
ICASSP2023
Jun Chen 0024, Wei Rao, Zilin Wang, Jiuxin Lin, Zhiyong Wu 0001, Yannan Wang, Shidong Shang, Helen Meng, 
Inter-Subnet: Speech Enhancement with Subband Interaction.
SpeechComm2024
Shuai Wang 0016, Zhengyang Chen, Bing Han, Hongji Wang, Chengdong Liang, Binbin Zhang, Xu Xiang, Wen Ding, Johan Rohdin, Anna Silnova, Yanmin Qian, Haizhou Li 0001, 
Advancing speaker embedding learning: Wespeaker toolkit for research and production.
TASLP2024
Lei Liu, Li Liu 0036, Haizhou Li 0001, 
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition.
TASLP2024
Tianchi Liu 0004, Kong Aik Lee, Qiongqiong Wang, Haizhou Li 0001, 
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification.
TASLP2024
Rui Liu 0008, Berrak Sisman, Guanglai Gao, Haizhou Li 0001, 
Controllable Accented Text-to-Speech Synthesis With Fine and Coarse-Grained Intensity Rendering.
TASLP2024
Congcong Sun, Hui Tian 0002, Peng Tian, Haizhou Li 0001, Zhenxing Qian, 
Multi-Agent Deep Learning for the Detection of Multiple Speech Steganography Methods.
TASLP2024
Wupeng Wang, Zexu Pan, Xinke Li, Shuai Wang 0016, Haizhou Li 0001, 
Speech Separation With Pretrained Frontend to Minimize Domain Mismatch.
TASLP2024
Koichiro Yoshino, Yun-Nung Chen, Paul A. Crook, Satwik Kottur, Jinchao Li, Behnam Hedayatnia, Seungwhan Moon, Zhengcong Fei, Zekang Li, Jinchao Zhang, Yang Feng 0004, Jie Zhou 0016, Seokhwan Kim, Yang Liu 0004, Di Jin, Alexandros Papangelis, Karthik Gopalakrishnan 0001, Dilek Hakkani-Tur, Babak Damavandi, Alborz Geramifard, Chiori Hori, Ankit Shah, Chen Zhang 0020, Haizhou Li 0001, João Sedoc, Luis F. D'Haro, Rafael E. Banchs, Alexander Rudnicky, 
Overview of the Tenth Dialog System Technology Challenge: DSTC10.
TASLP2024
Mingyang Zhang 0003, Yi Zhou 0020, Yi Ren 0006, Chen Zhang 0020, Xiang Yin 0006, Haizhou Li 0001, 
RefXVC: Cross-Lingual Voice Conversion With Enhanced Reference Leveraging.
TASLP2024
Xuehao Zhou, Mingyang Zhang 0003, Yi Zhou 0020, Zhizheng Wu 0001, Haizhou Li 0001, 
Accented Text-to-Speech Synthesis With Limited Data.
ICASSP2024
Yu Chen, Xinyuan Qian, Zexu Pan, Kainan Chen, Haizhou Li 0001, 
LOCSELECT: Target Speaker Localization with an Auditory Selective Hearing Mechanism.
ICASSP2024
Sho Inoue, Kun Zhou 0003, Shuai Wang 0016, Haizhou Li 0001, 
Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis.
ICASSP2024
Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li 0001, 
Prompt-Driven Target Speech Diarization.
ICASSP2024
Junjie Li, Ruijie Tao, Zexu Pan, Meng Ge, Shuai Wang 0016, Haizhou Li 0001, 
Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-Talker Speech.
ICASSP2024
Yi Ma, Kong Aik Lee, Ville Hautamäki, Meng Ge, Haizhou Li 0001, 
Gradient Weighting for Speaker Verification in Extremely Low Signal-to-Noise Ratio.
ICASSP2024
Zeyang Song, Jibin Wu, Malu Zhang, Mike Zheng Shou, Haizhou Li 0001, 
Spiking-Leaf: A Learnable Auditory Front-End for Spiking Neural Networks.
ICASSP2024
Shuai Wang 0016, Qibing Bai, Qi Liu 0018, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li 0001, 
Leveraging in-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition.
ICASSP2024
Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li 0001, 
SVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks.
AAAI2024
Rui Liu 0008, Yifan Hu, Yi Ren 0006, Xiang Yin 0006, Haizhou Li 0001, 
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling.
AAAI2024
Jiadong Wang, Zexu Pan, Malu Zhang, Robby T. Tan, Haizhou Li 0001, 
Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition.
SpeechComm2023
Buddhi Wickramasinghe, Eliathamby Ambikairajah, Vidhyasaharan Sethu, Julien Epps, Haizhou Li 0001, Ting Dang, 
DNN controlled adaptive front-end for replay attack detection systems.
SpeechComm2024
Li Zhang 0106, Ning Jiang, Qing Wang 0039, Yue Li, Quan Lu, Lei Xie 0001, 
Whisper-SV: Adapting Whisper for low-data-resource speaker verification.
TASLP2024
Tao Li, Zhichao Wang 0002, Xinfa Zhu, Jian Cong, Qiao Tian, Yuping Wang, Lei Xie 0001, 
U-Style: Cascading U-Nets With Multi-Level Speaker and Style Modeling for Zero-Shot Voice Cloning.
TASLP2024
Qijie Shao, Pengcheng Guo, Jinghao Yan, Pengfei Hu 0004, Lei Xie 0001, 
Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition.
TASLP2024
Zhichao Wang 0002, Liumeng Xue, Qiuqiang Kong, Lei Xie 0001, Yuanzhe Chen, Qiao Tian, Yuping Wang, 
Multi-Level Temporal-Channel Speaker Retrieval for Zero-Shot Voice Conversion.
TASLP2024
Kun Wei, Bei Li, Hang Lv 0001, Quan Lu, Ning Jiang, Lei Xie 0001, 
Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation.
TASLP2024
Jixun Yao, Qing Wang 0039, Pengcheng Guo, Ziqian Ning, Lei Xie 0001, 
Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted Matrix.
TASLP2024
Xinfa Zhu, Yi Lei, Tao Li, Yongmao Zhang, Hongbin Zhou, Heng Lu 0004, Lei Xie 0001, 
METTS: Multilingual Emotional Text-to-Speech by Cross-Speaker and Cross-Lingual Emotion Transfer.
ICASSP2024
Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu 0004, Shuai Wang, Jixun Yao, Lei Xie 0001, Mengxiao Bi, 
Dualvc 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion.
ICASSP2024
He Wang, Pengcheng Guo, Pan Zhou, Lei Xie 0001, 
MLCA-AVSR: Multi-Layer Cross Attention Fusion Based Audio-Visual Speech Recognition.
ICASSP2024
Ziqian Wang, Xinfa Zhu, Zihan Zhang, Yuanjun Lv, Ning Jiang, Guoqing Zhao, Lei Xie 0001, 
SELM: Speech Enhancement using Discrete Tokens and Language Models.
ICASSP2024
Jixun Yao, Yuguang Yang 0005, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, Jingjing Yin, Hongbin Zhou, Heng Lu 0004, Lei Xie 0001, 
Promptvc: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts.
ACL2024
Zhichao Wang 0002, Yuanzhe Chen, Xinsheng Wang, Lei Xie 0001, Yuping Wang, 
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion.
TASLP2023
Tao Li, Chenxu Hu, Jian Cong, Xinfa Zhu, Jingbei Li, Qiao Tian, Yuping Wang, Lei Xie 0001, 
DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech - A Study Between English and Mandarin.
TASLP2023
Zhichao Wang 0002, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie 0001, Qiao Tian, Yuping Wang, 
MSM-VC: High-Fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-Scale Style Modeling.
TASLP2023
Qing Wang 0039, Jixun Yao, Li Zhang 0106, Pengcheng Guo, Lei Xie 0001, 
Timbre-Reserved Adversarial Attack in Speaker Identification.
ICASSP2023
Mingshuai Liu, Shubo Lv, Zihan Zhang, Runduo Han, Xiang Hao, Xianjun Xia, Li Chen, Yijian Xiao, Lei Xie 0001, 
Two-Stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge.
ICASSP2023
Ziqian Ning, Qicong Xie, Pengcheng Zhu 0004, Zhichao Wang 0002, Liumeng Xue, Jixun Yao, Lei Xie 0001, Mengxiao Bi, 
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features.
ICASSP2023
Kun Song, Yongmao Zhang, Yi Lei, Jian Cong, Hanzhao Li, Lei Xie 0001, Gang He, Jinfeng Bai, 
DSPGAN: A Gan-Based Universal Vocoder for High-Fidelity TTS by Time-Frequency Domain Supervision from DSP.
ICASSP2023
Zhichao Wang 0002, Xinsheng Wang, Lei Xie 0001, Yuanzhe Chen, Qiao Tian, Yuping Wang, 
Delivering Speaking Style in Low-Resource Voice Conversion with Multi-Factor Constraints.
ICASSP2023
Xiaopeng Yan, Yindi Yang, Zhihao Guo, Liangliang Peng, Lei Xie 0001, 
The NPU-Elevoc Personalized Speech Enhancement System for Icassp2023 DNS Challenge.
SpeechComm2024
Shuai Wang 0016, Zhengyang Chen, Bing Han, Hongji Wang, Chengdong Liang, Binbin Zhang, Xu Xiang, Wen Ding, Johan Rohdin, Anna Silnova, Yanmin Qian, Haizhou Li 0001, 
Advancing speaker embedding learning: Wespeaker toolkit for research and production.
TASLP2024
Zhengyang Chen, Bing Han, Shuai Wang 0016, Yanmin Qian, 
Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer.
TASLP2024
Xun Gong 0005, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, Rui Zhao 0017, Xie Chen 0001, Yanmin Qian, 
Advanced Long-Content Speech Recognition With Factorized Neural Transducer.
TASLP2024
Bing Han, Zhengyang Chen, Yanmin Qian, 
Self-Supervised Learning With Cluster-Aware-DINO for High-Performance Robust Speaker Verification.
TASLP2024
Jiahong Li, Chenda Li, Yifei Wu, Yanmin Qian, 
Unified Cross-Modal Attention: Robust Audio-Visual Speech Recognition and Beyond.
TASLP2024
Bei Liu, Haoyu Wang 0007, Yanmin Qian, 
Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization.
TASLP2024
Wei Wang 0010, Yanmin Qian, 
Universal Cross-Lingual Data Generation for Low Resource ASR.
ICASSP2024
Bing Han, Zhiqiang Lv, Anbai Jiang, Wen Huang 0004, Zhengyang Chen, Yufeng Deng, Jiawei Ding, Cheng Lu 0007, Wei-Qiang Zhang 0001, Pingyi Fan, Jia Liu 0001, Yanmin Qian, 
Exploring Large Scale Pre-Trained Models for Robust Machine Anomalous Sound Detection.
ICASSP2024
Wen Huang 0004, Bing Han, Shuai Wang 0016, Zhengyang Chen, Yanmin Qian, 
Robust Cross-Domain Speaker Verification with Multi-Level Domain Adapters.
ICASSP2024
Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li 0001, 
Prompt-Driven Target Speech Diarization.
ICASSP2024
Hang Shao, Bei Liu, Yanmin Qian, 
One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models.
ICASSP2024
Shuai Wang 0016, Qibing Bai, Qi Liu 0018, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li 0001, 
Leveraging in-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition.
ICASSP2024
Linfeng Yu, Wangyou Zhang, Chenpeng Du, Leying Zhang, Zheng Liang, Yanmin Qian, 
Generation-Based Target Speech Extraction with Speech Discretization and Vocoder.
ICASSP2024
Wangyou Zhang, Jee-weon Jung, Yanmin Qian, 
Improving Design of Input Condition Invariant Speech Enhancement.
TASLP2023
Bei Liu, Zhengyang Chen, Yanmin Qian, 
Depth-First Neural Architecture With Attentive Feature Fusion for Efficient Speaker Verification.
ICASSP2023
Xun Gong 0005, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, Rui Zhao 0017, Xie Chen 0001, Yanmin Qian, 
LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer.
ICASSP2023
Xun Gong 0005, Wei Wang 0010, Hang Shao, Xie Chen 0001, Yanmin Qian, 
Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR.
ICASSP2023
Bing Han, Zhengyang Chen, Yanmin Qian, 
Exploring Binary Classification Loss for Speaker Verification.
ICASSP2023
Jiahong Li, Chenda Li, Yifei Wu, Yanmin Qian, 
Robust Audio-Visual ASR with Unified Cross-Modal Attention.
ICASSP2023
Chenda Li, Yao Qian, Zhuo Chen 0006, Dongmei Wang, Takuya Yoshioka, Shujie Liu 0001, Yanmin Qian, Michael Zeng 0001, 
Target Sound Extraction with Variable Cross-Modality Clues.
TASLP2024
Jiaming Cheng, Ruiyu Liang, Lin Zhou 0001, Li Zhao 0003, Chengwei Huang, Björn W. Schuller, 
Residual Fusion Probabilistic Knowledge Distillation for Speech Enhancement.
TASLP2024
Ruiyu Liang, Yue Xie, Jiaming Cheng, Cong Pang, Björn W. Schuller, 
A Non-Invasive Speech Quality Evaluation Algorithm for Hearing Aids With Multi-Head Self-Attention and Audiogram-Based Features.
ICASSP2024
Zhongren Dong, Zixing Zhang 0001, Weixiang Xu, Jing Han 0010, Jianjun Ou, Björn W. Schuller, 
HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech.
ICASSP2024
Xiangheng He, Junjie Chen, Björn W. Schuller, 
Task Selection and Assignment for Multi-Modal Multi-Task Dialogue Act Classification with Non-Stationary Multi-Armed Bandits.
ICASSP2024
Cheng Lu 0005, Yuan Zong, Hailun Lian, Yan Zhao, Björn W. Schuller, Wenming Zheng, 
Improving Speaker-Independent Speech Emotion Recognition using Dynamic Joint Distribution Adaptation.
ICASSP2024
Manuel Milling, Andreas Triantafyllopoulos, Iosif Tsangko, Simon David Noel Rampp, Björn Wolfgang Schuller, 
Bringing the Discussion of Minima Sharpness to the Audio Domain: A Filter-Normalised Evaluation for Acoustic Scene Classification.
ICASSP2024
Liyizhe Peng, Zixing Zhang 0001, Tao Pang, Jing Han 0010, Huan Zhao 0003, Hao Chen, Björn W. Schuller, 
Customising General Large Language Models for Specialised Emotion Recognition Tasks.
ICASSP2024
Yong Wang, Cheng Lu 0005, Hailun Lian, Yan Zhao, Björn W. Schuller, Yuan Zong, Wenming Zheng, 
Speech Swin-Transformer: Exploring a Hierarchical Transformer with Shifted Windows for Speech Emotion Recognition.
ICASSP2024
Yan Zhao, Jincen Wang, Cheng Lu 0005, Sunan Li, Björn W. Schuller, Yuan Zong, Wenming Zheng, 
Emotion-Aware Contrastive Adaptation Network for Source-Free Cross-Corpus Speech Emotion Recognition.
ICASSP2023
Felix Burkhardt, Anna Derington, Matthias Kahlau, Klaus R. Scherer, Florian Eyben, Björn W. Schuller, 
Masking Speech Contents by Random Splicing: is Emotional Expression Preserved?
ICASSP2023
Yi Chang 0004, Zhao Ren, Thanh Tam Nguyen, Kun Qian 0003, Björn W. Schuller, 
Knowledge Transfer for on-Device Speech Emotion Recognition With Neural Structured Learning.
ICASSP2023
Najla D. Al Futaisi, Alejandrina Cristià, Björn W. Schuller, 
Hearttoheart: The Arts of Infant Versus Adult-Directed Speech Classification.
ICASSP2023
Shuo Liu 0012, Adria Mallol-Ragolta, Björn W. Schuller, 
COVID-19 Detection from Speech in Noisy Conditions.
ICASSP2023
Zhao Ren, Thanh Tam Nguyen, Yi Chang 0004, Björn W. Schuller, 
Fast Yet Effective Speech Emotion Recognition with Self-Distillation.
ICASSP2023
Georgios Rizos, Rafael A. Calvo, Björn W. Schuller, 
Positive-Pair Redundancy Reduction Regularisation for Speech-Based Asthma Diagnosis Prediction.
ICASSP2023
Meishu Song, Andreas Triantafyllopoulos, Zijiang Yang 0007, Hiroki Takeuchi, Toru Nakamura, Akifumi Kishi, Tetsuro Ishizawa, Kazuhiro Yoshiuchi, Xin Jing, Vincent Karas, Zhonghao Zhao, Kun Qian 0003, Bin Hu 0001, Björn W. Schuller, Yoshiharu Yamamoto, 
Daily Mental Health Monitoring from Speech: A Real-World Japanese Dataset and Multitask Learning Analysis.
ICASSP2023
Panagiotis Tzirakis, Alice Baird, Jeffrey A. Brooks, Christopher Gagne, Lauren Kim, Michael Opara, Christopher B. Gregory, Jacob Metrick, Garrett Boseck, Vineet Tiruvadi, Björn W. Schuller, Dacher Keltner, Alan Cowen, 
Large-Scale Nonverbal Vocalization Detection Using Transformers.
ICASSP2023
Xinzhou Xu, Jun Deng, Zixing Zhang 0001, Zhen Yang, Björn W. Schuller, 
Zero-Shot Speech Emotion Recognition Using Generative Learning with Reconstructed Prototypes.
ICASSP2023
Yongzi Yu, Wanyong Qiu, Chen Quan, Kun Qian 0003, Zhihua Wang, Yu Ma, Bin Hu 0001, Björn W. Schuller, Yoshiharu Yamamoto, 
Federated Intelligent Terminals Facilitate Stuttering Monitoring.
ICASSP2023
Ziping Zhao 0001, Huan Wang, Haishuai Wang, Björn W. Schuller, 
Hierarchical Network with Decoupled Knowledge Distillation for Speech Emotion Recognition.
TASLP2024
Kai-Wei Chang, Haibin Wu, Yu-Kai Wang, Yuan-Kuei Wu, Hua Shen, Wei-Cheng Tseng, Iu-thing Kang, Shang-wen Li 0001, Hung-Yi Lee, 
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks.
TASLP2024
Shensian Syu, Juncheng Xie, Hung-yi Lee, 
Improving Non-Autoregressive Translation Quality With Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC.
TASLP2024
Shu-Wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li 0001, Abdelrahman Mohamed, Shinji Watanabe 0001, Hung-yi Lee, 
A Large-Scale Evaluation of Speech Foundation Models.
ICASSP2024
Kevin Everson, Yile Gu, Chao-Han Huck Yang, Prashanth Gurunath Shivakumar, Guan-Ting Lin, Jari Kolehmainen, Ivan Bulyko, Ankur Gandhe, Shalini Ghosh, Wael Hamza, Hung-Yi Lee, Ariya Rastrow, Andreas Stolcke, 
Towards ASR Robust Spoken Language Understanding Through in-Context Learning with Word Confusion Networks.
ICASSP2024
Chien-Yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan S. Sharma, Shinji Watanabe 0001, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee, 
Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech.
ICASSP2024
Kuan-Po Huang, Chih-Kai Yang, Yu-Kuan Fu, Ewan Dunbar, Hung-Yi Lee, 
Zero Resource Code-Switched Speech Benchmark Using Speech Utterance Pairs for Multiple Spoken Languages.
ICASSP2024
Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li 0001, Abdelrahman Mohamed, Hung-Yi Lee, Lin-Shan Lee, 
SpeechDPR: End-To-End Spoken Passage Retrieval For Open-Domain Spoken Question Answering.
ICASSP2024
Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-Yi Lee, Ivan Bulyko, 
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue.
ICASSP2024
Yuan Tseng, Layne Berry, Yiting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Poyao Huang 0001, Chun-Mao Lai, Shang-Wen Li 0001, David Harwath, Yu Tsao 0001, Abdelrahman Mohamed, Chi-Luen Feng, Hung-Yi Lee, 
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.
ICASSP2024
Haibin Wu, Heng-Cheng Kuo, Yu Tsao 0001, Hung-Yi Lee, 
Scalable Ensemble-Based Detection Method Against Adversarial Attacks For Speaker Verification.
ACL2024
Guan-Ting Lin, Cheng-Han Chiang, Hung-yi Lee, 
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations.
ACL-Findings2024
Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan S. Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe 0001, 
On the Evaluation of Speech Foundation Models for Spoken Language Understanding.
TASLP2023
Yun-Yen Chuang, Hung-Min Hsu, Kevin Lin, Ray-I Chang, Hung-Yi Lee, 
MetaEx-GAN: Meta Exploration to Improve Natural Language Generation via Generative Adversarial Networks.
TASLP2023
Po-Chun Hsu, Da-Rong Liu, Andy T. Liu, Hung-yi Lee, 
Parallel Synthesis for Autoregressive Speech Generation.
ICASSP2023
Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-Yi Lee, David Harwath, 
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval.
ICASSP2023
Hsuan-Jui Chen, Yen Meng, Hung-yi Lee, 
Once-for-All Sequence Compression for Self-Supervised Speech Models.
ICASSP2023
Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola García, Hung-Yi Lee, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Euro: Espnet Unsupervised ASR Open-Source Toolkit.
ICASSP2023
Chan-Jan Hsu, Ho-Lam Chung, Hung-Yi Lee, Yu Tsao 0001, 
T5lephone: Bridging Speech and Text Self-Supervised Models for Spoken Language Understanding Via Phoneme Level T5.
ICASSP2023
Sung-Feng Huang, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-Yi Lee, 
Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning.
ICASSP2023
Kuan-Po Huang, Tzu-hsun Feng, Yu-Kuan Fu, Tsu-Yuan Hsu, Po-Chieh Yen, Wei-Cheng Tseng, Kai-Wei Chang, Hung-Yi Lee, 
Ensemble Knowledge Distillation of Self-Supervised Speech Models.
TASLP2024
Shujie Hu, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Yi Wang, Mingyu Cui, Tianzi Wang, Helen Meng, Xunying Liu, 
Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition.
TASLP2024
Zengrui Jin, Mengzhe Geng, Jiajun Deng, Tianzi Wang, Shujie Hu, Guinan Li, Xunying Liu, 
Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition.
ICASSP2024
Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction.
ICASSP2024
Jiajun Deng, Xurong Xie, Guinan Li, Mingyu Cui, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Zhaoqing Li, Xunying Liu, 
Towards High-Performance and Low-Latency Feature-Based Speaker Adaptation of Conformer Speech Recognition Systems.
ICASSP2024
Zengrui Jin, Xurong Xie, Tianzi Wang, Mengzhe Geng, Jiajun Deng, Guinan Li, Shujie Hu, Xunying Liu, 
Towards Automatic Data Augmentation for Disordered Speech Recognition.
ICASSP2024
Huimeng Wang, Zengrui Jin, Mengzhe Geng, Shujie Hu, Guinan Li, Tianzi Wang, Haoning Xu, Xunying Liu, 
Enhancing Pre-Trained ASR System Fine-Tuning for Dysarthric Speech Recognition Using Adversarial Data Augmentation.
TASLP2023
Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Guinan Li, Shujie Hu, Xunying Liu, 
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems.
TASLP2023
Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu, 
Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition.
TASLP2023
Xixin Wu, Hui Lu, Kun Li 0003, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms.
ICASSP2023
Shujie Hu, Xurong Xie, Zengrui Jin, Mengzhe Geng, Yi Wang, Mingyu Cui, Jiajun Deng, Xunying Liu, Helen Meng, 
Exploring Self-Supervised Pre-Trained ASR Models for Dysarthric and Elderly Speech Recognition.
ICASSP2023
Zengrui Jin, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shujie Hu, Jiajun Deng, Guinan Li, Xunying Liu, 
Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition.
ICASSP2023
Jinchao Li, Kaitao Song, Junan Li, Bo Zheng, Dongsheng Li 0002, Xixin Wu, Xunying Liu, Helen Meng, 
Leveraging Pretrained Representations With Task-Related Keywords for Alzheimer's Disease Detection.
ICASSP2023
Jinchao Li, Xixin Wu, Kaitao Song, Dongsheng Li 0002, Xunying Liu, Helen Meng, 
A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition.
ICASSP2023
Xurong Xie, Xunying Liu, Hui Chen 0020, Hongan Wang, 
Unsupervised Model-Based Speaker Adaptation of End-To-End Lattice-Free MMI Model for Speech Recognition.
Interspeech2023
Mingyu Cui, Jiawen Kang 0002, Jiajun Deng, Xi Yin 0010, Yutao Xie, Xie Chen 0001, Xunying Liu, 
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems.
Interspeech2023
Jiajun Deng, Guinan Li, Xurong Xie, Zengrui Jin, Mingyu Cui, Tianzi Wang, Shujie Hu, Mengzhe Geng, Xunying Liu, 
Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems.
Interspeech2023
Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Jiajun Deng, Mingyu Cui, Guinan Li, Jianwei Yu, Xurong Xie, Xunying Liu, 
Use of Speech Impairment Severity for Dysarthric Speech Recognition.
Interspeech2023
Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui Jin, Tianzi Wang, Shujie Hu, Zi Ye 0001, Helen Meng, Xunying Liu, 
On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition.
Interspeech2023
Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Guinan Li, Tianzi Wang, Helen Meng, Xunying Liu, 
Exploiting Cross-Domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition.
Interspeech2023
Zhaoqing Li, Tianzi Wang, Jiajun Deng, Junhao Xu, Shoukang Hu, Xunying Liu, 
Lossless 4-bit Quantization of Architecture Compressed Conformer ASR Systems on the 300-hr Switchboard Corpus.
TASLP2024
Hao Zhang, Yixuan Zhang 0005, Meng Yu 0003, Dong Yu 0001, 
Enhanced Acoustic Howling Suppression via Hybrid Kalman Filter and Deep Learning Models.
ICASSP2024
Muqiao Yang, Chunlei Zhang, Yong Xu 0004, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu 0001, 
uSee: Unified Speech Enhancement And Editing with Conditional Diffusion Models.
ICML2024
Manjie Xu, Chenxing Li, Duzhen Zhang, Dan Su 0002, Wei Liang, Dong Yu 0001, 
Prompt-guided Precise Audio Editing with Diffusion Models.
ACL2024
Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Jinchuan Tian, Zhenhui Ye, Luping Liu, Zehan Wang 0001, Ziyue Jiang 0001, Xuankai Chang, Jiatong Shi, Chao Weng, Zhou Zhao, Dong Yu 0001, 
Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.
ACL2024
Yongxin Zhu 0003, Dan Su 0002, Liqiang He, Linli Xu, Dong Yu 0001, 
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer.
TASLP2023
Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu 0001, 
Unsupervised TTS Acoustic Modeling for TTS With Conditional Disentangled Sequential VAE.
TASLP2023
Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu 0001, 
Integrating Lattice-Free MMI Into End-to-End Speech Recognition.
TASLP2023
Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu 0001, 
Diffsound: Discrete Diffusion Model for Text-to-Sound Generation.
Interspeech2023
Yong Xu 0004, Vinay Kothapally, Meng Yu 0003, Shixiong Zhang, Dong Yu 0001, 
Zoneformer: On-device Neural Beamformer For In-car Multi-zone Speech Separation, Enhancement and Echo Cancellation.
Interspeech2023
Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu 0001, Shinji Watanabe 0001, 
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction.
Interspeech2023
Wei Xiao, Wenzhe Liu, Meng Wang, Shan Yang, Yupeng Shi, Yuyong Kang, Dan Su 0002, Shidong Shang, Dong Yu 0001, 
Multi-mode Neural Speech Coding Based on Deep Generative Networks.
Interspeech2023
Yuping Yuan, Zhao You, Shulin Feng, Dan Su 0002, Yanchun Liang 0001, Xiaohu Shi, Dong Yu 0001, 
Compressed MoE ASR Model Based on Knowledge Distillation and Quantization.
Interspeech2023
Hao Zhang, Meng Yu 0003, Yuzhong Wu, Tao Yu, Dong Yu 0001, 
Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression.
Interspeech2023
Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu 0001, Zhao You, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation.
EMNLP2023
Dian Yu 0001, Xiaoyang Wang, Wanshun Chen, Nan Du, Longyue Wang, Haitao Mi, Dong Yu 0001, 
More Than Spoken Words: Nonverbal Message Extraction and Generation.
ACL-Findings2023
Rongjie Huang, Chunlei Zhang, Yi Ren 0006, Zhou Zhao, Dong Yu 0001, 
Prosody-TTS: Improving Prosody with Masked Autoencoder and Conditional Diffusion Model For Expressive Text-to-Speech.
ICASSP2022
Jiachen Lian, Chunlei Zhang, Dong Yu 0001, 
Robust Disentangled Variational Speech Representation Learning for Zero-Shot Voice Conversion.
ICASSP2022
Songxiang Liu, Shan Yang, Dan Su 0002, Dong Yu 0001, 
Referee: Towards Reference-Free Cross-Speaker Style Transfer with Low-Quality Data for Expressive Speech Synthesis.
ICASSP2022
Dongpeng Ma, Yiwen Wang, Liqiang He, Mingjie Jin, Dan Su 0002, Dong Yu 0001, 
DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition.
ICASSP2022
Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu 0003, Zhenyu Tang 0001, Dinesh Manocha, Dong Yu 0001, 
Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator.
TASLP2024
Jingbei Li, Sipan Li, Ping Chen, Luwen Zhang, Yi Meng, Zhiyong Wu 0001, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang 0002, 
Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing.
ICASSP2024
Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction.
ICASSP2024
Xueyuan Chen, Xi Wang 0016, Shaofei Zhang, Lei He 0005, Zhiyong Wu 0001, Xixin Wu, Helen Meng, 
Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis.
ICASSP2024
Qiaochu Huang, Xu He, Boshi Tang, Haolin Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu 0001, Haozhi Huang 0004, Helen Meng, 
Enhancing Expressiveness in Dance Generation Via Integrating Frequency and Music Style Information.
ICASSP2024
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Dan Luo, Zhiyong Wu 0001, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han 0001, Helen Meng, 
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.
ICASSP2024
Xingda Li, Fan Zhuo, Dan Luo, Jun Chen 0024, Shiyin Kang, Zhiyong Wu 0001, Tao Jiang, Yang Li, Han Fang, Yahui Zhou, 
Generating Stereophonic Music with Single-Stage Language Models.
ICASSP2024
Zhiwei Lin, Jun Chen 0024, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu 0001, Helen Meng, 
Multi-View Midivae: Fusing Track- and Bar-View Representations for Long Multi-Track Symbolic Music Generation.
ICASSP2024
Hui Lu, Xixin Wu, Haohan Guo, Songxiang Liu, Zhiyong Wu 0001, Helen Meng, 
Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.
ICASSP2024
Binzhu Sha, Xu Li 0015, Zhiyong Wu 0001, Ying Shan, Helen Meng, 
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion.
ICASSP2024
Haiwei Xue, Sicheng Yang, Zhensong Zhang, Zhiyong Wu 0001, Minglei Li 0001, Zonghong Dai, Helen Meng, 
Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models.
ICASSP2024
Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu 0001, 
FreeTalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness.
AAAI2024
Yaoxun Xu, Hangting Chen, Jianwei Yu, Qiaochu Huang, Zhiyong Wu 0001, Shi-Xiong Zhang, Guangzhi Li, Yi Luo 0004, Rongzhi Gu, 
SECap: Speech Emotion Captioning with Large Language Model.
TASLP2023
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Zhiyong Wu 0001, Xixin Wu, Shiyin Kang, Helen Meng, 
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.
TASLP2023
Xixin Wu, Hui Lu, Kun Li 0003, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms.
ICASSP2023
Jun Chen 0024, Wei Rao, Zilin Wang, Jiuxin Lin, Zhiyong Wu 0001, Yannan Wang, Shidong Shang, Helen Meng, 
Inter-Subnet: Speech Enhancement with Subband Interaction.
ICASSP2023
Jun Chen 0024, Yupeng Shi, Wenzhe Liu, Wei Rao, Shulin He, Andong Li, Yannan Wang, Zhiyong Wu 0001, Shidong Shang, Chengshi Zheng, 
Gesper: A Unified Framework for General Speech Restoration.
ICASSP2023
Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fuping Pan, Zhiyong Wu 0001, 
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech.
ICASSP2023
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis.
ICASSP2023
Jiuxin Lin, Xinyu Cai, Heinrich Dinkel, Jun Chen 0024, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Zhiyong Wu 0001, Yujun Wang, Helen Meng, 
Av-Sepformer: Cross-Attention Sepformer for Audio-Visual Target Speaker Extraction.
ICASSP2023
Xingchen Song, Di Wu 0061, Zhiyong Wu 0001, Binbin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu, 
TrimTail: Low-Latency Streaming ASR with Simple But Effective Spectrogram-Level Length Penalty.
TASLP2024
Xun Gong 0005, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, Rui Zhao 0017, Xie Chen 0001, Yanmin Qian, 
Advanced Long-Content Speech Recognition With Factorized Neural Transducer.
TASLP2024
Xiaofei Wang 0007, Manthan Thakker, Zhuo Chen 0006, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu 0001, Jinyu Li 0001, Takuya Yoshioka, 
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
TASLP2024
Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu 0012, Shujie Liu 0001, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Furu Wei, 
VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation.
TASLP2024
Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu 0012, Shuo Ren, Shujie Liu 0001, Zhuoyuan Yao, Xun Gong 0005, Li-Rong Dai 0001, Jinyu Li 0001, Furu Wei, 
SpeechLM: Enhanced Speech Pre-Training With Unpaired Textual Data.
ICASSP2024
Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li 0001, Yashesh Gaur, 
Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation.
ICASSP2024
Yiming Wang, Jinyu Li 0001, 
Residualtransformer: Residual Low-Rank Learning With Weight-Sharing For Transformer Layers.
ICASSP2024
Jian Wu 0027, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao 0017, Zhuo Chen 0006, Jinyu Li 0001, 
T-SOT FNT: Streaming Multi-Talker ASR with Text-Only Domain Adaptation Capability.
ICASSP2024
Mu Yang, Naoyuki Kanda, Xiaofei Wang 0009, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li 0001, Takuya Yoshioka, 
Diarist: Streaming Speech Translation with Speaker Diarization.
ICML2024
Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan 0003, Detai Xin, Dongchao Yang, Eric Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu 0001, Tao Qin 0001, Xiangyang Li 0001, Wei Ye 0004, Shikun Zhang, Jiang Bian 0002, Lei He 0005, Jinyu Li 0001, Sheng Zhao, 
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
ICASSP2023
Zhuo Chen 0006, Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiaofei Wang 0009, Takuya Yoshioka, Jinyu Li 0001, Sunit Sivasankaran, Sefik Emre Eskimez, 
Speech Separation with Large-Scale Self-Supervised Learning.
ICASSP2023
Ruchao Fan, Yiming Wang, Yashesh Gaur, Jinyu Li 0001, 
CTCBERT: Advancing Hidden-Unit Bert with CTC Objectives.
ICASSP2023
Xun Gong 0005, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, Rui Zhao 0017, Xie Chen 0001, Yanmin Qian, 
LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer.
ICASSP2023
Zili Huang, Zhuo Chen 0006, Naoyuki Kanda, Jian Wu 0027, Yiming Wang, Jinyu Li 0001, Takuya Yoshioka, Xiaofei Wang 0009, Peidong Wang, 
Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.
ICASSP2023
Naoyuki Kanda, Jian Wu 0027, Xiaofei Wang 0009, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
ICASSP2023
Xiaoqiang Wang 0006, Yanqing Liu, Jinyu Li 0001, Sheng Zhao, 
Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation.
ICASSP2023
Kun Wei, Long Zhou, Ziqiang Zhang, Liping Chen, Shujie Liu 0001, Lei He 0005, Jinyu Li 0001, Furu Wei, 
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation.
ICASSP2023
Jian Wu 0027, Zhuo Chen 0006, Min Hu, Xiong Xiao, Jinyu Li 0001, 
Speaker Change Detection For Transformer Transducer ASR.
ICASSP2023
Muqiao Yang, Naoyuki Kanda, Xiaofei Wang 0009, Jian Wu 0027, Sunit Sivasankaran, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.
ICASSP2023
Rui Zhao 0017, Jian Xue, Partha Parthasarathy, Veljko Miljanic, Jinyu Li 0001, 
Fast and Accurate Factorized Neural Transducer for Text Adaption of End-to-End Speech Recognition Models.
Interspeech2023
Yuang Li, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, 
Accelerating Transducers through Adjacent Token Merging.
TASLP2024
Hassan Taherian, DeLiang Wang, 
Multi-Channel Conversational Speaker Separation via Neural Diarization.
ICASSP2024
Vahid Ahmadi Kalkhorani, Anurag Kumar 0003, Ke Tan 0001, Buye Xu, DeLiang Wang, 
Audiovisual Speaker Separation with Full- and Sub-Band Modeling in the Time-Frequency Domain.
ICASSP2024
Hassan Taherian, Ashutosh Pandey 0004, Daniel Wong, Buye Xu, DeLiang Wang, 
Leveraging Sound Localization to Improve Continuous Speaker Separation.
TASLP2023
Ashutosh Pandey 0004, DeLiang Wang, 
Attentive Training: A New Training Framework for Speech Enhancement.
TASLP2023
Yixuan Zhang 0005, Heming Wang, DeLiang Wang, 
$F0$ Estimation and Voicing Detection With Cascade Architecture in Noisy Speech.
ICASSP2023
Hassan Taherian, DeLiang Wang, 
Multi-Resolution Location-Based Training for Multi-Channel Continuous Speech Separation.
ICASSP2023
Heming Wang, Yao Qian, Hemin Yang, Nauyuki Kanda, Peidong Wang, Takuya Yoshioka, Xiaofei Wang 0009, Yiming Wang, Shujie Liu 0001, Zhuo Chen 0006, DeLiang Wang, Michael Zeng 0001, 
DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks.
ICASSP2023
Heming Wang, DeLiang Wang, 
Cross-Domain Diffusion Based Speech Enhancement for Very Noisy Speech.
Interspeech2023
Vahid Ahmadi Kalkhorani, Anurag Kumar 0003, Ke Tan 0001, Buye Xu, DeLiang Wang, 
Time-domain Transformer-based Audiovisual Speaker Separation.
Interspeech2023
Hassan Taherian, Ashutosh Pandey 0004, Daniel Wong, Buye Xu, DeLiang Wang, 
Multi-input Multi-output Complex Spectral Mapping for Speaker Separation.
Interspeech2023
Yufeng Yang, Ashutosh Pandey 0004, DeLiang Wang, 
Time-Domain Speech Enhancement for Robust Automatic Speech Recognition.
TASLP2022
Ashutosh Pandey 0004, DeLiang Wang, 
Self-Attending RNN for Speech Enhancement to Improve Cross-Corpus Generalization.
TASLP2022
Hassan Taherian, Ke Tan 0001, DeLiang Wang, 
Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training.
TASLP2022
Ke Tan 0001, Zhong-Qiu Wang, DeLiang Wang, 
Neural Spectrospatial Filtering.
TASLP2022
Heming Wang, DeLiang Wang, 
Neural Cascade Architecture With Triple-Domain Loss for Speech Enhancement.
TASLP2022
Heming Wang, Xueliang Zhang 0001, DeLiang Wang, 
Fusing Bone-Conduction and Air-Conduction Sensors for Complex-Domain Speech Enhancement.
TASLP2022
Hao Zhang, DeLiang Wang, 
Neural Cascade Architecture for Multi-Channel Acoustic Echo Suppression.
ICASSP2022
Ashutosh Pandey 0004, Buye Xu, Anurag Kumar 0003, Jacob Donley, Paul Calamia, DeLiang Wang, 
TPARN: Triple-Path Attentive Recurrent Network for Time-Domain Multichannel Speech Enhancement.
ICASSP2022
Hassan Taherian, Ke Tan 0001, DeLiang Wang, 
Location-Based Training for Multi-Channel Talker-Independent Speaker Separation.
ICASSP2022
Heming Wang, Yao Qian, Xiaofei Wang 0009, Yiming Wang, Chengyi Wang 0002, Shujie Liu 0001, Takuya Yoshioka, Jinyu Li 0001, DeLiang Wang, 
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.
TASLP2024
Hang Chen, Qing Wang 0008, Jun Du, Bao-Cai Yin, Jia Pan, Chin-Hui Lee 0001, 
Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition.
TASLP2024
Zilu Guo, Qing Wang 0008, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui Lee 0001, 
A Variance-Preserving Interpolation Approach for Diffusion Models With Applications to Single Channel Speech Enhancement and Recognition.
ICASSP2024
Hanbo Cheng, Jun Du, Pengfei Hu 0006, Jiefeng Ma, Zhenrong Zhang, Mobai Xue, 
Viewing Writing as Video: Optical Flow based Multi-Modal Handwritten Mathematical Expression Recognition.
ICASSP2024
Feng Ma, Yanhui Tu, Maokui He, Ruoyu Wang 0029, Shutong Niu, Lei Sun 0010, Zhongfu Ye, Jun Du, Jia Pan, Chin-Hui Lee 0001, 
A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition.
ICASSP2024
Haotian Wang, Jun Du, Yusheng Dai, Chin-Hui Lee 0001, Yuling Ren, Yu Liu, 
Improving Multi-Modal Emotion Recognition Using Entropy-Based Fusion and Pruning-Based Network Architecture Optimization.
ICASSP2024
Minghui Wu, Haitao Tang, Jiahuan Fan, Ruoyu Wang, Hang Chen, Yanyong Zhang, Jun Du, Hengshun Zhou, Lei Sun, Xin Fang, Tian Gao, Genshun Wan, Jia Pan, Jianqing Gao, 
Implicit Enhancement of Target Speaker in Speaker-Adaptive ASR through Efficient Joint Optimization.
ICASSP2024
Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang 0029, Hongbo Lan, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao, 
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
ICASSP2024
Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang 0029, Yanyan Yue, Shuangqing Qian, Shilong Wu, Jun Du, Chin-Hui Lee 0001, 
Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture.
SpeechComm2023
Shi Cheng, Jun Du, Shutong Niu, Alejandrina Cristià, Xin Wang 0037, Qing Wang 0008, Chin-Hui Lee 0001, 
Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions.
SpeechComm2023
Li Chai 0002, Hang Chen, Jun Du, Qing-Feng Liu, Chin-Hui Lee 0001, 
Space-and-speaker-aware acoustic modeling with effective data augmentation for recognition of multi-array conversational speech.
TASLP2023
Mao-Kui He, Jun Du, Qing-Feng Liu, Chin-Hui Lee 0001, 
ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding.
TASLP2023
Shutong Niu, Jun Du, Lei Sun 0010, Yu Hu 0003, Chin-Hui Lee 0001, 
QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization.
TASLP2023
Jie Zhang 0042, Rui Tao, Jun Du, Li-Rong Dai 0001, 
SDW-SWF: Speech Distortion Weighted Single-Channel Wiener Filter for Noise Reduction.
ICASSP2023
Hang Chen, Shilong Wu, Yusheng Dai, Zhe Wang, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge.
ICASSP2023
Shutong Niu, Jun Du, Qing Wang 0008, Li Chai 0002, Huaxin Wu, Zhaoxu Nian, Lei Sun 0010, Yi Fang, Jia Pan, Chin-Hui Lee 0001, 
An Experimental Study on Sound Event Localization and Detection Under Realistic Testing Conditions.
ICASSP2023
Ruoyu Wang 0029, Jun Du, Tian Gao, 
Quantum Transfer Learning Using the Large-Scale Unsupervised Pre-Trained Model Wavlm-Large for Synthetic Speech Detection.
ICASSP2023
Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.
ICASSP2023
Chenyue Zhang, Hang Chen, Jun Du, Bao-Cai Yin, Jia Pan, Chin-Hui Lee 0001, 
Incorporating Visual Information Reconstruction into Progressive Learning for Optimizing audio-visual Speech Enhancement.
Interspeech2023
Zilu Guo, Jun Du, Chin-Hui Lee 0001, Yu Gao, Wenbin Zhang, 
Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement.
Interspeech2023
Shutong Niu, Jun Du, Maokui He, Chin-Hui Lee 0001, Baoxiang Li, Jiakui Li, 
Unsupervised Adaptation with Quality-Aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization.
TASLP2024
Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe 0001, 
End-to-End Speech Recognition: A Survey.
ICASSP2024
Junwen Bai, Bo Li 0028, Qiujia Li, Tara N. Sainath, Trevor Strohman, 
Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR.
ICASSP2024
Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li 0028, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal, 
USM-Lite: Quantization and Sparsity Aware Fine-Tuning for Speech Recognition with Universal Speech Models.
ICASSP2024
Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara N. Sainath, Françoise Beaufays, Pedro Moreno Mengibar, 
Improving Speech Recognition for African American English with Audio Classification.
ICASSP2024
W. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang 0033, Yongqiang Wang, Shuo-Yiin Chang, Tara N. Sainath, 
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study.
ICASSP2024
Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno 0001, 
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models.
ICASSP2024
Haozhe Shan, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara N. Sainath, 
Augmenting Conformers With Structured State-Space Sequence Models For Online Speech Recognition.
ICASSP2024
Khe Chai Sim, Zhouyuan Huo, Tsendsuren Munkhdalai, Nikhil Siddhartha, Adam Stooke, Zhong Meng, Bo Li 0028, Tara N. Sainath, 
A Comparison of Parameter-Efficient ASR Domain Adaptation Methods for Universal Speech and Language Models.
NAACL2024
Weiran Wang, Rohit Prabhavalkar, Haozhe Shan, Zhong Meng, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li 0028, James Qin, Xingyu Cai, Adam Stooke, Chengjian Zheng, Yanzhang He, Tara N. Sainath, Pedro Moreno Mengibar, 
Massive End-to-end Speech Recognition Models with Time Reduction.
ICASSP2023
Rami Botros, Rohit Prabhavalkar, Johan Schalkwyk, Ciprian Chelba, Tara N. Sainath, Françoise Beaufays, 
Lego-Features: Exporting Modular Encoder Features for Streaming and Deliberation ASR.
ICASSP2023
Shuo-Yiin Chang, Chao Zhang 0031, Tara N. Sainath, Bo Li 0028, Trevor Strohman, 
Context-Aware end-to-end ASR Using Self-Attentive Embedding and Tensor Fusion.
ICASSP2023
Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw, 
Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models.
ICASSP2023
Ke Hu, Tara N. Sainath, Bo Li 0028, Nan Du 0002, Yanping Huang, Andrew M. Dai, Yu Zhang 0033, Rodrigo Cabrera, Zhifeng Chen, Trevor Strohman, 
Massively Multilingual Shallow Fusion with Large Language Models.
ICASSP2023
W. Ronny Huang, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, David Rybach, Robert David, Rohit Prabhavalkar, Cyril Allauzen, Cal Peyser, Trevor D. Strohman, 
E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model.
ICASSP2023
Zhouyuan Huo, Khe Chai Sim, Bo Li 0028, Dongseong Hwang, Tara N. Sainath, Trevor Strohman, 
Resource-Efficient Transfer Learning from Speech Foundation Model Using Hierarchical Feature Fusion.
ICASSP2023
Bo Li 0028, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang 0033, Wei Han 0002, Trevor Strohman, Françoise Beaufays, 
Efficient Domain Adaptation for Speech Foundation Models.
ICASSP2023
Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang 0033, Bo Li 0028, Andrew Rosenberg, Bhuvana Ramabhadran, 
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition.
ICASSP2023
Cal Peyser, Michael Picheny, Kyunghyun Cho, Rohit Prabhavalkar, W. Ronny Huang, Tara N. Sainath, 
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale.
ICASSP2023
Tara N. Sainath, Rohit Prabhavalkar, Diamantino Caseiro, Pat Rondon, Cyril Allauzen, 
Improving Contextual Biasing with Text Injection.
ICASSP2023
Weiran Wang, Ding Zhao, Shaojin Ding, Hao Zhang 0010, Shuo-Yiin Chang, David Rybach, Tara N. Sainath, Yanzhang He, Ian McGraw, Shankar Kumar, 
Multi-Output RNN-T Joint Networks for Multi-Task Learning of ASR and Auxiliary Tasks.
TASLP2024
Cunhang Fan, Mingming Ding, Jianhua Tao 0001, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Zhao Lv, 
Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection.
ICASSP2024
Yong Ren, Tao Wang 0074, Jiangyan Yi, Le Xu, Jianhua Tao 0001, Chu Yuan Zhang, Junzuo Zhou, 
Fewer-Token Neural Speech Codec with Time-Invariant Codes.
ICASSP2024
Chenglong Wang, Jiayi He, Jiangyan Yi, Jianhua Tao 0001, Chu Yuan Zhang, Xiaohui Zhang 0006, 
Multi-Scale Permutation Entropy for Audio Deepfake Detection.
ICASSP2024
Mingyu Xu, Zheng Lian, Bin Liu 0041, Zerui Chen, Jianhua Tao 0001, 
Pseudo Labels Regularization for Imbalanced Partial-Label Learning.
AAAI2024
Xiaohui Zhang 0006, Jiangyan Yi, Chenglong Wang, Chu Yuan Zhang, Siding Zeng, Jianhua Tao 0001, 
What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection.
SpeechComm2023
Jiangyan Yi, Jianhua Tao 0001, Ye Bai, Zhengkun Tian, Cunhang Fan, 
Transfer knowledge for punctuation prediction via adversarial training.
TASLP2023
Jiangyan Yi, Jianhua Tao 0001, Ruibo Fu, Tao Wang 0074, Chu Yuan Zhang, Chenglong Wang, 
Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings.
ICASSP2023
Guanjun Li, Wei Xue, Wenju Liu, Jiangyan Yi, Jianhua Tao 0001, 
GCC-Speaker: Target Speaker Localization with Optimal Speaker-Dependent Weighting in Multi-Speaker Scenarios.
ICASSP2023
Jinlong Xue, Yayue Deng, Fengping Wang, Ya Li, Yingming Gao, Jianhua Tao 0001, Jianqing Sun, Jiaen Liang, 
M2-CTTS: End-to-End Multi-Scale Multi-Modal Conversational Text-to-Speech Synthesis.
Interspeech2023
Haiyang Sun, Zheng Lian, Bin Liu 0041, Ying Li, Jianhua Tao 0001, Licai Sun, Cong Cai, Meng Wang, Yuan Cheng, 
EmotionNAS: Two-stream Neural Architecture Search for Speech Emotion Recognition.
Interspeech2023
Chenglong Wang, Jiangyan Yi, Jianhua Tao 0001, Chu Yuan Zhang, Shuai Zhang 0014, Xun Chen, 
Detection of Cross-Dataset Fake Audio Based on Prosodic and Pronunciation Features.
Interspeech2023
Chenglong Wang, Jiangyan Yi, Jianhua Tao 0001, Chu Yuan Zhang, Shuai Zhang 0014, Ruibo Fu, Xun Chen, 
TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection.
Interspeech2023
Ruiteng Zhang, Jianguo Wei, Xugang Lu, Yongwei Li, Junhai Xu, Di Jin 0001, Jianhua Tao 0001, 
SOT: Self-supervised Learning-Assisted Optimal Transport for Unsupervised Adaptive Speech Emotion Recognition.
ICML2023
Xiaohui Zhang 0006, Jiangyan Yi, Jianhua Tao 0001, Chenglong Wang, Chu Yuan Zhang, 
Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection.
SpeechComm2022
Wenhuan Lu, Xinyue Zhao, Na Guo, Yongwei Li, Jianguo Wei, Jianhua Tao 0001, Jianwu Dang 0001, 
One-shot emotional voice conversion based on feature separation.
TASLP2022
Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao 0001, Zhengqi Wen, 
NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.
TASLP2022
Tao Wang 0074, Jiangyan Yi, Ruibo Fu, Jianhua Tao 0001, Zhengqi Wen, 
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.
ICASSP2022
Cong Cai, Bin Liu 0041, Jianhua Tao 0001, Zhengkun Tian, Jiahao Lu, Kexin Wang, 
End-to-End Network Based on Transformer for Automatic Detection of Covid-19.
ICASSP2022
Ya Li, Mingyue Niu, Ziping Zhao 0001, Jianhua Tao 0001, 
Automatic Depression Level Assessment from Speech By Long-Term Global Information Embedding.
ICASSP2022
Tao Wang 0074, Jiangyan Yi, Liqun Deng, Ruibo Fu, Jianhua Tao 0001, Zhengqi Wen, 
Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.
SpeechComm2024
Yuqin Lin, Jianwu Dang 0001, Longbiao Wang, Sheng Li 0010, Chenchen Ding, 
Disordered speech recognition considering low resources and abnormal articulation.
SpeechComm2024
Nan Li, Longbiao Wang, Meng Ge, Masashi Unoki, Sheng Li 0010, Jianwu Dang 0001, 
Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network.
TASLP2024
Cheng Gong, Xin Wang 0037, Erica Cooper, Dan Wells, Longbiao Wang, Jianwu Dang 0001, Korin Richmond, Junichi Yamagishi, 
ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations.
TASLP2024
Rui Liu 0008, Yifan Hu, Haolin Zuo, Zhaojie Luo, Longbiao Wang, Guanglai Gao, 
Text-to-Speech for Low-Resource Agglutinative Language With Morphology-Aware Language Model Pre-Training.
TASLP2024
Xiao Wei, Yuhang Li, Yuke Si, Longbiao Wang, Xiaobao Wang, Jianwu Dang 0001, 
A Prompt-Based Hierarchical Pipeline for Cross-Domain Slot Filling.
ICASSP2024
Chunyu Qiang, Hao Li, Yixin Tian, Ruibo Fu, Tao Wang 0074, Longbiao Wang, Jianwu Dang 0001, 
Learning Speech Representation from Contrastive Token-Acoustic Pretraining.
ICASSP2024
Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang 0001, 
High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models.
ICASSP2024
Linjuan Zhang, Kong Aik Lee, Lin Zhang, Longbiao Wang, Baoning Niu, 
CPAUG: Refining Copy-Paste Augmentation for Speech Anti-Spoofing.
TASLP2023
Yuqin Lin, Longbiao Wang, Yanbing Yang, Jianwu Dang 0001, 
CFDRN: A Cognition-Inspired Feature Decomposition and Recombination Network for Dysarthric Speech Recognition.
TASLP2023
Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, Helen Meng, 
Meta-Generalization for Domain-Invariant Speaker Verification.
ICASSP2023
Hui Chen, Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, 
Self-Supervised Audio-Visual Speaker Representation with Co-Meta Learning.
ICASSP2023
Yuhao Liu, Cheng Gong, Longbiao Wang, Xixin Wu, Qiuyu Liu, Jianwu Dang 0001, 
VF-Taco2: Towards Fast and Lightweight Synthesis for Autoregressive Models with Variation Autoencoder and Feature Distillation.
ICASSP2023
Xiaohui Liu, Meng Liu, Longbiao Wang, Kong Aik Lee, Hanyi Zhang, Jianwu Dang 0001, 
Leveraging Positional-Related Local-Global Dependency for Synthetic Speech Detection.
ICASSP2023
Meng Liu, Kong Aik Lee, Longbiao Wang, Hanyi Zhang, Chang Zeng, Jianwu Dang 0001, 
Cross-Modal Audio-Visual Co-Learning for Text-Independent Speaker Verification.
ICASSP2023
Haoyu Lu, Nan Li, Tongtong Song, Longbiao Wang, Jianwu Dang 0001, Xiaobao Wang, Shiliang Zhang, 
Speech and Noise Dual-Stream Spectrogram Refine Network With Speech Distortion Loss For Robust Speech Recognition.
ICASSP2023
Hao Shi, Masato Mimura, Longbiao Wang, Jianwu Dang 0001, Tatsuya Kawahara, 
Time-Domain Speech Enhancement Assisted by Multi-Resolution Frequency Encoder and Decoder.
ICASSP2023
Yao Sun, Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, 
Noise-Disentanglement Metric Learning for Robust Speaker Verification.
ICASSP2023
Yiwei Wei, Shaozu Yuan, Meng Chen 0006, Longbiao Wang, 
Enhancing Multimodal Alignment with Momentum Augmentation for Dense Video Captioning.
Interspeech2023
Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang 0001, Chengyun Deng, Fei Wang, 
Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation.
Interspeech2023
Junjie Li, Meng Ge, Zexu Pan, Rui Cao, Longbiao Wang, Jianwu Dang 0001, Shiliang Zhang, 
Rethinking the Visual Cues in Audio-Visual Speaker Extraction.
SpeechComm2024
Yuqin Lin, Jianwu Dang 0001, Longbiao Wang, Sheng Li 0010, Chenchen Ding, 
Disordered speech recognition considering low resources and abnormal articulation.
SpeechComm2024
Nan Li, Longbiao Wang, Meng Ge, Masashi Unoki, Sheng Li 0010, Jianwu Dang 0001, 
Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network.
TASLP2024
Cheng Gong, Xin Wang 0037, Erica Cooper, Dan Wells, Longbiao Wang, Jianwu Dang 0001, Korin Richmond, Junichi Yamagishi, 
ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations.
TASLP2024
Xiao Wei, Yuhang Li, Yuke Si, Longbiao Wang, Xiaobao Wang, Jianwu Dang 0001, 
A Prompt-Based Hierarchical Pipeline for Cross-Domain Slot Filling.
ICASSP2024
Chunyu Qiang, Hao Li, Yixin Tian, Ruibo Fu, Tao Wang 0074, Longbiao Wang, Jianwu Dang 0001, 
Learning Speech Representation from Contrastive Token-Acoustic Pretraining.
ICASSP2024
Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang 0001, 
High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models.
TASLP2023
Yuqin Lin, Longbiao Wang, Yanbing Yang, Jianwu Dang 0001, 
CFDRN: A Cognition-Inspired Feature Decomposition and Recombination Network for Dysarthric Speech Recognition.
TASLP2023
Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, Helen Meng, 
Meta-Generalization for Domain-Invariant Speaker Verification.
ICASSP2023
Hui Chen, Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, 
Self-Supervised Audio-Visual Speaker Representation with Co-Meta Learning.
ICASSP2023
Zhongjie Li, Bin Zhao, Gaoyan Zhang, Jianwu Dang 0001, 
Brain Network Features Differentiate Intentions from Different Emotional Expressions of the Same Text.
ICASSP2023
Yuhao Liu, Cheng Gong, Longbiao Wang, Xixin Wu, Qiuyu Liu, Jianwu Dang 0001, 
VF-Taco2: Towards Fast and Lightweight Synthesis for Autoregressive Models with Variation Autoencoder and Feature Distillation.
ICASSP2023
Xiaohui Liu, Meng Liu, Longbiao Wang, Kong Aik Lee, Hanyi Zhang, Jianwu Dang 0001, 
Leveraging Positional-Related Local-Global Dependency for Synthetic Speech Detection.
ICASSP2023
Meng Liu, Kong Aik Lee, Longbiao Wang, Hanyi Zhang, Chang Zeng, Jianwu Dang 0001, 
Cross-Modal Audio-Visual Co-Learning for Text-Independent Speaker Verification.
ICASSP2023
Haoyu Lu, Nan Li, Tongtong Song, Longbiao Wang, Jianwu Dang 0001, Xiaobao Wang, Shiliang Zhang, 
Speech and Noise Dual-Stream Spectrogram Refine Network With Speech Distortion Loss For Robust Speech Recognition.
ICASSP2023
Hao Shi, Masato Mimura, Longbiao Wang, Jianwu Dang 0001, Tatsuya Kawahara, 
Time-Domain Speech Enhancement Assisted by Multi-Resolution Frequency Encoder and Decoder.
ICASSP2023
Yao Sun, Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, 
Noise-Disentanglement Metric Learning for Robust Speaker Verification.
Interspeech2023
Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang 0001, Chengyun Deng, Fei Wang, 
Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation.
Interspeech2023
Junjie Li, Meng Ge, Zexu Pan, Rui Cao, Longbiao Wang, Jianwu Dang 0001, Shiliang Zhang, 
Rethinking the Visual Cues in Audio-Visual Speaker Extraction.
Interspeech2023
Yuhang Li, Xiao Wei, Yuke Si, Longbiao Wang, Xiaobao Wang, Jianwu Dang 0001, 
Improving Zero-shot Cross-domain Slot Filling via Transformer-based Slot Semantics Fusion.
Interspeech2023
Zhongjie Li, Gaoyan Zhang, Longbiao Wang, Jianwu Dang 0001, 
Discrimination of the Different Intents Carried by the Same Text Through Integrating Multimodal Information.
TASLP2024
Yuchen Hu, Chen Chen 0075, Qiushi Zhu, Eng Siong Chng, 
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR.
TASLP2024
Linhui Sun, Shuo Yuan, Aifei Gong, Lei Ye, Eng Siong Chng, 
Dual-Branch Modeling Based on State-Space Model for Speech Enhancement.
ICASSP2024
Weiguang Chen, Tran The Anh, Xionghu Zhong, Eng Siong Chng, 
Enhancing Low-Latency Speaker Diarization with Spatial Dictionary Learning.
ICASSP2024
Dianwen Ng, Chong Zhang 0003, Ruixi Zhang, Yukun Ma, Fabian Ritter Gutierrez, Trung Hieu Nguyen 0001, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma 0001, 
Are Soft Prompts Good Zero-Shot Learners for Speech Recognition?
ICASSP2024
Duc-Tuan Truong, Ruijie Tao, Jia Qi Yip, Kong Aik Lee, Eng Siong Chng, 
Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification.
ICASSP2024
Jia Qi Yip, Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang 0003, Hao Wang 0199, Trung Hieu Nguyen 0001, Kun Zhou 0003, Dianwen Ng, Eng Siong Chng, Bin Ma 0001, 
SPGM: Prioritizing Local Features for Enhanced Speech Separation Performance.
ICASSP2024
Zizheng Zhang, Chen Chen 0075, Hsin-Hung Chen, Xiang Liu, Yuchen Hu, Eng Siong Chng, 
Noise-Aware Speech Separation with Contrastive Learning.
ICASSP2024
Heqing Zou, Meng Shen 0002, Yuchen Hu, Chen Chen 0075, Eng Siong Chng, Deepu Rajan, 
Cross-Modality and Within-Modality Regularization for Audio-Visual Deepfake Detection.
ICLR2024
Chen Chen 0075, Ruizhe Li 0001, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Engsiong Chng, Chao-Han Huck Yang, 
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition.
ICLR2024
Yuchen Hu, Chen Chen 0075, Chao-Han Huck Yang, Ruizhe Li 0001, Chao Zhang 0031, Pin-Yu Chen, Engsiong Chng, 
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition.
ICASSP2023
Chen Chen 0075, Yuchen Hu, Weiwei Weng, Eng Siong Chng, 
Metric-Oriented Speech Enhancement Using Diffusion Probabilistic Model.
ICASSP2023
Chen Chen 0075, Yuchen Hu, Heqing Zou, Linhui Sun, Eng Siong Chng, 
Unsupervised Noise Adaptation Using Data Simulation.
ICASSP2023
Yuchen Hu, Chen Chen 0075, Ruizhe Li 0001, Qiushi Zhu, Eng Siong Chng, 
Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition.
ICASSP2023
Yuchen Hu, Chen Chen 0075, Heqing Zou, Xionghu Zhong, Eng Siong Chng, 
Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation.
ICASSP2023
Dianwen Ng, Ruixi Zhang, Jia Qi Yip, Zhao Yang, Jinjie Ni, Chong Zhang 0003, Yukun Ma, Chongjia Ni, Eng Siong Chng, Bin Ma 0001, 
De'hubert: Disentangling Noise in a Self-Supervised Model for Robust Speech Recognition.
ICASSP2023
Dianwen Ng, Ruixi Zhang, Jia Qi Yip, Chong Zhang 0003, Yukun Ma, Trung Hieu Nguyen 0001, Chongjia Ni, Eng Siong Chng, Bin Ma 0001, 
Contrastive Speech Mixup for Low-Resource Keyword Spotting.
ICASSP2023
Shangeth Rajaa, Kriti Anandan, Swaraj Dalmia, Tarun Gupta, Eng Siong Chng, 
Improving Spoken Language Identification with Map-Mix.
ICASSP2023
Alexey Sholokhov, Nikita Kuzmin, Kong Aik Lee, Eng Siong Chng, 
Probabilistic Back-ends for Online Speaker Recognition and Clustering.
ICASSP2023
Yuhang Yang, Haihua Xu, Hao Huang 0009, Eng Siong Chng, Sheng Li 0010, 
Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition.
Interspeech2023
Chen Chen 0075, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng Siong Chng, 
A Neural State-Space Modeling Approach to Efficient Speech Separation.
TASLP2024
Xuenan Xu, Zeyu Xie, Mengyue Wu, Kai Yu 0004, 
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning.
ICASSP2024
Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen 0001, Kai Yu 0004, 
VoiceFlow: Efficient Text-To-Speech with Rectified Flow Matching.
ICASSP2024
Junjie Li, Yiwei Guo, Xie Chen 0001, Kai Yu 0004, 
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention.
ICASSP2024
Tao Liu, Chenpeng Du, Shuai Fan 0005, Feilong Chen, Kai Yu 0004, 
DiffDub: Person-Generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-Encoder.
ICASSP2024
Sen Liu, Yiwei Guo, Xie Chen 0001, Kai Yu 0004, 
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations.
ICASSP2024
Feiyu Shen, Yiwei Guo, Chenpeng Du, Xie Chen 0001, Kai Yu 0004, 
Acoustic BPE for Speech Generation with Discrete Tokens.
ICASSP2024
Zeyu Xie, Baihan Li, Xuenan Xu, Mengyue Wu, Kai Yu 0004, 
Enhancing Audio Generation Diversity with Visual Information.
ICASSP2024
Xuenan Xu, Xiaohang Xu 0004, Zeyu Xie, Pingyue Zhang, Mengyue Wu, Kai Yu 0004, 
A Detailed Audio-Text Data Simulation Pipeline Using Single-Event Sounds.
ICASSP2024
Hongshen Xu, Ruisheng Cao, Su Zhu, Sheng Jiang, Hanchong Zhang, Lu Chen 0002, Kai Yu 0004, 
A Birgat Model for Multi-Intent Spoken Language Understanding with Hierarchical Semantic Frames.
ICASSP2024
Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu 0004, Daniel Povey, Xie Chen 0001, 
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.
AAAI2024
Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen 0001, Shuai Wang 0016, Hui Zhang, Kai Yu 0004, 
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding.
TASLP2023
Chenpeng Du, Yiwei Guo, Xie Chen 0001, Kai Yu 0004, 
Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature.
TASLP2023
Wenbin Jiang, Kai Yu 0004, 
Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking.
ICASSP2023
Chenpeng Du, Yiwei Guo, Feiyu Shen, Kai Yu 0004, 
Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge.
ICASSP2023
Yiwei Guo, Chenpeng Du, Xie Chen 0001, Kai Yu 0004, 
Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance.
ICASSP2023
Guangwei Li, Xuenan Xu, Lingfeng Dai, Mengyue Wu, Kai Yu 0004, 
Diverse and Vivid Sound Generation from Text Descriptions.
ICASSP2023
Tao Liu, Zhengyang Chen, Yanmin Qian, Kai Yu 0004, 
Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for the MISP 2022 Challenge.
ICASSP2023
Zhijun Liu, Yiwei Guo, Kai Yu 0004, 
DiffVoice: Text-to-Speech with Latent Diffusion.
Interspeech2023
Wenbin Jiang, Fei Wen, Yifan Zhang, Kai Yu 0004, 
UnSE: Unsupervised Speech Enhancement Using Optimal Transport.
Interspeech2023
Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu 0004, Xie Chen 0001, 
Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation.
TASLP2024
Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda, 
Pretraining and Adaptation Techniques for Electrolaryngeal Speech Recognition.
TASLP2024
Rui Wang, Li Li 0063, Tomoki Toda, 
Dual-Channel Target Speaker Extraction Based on Conditional Variational Autoencoder and Directional Information.
ICASSP2024
Jiajun He, Xiaohan Shi, Xingfeng Li 0001, Tomoki Toda, 
MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction.
ICASSP2024
Tatsuya Komatsu, Yusuke Fujita, Kazuya Takeda, Tomoki Toda, 
Audio Difference Learning for Audio Captioning.
ICASSP2024
Yamato Ohtani, Takuma Okamoto, Tomoki Toda, Hisashi Kawai, 
FIRNet: Fundamental Frequency Controllable Fast Neural Vocoder With Trainable Finite Impulse Response Filter.
ICASSP2024
Takuma Okamoto, Yamato Ohtani, Tomoki Toda, Hisashi Kawai, 
Convnext-TTS And Convnext-VC: Convnext-Based Fast End-To-End Sequence-To-Sequence Text-To-Speech And Voice Conversion.
ICASSP2024
Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda, 
Electrolaryngeal Speech Intelligibility Enhancement through Robust Linguistic Encoders.
TASLP2023
Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Hisashi Kawai, 
Harmonic-Net: Fundamental Frequency and Speech Rate Controllable Fast Neural Vocoder.
TASLP2023
Chao Xie, Tomoki Toda, 
Noisy-to-Noisy Voice Conversion Under Variations of Noisy Condition.
TASLP2023
Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda, 
High-Fidelity and Pitch-Controllable Neural Vocoder Based on Unified Source-Filter Networks.
ICASSP2023
Takuya Fujimura, Tomoki Toda, 
Analysis Of Noisy-Target Training For Dnn-Based Speech Enhancement.
ICASSP2023
Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda, 
Low-Latency Electrolaryngeal Speech Enhancement Based on Fastspeech2-Based Voice Conversion and Self-Supervised Speech Representation.
ICASSP2023
Atsushi Miyashita, Tomoki Toda, 
Representation of Vocal Tract Length Transformation Based on Group Theory.
ICASSP2023
Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda, 
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition.
ICASSP2023
Ryuichi Yamamoto, Reo Yoneyama, Tomoki Toda, 
NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit.
ICASSP2023
Yusuke Yasuda, Tomoki Toda, 
Text-To-Speech Synthesis Based on Latent Variable Conversion Using Diffusion Probabilistic Model and Variational Autoencoder.
ICASSP2023
Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda, 
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder.
Interspeech2023
Yeonjong Choi, Chao Xie, Tomoki Toda, 
Reverberation-Controllable Voice Conversion Using Reverberation Time Estimator.
Interspeech2023
Cheng-Hung Hu, Yusuke Yasuda, Tomoki Toda, 
Preference-based training framework for automatic speech quality assessment using deep neural network.
Interspeech2023
Takuma Okamoto, Tomoki Toda, Hisashi Kawai, 
E2E-S2S-VC: End-To-End Sequence-To-Sequence Voice Conversion.
TASLP2024
Cheng Gong, Xin Wang 0037, Erica Cooper, Dan Wells, Longbiao Wang, Jianwu Dang 0001, Korin Richmond, Junichi Yamagishi, 
ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations.
TASLP2024
Michele Panariello, Natalia A. Tomashenko, Xin Wang 0037, Xiaoxiao Miao, Pierre Champion, Hubert Nourtel, Massimiliano Todisco, Nicholas W. D. Evans, Emmanuel Vincent 0001, Junichi Yamagishi, 
The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation.
ICASSP2024
Xin Wang 0037, Junichi Yamagishi, 
Can Large-Scale Vocoded Spoofed Data Improve Speech Spoofing Countermeasure with a Self-Supervised Front End?
ICASSP2024
Wanying Ge, Xin Wang 0037, Junichi Yamagishi, Massimiliano Todisco, Nicholas W. D. Evans, 
Spoofing Attack Augmentation: Can Differently-Trained Attack Models Improve Generalisation?
ICASSP2024
Xiaoxiao Miao, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Nicholas W. D. Evans, Massimiliano Todisco, Jean-François Bonastre, Mickael Rouvier, 
Synvox2: Towards A Privacy-Friendly Voxceleb2 Dataset.
TASLP2023
Xuechen Liu, Xin Wang 0037, Md. Sahidullah, Jose Patino 0001, Héctor Delgado, Tomi Kinnunen, Massimiliano Todisco, Junichi Yamagishi, Nicholas W. D. Evans, Andreas Nautsch, Kong Aik Lee, 
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild.
TASLP2023
Xiaoxiao Miao, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Natalia A. Tomashenko, 
Speaker Anonymization Using Orthogonal Householder Neural Network.
TASLP2023
Lin Zhang, Xin Wang 0037, Erica Cooper, Nicholas W. D. Evans, Junichi Yamagishi, 
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance.
ICASSP2023
Haoyu Li, Yun Liu, Junichi Yamagishi, 
Joint Noise Reduction and Listening Enhancement for Full-End Speech Enhancement.
ICASSP2023
Paul-Gauthier Noé, Xiaoxiao Miao, Xin Wang 0037, Junichi Yamagishi, Jean-François Bonastre, Driss Matrouf, 
Hiding Speaker's Sex in Speech Using Zero-Evidence Speaker Representation in an Analysis/Synthesis Pipeline.
ICASSP2023
Xuan Shi, Erica Cooper, Xin Wang 0037, Junichi Yamagishi, Shrikanth Narayanan, 
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural Midi-to-Audio Synthesis Systems?
ICASSP2023
Xin Wang 0037, Junichi Yamagishi, 
Spoofed Training Data for Speech Spoofing Countermeasure Can Be Efficiently Created Using Neural Vocoders.
Interspeech2023
Erica Cooper, Junichi Yamagishi, 
Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech.
Interspeech2023
Hieu-Thi Luong, Junichi Yamagishi, 
Controlling Multi-Class Human Vocalization Generation via a Simple Segment-based Labeling Scheme.
Interspeech2023
Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang 0037, Xuechen Liu, Md. Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas W. D. Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung, 
Towards Single Integrated Spoofing-aware Speaker Verification Embeddings.
Interspeech2023
Chang Zeng, Xin Wang 0037, Xiaoxiao Miao, Erica Cooper, Junichi Yamagishi, 
Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms.
Interspeech2023
Lin Zhang, Xin Wang 0037, Erica Cooper, Nicholas W. D. Evans, Junichi Yamagishi, 
Range-Based Equal Error Rate for Spoof Localization.
TASLP2022
Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi, 
Optimizing Tandem Speaker Verification and Anti-Spoofing Systems.
TASLP2022
Xuan Shi, Erica Cooper, Junichi Yamagishi, 
Use of Speaker Recognition Approaches for Learning and Evaluating Embedding Representations of Musical Instrument Sounds.
TASLP2022
Brij Mohan Lal Srivastava, Mohamed Maouche, Md. Sahidullah, Emmanuel Vincent 0001, Aurélien Bellet, Marc Tommasi, Natalia A. Tomashenko, Xin Wang 0037, Junichi Yamagishi, 
Privacy and Utility of X-Vector Based Speaker Anonymization.
TASLP2024
Tsubasa Ochiai, Kazuma Iwamoto, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance.
ICASSP2024
Takanori Ashihara, Marc Delcroix, Takafumi Moriya, Kohei Matsuura, Taichi Asami, Yusuke Ijima, 
What Do Self-Supervised Speech and Speaker Models Learn? New Findings from a Cross Model Layer-Wise Analysis.
ICASSP2024
William Chen, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe 0001, 
Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing.
ICASSP2024
Kenichi Fujita, Hiroshi Sato, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix, Takafumi Moriya, Yusuke Ijima, 
Noise-Robust Zero-Shot Text-to-Speech Synthesis Conditioned on Self-Supervised Speech-Representation Model with Adapters.
ICASSP2024
Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
How Does End-To-End Speech Recognition Training Impact Speech Enhancement Artifacts?
ICASSP2024
Dominik Klement, Mireia Díez, Federico Landini, Lukás Burget, Anna Silnova, Marc Delcroix, Naohiro Tawara, 
Discriminative Training of VBx Diarization.
ICASSP2024
Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Shoko Araki, Jan Cernocký, 
Target Speech Extraction with Pre-Trained Self-Supervised Learning Models.
TASLP2023
Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Yasunori Ohishi, Shoko Araki, 
SoundBeam: Target Sound Extraction Conditioned on Sound-Class Labels and Enrollment Clues for Increased Performance and Continuous Learning.
TASLP2023
Thilo von Neumann, Keisuke Kinoshita, Christoph Böddeker, Marc Delcroix, Reinhold Haeb-Umbach, 
Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation Criteria.
ICASSP2023
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan S. Sharma, Kohei Matsuura, Shinji Watanabe 0001, 
Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders.
ICASSP2023
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix, Ryo Masumura, 
Leveraging Large Text Corpora For End-To-End Speech Summarization.
ICASSP2023
Thilo von Neumann, Christoph Böddeker, Keisuke Kinoshita, Marc Delcroix, Reinhold Haeb-Umbach, 
On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems.
ICASSP2023
Atsunori Ogawa, Takafumi Moriya, Naoyuki Kamo, Naohiro Tawara, Marc Delcroix, 
Iterative Shallow Fusion of Backward Language Model for End-To-End Speech Recognition.
Interspeech2023
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix, Yukinori Honma, 
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Interspeech2023
Marc Delcroix, Naohiro Tawara, Mireia Díez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukás Burget, Shoko Araki, 
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization.
Interspeech2023
Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani, 
Target Speech Extraction with Conditional Diffusion Model.
Interspeech2023
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, 
Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization.
Interspeech2023
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami, 
Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data.
Interspeech2023
Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo, 
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss.
ICASSP2022
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe 0001, 
Integrating Multiple ASR Systems into NLP Backend with Attention Fusion.
ICASSP2024
Chowdam Venkata Thirumala Kumar, Tanuka Bhattacharjee, Seena Vengalil, Saraswati Nashi, Madassu Keerthipriya, Yamini Belur, Atchayaram Nalini, Prasanta Kumar Ghosh, 
Spectral Analysis of Vowels and Fricatives at Varied Levels of Dysarthria Severity for Amyotrophic Lateral Sclerosis.
ICASSP2023
Tanuka Bhattacharjee, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Prasanta Kumar Ghosh, 
Exploring the Role of Fricatives in Classifying Healthy Subjects and Patients with Amyotrophic Lateral Sclerosis and Parkinson's Disease.
ICASSP2023
Tanuka Bhattacharjee, Chowdam Venkata Thirumala Kumar, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Prasanta Kumar Ghosh, 
Static and Dynamic Source and Filter Cues for Classification of Amyotrophic Lateral Sclerosis Patients and Healthy Subjects.
ICASSP2023
Abhayjeet Singh, Amala Nagireddi, Deekshitha G, Jesuraja Bandekar, Roopa R., Sandhya Badiger, Sathvik Udupa, Prasanta Kumar Ghosh, Hema A. Murthy, Heiga Zen, Pranaw Kumar, Kamal Kant, Amol Bole, Bira Chandra Singh, Keiichi Tokuda, Mark Hasegawa-Johnson, Philipp Olbrich, 
Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech.
ICASSP2023
Sathvik Udupa, Prasanta Kumar Ghosh, 
Real-Time MRI Video Synthesis from Time Aligned Phonemes with Sequence-to-Sequence Networks.
ICASSP2023
Sathvik Udupa, C. Siddarth, Prasanta Kumar Ghosh, 
Improved Acoustic-to-Articulatory Inversion Using Representations from Pretrained Self-Supervised Learning Models.
Interspeech2023
Jesuraja Bandekar, Sathvik Udupa, Prasanta Kumar Ghosh, 
Exploring a classification approach using quantised articulatory movements for acoustic to articulatory inversion.
Interspeech2023
Varun Belagali, M. V. Achuth Rao, Prasanta Kumar Ghosh, 
Weakly supervised glottis segmentation in high-speed videoendoscopy using bounding box labels.
Interspeech2023
Tanuka Bhattacharjee, Anjali Jayakumar, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Prasanta Kumar Ghosh, 
Transfer Learning to Aid Dysarthria Severity Classification for Patients with Amyotrophic Lateral Sclerosis.
Interspeech2023
Siddarth Chandrasekar, Arvind Ramesh, Tilak Purohit, Prasanta Kumar Ghosh, 
A Study on the Importance of Formant Transitions for Stop-Consonant Classification in VCV Sequence.
Interspeech2023
Shelly Jain, Priyanshi Pal, Anil Kumar Vuppala, Prasanta Kumar Ghosh, Chiranjeevi Yarra, 
An Investigation of Indian Native Language Phonemic Influences on L2 English Pronunciations.
Interspeech2023
Chowdam Venkata Thirumala Kumar, Tanuka Bhattacharjee, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Prasanta Kumar Ghosh, 
Classification of Multi-class Vowels and Fricatives From Patients Having Amyotrophic Lateral Sclerosis with Varied Levels of Dysarthria Severity.
Interspeech2023
Mohammad Shaique Solanki, Ashutosh Bharadwaj, Jeevan Kylash, Prasanta Kumar Ghosh, 
Do Vocal Breath Sounds Encode Gender Cues for Automatic Gender Classification?
SpeechComm2022
Chiranjeevi Yarra, Prasanta Kumar Ghosh, 
Automatic syllable stress detection under non-parallel label and data condition.
ICASSP2022
Aravind Illa, Aanish Nair, Prasanta Kumar Ghosh, 
The impact of cross language on acoustic-to-articulatory inversion and its influence on articulatory speech synthesis.
ICASSP2022
Abinay Reddy Naini, Bhavuk Singhal, Prasanta Kumar Ghosh, 
Dual Attention Pooling Network for Recording Device Classification Using Neutral and Whispered Speech.
ICASSP2022
Anwesha Roy, Varun Belagali, Prasanta Kumar Ghosh, 
An Error Correction Scheme for Improved Air-Tissue Boundary in Real-Time MRI Video for Speech Production.
Interspeech2022
Anish Bhanushali, Grant Bridgman, Deekshitha G, Prasanta Kumar Ghosh, Pratik Kumar, Saurabh Kumar, Adithya Raj Kolladath, Nithya Ravi, Aaditeshwar Seth, Ashish Seth, Abhayjeet Singh, Vrunda N. Sukhadia, Srinivasan Umesh, Sathvik Udupa, Lodagala V. S. V. Durga Prasad, 
Gram Vaani ASR Challenge on spontaneous telephone speech recordings in regional variations of Hindi.
Interspeech2022
Anwesha Roy, Varun Belagali, Prasanta Kumar Ghosh, 
Air tissue boundary segmentation using regional loss in real-time Magnetic Resonance Imaging video for speech production.
Interspeech2022
C. Siddarth, Sathvik Udupa, Prasanta Kumar Ghosh, 
Watch Me Speak: 2D Visualization of Human Mouth during Speech.
TASLP2024
Hang Chen, Qing Wang 0008, Jun Du, Bao-Cai Yin, Jia Pan, Chin-Hui Lee 0001, 
Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition.
TASLP2024
Zilu Guo, Qing Wang 0008, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui Lee 0001, 
A Variance-Preserving Interpolation Approach for Diffusion Models With Applications to Single Channel Speech Enhancement and Recognition.
ICASSP2024
Feng Ma, Yanhui Tu, Maokui He, Ruoyu Wang 0029, Shutong Niu, Lei Sun 0010, Zhongfu Ye, Jun Du, Jia Pan, Chin-Hui Lee 0001, 
A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition.
ICASSP2024
Haotian Wang, Jun Du, Yusheng Dai, Chin-Hui Lee 0001, Yuling Ren, Yu Liu, 
Improving Multi-Modal Emotion Recognition Using Entropy-Based Fusion and Pruning-Based Network Architecture Optimization.
ICASSP2024
Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang 0029, Hongbo Lan, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao, 
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
ICASSP2024
Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang 0029, Yanyan Yue, Shuangqing Qian, Shilong Wu, Jun Du, Chin-Hui Lee 0001, 
Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture.
ICASSP2024
Hao Yen, Sabato Marco Siniscalchi, Chin-Hui Lee 0001, 
Boosting End-to-End Multilingual Phoneme Recognition Through Exploiting Universal Speech Attributes Constraints.
SpeechComm2023
Shi Cheng, Jun Du, Shutong Niu, Alejandrina Cristià, Xin Wang 0037, Qing Wang 0008, Chin-Hui Lee 0001, 
Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions.
SpeechComm2023
Li Chai 0002, Hang Chen, Jun Du, Qing-Feng Liu, Chin-Hui Lee 0001, 
Space-and-speaker-aware acoustic modeling with effective data augmentation for recognition of multi-array conversational speech.
TASLP2023
Mao-Kui He, Jun Du, Qing-Feng Liu, Chin-Hui Lee 0001, 
ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding.
TASLP2023
Shutong Niu, Jun Du, Lei Sun 0010, Yu Hu 0003, Chin-Hui Lee 0001, 
QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization.
ICASSP2023
Hang Chen, Shilong Wu, Yusheng Dai, Zhe Wang, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge.
ICASSP2023
Shutong Niu, Jun Du, Qing Wang 0008, Li Chai 0002, Huaxin Wu, Zhaoxu Nian, Lei Sun 0010, Yi Fang, Jia Pan, Chin-Hui Lee 0001, 
An Experimental Study on Sound Event Localization and Detection Under Realistic Testing Conditions.
ICASSP2023
Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.
ICASSP2023
Chao-Han Huck Yang, Bo Li 0028, Yu Zhang 0033, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee 0001, 
A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition.
ICASSP2023
Chenyue Zhang, Hang Chen, Jun Du, Bao-Cai Yin, Jia Pan, Chin-Hui Lee 0001, 
Incorporating Visual Information Reconstruction into Progressive Learning for Optimizing audio-visual Speech Enhancement.
Interspeech2023
Zilu Guo, Jun Du, Chin-Hui Lee 0001, Yu Gao, Wenbin Zhang, 
Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement.
Interspeech2023
Pin-Jui Ku, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee 0001, 
A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models.
Interspeech2023
Shutong Niu, Jun Du, Maokui He, Chin-Hui Lee 0001, Baoxiang Li, Jiakui Li, 
Unsupervised Adaptation with Quality-Aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization.
Interspeech2023
Haotian Wang, Jun Du, Hengshun Zhou, Chin-Hui Lee 0001, Yuling Ren, Jiangjiang Zhao, 
A Multiple-Teacher Pruning Based Self-Distillation (MT-PSD) Approach to Model Compression for Audio-Visual Wake Word Spotting.
TASLP2024
Vinay Kothapally, John H. L. Hansen, 
Monaural Speech Dereverberation Using Deformable Convolutional Networks.
TASLP2024
Nursadul Mamun, John H. L. Hansen, 
Speech Enhancement for Cochlear Implant Recipients Using Deep Complex Convolution Transformer With Frequency Transformation.
ICASSP2024
John H. L. Hansen, Aditya Joglekar, Meena M. Chandra Shekar, Szu-Jui Chen, Xi Liu, 
Fearless Steps Apollo: Team Communications Based Community Resource Development for Science, Technology, Education, and Historical Preservation.
ICASSP2024
Taylor Lawson, John H. L. Hansen, 
Situational Signal Processing with Ecological Momentary Assessment: Leveraging Environmental Context for Cochlear Implant Users.
ICASSP2024
Xi Liu, Szu-Jui Chen, John H. L. Hansen, 
Dual-Path Minimum-Phase and All-Pass Decomposition Network for Single Channel Speech Dereverberation.
ICASSP2024
Mufan Sang, John H. L. Hansen, 
Efficient Adapter Tuning of Pre-Trained Speech Models for Automatic Speaker Verification.
ICASSP2024
Meena M. Chandra Shekar, John H. L. Hansen, 
Apollo's Unheard Voices: Graph Attention Networks for Speaker Diarization and Clustering for Fearless Steps Apollo Collection.
SpeechComm2023
Midia Yousefi, John H. L. Hansen, 
Single-channel speech separation using soft-minimum permutation invariant training.
TASLP2023
Shahram Ghorbani, John H. L. Hansen, 
Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech.
TASLP2023
Wei Xia, John H. L. Hansen, 
Attention and DCT Based Global Context Modeling for Text-Independent Speaker Recognition.
ICASSP2023
Iván López-Espejo, Ram C. M. C. Shekar, Zheng-Hua Tan, Jesper Jensen 0001, John H. L. Hansen, 
Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting.
ICASSP2023
Mufan Sang, Yong Zhao 0008, Gang Liu 0001, John H. L. Hansen, Jian Wu 0027, 
Improving Transformer-Based Networks with Locality for Automatic Speaker Verification.
Interspeech2023
Nursadul Mamun, John H. L. Hansen, 
CFTNet: Complex-valued Frequency Transformation Network for Speech Enhancement.
Interspeech2023
Meena M. Chandra Shekar, John H. L. Hansen, 
Speaker Tracking using Graph Attention Networks with Varying Duration Utterances across Multi-Channel Naturalistic Data: Fearless Steps Apollo-11 Audio Corpus.
Interspeech2023
Ram C. M. C. Shekar, Mu Yang, Kevin Hirschi, Stephen D. Looney, Okim Kang, John H. L. Hansen, 
Assessment of Non-Native Speech Intelligibility using Wav2vec2-based Mispronunciation Detection and Multi-level Goodness of Pronunciation Transformer.
Interspeech2023
Jiamin Xie, John H. L. Hansen, 
MixRep: Hidden Representation Mixup for Low-Resource Speech Recognition.
Interspeech2023
Mu Yang, Ram C. M. C. Shekar, Okim Kang, John H. L. Hansen, 
What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model.
SpeechComm2022
Rasa Lileikyte, Dwight Irvin, John H. L. Hansen, 
Assessing child communication engagement and statistical speech patterns for American English via speech recognition in naturalistic active learning spaces.
TASLP2022
Vinay Kothapally, John H. L. Hansen, 
SkipConvGAN: Monaural Speech Dereverberation Using Generative Adversarial Networks via Complex Time-Frequency Masking.
TASLP2022
Zhenyu Wang, John H. L. Hansen, 
Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition.
TASLP2024
Syu-Siang Wang, Jia-Yang Chen, Bo-Ren Bai, Shih-Hau Fang, Yu Tsao 0001, 
Unsupervised Face-Masked Speech Enhancement Using Generative Adversarial Networks With Human-in-the-Loop Assessment Metrics.
ICASSP2024
Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-Based ASR.
ICASSP2024
Yuan Tseng, Layne Berry, Yiting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Poyao Huang 0001, Chun-Mao Lai, Shang-Wen Li 0001, David Harwath, Yu Tsao 0001, Abdelrahman Mohamed, Chi-Luen Feng, Hung-Yi Lee, 
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.
ICASSP2024
Haibin Wu, Heng-Cheng Kuo, Yu Tsao 0001, Hung-Yi Lee, 
Scalable Ensemble-Based Detection Method Against Adversarial Attacks For Speaker Verification.
ICASSP2024
Ryandhimas E. Zezario, Bo-Ren Brian Bai, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001, 
Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model.
ICLR2024
Szu-Wei Fu, Kuo-Hsuan Hung, Yu Tsao 0001, Yu-Chiang Frank Wang, 
Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech.
TASLP2023
Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, Ching-Feng Liu, Jeih-weih Hung, Shinji Watanabe 0001, Yu Tsao 0001, 
Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information.
TASLP2023
Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen 0011, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001, 
Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features.
ICASSP2023
Chan-Jan Hsu, Ho-Lam Chung, Hung-Yi Lee, Yu Tsao 0001, 
T5lephone: Bridging Speech and Text Self-Supervised Models for Spoken Language Understanding Via Phoneme Level T5.
ICASSP2023
Hsin-Yi Lin, Huan-Hsin Tseng, Yu Tsao 0001, 
On the Robustness of Non-Intrusive Speech Quality Model by Adversarial Examples.
Interspeech2023
Hsin-Hao Chen 0006, Yung-Lun Chien, Ming-Chi Yen, Shu-Wei Tsai, Tai-Shih Chi, Hsin-Min Wang, Yu Tsao 0001, 
Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features.
Interspeech2023
Li-Wei Chen, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao 0001, Hsin-Min Wang, 
A Training and Inference Strategy Using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech.
Interspeech2023
Yung-Lun Chien, Hsin-Hao Chen 0006, Ming-Chi Yen, Shu-Wei Tsai, Hsin-Min Wang, Yu Tsao 0001, Tai-Shih Chi, 
Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion.
Interspeech2023
Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao 0001, 
Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition.
ICLR2023
Chi-Chang Lee, Yu Tsao 0001, Hsin-Min Wang, Chu-Song Chen, 
D4AM: A General Denoising Framework for Downstream Acoustic Models.
TASLP2022
Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao 0001, 
Improved Lite Audio-Visual Speech Enhancement.
TASLP2022
Yu-Chen Lin, Cheng Yu, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao 0001, Tei-Wei Kuo, 
SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points.
ICASSP2022
Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao 0001, 
MetricGAN-U: Unsupervised Speech Enhancement/ Dereverberation Based Only on Noisy/ Reverberated Speech.
ICASSP2022
Yu-Chen Lin, Tsun-An Hsieh, Kuo-Hsuan Hung, Cheng Yu, Harinath Garudadri, Yu Tsao 0001, Tei-Wei Kuo, 
Speech Recovery For Real-World Self-Powered Intermittent Devices.
ICASSP2022
Guan-Ting Lin, Chan-Jan Hsu, Da-Rong Liu, Hung-Yi Lee, Yu Tsao 0001, 
Analyzing The Robustness of Unsupervised Speech Recognition.
TASLP2024
Chenfeng Miao, Qingying Zhu, Minchuan Chen, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion.
ICASSP2024
Yimin Deng, Huaizhen Tang, Xulong Zhang 0001, Ning Cheng 0001, Jing Xiao 0006, Jianzong Wang, 
Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval.
ICASSP2024
Haobin Tang, Xulong Zhang 0001, Ning Cheng 0001, Jing Xiao 0006, Jianzong Wang, 
ED-TTS: Multi-Scale Emotion Modeling Using Cross-Domain Emotion Diarization for Emotional Speech Synthesis.
ICASSP2024
Zeyu Yang, Minchuan Chen, Yanping Li, Wei Hu, Shaojun Wang, Jing Xiao 0006, Zijian Li, 
ESVC: Combining Adaptive Style Fusion and Multi-Level Feature Disentanglement for Expressive Singing Voice Conversion.
ICASSP2024
Yong Zhang, Hanzhang Li, Zhitao Li, Ning Cheng 0001, Ming Li, Jing Xiao 0006, Jianzong Wang, 
Leveraging Biases in Large Language Models: "bias-kNN" for Effective Few-Shot Learning.
ICASSP2024
Ziyang Zhuang, Kun Zou, Chenfeng Miao, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, Jing Xiao 0006, 
Improving Attention-Based End-to-End Speech Recognition by Monotonic Alignment Attention Matrix Reconstruction.
ICML2024
Chenfeng Miao, Qingying Zhu, Minchuan Chen, Wei Hu, Zijian Li, Shaojun Wang, Jing Xiao 0006, 
DFlow: A Generative Model Combining Denoising AutoEncoder and Normalizing Flow for High Fidelity Waveform Generation.
ICASSP2023
Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Xiaoyang Qu, Jing Xiao 0006, 
Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification.
ICASSP2023
Ganghui Ru, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Improving Music Genre Classification from multi-modal Properties of Music and Genre Correlations Perspective.
ICASSP2023
Huaizhen Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Learning Speech Representations with Flexible Hidden Feature Dimensions.
ICASSP2023
Huaizhen Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
VQ-CL: Learning Disentangled Speech Representations with Contrastive Learning and Vector Quantization.
ICASSP2023
Haobin Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis.
ICASSP2023
Xulong Zhang 0001, Haobin Tang, Jianzong Wang, Ning Cheng 0001, Jian Luo, Jing Xiao 0006, 
Dynamic Alignment Mask CTC: Improved Mask CTC With Aligned Cross Entropy.
ICASSP2023
Kexin Zhu, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Improving EEG-based Emotion Recognition by Fusing Time-Frequency and Spatial Representations.
Interspeech2023
Minchuan Chen, Chenfeng Miao, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
Exploring multi-task learning and data augmentation in dementia detection with self-supervised pretrained models.
Interspeech2023
Jiaxin Fan, Yong Zhang, Hanzhang Li, Jianzong Wang, Zhitao Li, Sheng Ouyang, Ning Cheng 0001, Jing Xiao 0006, 
Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism.
Interspeech2023
Zuheng Kang, Jianzong Wang, Junqing Peng, Jing Xiao 0006, 
SVVAD: Personal Voice Activity Detection for Speaker Verification.
Interspeech2023
Yifu Sun, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Kaiyu Hu, Jing Xiao 0006, 
Investigation of Music Emotion Recognition Based on Segmented Semi-Supervised Learning.
Interspeech2023
Fengyun Tan, Chaofeng Feng, Tao Wei, Shuai Gong, Jinqiang Leng, Wei Chu, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
Improving End-to-End Modeling For Mandarin-English Code-Switching Using Lightweight Switch-Routing Mixture-of-Experts.
Interspeech2023
Haobin Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis.
ICASSP2024
Tiantian Feng, Rajat Hebbar, Shrikanth Narayanan, 
TRUST-SER: On The Trustworthiness Of Fine-Tuning Pre-Trained Speech Embeddings For Speech Emotion Recognition.
ICASSP2024
Tiantian Feng, Shrikanth Narayanan, 
Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting.
ICASSP2024
Yoonsoo Nam, Adam Lehavi, Daniel Yang, Digbalay Bose, Swabha Swayamdipta, Shrikanth Narayanan, 
Does Video Summarization Require Videos? Quantifying the Effectiveness of Language in Video Summarization.
ICASSP2024
Shanti Stewart, Kleanthis Avramidis, Tiantian Feng, Shrikanth Narayanan, 
Emotion-Aligned Contrastive Learning Between Images and Music.
ICASSP2024
Anfeng Xu, Kevin Huang, Tiantian Feng, Helen Tager-Flusberg, Shrikanth Narayanan, 
Audio-Visual Child-Adult Speaker Classification in Dyadic Interactions.
ICASSP2023
Nikolaos Antoniou, Athanasios Katsamanis, Theodoros Giannakopoulos, Shrikanth Narayanan, 
Designing and Evaluating Speech Emotion Recognition Systems: A Reality Check Case Study with IEMOCAP.
ICASSP2023
Digbalay Bose, Rajat Hebbar, Krishna Somandepalli, Shrikanth Narayanan, 
Contextually-Rich Human Affect Perception Using Multimodal Scene Information.
ICASSP2023
Georgios Chochlakis, Gireesh Mahajan, Sabyasachee Baruah, Keith Burghardt, Kristina Lerman, Shrikanth Narayanan, 
Using Emotion Embeddings to Transfer Knowledge between Emotions, Languages, and Annotation Formats.
ICASSP2023
Rajat Hebbar, Digbalay Bose, Krishna Somandepalli, Veena Vijai, Shrikanth Narayanan, 
A Dataset for Audio-Visual Sound Event Detection in Movies.
ICASSP2023
Xuan Shi, Erica Cooper, Xin Wang 0037, Junichi Yamagishi, Shrikanth Narayanan, 
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural Midi-to-Audio Synthesis Systems?
ICASSP2023
Tuo Zhang, Tiantian Feng, Samiul Alam, Sunwoo Lee, Mi Zhang 0002, Shrikanth S. Narayanan, Salman Avestimehr, 
FedAudio: A Federated Learning Benchmark for Audio Tasks.
Interspeech2023
Reed Blaylock, Shrikanth Narayanan, 
Beatboxing Kick Drum Kinematics.
Interspeech2023
Rimita Lahiri, Tiantian Feng, Rajat Hebbar, Catherine Lord, So Hyun Kim, Shrikanth Narayanan, 
Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism.
Interspeech2023
Thomas Melistas, Lefteris Kapelonis, Nikolaos Antoniou, Petros Mitseas, Dimitris Sgouropoulos, Theodoros Giannakopoulos, Athanasios Katsamanis, Shrikanth Narayanan, 
Cross-Lingual Features for Alzheimer's Dementia Detection from Speech.
Interspeech2023
Shrikanth Narayanan, 
Bridging Speech Science and Technology - Now and Into the Future.
Interspeech2023
Anfeng Xu, Rajat Hebbar, Rimita Lahiri, Tiantian Feng, Lindsay Butler, Lue Shen, Helen Tager-Flusberg, Shrikanth Narayanan, 
Understanding Spoken Language Development of Children with ASD Using Pre-trained Speech Embeddings.
ICASSP2022
Tiantian Feng, Hanieh Hashemi, Murali Annavaram, Shrikanth S. Narayanan, 
Enhancing Privacy Through Domain Adaptive Noise Injection For Speech Emotion Recognition.
Interspeech2022
Tiantian Feng, Shrikanth Narayanan, 
Semi-FedSER: Semi-supervised Learning for Speech Emotion Recognition On Federated Learning using Multiview Pseudo-Labeling.
Interspeech2022
Tiantian Feng, Raghuveer Peri, Shrikanth Narayanan, 
User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition on Federated Learning.
Interspeech2022
Nikolaos Flemotomos, Shrikanth Narayanan, 
Multimodal Clustering with Role Induced Constraints for Speaker Diarization.
ICASSP2024
Yu Gu, Qiushi Zhu, Guangzhi Lei, Chao Weng, Dan Su 0002, 
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis.
ICML2024
Manjie Xu, Chenxing Li, Duzhen Zhang, Dan Su 0002, Wei Liang, Dong Yu 0001, 
Prompt-guided Precise Audio Editing with Diffusion Models.
ACL2024
Yongxin Zhu 0003, Dan Su 0002, Liqiang He, Linli Xu, Dong Yu 0001, 
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer.
Interspeech2023
Wei Xiao, Wenzhe Liu, Meng Wang, Shan Yang, Yupeng Shi, Yuyong Kang, Dan Su 0002, Shidong Shang, Dong Yu 0001, 
Multi-mode Neural Speech Coding Based on Deep Generative Networks.
Interspeech2023
Yuping Yuan, Zhao You, Shulin Feng, Dan Su 0002, Yanchun Liang 0001, Xiaohu Shi, Dong Yu 0001, 
Compressed MoE ASR Model Based on Knowledge Distillation and Quantization.
Interspeech2023
Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu 0001, Zhao You, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation.
AAAI2023
Yi Lei, Shan Yang, Xinsheng Wang, Qicong Xie, Jixun Yao, Lei Xie 0001, Dan Su 0002, 
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis.
ICASSP2022
Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu 0001, Helen Meng, Chao Weng, Dan Su 0002, 
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.
ICASSP2022
Songxiang Liu, Shan Yang, Dan Su 0002, Dong Yu 0001, 
Referee: Towards Reference-Free Cross-Speaker Style Transfer with Low-Quality Data for Expressive Speech Synthesis.
ICASSP2022
Dongpeng Ma, Yiwen Wang, Liqiang He, Mingjie Jin, Dan Su 0002, Dong Yu 0001, 
DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition.
ICASSP2022
Xiaoyi Qin, Na Li 0012, Chao Weng, Dan Su 0002, Ming Li 0026, 
Simple Attention Module Based Speaker Verification with Iterative Noisy Label Detection.
ICASSP2022
Disong Wang, Shan Yang, Dan Su 0002, Xunying Liu, Dong Yu 0001, Helen Meng, 
VCVTS: Multi-Speaker Video-to-Speech Synthesis Via Cross-Modal Knowledge Transfer from Voice Conversion.
ICASSP2022
Zhao You, Shulin Feng, Dan Su 0002, Dong Yu 0001, 
Speechmoe2: Mixture-of-Experts Model with Improved Routing.
ICASSP2022
Naijun Zheng, Na Li 0012, Xixin Wu, Lingwei Meng, Jiawen Kang 0002, Haibin Wu, Chao Weng, Dan Su 0002, Helen Meng, 
The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
ICASSP2022
Naijun Zheng, Na Li 0012, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
Multi-Channel Speaker Diarization Using Spatial Features for Meetings.
Interspeech2022
Yi Lei, Shan Yang, Jian Cong, Lei Xie 0001, Dan Su 0002, 
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion.
Interspeech2022
Xiaoyi Qin, Na Li 0012, Chao Weng, Dan Su 0002, Ming Li 0026, 
Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings.
Interspeech2022
Liumeng Xue, Shan Yang, Na Hu, Dan Su 0002, Lei Xie 0001, 
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers.
Interspeech2022
Yixuan Zhou 0002, Changhe Song, Jingbei Li, Zhiyong Wu 0001, Yanyao Bian, Dan Su 0002, Helen Meng, 
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis.
Interspeech2022
Yixuan Zhou 0002, Changhe Song, Xiang Li 0105, Luwen Zhang, Zhiyong Wu 0001, Yanyao Bian, Dan Su 0002, Helen Meng, 
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis.
ICASSP2024
SooHwan Eom, Eunseop Yoon, Hee Suk Yoon, Chanwoo Kim 0001, Mark Hasegawa-Johnson, Chang D. Yoo, 
AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition.
ICASSP2024
Heting Gao, Mark Hasegawa-Johnson, Chang D. Yoo, 
G2PU: Grapheme-To-Phoneme Transducer with Speech Units.
ICASSP2024
Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo, 
Unsupervised Speech Recognition with N-skipgram and Positional Unigram Matching.
ICML2024
Heting Gao, Kaizhi Qian, Junrui Ni, Chuang Gan, Mark A. Hasegawa-Johnson, Shiyu Chang, Yang Zhang 0001, 
Speech Self-Supervised Learning Using Diffusion Model Synthetic Data.
ICASSP2023
Abhayjeet Singh, Amala Nagireddi, Deekshitha G, Jesuraja Bandekar, Roopa R., Sandhya Badiger, Sathvik Udupa, Prasanta Kumar Ghosh, Hema A. Murthy, Heiga Zen, Pranaw Kumar, Kamal Kant, Amol Bole, Bira Chandra Singh, Keiichi Tokuda, Mark Hasegawa-Johnson, Philipp Olbrich, 
Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech.
ICASSP2023
Zhongweiyang Xu, Xulin Fan, Mark Hasegawa-Johnson, 
Dual-Path Cross-Modal Attention for Better Audio-Visual Speech Extraction.
Interspeech2023
Wonjune Kang, Mark Hasegawa-Johnson, Deb Roy, 
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions.
Interspeech2023
Jialu Li 0002, Mark Hasegawa-Johnson, Nancy L. McElwain, 
Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio.
Interspeech2023
Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John B. Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim 0001, Chang D. Yoo, 
Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction.
Interspeech2023
Wanyue Zhai, Mark Hasegawa-Johnson, 
Wav2ToBI: a new approach to automatic ToBI transcription.
ACL2023
Liming Wang, Mark Hasegawa-Johnson, Chang Dong Yoo, 
A Theory of Unsupervised Speech Recognition.
ACL-Findings2023
Liming Wang, Junrui Ni, Heting Gao, Jialu Li 0002, Kai Chieh Chang, Xulin Fan, Junkai Wu, Mark Hasegawa-Johnson, Chang Dong Yoo, 
Listen, Decipher and Sign: Toward Unsupervised Speech-to-Sign Language Recognition.
ACL-Findings2023
Eunseop Yoon, Hee Suk Yoon, John B. Harvill, Mark Hasegawa-Johnson, Chang Dong Yoo, 
INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition.
SpeechComm2022
Heting Gao, Xiaoxuan Wang, Sunghun Kang, Rusty Mina, Dias Issa, John B. Harvill, Leda Sari, Mark Hasegawa-Johnson, Chang D. Yoo, 
Seamless equal accuracy ratio for inclusive CTC speech recognition.
TASLP2022
Jialu Li 0002, Mark Hasegawa-Johnson, 
Autosegmental Neural Nets 2.0: An Extensive Study of Training Synchronous and Asynchronous Phones and Tones for Under-Resourced Tonal Languages.
ICASSP2022
Chak Ho Chan, Kaizhi Qian, Yang Zhang 0001, Mark Hasegawa-Johnson, 
SpeechSplit2.0: Unsupervised Speech Disentanglement for Voice Conversion without Tuning Autoencoder Bottlenecks.
ICASSP2022
John B. Harvill, Yash R. Wani, Moitreya Chatterjee, Mustafa Alam, David G. Beiser, David Chestek, Mark Hasegawa-Johnson, Narendra Ahuja, 
Detection of Covid-19 from Joint Time and Frequency Analysis of Speech, Breathing and Cough Audio.
Interspeech2022
Heting Gao, Junrui Ni, Kaizhi Qian, Yang Zhang 0001, Shiyu Chang, Mark Hasegawa-Johnson, 
WavPrompt: Towards Few-Shot Spoken Language Understanding with Frozen Language Models.
Interspeech2022
John B. Harvill, Mark Hasegawa-Johnson, Chang D. Yoo, 
Frame-Level Stutter Detection.
Interspeech2022
Mahir Morshed, Mark Hasegawa-Johnson, 
Cross-lingual articulatory feature information transfer for speech recognition using recurrent progressive neural networks.
ICASSP2024
Alexander H. Liu, Sung-Lin Yeh, James R. Glass, 
Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective.
NAACL2024
Heng-Jui Chang, James R. Glass, 
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces.
ICASSP2023
Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas 0001, Rogério Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James R. Glass, 
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval.
Interspeech2023
Yuan Gong 0001, Sameer Khurana, Leonid Karlinsky, James R. Glass, 
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers.
Interspeech2023
Heng-Jui Chang, Alexander H. Liu, James R. Glass, 
Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering.
Interspeech2023
Andrew Rouditchenko, Sameer Khurana, Samuel Thomas 0001, Rogério Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James R. Glass, 
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages.
NeurIPS2023
Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass, 
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning.
ICLR2023
Yuan Gong 0001, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James R. Glass, 
Contrastive Audio-Visual Masked Autoencoder.
ICASSP2022
Yuan Gong 0001, Ziyi Chen, Iek-Heng Chu, Peng Chang 0002, James R. Glass, 
Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment.
ICASSP2022
Yuan Gong 0001, Jin Yu, James R. Glass, 
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition.
ICASSP2022
R'mani Haulcy, Katerina Placek, Brian Tracey, Adam P. Vogel, James R. Glass, 
Repetition Assessment for Speech and Language Disorders: A Study of the Logopenic Variant of Primary Progressive Aphasia.
ICASSP2022
Sameer Khurana, Antoine Laurent, James R. Glass, 
Magic Dust for Cross-Lingual Adaptation of Monolingual Wav2vec-2.0.
ICASSP2022
Cheng-I Jeff Lai, Erica Cooper, Yang Zhang 0001, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David D. Cox, James R. Glass, 
On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis.
Interspeech2022
Alexander H. Liu, Cheng-I Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James R. Glass, 
Simple and Effective Unsupervised Speech Synthesis.
AAAI2022
Yuan Gong 0001, Cheng-I Lai, Yu-An Chung, James R. Glass, 
SSAST: Self-Supervised Audio Spectrogram Transformer.
TASLP2021
Yuan Gong 0001, Yu-An Chung, James R. Glass, 
PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation.
ICASSP2021
Yu-An Chung, Yonatan Belinkov, James R. Glass, 
Similarity Analysis of Self-Supervised Speech Representations.
ICASSP2021
Cheng-I Lai, Yung-Sung Chuang, Hung-Yi Lee, Shang-Wen Li 0001, James R. Glass, 
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining.
Interspeech2021
Yuan Gong 0001, Yu-An Chung, James R. Glass, 
AST: Audio Spectrogram Transformer.
Interspeech2021
R'mani Haulcy, James R. Glass, 
CLAC: A Speech Corpus of Healthy English Speakers.
ICASSP2024
W. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang 0033, Yongqiang Wang, Shuo-Yiin Chang, Tara N. Sainath, 
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study.
ICASSP2023
Ke Hu, Tara N. Sainath, Bo Li 0028, Nan Du 0002, Yanping Huang, Andrew M. Dai, Yu Zhang 0033, Rodrigo Cabrera, Zhifeng Chen, Trevor Strohman, 
Massively Multilingual Shallow Fusion with Large Language Models.
ICASSP2023
Dongseong Hwang, Khe Chai Sim, Yu Zhang 0033, Trevor Strohman, 
Comparison of Soft and Hard Target RNN-T Distillation for Large-Scale ASR.
ICASSP2023
Bo Li 0028, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang 0033, Wei Han 0002, Trevor Strohman, Françoise Beaufays, 
Efficient Domain Adaptation for Speech Foundation Models.
ICASSP2023
Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang 0033, Bo Li 0028, Andrew Rosenberg, Bhuvana Ramabhadran, 
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition.
ICASSP2023
Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang 0033, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran, 
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech.
ICASSP2023
Yongqiang Wang, Zhehuai Chen, Chengjian Zheng, Yu Zhang 0033, Wei Han 0002, Parisa Haghani, 
Accelerating RNN-T Training and Inference Using CTC Guidance.
ICASSP2023
Gary Wang, Kyle Kastner, Ankur Bapna, Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang 0033, 
Understanding Shared Speech-Text Representations.
ICASSP2023
Chao-Han Huck Yang, Bo Li 0028, Yu Zhang 0033, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman, 
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition.
ICASSP2023
Chao-Han Huck Yang, Bo Li 0028, Yu Zhang 0033, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee 0001, 
A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition.
Interspeech2023
Zih-Ching Chen, Chao-Han Huck Yang, Bo Li 0028, Yu Zhang 0033, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath, 
How to Estimate Model Transferability of Pre-Trained Speech Models?
Interspeech2023
Ke Hu, Bo Li 0028, Tara N. Sainath, Yu Zhang 0033, Françoise Beaufays, 
Mixture-of-Expert Conformer for Streaming Multilingual ASR.
Interspeech2023
Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang 0033, Wei Han 0002, Ankur Bapna, 
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus.
ICML2023
Yong Cheng, Yu Zhang 0033, Melvin Johnson, Wolfgang Macherey, Ankur Bapna, 
Mu2SLAM: Multitask, Multilingual Speech and Language Models.
ICASSP2022
Junwen Bai, Bo Li 0028, Yu Zhang 0033, Ankur Bapna, Nikhil Siddhartha, Khe Chai Sim, Tara N. Sainath, 
Joint Unsupervised and Supervised Training for Multilingual ASR.
ICASSP2022
Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Gary Wang, 
Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses.
ICASSP2022
Bo Li 0028, Ruoming Pang, Yu Zhang 0033, Tara N. Sainath, Trevor Strohman, Parisa Haghani, Yun Zhu, Brian Farris, Neeraj Gaur, Manasa Prasad, 
Massively Multilingual ASR: A Lifelong Learning Solution.
ICASSP2022
Qiujia Li, Yu Zhang 0033, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland, 
Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang 0001, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
ICASSP2022
Joel Shor, Aren Jansen, Wei Han 0002, Daniel S. Park, Yu Zhang 0033, 
Universal Paralinguistic Speech Representations Using self-Supervised Conformers.
TASLP2024
Yang Ai, Zhen-Hua Ling, 
Low-Latency Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses for Speech Generation Tasks.
TASLP2024
Zhaoci Liu, Liping Chen, Ya-Jun Hu, Zhen-Hua Ling, Jia Pan, 
PE-Wav2vec: A Prosody-Enhanced Speech Model for Self-Supervised Prosody Learning in TTS.
TASLP2024
Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling, 
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement.
ICASSP2024
Shihao Chen, Liping Chen, Jie Zhang 0042, Kong-Aik Lee, Zhenhua Ling, Lirong Dai 0001, 
Adversarial Speech for Voice Privacy Protection from Personalized Speech Generation.
ICASSP2024
Liping Chen, Kong Aik Lee, Wu Guo, Zhen-Hua Ling, 
Modeling Pseudo-Speaker Uncertainty in Voice Anonymization.
ICASSP2024
Kangdi Mei, Zhaoci Liu, Hui-Peng Du, Hengyu Li, Yang Ai, Liping Chen, Zhenhua Ling, 
Considering Temporal Connection between Turns for Conversational Speech Synthesis.
ICASSP2024
Qing-Tian Xu, Jie Zhang 0042, Zhen-Hua Ling, 
An End-to-End EEG Channel Selection Method with Residual Gumbel Softmax for Brain-Assisted Speech Enhancement.
ACL-Findings2024
Qian Wang, Jia-Chen Gu, Zhen-Hua Ling, 
X-ACE: Explainable and Multi-factor Audio Captioning Evaluation.
TASLP2023
Yang Ai, Zhen-Hua Ling, 
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra.
TASLP2023
Chang Liu, Zhen-Hua Ling, Ling-Hui Chen, 
Pronunciation Dictionary-Free Multilingual Speech Synthesis Using Learned Phonetic Representations.
ICASSP2023
Yang Ai, Zhen-Hua Ling, 
Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses.
ICASSP2023
Kangdi Mei, Xinyun Ding, Yinlong Liu, Zhiqiang Guo, Feiyang Xu, Xin Li 0064, Tuya Naren, Jiahong Yuan, Zhenhua Ling, 
The Ustc System for Adress-m Challenge.
ICASSP2023
Jing-Xuan Zhang, Genshun Wan, Zhen-Hua Ling, Jia Pan, Jianqing Gao, Cong Liu 0006, 
Self-Supervised Audio-Visual Speech Representations Learning by Multimodal Self-Distillation.
ICASSP2023
Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling, 
Speech Reconstruction from Silent Tongue and Lip Articulation by Pseudo Target Generation and Domain Adversarial Training.
Interspeech2023
Jie Zhang 0042, Qing-Tian Xu, Qiu-Shi Zhu, Zhen-Hua Ling, 
BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions.
Interspeech2023
Zhaoci Liu, Zhen-Hua Ling, Ya-Jun Hu, Jia Pan, Jin-Wei Wang, Yun-Di Wu, 
Speech Synthesis with Self-Supervisedly Learnt Prosodic Representations.
Interspeech2023
Ye-Xin Lu, Yang Ai, Zhen-Hua Ling, 
MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra.
Interspeech2023
Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling, 
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation.
EMNLP-Findings2023
Yue Chen, Tianwei He, Hongbin Zhou, Jia-Chen Gu, Heng Lu 0002, Zhen-Hua Ling, 
Symbolization, Prompt, and Classification: A Framework for Implicit Speaker Identification in Novels.
TASLP2022
Yang Ai, Zhen-Hua Ling, Wei-Lu Wu, Ang Li, 
Denoising-and-Dereverberation Hierarchical Neural Vocoder for Statistical Parametric Speech Synthesis.
TASLP2024
Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu 0012, Shuo Ren, Shujie Liu 0001, Zhuoyuan Yao, Xun Gong 0005, Li-Rong Dai 0001, Jinyu Li 0001, Furu Wei, 
SpeechLM: Enhanced Speech Pre-Training With Unpaired Textual Data.
ICASSP2024
Shihao Chen, Liping Chen, Jie Zhang 0042, Kong-Aik Lee, Zhenhua Ling, Lirong Dai 0001, 
Adversarial Speech for Voice Privacy Protection from Personalized Speech Generation.
ICASSP2024
Jianwei Cui, Yu Gu, Chao Weng, Jie Zhang 0042, Liping Chen, Lirong Dai 0001, 
Sifisinger: A High-Fidelity End-to-End Singing Voice Synthesizer Based on Source-Filter Model.
ICASSP2024
Yichi Wang, Jie Zhang 0042, Shihao Chen, Weitai Zhang, Zhongyi Ye, Xinyuan Zhou, Lirong Dai 0001, 
A Study of Multichannel Spatiotemporal Features and Knowledge Distillation on Robust Target Speaker Extraction.
ICASSP2024
Weitai Zhang, Hanyi Zhang, Chenxuan Liu, Zhongyi Ye, Xinyuan Zhou, Chao Lin, Lirong Dai 0001, 
Pre-Trained Acoustic-and-Textual Modeling for End-To-End Speech-To-Text Translation.
AAAI2024
Qiushi Zhu, Jie Zhang 0042, Yu Gu, Yuchen Hu, Lirong Dai 0001, 
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation.
TASLP2023
Jie Zhang 0042, Rui Tao, Jun Du, Li-Rong Dai 0001, 
SDW-SWF: Speech Distortion Weighted Single-Channel Wiener Filter for Noise Reduction.
TASLP2023
Qiu-Shi Zhu, Jie Zhang 0042, Ziqiang Zhang, Li-Rong Dai 0001, 
A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition.
ICASSP2023
Hang-Rui Hu, Yan Song 0001, Jian-Tao Zhang, Li-Rong Dai 0001, Ian McLoughlin 0001, Zhu Zhuo, Yu Zhou, Yu-Hong Li, Hui Xue, 
Stargan-vc Based Cross-Domain Data Augmentation for Speaker Verification.
ICASSP2023
Haitao Xu, Liangfa Wei, Jie Zhang 0042, Jianming Yang, Yannan Wang, Tian Gao, Xin Fang, Li-Rong Dai 0001, 
A Multi-Scale Feature Aggregation Based Lightweight Network for Audio-Visual Speech Enhancement.
ICASSP2023
Qiu-Shi Zhu, Long Zhou, Jie Zhang 0042, Shujie Liu 0001, Yu-Chen Hu, Li-Rong Dai 0001, 
Robust Data2VEC: Noise-Robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning.
Interspeech2023
Kang Li, Yan Song 0001, Ian McLoughlin 0001, Lin Liu 0017, Jin Li, Li-Rong Dai 0001, 
Fine-tuning Audio Spectrogram Transformer with Task-aware Adapters for Sound Event Detection.
Interspeech2023
Mohan Shi, Zhihao Du, Qian Chen 0003, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang 0042, Li-Rong Dai 0001, 
CASA-ASR: Context-Aware Speaker-Attributed ASR.
Interspeech2023
Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen 0003, Shiliang Zhang, Jie Zhang 0042, Li-Rong Dai 0001, 
Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction.
Interspeech2023
Jingyuan Wang, Jie Zhang 0042, Li-Rong Dai 0001, 
Real-Time Causal Spectro-Temporal Voice Activity Detection Based on Convolutional Encoding and Residual Decoding.
Interspeech2023
Xiao-Min Zeng, Yan Song 0001, Ian McLoughlin 0001, Lin Liu 0017, Li-Rong Dai 0001, 
Robust Prototype Learning for Anomalous Sound Detection.
ICASSP2022
Han Chen, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu 0017, 
Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift.
ICASSP2022
Xing-Yu Chen, Qiu-Shi Zhu, Jie Zhang 0042, Li-Rong Dai 0001, 
Supervised and Self-Supervised Pretraining Based Covid-19 Detection Using Acoustic Breathing/Cough/Speech Signals.
ICASSP2022
Hang-Rui Hu, Yan Song 0001, Ying Liu, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu 0017, 
Domain Robust Deep Embedding Learning for Speaker Recognition.
ICASSP2022
Yuxuan Xi, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu 0017, 
Frontend Attributes Disentanglement for Speech Emotion Recognition.
SpeechComm2024
Shikha Baghel, Shreyas Ramoji, Somil Jain, Pratik Roy Chowdhuri, Prachi Singh, Deepu Vijayasenan, Sriram Ganapathy, 
Summary of the DISPLACE challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments.
TASLP2024
Varun Krishna, Tarun Sai, Sriram Ganapathy, 
Representation Learning With Hidden Unit Clustering for Low Resource Speech Applications.
TASLP2024
Anurenjan Purushothaman, Debottam Dutta, Rohit Kumar, Sriram Ganapathy, 
Speech Dereverberation With Frequency Domain Autoregressive Modeling.
ICASSP2024
Shikhar Bharadwaj, Min Ma, Shikhar Vashishth, Ankur Bapna, Sriram Ganapathy, Vera Axelrod, Siddharth Dalmia, Wei Han, Yu Zhang, Daan van Esch, Sandy Ritchie, Partha Talukdar, Jason Riesa, 
Multimodal Modeling for Spoken Language Identification.
ICASSP2024
Soumya Dutta, Sriram Ganapathy, 
Zero Shot Audio To Audio Emotion Transfer With Speaker Disentanglement.
ICASSP2023
Prachi Singh, Amrit Kaul, Sriram Ganapathy, 
Supervised Hierarchical Clustering Using Graph Neural Networks for Speaker Diarization.
Interspeech2023
Shikha Baghel, Shreyas Ramoji, Sidharth, Ranjana H, Prachi Singh, Somil Jain, Pratik Roy Chowdhuri, Kaustubh Kulkarni, Swapnil Padhi, Deepu Vijayasenan, Sriram Ganapathy, 
The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments.
Interspeech2023
Akshara Soman, Vidhi Sinha, Sriram Ganapathy, 
Enhancing the EEG Speech Match Mismatch Tasks With Word Boundaries.
Interspeech2023
Shikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han 0002, Vera Axelrod, Partha Talukdar, 
Label Aware Speech Representation Learning For Language Identification.
EMNLP2023
Darshan Prabhu, Preethi Jyothi, Sriram Ganapathy, Vinit Unni, 
Accented Speech Recognition With Accent-specific Codebooks.
ICASSP2022
Soumya Dutta, Sriram Ganapathy, 
Multimodal Transformer with Learnable Frontend and Self Attention for Emotion Recognition.
ICASSP2022
Varun Krishna, Sriram Ganapathy, 
Self Supervised Representation Learning with Deep Clustering for Acoustic Unit Discovery from Raw Speech.
ICASSP2022
Rohit Kumar, Anurenjan Purushothaman, Anirudh Sreeram, Sriram Ganapathy, 
End-To-End Speech Recognition with Joint Dereverberation of Sub-Band Autoregressive Envelopes.
ICASSP2022
Neeraj Kumar Sharma 0001, Srikanth Raj Chetupalli, Debarpan Bhattacharya, Debottam Dutta, Pravin Mote, Sriram Ganapathy, 
The Second Dicova Challenge: Dataset and Performance Analysis for Diagnosis of Covid-19 Using Acoustics.
Interspeech2022
Shrutina Agarwal, Naoya Takahashi, Sriram Ganapathy, 
Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer.
Interspeech2022
Tarun Sai Bandarupalli, Shakti Rath, Nirmesh Shah, Naoyuki Onoe, Sriram Ganapathy, 
Semi-supervised Acoustic and Language Modeling for Hindi ASR.
Interspeech2022
Debarpan Bhattacharya, Debottam Dutta, Neeraj Kumar Sharma 0001, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K. K, Sadhana Gonuguntla, Murali Alagesan, 
Coswara: A website application enabling COVID-19 screening by analysing respiratory sound samples and health symptoms.
Interspeech2022
Debarpan Bhattacharya, Debottam Dutta, Neeraj Kumar Sharma 0001, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K. K, Sadhana Gonuguntla, Murali Alagesan, 
Analyzing the impact of SARS-CoV-2 variants on respiratory sound signals.
Interspeech2022
Srikanth Raj Chetupalli, Sriram Ganapathy, 
Speaker conditioned acoustic modeling for multi-speaker conversational ASR.
Interspeech2022
Debottam Dutta, Debarpan Bhattacharya, Sriram Ganapathy, Amir Hossein Poorjam, Deepak Mittal, Maneesh Singh 0001, 
Acoustic Representation Learning on Breathing and Speech Signals for COVID-19 Detection.
TASLP2024
Magdalena Rybicka, Jesús Villalba 0001, Thomas Thebaud, Najim Dehak, Konrad Kowalczyk, 
End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors.
Interspeech2023
Jesús Villalba 0001, Jonas Borgstrom, Maliha Jahan, Saurabh Kataria, Leibny Paola García, Pedro A. Torres-Carrasquillo, Najim Dehak, 
Advances in Language Recognition in Low Resource African Languages: The JHU-MIT Submission for NIST LRE22.
Interspeech2023
Saurabhchand Bhati, Jesús Villalba 0001, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak, 
Segmental SpeechCLIP: Utilizing Pretrained Image-text Models for Audio-Visual Learning.
Interspeech2023
Anna Favaro, Tianyu Cao 0003, Thomas Thebaud, Jesús Villalba 0001, Ankur A. Butala, Najim Dehak, Laureano Moro-Velázquez, 
Do Phonatory Features Display Robustness to Characterize Parkinsonian Speech Across Corpora?
Interspeech2023
Saurabh Kataria, Jesús Villalba 0001, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak, 
Self-FiLM: Conditioning GANs with self-supervised representations for bandwidth extension based speaker recognition.
Interspeech2023
Helin Wang, Thomas Thebaud, Jesús Villalba 0001, Myra Sydnor, Becky Lammers, Najim Dehak, Laureano Moro-Velázquez, 
DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model.
Interspeech2022
Jaejin Cho, Raghavendra Pappagari, Piotr Zelasko, Laureano Moro-Velázquez, Jesús Villalba 0001, Najim Dehak, 
Non-contrastive self-supervised learning of utterance-level speech representations.
Interspeech2022
Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesús Villalba 0001, Sanjeev Khudanpur, Najim Dehak, 
Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser.
Interspeech2022
Sonal Joshi, Saurabh Kataria, Jesús Villalba 0001, Najim Dehak, 
AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification.
Interspeech2022
Saurabh Kataria, Jesús Villalba 0001, Laureano Moro-Velázquez, Najim Dehak, 
Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification.
Interspeech2022
Magdalena Rybicka, Jesús Villalba 0001, Najim Dehak, Konrad Kowalczyk, 
End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors.
Interspeech2022
Yiwen Shao, Jesús Villalba 0001, Sonal Joshi, Saurabh Kataria, Sanjeev Khudanpur, Najim Dehak, 
Chunking Defense for Adversarial Attacks on ASR.
ICASSP2021
Nanxin Chen, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak, 
Focus on the Present: A Regularization Method for the ASR Source-Target Attention Layer.
ICASSP2021
Jaejin Cho, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak, 
Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios.
ICASSP2021
Siyuan Feng 0001, Piotr Zelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak, 
How Phonotactics Affect Multilingual and Zero-Shot ASR Performance.
ICASSP2021
Saurabh Kataria, Jesús Villalba 0001, Najim Dehak, 
Perceptual Loss Based Speech Denoising with an Ensemble of Audio Pattern Recognition and Self-Supervised Models.
ICASSP2021
Raghavendra Pappagari, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak, 
CopyPaste: An Augmentation Method for Speech Emotion Recognition.
ICASSP2021
Liming Wang, Xinsheng Wang, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak, 
Align or attend? Toward More Efficient and Accurate Spoken Word Discovery Using Speech-to-Image Retrieval.
Interspeech2021
Saurabhchand Bhati, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak, 
Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation.
Interspeech2021
Nanxin Chen, Piotr Zelasko, Laureano Moro-Velázquez, Jesús Villalba 0001, Najim Dehak, 
Align-Denoise: Single-Pass Non-Autoregressive Speech Recognition.
TASLP2024
Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang 0001, 
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research.
AAAI2024
Xuxin Cheng, Zhihong Zhu, Hongxiang Li, Yaowei Li, Xianwei Zhuang, Yuexian Zou, 
Towards Multi-Intent Spoken Language Understanding via Hierarchical Attention and Optimal Transport.
AAAI2024
Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren 0006, Yuexian Zou, Zhou Zhao, Shinji Watanabe 0001, 
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
ACL2024
Xianwei Zhuang, Xuxin Cheng, Liming Liang, Yuxin Xie, Zhichang Wang, Zhiqi Huang, Yuexian Zou, 
PCAD: Towards ASR-Robust Spoken Language Understanding via Prototype Calibration and Asymmetric Decoupling.
ACL-Findings2024
Xuxin Cheng, Zhihong Zhu, Bang Yang, Xianwei Zhuang, Hongxiang Li, Yuexian Zou, 
Cyclical Contrastive Learning Based on Geodesic for Zero-shot Cross-lingual Spoken Language Understanding.
ACL-Findings2024
Xuxin Cheng, Zhihong Zhu, Xianwei Zhuang, Zhanpeng Chen, Zhiqi Huang, Yuexian Zou, 
MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts.
TASLP2023
Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu 0001, 
Integrating Lattice-Free MMI Into End-to-End Speech Recognition.
TASLP2023
Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu 0001, 
Diffsound: Discrete Diffusion Model for Text-to-Sound Generation.
ICASSP2023
Xuxin Cheng, Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Yuexian Zou, 
M3ST: Mix at Three Levels for Speech Translation.
ICASSP2023
Xuxin Cheng, Zhihong Zhu, Hongxiang Li, Yaowei Li, Yuexian Zou, 
SSVMR: Saliency-Based Self-Training for Video-Music Retrieval.
ICASSP2023
Tengtao Song, Nuo Chen 0001, Ji Jiang, Zhihong Zhu, Yuexian Zou, 
Improving Retrieval-Based Dialogue System Via Syntax-Informed Attention.
ICASSP2023
Zhihong Zhu, Weiyuan Xu, Xuxin Cheng, Tengtao Song, Yuexian Zou, 
A Dynamic Graph Interactive Framework with Label-Semantic Injection for Spoken Language Understanding.
Interspeech2023
Xuxin Cheng, Ziyu Yao 0001, Zhihong Zhu, Yaowei Li, Hongxiang Li, Yuexian Zou, 
C²A-SLU: Cross and Contrastive Attention for Improving ASR Robustness in Spoken Language Understanding.
Interspeech2023
Xuxin Cheng, Wanshi Xu, Ziyu Yao 0001, Zhihong Zhu, Yaowei Li, Hongxiang Li, Yuexian Zou, 
FC-MTLF: A Fine- and Coarse-grained Multi-Task Learning Framework for Cross-Lingual Spoken Language Understanding.
Interspeech2023
Xuxin Cheng, Zhihong Zhu, Ziyu Yao 0001, Hongxiang Li, Yaowei Li, Yuexian Zou, 
GhostT5: Generate More Features with Cheap Operations to Improve Textless Spoken Question Answering.
Interspeech2023
Yifei Xin, Dongchao Yang, Yuexian Zou, 
Background-aware Modeling for Weakly Supervised Sound Event Detection.
Interspeech2023
Yifei Xin, Yuexian Zou, 
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions.
Interspeech2023
Dongchao Yang, Songxiang Liu, Helin Wang, Jianwei Yu, Chao Weng, Yuexian Zou, 
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS.
Interspeech2023
Zhihong Zhu, Xuxin Cheng, Dongsheng Chen, Zhiqi Huang, Hongxiang Li, Yuexian Zou, 
Mix before Align: Towards Zero-shot Cross-lingual Sentiment Analysis via Soft-Mix and Multi-View Learning.
ACL-Findings2023
Xuxin Cheng, Bowen Cao, Qichen Ye, Zhihong Zhu, Hongxiang Li, Yuexian Zou, 
ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding.
ICASSP2024
Jingyu Li, Tan Lee 0001, 
Efficient Black-Box Speaker Verification Model Adaptation With Reprogramming And Backend Learning.
ICASSP2024
Wei Liu, Ying Qin, Zhiyuan Peng, Tan Lee 0001, 
Sparsely Shared Lora on Whisper for Child Speech Recognition.
ICASSP2024
Yusheng Tian, Jingyu Li, Tan Lee 0001, 
Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss.
TASLP2023
Guangyan Zhang, Ying Qin, Wenjie Zhang, Jialun Wu, Mei Li, Yutao Gai, Feijun Jiang, Tan Lee 0001, 
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis Based on Disentanglement Between Prosody and Timbre.
ICASSP2023
Jingyu Li, Yusheng Tian, Tan Lee 0001, 
Convolution-Based Channel-Frequency Attention for Text-Independent Speaker Verification.
ICASSP2023
Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li 0119, Zejun Ma, Tan Lee 0001, 
Leveraging Phone-Level Linguistic-Acoustic Similarity For Utterance-Level Pronunciation Scoring.
ICASSP2023
Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li 0119, Zejun Ma, Tan Lee 0001, 
An ASR-Free Fluency Scoring Approach with Self-Supervised Learning.
ICASSP2023
Zhiyuan Peng, Mingjie Shao, Xuanji He, Xu Li, Tan Lee 0001, Ke Ding, Guanglu Wan, 
Covariance Regularization for Probabilistic Linear Discriminant Analysis.
Interspeech2023
Jingyu Li, Wei Liu, Zhaoyang Zhang 0001, Jiong Wang, Tan Lee 0001, 
Model Compression for DNN-based Speaker Verification Using Weight Quantization.
Interspeech2023
Wei Liu, Zhiyuan Peng, Tan Lee 0001, 
CoMFLP: Correlation Measure Based Fast Search on ASR Layer Pruning.
Interspeech2023
Si Ioi Ng, Cymie Wing-Yee Ng, Tan Lee 0001, 
A Study on Using Duration and Formant Features in Automatic Detection of Speech Sound Disorder in Children.
Interspeech2023
Dehua Tao, Tan Lee 0001, Harold Chui, Sarah Luk, 
A Study on Prosodic Entrainment in Relation to Therapist Empathy in Counseling Conversation.
Interspeech2023
Yusheng Tian, Guangyan Zhang, Tan Lee 0001, 
Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models.
Interspeech2023
Yujia Xiao, Shaofei Zhang, Xi Wang 0016, Xu Tan 0003, Lei He 0005, Sheng Zhao, Frank K. Soong, Tan Lee 0001, 
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading.
TASLP2022
Shuiyang Mao, P. C. Ching, Tan Lee 0001, 
Enhancing Segment-Based Speech Emotion Recognition by Iterative Self-Learning.
ICASSP2022
Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan 0003, Sheng Zhao, Tan Lee 0001, 
A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.
Interspeech2022
Jonathan Him Nok Lee, Dehua Tao, Harold Chui, Tan Lee 0001, Sarah Luk, Nicolette Wing Tung Lee, Koonkan Fung, 
Durational Patterning at Discourse Boundaries in Relation to Therapist Empathy in Psychotherapy.
Interspeech2022
Jingyu Li, Wei Liu, Tan Lee 0001, 
EDITnet: A Lightweight Network for Unsupervised Domain Adaptation in Speaker Verification.
Interspeech2022
Si Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee 0001, 
Automatic Detection of Speech Sound Disorder in Child Speech Using Posterior-based Speaker Representations.
Interspeech2022
Zhiyuan Peng, Xuanji He, Ke Ding, Tan Lee 0001, Guanglu Wan, 
Unifying Cosine and PLDA Back-ends for Speaker Verification.
TASLP2024
Federico Landini, Mireia Díez, Themos Stafylakis, Lukás Burget, 
DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors.
ICASSP2024
Karel Benes, Martin Kocour, Lukás Burget, 
Hystoc: Obtaining Word Confidences for Fusion of End-To-End ASR Systems.
ICASSP2024
Jiangyu Han, Federico Landini, Johan Rohdin, Mireia Díez, Lukás Burget, Yuhang Cao, Heng Lu, Jan Cernocký, 
Diacorrect: Error Correction Back-End for Speaker Diarization.
ICASSP2024
Dominik Klement, Mireia Díez, Federico Landini, Lukás Burget, Anna Silnova, Marc Delcroix, Naohiro Tawara, 
Discriminative Training of VBx Diarization.
ICASSP2023
Sofoklis Kakouros, Themos Stafylakis, Ladislav Mosner, Lukás Burget, 
Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing.
ICASSP2023
Federico Landini, Mireia Díez, Alicia Lozano-Diez, Lukás Burget, 
Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization.
ICASSP2023
Junyi Peng, Themos Stafylakis, Rongzhi Gu, Oldrich Plchot, Ladislav Mosner, Lukás Burget, Jan Cernocký, 
Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters.
ICASSP2023
Anna Silnova, Niko Brümmer, Albert Swart, Lukás Burget, 
Toroidal Probabilistic Spherical Discriminant Analysis.
Interspeech2023
Marc Delcroix, Naohiro Tawara, Mireia Díez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukás Burget, Shoko Araki, 
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization.
Interspeech2023
Pavel Matejka, Anna Silnova, Josef Slavícek, Ladislav Mosner, Oldrich Plchot, Michal Klco, Junyi Peng, Themos Stafylakis, Lukás Burget, 
Description and Analysis of ABC Submission to NIST LRE 2022.
Interspeech2023
Ladislav Mosner, Oldrich Plchot, Junyi Peng, Lukás Burget, Jan Cernocký, 
Multi-Channel Speech Separation with Cross-Attention and Beamforming.
Interspeech2023
Junyi Peng, Oldrich Plchot, Themos Stafylakis, Ladislav Mosner, Lukás Burget, Jan Cernocký, 
Improving Speaker Verification with Self-Pretrained Transformer Models.
TASLP2022
Lucas Ondel, Bolaji Yusuf, Lukás Burget, Murat Saraçlar, 
Non-Parametric Bayesian Subspace Models for Acoustic Unit Discovery.
ICASSP2022
Jiangyu Han, Yanhua Long, Lukás Burget, Jan Cernocký, 
DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation and Extraction.
ICASSP2022
Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Honza Cernocký, 
Multisv: Dataset for Far-Field Multi-Channel Speaker Verification.
ICASSP2022
Lucas Ondel, Léa-Marie Lam-Yee-Mui, Martin Kocour, Caio Filippo Corro, Lukás Burget, 
GPU-Accelerated Forward-Backward Algorithm with Application to Lattice-Free MMI.
Interspeech2022
Murali Karthick Baskar, Tim Herzig, Diana Nguyen, Mireia Díez, Tim Polzehl, Lukás Burget, Jan Cernocký, 
Speaker adaptation for Wav2vec2 based dysarthric ASR.
Interspeech2022
Niko Brummer, Albert Swart, Ladislav Mosner, Anna Silnova, Oldrich Plchot, Themos Stafylakis, Lukás Burget, 
Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings.
Interspeech2022
Martin Kocour, Katerina Zmolíková, Lucas Ondel, Jan Svec, Marc Delcroix, Tsubasa Ochiai, Lukás Burget, Jan Cernocký, 
Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model.
Interspeech2022
Federico Landini, Alicia Lozano-Diez, Mireia Díez, Lukás Burget, 
From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization.
ICASSP2024
Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction.
ICASSP2024
Xueyuan Chen, Xi Wang 0016, Shaofei Zhang, Lei He 0005, Zhiyong Wu 0001, Xixin Wu, Helen Meng, 
Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis.
ICASSP2024
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Dan Luo, Zhiyong Wu 0001, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han 0001, Helen Meng, 
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.
ICASSP2024
Hui Lu, Xixin Wu, Haohan Guo, Songxiang Liu, Zhiyong Wu 0001, Helen Meng, 
Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.
ICASSP2024
Yuejiao Wang, Xixin Wu, Disong Wang, Lingwei Meng, Helen Meng, 
UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization.
ICML2024
Dongchao Yang, Jinchuan Tian, Xu Tan 0003, Rongjie Huang, Songxiang Liu, Haohan Guo, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian 0002, Zhou Zhao, Xixin Wu, Helen M. Meng, 
UniAudio: Towards Universal Audio Generation with Large Language Models.
TASLP2023
Haohan Guo, Fenglong Xie, Xixin Wu, Frank K. Soong, Helen Meng, 
MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS.
TASLP2023
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Zhiyong Wu 0001, Xixin Wu, Shiyin Kang, Helen Meng, 
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.
TASLP2023
Xixin Wu, Hui Lu, Kun Li 0003, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms.
ICASSP2023
Jinchao Li, Kaitao Song, Junan Li, Bo Zheng, Dongsheng Li 0002, Xixin Wu, Xunying Liu, Helen Meng, 
Leveraging Pretrained Representations With Task-Related Keywords for Alzheimer's Disease Detection.
ICASSP2023
Jinchao Li, Xixin Wu, Kaitao Song, Dongsheng Li 0002, Xunying Liu, Helen Meng, 
A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition.
ICASSP2023
Yuhao Liu, Cheng Gong, Longbiao Wang, Xixin Wu, Qiuyu Liu, Jianwu Dang 0001, 
VF-Taco2: Towards Fast and Lightweight Synthesis for Autoregressive Models with Variation Autoencoder and Feature Distillation.
ICASSP2023
Lingwei Meng, Jiawen Kang 0002, Mingyu Cui, Yuejiao Wang, Xixin Wu, Helen Meng, 
A Sidecar Separator Can Convert A Single-Talker Speech Recognition System to A Multi-Talker One.
Interspeech2023
Yunxiang Li, Pengfei Liu 0003, Xixin Wu, Helen Meng, 
PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts.
Interspeech2023
Lingwei Meng, Jiawen Kang 0002, Mingyu Cui, Haibin Wu, Xixin Wu, Helen Meng, 
Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator.
Interspeech2023
Helen Meng, Brian Mak, Man-Wai Mak, Helene H. Fung, Xianmin Gong, Timothy C. Y. Kwok, Xunying Liu, Vincent C. T. Mok, Patrick C. M. Wong, Jean Woo, Xixin Wu, Ka Ho Wong, Sean Shensheng Xu, Naijun Zheng, Ranzo Huang, Jiawen Kang 0002, Xiaoquan Ke, Junan Li, Jinchao Li, Yi Wang, 
Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders.
ICASSP2022
Hang Su, Danyang Zhao, Long Dang, Minglei Li 0001, Xixin Wu, Xunying Liu, Helen Meng, 
A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition.
ICASSP2022
Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu, Helen Meng, 
Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation.
ICASSP2022
Xixin Wu, Shoukang Hu, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Neural Architecture Search for Speech Emotion Recognition.
ICASSP2022
Haibin Wu, Bo Zheng, Xu Li 0015, Xixin Wu, Hung-Yi Lee, Helen Meng, 
Characterizing the Adversarial Vulnerability of Speech self-Supervised Learning.
ICASSP2024
Shengpeng Ji, Jialong Zuo, Minghui Fang 0002, Ziyue Jiang 0004, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao, 
TextrolSpeech: A Text Style Control Speech Corpus with Codec Language Text-to-Speech Models.
ICML2024
Rongjie Huang, Ruofan Hu, Yongqi Wang, Zehan Wang 0001, Xize Cheng, Ziyue Jiang 0001, Zhenhui Ye, Dongchao Yang, Luping Liu, Peng Gao 0007, Zhou Zhao, 
InstructSpeech: Following Speech Editing Instructions via Large Language Models.
ICML2024
Dongchao Yang, Jinchuan Tian, Xu Tan 0003, Rongjie Huang, Songxiang Liu, Haohan Guo, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian 0002, Zhou Zhao, Xixin Wu, Helen M. Meng, 
UniAudio: Towards Universal Audio Generation with Large Language Models.
ICLR2024
Ziyue Jiang 0001, Jinglin Liu, Yi Ren 0006, Jinzheng He, Zhenhui Ye, Shengpeng Ji, Qian Yang, Chen Zhang 0020, Pengfei Wei 0001, Chunfeng Wang, Xiang Yin 0006, Zejun Ma, Zhou Zhao, 
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis.
AAAI2024
Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren 0006, Yuexian Zou, Zhou Zhao, Shinji Watanabe 0001, 
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
AAAI2024
Yu Zhang 0126, Rongjie Huang, Ruiqi Li, Jinzheng He, Yan Xia 0006, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao, 
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis.
ACL2024
Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Jinchuan Tian, Zhenhui Ye, Luping Liu, Zehan Wang 0001, Ziyue Jiang 0001, Xuankai Chang, Jiatong Shi, Chao Weng, Zhou Zhao, Dong Yu 0001, 
Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.
ACL2024
Shengpeng Ji, Ziyue Jiang 0001, Hanting Wang, Jialong Zuo, Zhou Zhao, 
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech.
ACL2024
Songju Lei, Xize Cheng, Mengjiao Lyu, Jianqiao Hu, Jintao Tan, Runlin Liu, Lingyu Xiong, Tao Jin 0004, Xiandong Li, Zhou Zhao, 
Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation.
ACL2024
Ruiqi Li, Yu Zhang 0126, Yongqi Wang, Zhiqing Hong, Rongjie Huang, Zhou Zhao, 
Robust Singing Voice Transcription Serves Synthesis.
ACL2024
Qian Yang, Jin Xu, Wenrui Liu, Yunfei Chu, Ziyue Jiang 0001, Xiaohuan Zhou, Yichong Leng, Yuanjun Lv, Zhou Zhao, Chang Zhou, Jingren Zhou, 
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension.
NAACL2024
Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, Ruiqi Li, Wenrui Liu, Fuming You, Tao Jin, Zhou Zhao, 
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt.
ACL-Findings2024
Xize Cheng, Rongjie Huang, Linjun Li, Zehan Wang 0001, Tao Jin 0004, Aoxiong Yin, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao, 
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation.
ACL-Findings2024
Ruiqi Li, Rongjie Huang, Yongqi Wang, Zhiqing Hong, Zhou Zhao, 
Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion.
ACL-Findings2024
Huadai Liu, Rongjie Huang, Jinzheng He, Gang Sun, Ran Shen, Xize Cheng, Zhou Zhao, 
Wav2SQL: Direct Generalizable Speech-To-SQL Parsing.
ICASSP2023
Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen 0003, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren 0006, Zhou Zhao, 
Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG).
ICASSP2023
Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen 0003, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren 0006, Zhou Zhao, 
MUG: A General Meeting Understanding and Generation Benchmark.
ICML2023
Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren 0006, Luping Liu, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin 0006, Zhou Zhao, 
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models.
NeurIPS2023
Haoyi Duan, Yan Xia 0006, Mingze Zhou, Li Tang, Jieming Zhu, Zhou Zhao, 
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks.
ICLR2023
Rongjie Huang, Jinglin Liu, Huadai Liu, Yi Ren 0006, Lichao Zhang, Jinzheng He, Zhou Zhao, 
TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation.
ICASSP2024
Hangting Chen, Jianwei Yu, Chao Weng, 
Complexity Scaling for Speech Denoising.
ICASSP2024
Shuai Wang 0016, Qibing Bai, Qi Liu 0018, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li 0001, 
Leveraging in-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition.
ICASSP2024
Jianwei Yu, Hangting Chen, Yanyao Bian, Xiang Li, Yi Luo, Jinchuan Tian, Mengyang Liu, Jiayi Jiang, Shuai Wang, 
AutoPrep: An Automatic Preprocessing Framework for In-The-Wild Speech Data.
AAAI2024
Yaoxun Xu, Hangting Chen, Jianwei Yu, Qiaochu Huang, Zhiyong Wu 0001, Shi-Xiong Zhang, Guangzhi Li, Yi Luo 0004, Rongzhi Gu, 
SECap: Speech Emotion Captioning with Large Language Model.
TASLP2023
Yi Luo 0004, Jianwei Yu, 
Music Source Separation With Band-Split RNN.
TASLP2023
Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu 0001, 
Integrating Lattice-Free MMI Into End-to-End Speech Recognition.
TASLP2023
Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu 0001, 
Diffsound: Discrete Diffusion Model for Text-to-Sound Generation.
ICASSP2023
Jianwei Yu, Hangting Chen, Yi Luo 0004, Rongzhi Gu, Weihua Li, Chao Weng, 
TSpeech-AI System Description to the 5th Deep Noise Suppression (DNS) Challenge.
ICASSP2023
Jianwei Yu, Yi Luo 0004, 
Efficient Monaural Speech Enhancement with Universal Sample Rate Band-Split RNN.
Interspeech2023
Yi Luo 0004, Jianwei Yu, 
FRA-RIR: Fast Random Approximation of the Image-source Method.
Interspeech2023
Hangting Chen, Jianwei Yu, Yi Luo 0004, Rongzhi Gu, Weihua Li, Zhuocheng Lu, Chao Weng, 
Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression.
Interspeech2023
Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Jiajun Deng, Mingyu Cui, Guinan Li, Jianwei Yu, Xurong Xie, Xunying Liu, 
Use of Speech Impairment Severity for Dysarthric Speech Recognition.
Interspeech2023
Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui Jin, Tianzi Wang, Shujie Hu, Zi Ye 0001, Helen Meng, Xunying Liu, 
On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition.
Interspeech2023
Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu 0001, Shinji Watanabe 0001, 
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction.
Interspeech2023
Dongchao Yang, Songxiang Liu, Helin Wang, Jianwei Yu, Chao Weng, Yuexian Zou, 
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS.
Interspeech2023
Jianwei Yu, Hangting Chen, Yi Luo 0004, Rongzhi Gu, Chao Weng, 
High Fidelity Speech Enhancement with Band-split RNN.
TASLP2022
Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
ICASSP2022
Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng, 
Audio-Visual Multi-Channel Speech Separation, Dereverberation and Recognition.
ICASSP2022
Junhao Xu, Jianwei Yu, Xunying Liu, Helen Meng, 
Mixed Precision DNN Quantization for Overlapped Speech Separation and Recognition.
ICASSP2022
Naijun Zheng, Na Li 0012, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
Multi-Channel Speaker Diarization Using Spatial Features for Meetings.
ICASSP2024
Ruizhe Huang, Xiaohui Zhang 0007, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe 0001, Daniel Povey, Sanjeev Khudanpur, 
Less Peaky and More Accurate CTC Forced Alignment by Label Priors.
ICASSP2024
Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization.
ICASSP2024
Amir Hussein, Dorsa Zeinali, Ondrej Klejch, Matthew Wiesner, Brian Yan, Shammur Absar Chowdhury, Ahmed Ali 0002, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Speech Collage: Code-Switched Audio Generation by Collaging Monolingual Corpora.
ICASSP2024
Hexin Liu, Leibny Paola Garcia, Xiangyu Zhang, Andy W. H. Khong, Sanjeev Khudanpur, 
Enhancing Code-Switching Speech Recognition With Interactive Language Biases.
TASLP2023
Desh Raj, Daniel Povey, Sanjeev Khudanpur, 
SURT 2.0: Advances in Transducer-Based Multi-Talker Speech Recognition.
ICASSP2023
Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola García, Hung-Yi Lee, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Euro: Espnet Unsupervised ASR Open-Source Toolkit.
ICASSP2023
Zili Huang, Desh Raj, Paola García 0001, Sanjeev Khudanpur, 
Adapting Self-Supervised Models to Multi-Talker Speech Recognition Using Speaker Embeddings.
ICASSP2023
Ruizhe Huang, Matthew Wiesner, Leibny Paola García-Perera, Daniel Povey, Jan Trmal, Sanjeev Khudanpur, 
Building Keyword Search System from End-To-End Asr Systems.
ICASSP2023
Hexin Liu, Haihua Xu, Leibny Paola García, Andy W. H. Khong, Yi He, Sanjeev Khudanpur, 
Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization.
Interspeech2023
Yi Han Victoria Chua, Hexin Liu, Leibny Paola García, Fei Ting Woon, Jinyi Wong, Xiangyu Zhang, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles, 
MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization.
Interspeech2023
Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola García, Daniel Povey, Sanjeev Khudanpur, 
Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts.
Interspeech2023
Desh Raj, Daniel Povey, Sanjeev Khudanpur, 
GPU-accelerated Guided Source Separation for Meeting Transcription.
Interspeech2023
Suzy J. Styles, Yi Han Victoria Chua, Fei Ting Woon, Hexin Liu, Leibny Paola García, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels, 
Investigating model performance in language identification: beyond simple error statistics.
Interspeech2023
Cihan Xiao, Henry Li Xinyuan, Jinyi Yang, Dongji Gao, Matthew Wiesner, Kevin Duh, Sanjeev Khudanpur, 
HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation.
ICASSP2022
Zili Huang, Shinji Watanabe 0001, Shu-Wen Yang, Paola García 0001, Sanjeev Khudanpur, 
Investigating Self-Supervised Learning for Speech Enhancement and Separation.
ICASSP2022
Matthew Wiesner, Desh Raj, Sanjeev Khudanpur, 
Injecting Text and Cross-Lingual Supervision in Few-Shot Learning from Self-Supervised Models.
Interspeech2022
Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesús Villalba 0001, Sanjeev Khudanpur, Najim Dehak, 
Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser.
Interspeech2022
Hexin Liu, Leibny Paola García-Perera, Andy W. H. Khong, Suzy J. Styles, Sanjeev Khudanpur, 
PHO-LID: A Unified Model Incorporating Acoustic-Phonetic and Phonotactic Information for Language Identification.
Interspeech2022
Yiwen Shao, Jesús Villalba 0001, Sonal Joshi, Saurabh Kataria, Sanjeev Khudanpur, Najim Dehak, 
Chunking Defense for Adversarial Attacks on ASR.
ICASSP2021
Hang Lv 0001, Zhehuai Chen, Hainan Xu, Daniel Povey, Lei Xie 0001, Sanjeev Khudanpur, 
An Asynchronous WFST-Based Decoder for Automatic Speech Recognition.
TASLP2024
Danwei Cai, Ming Li 0026, 
Leveraging ASR Pretrained Conformers for Speaker Verification Through Transfer Learning and Knowledge Distillation.
TASLP2024
Xiaoyi Qin, Na Li 0012, Shufei Duan, Ming Li 0026, 
Investigating Long-Term and Short-Term Time-Varying Speaker Verification.
ICASSP2024
Zexin Cai, Ming Li 0026, 
Invertible Voice Conversion with Parallel Data.
ICASSP2024
Weiqing Wang, Danwei Cai, Ming Cheng, Ming Li 0026, 
Joint Inference of Speaker Diarization and ASR with Multi-Stage Information Sharing.
TASLP2023
Xiaoyi Qin, Danwei Cai, Ming Li 0026, 
Robust Multi-Channel Far-Field Speaker Verification Under Different In-Domain Data Availability Scenarios.
ICASSP2023
Danwei Cai, Zexin Cai, Ming Li 0026, 
Identifying Source Speakers for Voice Conversion Based Spoofing Attacks on Speaker Verification Systems.
ICASSP2023
Zexin Cai, Weiqing Wang, Ming Li 0026, 
Waveform Boundary Detection for Partially Spoofed Audio.
ICASSP2023
Danwei Cai, Weiqing Wang, Ming Li 0026, Rui Xia, Chuanzeng Huang, 
Pretraining Conformer with ASR for Speaker Verification.
ICASSP2023
Ming Cheng, Haoxu Wang, Ziteng Wang, Qiang Fu 0001, Ming Li 0026, 
The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge.
ICASSP2023
Ming Cheng, Weiqing Wang, Yucong Zhang, Xiaoyi Qin, Ming Li 0026, 
Target-Speaker Voice Activity Detection Via Sequence-to-Sequence Prediction.
ICASSP2023
Haoxu Wang, Ming Cheng, Qiang Fu 0001, Ming Li 0026, 
The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis.
ICASSP2023
Xingming Wang, Hao Wu, Chen Ding, Chuanzeng Huang, Ming Li 0026, 
Exploring Universal Singing Speech Language Identification Using Self-Supervised Learning Based Front-End Features.
Interspeech2023
Xingming Wang, Bang Zeng, Hongbin Suo, Yulong Wan, Ming Li 0026, 
Robust Audio Anti-spoofing Countermeasure with Joint Training of Front-end and Back-end Models.
Interspeech2023
Bang Zeng, Hongbin Suo, Yulong Wan, Ming Li 0026, 
SEF-Net: Speaker Embedding Free Target Speaker Extraction Network.
Interspeech2023
Yucong Zhang, Hongbin Suo, Yulong Wan, Ming Li 0026, 
Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous Sound Detection via Multitask Learning.
TASLP2022
Danwei Cai, Weiqing Wang, Ming Li 0026, 
Incorporating Visual Information in Audio Based Self-Supervised Speaker Recognition.
TASLP2022
Weiqing Wang, Qingjian Lin, Danwei Cai, Ming Li 0026, 
Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization.
ICASSP2022
Ming Cheng, Haoxu Wang, Yechen Wang, Ming Li 0026, 
The DKU Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge.
ICASSP2022
Qingjian Li, Lin Yang, Xuyang Wang, Xiaoyi Qin, Junjie Wang, Ming Li 0026, 
Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification.
ICASSP2022
Xiaoyi Qin, Na Li 0012, Chao Weng, Dan Su 0002, Ming Li 0026, 
Simple Attention Module Based Speaker Verification with Iterative Noisy Label Detection.
TASLP2024
Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe 0001, 
End-to-End Speech Recognition: A Survey.
ICASSP2024
Zijian Yang, Wei Zhou 0043, Ralf Schlüter, Hermann Ney, 
On the Relation Between Internal Language Model and Sequence Discriminative Training for Neural Transducers.
ICASSP2024
Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
Chunked Attention-Based Encoder-Decoder Model for Streaming Speech Recognition.
ICASSP2023
Zijian Yang, Wei Zhou 0043, Ralf Schlüter, Hermann Ney, 
Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers.
ICASSP2023
Wei Zhou 0043, Haotian Wu, Jingjing Xu, Mohammad Zeineldeen, Christoph Lüscher, Ralf Schlüter, Hermann Ney, 
Enhancing and Adversarial: Improve ASR with Speaker Labels.
Interspeech2023
Wei Zhou 0043, Eugen Beck, Simon Berger, Ralf Schlüter, Hermann Ney, 
RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition.
Interspeech2023
Simon Berger, Peter Vieting, Christoph Böddeker, Ralf Schlüter, Reinhold Haeb-Umbach, 
Mixture Encoder for Joint Speech Separation and Recognition.
Interspeech2023
Tina Raissi, Christoph Lüscher, Moritz Gunz, Ralf Schlüter, Hermann Ney, 
Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think.
ICASSP2022
Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney, 
Improving Factored Hybrid HMM Acoustic Modeling without State Tying.
ICASSP2022
Nils-Philipp Wynands, Wilfried Michel, Jan Rosendahl, Ralf Schlüter, Hermann Ney, 
Efficient Sequence Training of Attention Models Using Approximative Recombination.
ICASSP2022
Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney, 
Conformer-Based Hybrid ASR System For Switchboard Dataset.
ICASSP2022
Wei Zhou 0043, Zuoyun Zheng, Ralf Schlüter, Hermann Ney, 
On Language Model Integration for RNN Transducer Based Speech Recognition.
Interspeech2022
Felix Meyer, Wilfried Michel, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney, 
Automatic Learning of Subword Dependent Model Scales.
Interspeech2022
Zijian Yang, Yingbo Gao, Alexander Gerstenberger, Jintao Jiang, Ralf Schlüter, Hermann Ney, 
Self-Normalized Importance Sampling for Neural Language Modeling.
Interspeech2022
Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney, 
Improving the Training Recipe for a Robust Conformer-based Hybrid Model.
Interspeech2022
Wei Zhou 0043, Wilfried Michel, Ralf Schlüter, Hermann Ney, 
Efficient Training of Neural Transducer for Speech Recognition.
ICASSP2021
Wei Zhou 0043, Simon Berger, Ralf Schlüter, Hermann Ney, 
Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition.
Interspeech2021
Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlüter, Hermann Ney, 
On Sampling-Based Training Criteria for Neural Language Modeling.
Interspeech2021
Yu Qiao 0005, Wei Zhou 0043, Elma Kerz, Ralf Schlüter, 
The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech.
Interspeech2021
Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
Investigating Methods to Improve Language Model Integration for Attention-Based Encoder-Decoder ASR Models.
TASLP2024
David Thulke, Nico Daheim, Christian Dugast, Hermann Ney, 
Task-Oriented Document-Grounded Dialog Systems by HLTPR@RWTH for DSTC9 and DSTC10.
ICASSP2024
Zijian Yang, Wei Zhou 0043, Ralf Schlüter, Hermann Ney, 
On the Relation Between Internal Language Model and Sequence Discriminative Training for Neural Transducers.
ICASSP2024
Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
Chunked Attention-Based Encoder-Decoder Model for Streaming Speech Recognition.
ICASSP2023
Zijian Yang, Wei Zhou 0043, Ralf Schlüter, Hermann Ney, 
Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers.
ICASSP2023
Wei Zhou 0043, Haotian Wu, Jingjing Xu, Mohammad Zeineldeen, Christoph Lüscher, Ralf Schlüter, Hermann Ney, 
Enhancing and Adversarial: Improve ASR with Speaker Labels.
Interspeech2023
Wei Zhou 0043, Eugen Beck, Simon Berger, Ralf Schlüter, Hermann Ney, 
RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition.
Interspeech2023
Tina Raissi, Christoph Lüscher, Moritz Gunz, Ralf Schlüter, Hermann Ney, 
Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think.
ICASSP2022
Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney, 
Improving Factored Hybrid HMM Acoustic Modeling without State Tying.
ICASSP2022
Nils-Philipp Wynands, Wilfried Michel, Jan Rosendahl, Ralf Schlüter, Hermann Ney, 
Efficient Sequence Training of Attention Models Using Approximative Recombination.
ICASSP2022
Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney, 
Conformer-Based Hybrid ASR System For Switchboard Dataset.
ICASSP2022
Wei Zhou 0043, Zuoyun Zheng, Ralf Schlüter, Hermann Ney, 
On Language Model Integration for RNN Transducer Based Speech Recognition.
Interspeech2022
Felix Meyer, Wilfried Michel, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney, 
Automatic Learning of Subword Dependent Model Scales.
Interspeech2022
Zijian Yang, Yingbo Gao, Alexander Gerstenberger, Jintao Jiang, Ralf Schlüter, Hermann Ney, 
Self-Normalized Importance Sampling for Neural Language Modeling.
Interspeech2022
Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney, 
Improving the Training Recipe for a Robust Conformer-based Hybrid Model.
Interspeech2022
Wei Zhou 0043, Wilfried Michel, Ralf Schlüter, Hermann Ney, 
Efficient Training of Neural Transducer for Speech Recognition.
EMNLP2022
Viet Anh Khoa Tran, David Thulke, Yingbo Gao, Christian Herold, Hermann Ney, 
Does Joint Training Really Help Cascaded Speech Translation?
ICASSP2021
Wei Zhou 0043, Simon Berger, Ralf Schlüter, Hermann Ney, 
Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition.
Interspeech2021
Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlüter, Hermann Ney, 
On Sampling-Based Training Criteria for Neural Language Modeling.
Interspeech2021
Hermann Ney, 
Forty Years of Speech and Language Processing: From Bayes Decision Rule to Deep Learning.
Interspeech2021
Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
Investigating Methods to Improve Language Model Integration for Attention-Based Encoder-Decoder ASR Models.
SpeechComm2024
Detai Xin, Shinnosuke Takamichi, Hiroshi Saruwatari, 
JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions.
TASLP2024
Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe 0001, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis.
ICASSP2024
Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari, 
Diversity-Based Core-Set Selection for Text-to-Speech with Linguistic and Acoustic Features.
ICASSP2024
Shinnosuke Takamichi, Hiroki Maeda, Joonyong Park, Daisuke Saito, Hiroshi Saruwatari, 
Do Learned Speech Symbols Follow Zipf's Law?
ICASSP2024
Yoshihide Tomita, Shoichi Koyama, Hiroshi Saruwatari, 
Localizing Acoustic Energy in Sound Field Synthesis by Directionally Weighted Exterior Radiation Suppression.
TASLP2023
Takumi Abe, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari, 
Amplitude Matching for Multizone Sound Field Control.
TASLP2023
Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, Kazunobu Kondo, 
PoP-IDLMA: Product-of-Prior Independent Deeply Learned Matrix Analysis for Multichannel Music Source Separation.
ICASSP2023
Tomohiko Nakamura, Shinnosuke Takamichi, Naoko Tanji, Satoru Fukayama, Hiroshi Saruwatari, 
jaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus.
ICASSP2023
Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Detai Xin, Hiroshi Saruwatari, 
MID-Attribute Speaker Generation Using Optimal-Transport-Based Interpolation of Gaussian Mixture Models.
ICASSP2023
Detai Xin, Sharath Adavanne, Federico Ang, Ashish Kulkarni, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual Contexts.
ICASSP2023
Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari, 
Duration-Aware Pause Insertion Using Pre-Trained Language Model for Multi-Speaker Text-To-Speech.
Interspeech2023
Joonyong Park, Shinnosuke Takamichi, Tomohiko Nakamura, Kentaro Seki, Detai Xin, Hiroshi Saruwatari, 
How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics.
Interspeech2023
Yuki Saito, Eiji Iimori, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari, 
CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center.
Interspeech2023
Yuki Saito, Shinnosuke Takamichi, Eiji Iimori, Kentaro Tachibana, Hiroshi Saruwatari, 
ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings.
Interspeech2023
Yota Ueda, Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Hiroshi Saruwatari, 
HumanDiffusion: diffusion model using perceptual gradients.
Interspeech2023
Detai Xin, Shinnosuke Takamichi, Ai Morimatsu, Hiroshi Saruwatari, 
Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus.
IJCAI2023
Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe 0001, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining.
TASLP2022
Juliano G. C. Ribeiro, Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari, 
Region-to-Region Kernel Interpolation of Acoustic Transfer Functions Constrained by Physical Properties.
Interspeech2022
Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari, 
Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.
Interspeech2022
Yuto Nishimura, Yuki Saito, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari, 
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History.
TASLP2024
Yifan Chen, Gaofeng Cheng, Runyan Yang, Pengyuan Zhang, Yonghong Yan 0002, 
Interrelate Training and Clustering for Online Speaker Diarization.
TASLP2024
Han Zhu 0004, Gaofeng Cheng, Jindong Wang 0001, Wenxin Hou, Pengyuan Zhang, Yonghong Yan 0002, 
Boosting Cross-Domain Speech Recognition With Self-Supervision.
ICASSP2024
Jingze Lu, Yuxiang Zhang, Wenchao Wang, Zengqiang Shang, Pengyuan Zhang, 
One-Class Knowledge Distillation for Spoofing Speech Detection.
ICASSP2024
Yuxiang Zhang, Jingze Lu, Zengqiang Shang, Wenchao Wang, Pengyuan Zhang, 
Improving Short Utterance Anti-Spoofing with Aasist2.
SpeechComm2023
Feng Dang, Hangting Chen, Qi Hu, Pengyuan Zhang, Yonghong Yan 0002, 
First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement.
TASLP2023
Yuxiang Zhang, Zhuo Li, Jingze Lu, Hua Hua, Wenchao Wang, Pengyuan Zhang, 
The Impact of Silence on Speech Anti-Spoofing.
TASLP2023
Han Zhu 0004, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, Yonghong Yan 0002, 
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition.
ICASSP2023
Zhenduo Zhao, Zhuo Li, Wenchao Wang, Pengyuan Zhang, 
PCF: ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification.
TASLP2022
Changfeng Gao, Gaofeng Cheng, Ta Li, Pengyuan Zhang, Yonghong Yan 0002, 
Self-Supervised Pre-Training for Attention-Based Encoder-Decoder ASR Model.
ICASSP2022
Feng Dang, Hangting Chen, Pengyuan Zhang, 
DPT-FSNet: Dual-Path Transformer Based Full-Band and Sub-Band Fusion Network for Speech Enhancement.
ICASSP2022
Keqi Deng, Songjun Cao, Yike Zhang, Long Ma, Gaofeng Cheng, Ji Xu, Pengyuan Zhang, 
Improving CTC-Based Speech Recognition Via Knowledge Transferring from Pre-Trained Language Models.
ICASSP2022
Keqi Deng, Zehui Yang, Shinji Watanabe 0001, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang, 
Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models.
Interspeech2022
Yifan Chen, Yifan Guo, Qingxuan Li, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan 0002, 
Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization.
Interspeech2022
Hangting Chen, Yi Yang 0057, Feng Dang, Pengyuan Zhang, 
Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output.
Interspeech2022
Chengxin Chen, Pengyuan Zhang, 
CTA-RNN: Channel and Temporal-wise Attention RNN leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition.
Interspeech2022
Yukun Liu, Ta Li, Pengyuan Zhang, Yonghong Yan 0002, 
NAS-SCAE: Searching Compact Attention-based Encoders For End-to-end Automatic Speech Recognition.
Interspeech2022
Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie 0001, Yonghong Yan 0002, 
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset.
Interspeech2022
Lingxuan Ye, Gaofeng Cheng, Runyan Yang, Zehui Yang, Sanli Tian, Pengyuan Zhang, Yonghong Yan 0002, 
Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR by Fusing Speech Generation Methods.
Interspeech2022
Yuxiang Zhang, Zhuo Li, Wenchao Wang, Pengyuan Zhang, 
SASV Based on Pre-trained ASV System and Integrated Scoring Module.
Interspeech2022
Xueshuai Zhang, Jiakun Shen, Jun Zhou 0024, Pengyuan Zhang, Yonghong Yan 0002, Zhihua Huang, Yanfen Tang, Yu Wang, Fujie Zhang, Shaoxing Zhang, Aijun Sun, 
Robust Cough Feature Extraction and Classification Method for COVID-19 Cough Detection Based on Vocalization Characteristics.
TASLP2024
Tianchi Liu 0004, Kong Aik Lee, Qiongqiong Wang, Haizhou Li 0001, 
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification.
TASLP2024
Xuechen Liu, Md. Sahidullah, Kong Aik Lee, Tomi Kinnunen, 
Generalizing Speaker Verification for Spoof Awareness in the Embedding Space.
ICASSP2024
Shihao Chen, Liping Chen, Jie Zhang 0042, Kong-Aik Lee, Zhenhua Ling, Lirong Dai 0001, 
Adversarial Speech for Voice Privacy Protection from Personalized Speech Generation.
ICASSP2024
Liping Chen, Kong Aik Lee, Wu Guo, Zhen-Hua Ling, 
Modeling Pseudo-Speaker Uncertainty in Voice Anonymization.
ICASSP2024
Yi Ma, Kong Aik Lee, Ville Hautamäki, Meng Ge, Haizhou Li 0001, 
Gradient Weighting for Speaker Verification in Extremely Low Signal-to-Noise Ratio.
ICASSP2024
Duc-Tuan Truong, Ruijie Tao, Jia Qi Yip, Kong Aik Lee, Eng Siong Chng, 
Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification.
ICASSP2024
Linjuan Zhang, Kong Aik Lee, Lin Zhang, Longbiao Wang, Baoning Niu, 
CPAUG: Refining Copy-Paste Augmentation for Speech Anti-Spoofing.
TASLP2023
Xuechen Liu, Xin Wang 0037, Md. Sahidullah, Jose Patino 0001, Héctor Delgado, Tomi Kinnunen, Massimiliano Todisco, Junichi Yamagishi, Nicholas W. D. Evans, Andreas Nautsch, Kong Aik Lee, 
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild.
TASLP2023
Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li 0001, 
Self-Supervised Training of Speaker Encoder With Multi-Modal Diverse Positive Pairs.
TASLP2023
Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, Helen Meng, 
Meta-Generalization for Domain-Invariant Speaker Verification.
ICASSP2023
Hui Chen, Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, 
Self-Supervised Audio-Visual Speaker Representation with Co-Meta Learning.
ICASSP2023
Xiaohui Liu, Meng Liu, Longbiao Wang, Kong Aik Lee, Hanyi Zhang, Jianwu Dang 0001, 
Leveraging Positional-Related Local-Global Dependency for Synthetic Speech Detection.
ICASSP2023
Meng Liu, Kong Aik Lee, Longbiao Wang, Hanyi Zhang, Chang Zeng, Jianwu Dang 0001, 
Cross-Modal Audio-Visual Co-Learning for Text-Independent Speaker Verification.
ICASSP2023
Alexey Sholokhov, Nikita Kuzmin, Kong Aik Lee, Eng Siong Chng, 
Probabilistic Back-ends for Online Speaker Recognition and Clustering.
ICASSP2023
Yao Sun, Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, 
Noise-Disentanglement Metric Learning for Robust Speaker Verification.
ICASSP2023
Ruijie Tao, Kong Aik Lee, Zhan Shi, Haizhou Li 0001, 
Speaker Recognition with Two-Step Multi-Modal Deep Cleansing.
ICASSP2023
Qiongqiong Wang, Kong Aik Lee, Tianchi Liu 0004, 
Incorporating Uncertainty from Speaker Embedding Estimation to Speaker Verification.
Interspeech2023
Xuechen Liu, Md. Sahidullah, Kong Aik Lee, Tomi Kinnunen, 
Speaker-Aware Anti-spoofing.
Interspeech2023
Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang 0037, Xuechen Liu, Md. Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas W. D. Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung, 
Towards Single Integrated Spoofing-aware Speaker Verification Embeddings.
NeurIPS2023
Tianchi Liu 0004, Kong Aik Lee, Qiongqiong Wang, Haizhou Li 0001, 
Disentangling Voice and Content with Self-Supervision for Speaker Recognition.
ICASSP2024
Shaoshi Ling, Yuxuan Hu, Shuangbei Qian, Guoli Ye, Yao Qian, Yifan Gong 0001, Ed Lin, Michael Zeng 0001, 
Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition.
ICASSP2022
Liang Lu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Endpoint Detection for Streaming End-to-End Multi-Talker ASR.
ICASSP2022
Guoli Ye, Vadim Mazalov, Jinyu Li 0001, Yifan Gong 0001, 
Have Best of Both Worlds: Two-Pass Hybrid and E2E Cascading Framework for Speech Recognition.
Interspeech2022
Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li 0001, Xie Chen 0001, Yu Wu 0012, Yifan Gong 0001, 
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.
TASLP2021
Peidong Wang, Zhuo Chen 0006, DeLiang Wang, Jinyu Li 0001, Yifan Gong 0001, 
Speaker Separation Using Speaker Inventories and Estimated Speech.
ICASSP2021
Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu 0001, Xie Chen 0001, Jinyu Li 0001, Yifan Gong 0001, 
Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.
ICASSP2021
Eric Sun, Liang Lu 0001, Zhong Meng, Yifan Gong 0001, 
Sequence-Level Self-Teaching Regularization.
ICASSP2021
Jeremy Heng Meng Wong, Dimitrios Dimitriadis, Ken'ichi Kumatani, Yashesh Gaur, George Polovets, Partha Parthasarathy, Eric Sun, Jinyu Li 0001, Yifan Gong 0001, 
Ensemble Combination between Different Time Segmentations.
ICASSP2021
Jeremy Heng Meng Wong, Xiong Xiao, Yifan Gong 0001, 
Hidden Markov Model Diarisation with Speaker Location Information.
ICASSP2021
Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu 0001, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Interspeech2021
Liang Lu 0001, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001, 
Streaming Multi-Talker Speech Recognition with Joint Speaker Identification.
Interspeech2021
Liang Lu 0001, Zhong Meng, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001, 
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.
Interspeech2021
Yan Huang 0028, Guoli Ye, Jinyu Li 0001, Yifan Gong 0001, 
Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need.
Interspeech2021
Yan Deng, Rui Zhao 0017, Zhong Meng, Xie Chen 0001, Bing Liu, Jinyu Li 0001, Yifan Gong 0001, Lei He 0005, 
Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS.
Interspeech2021
Vikas Joshi, Amit Das 0007, Eric Sun, Rupesh R. Mehta, Jinyu Li 0001, Yifan Gong 0001, 
Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems.
Interspeech2021
Zhong Meng, Yu Wu 0012, Naoyuki Kanda, Liang Lu 0001, Xie Chen 0001, Guoli Ye, Eric Sun, Jinyu Li 0001, Yifan Gong 0001, 
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.
Interspeech2021
Eric Sun, Jinyu Li 0001, Zhong Meng, Yu Wu 0012, Jian Xue, Shujie Liu 0001, Yifan Gong 0001, 
Improving Multilingual Transformer Transducer Models by Reducing Language Confusions.
ICASSP2020
Yan Huang 0028, Yifan Gong 0001, 
Acoustic Model Adaptation for Presentation Transcription and Intelligent Meeting Assistant Systems.
ICASSP2020
Yan Huang 0028, Lei He 0005, Wenning Wei, William Gale, Jinyu Li 0001, Yifan Gong 0001, 
Using Personalized Speech Synthesis and Neural Language Generator for Rapid Speaker Adaptation.
ICASSP2020
Hu Hu, Rui Zhao 0017, Jinyu Li 0001, Liang Lu 0001, Yifan Gong 0001, 
Exploring Pre-Training with Alignments for RNN Transducer Based End-to-End Speech Recognition.
ICASSP2024
Ruizhe Huang, Xiaohui Zhang 0007, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe 0001, Daniel Povey, Sanjeev Khudanpur, 
Less Peaky and More Accurate CTC Forced Alignment by Label Priors.
ICASSP2024
Wei Kang 0006, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel Povey, 
Libriheavy: A 50, 000 Hours ASR Corpus with Punctuation Casing and Context.
ICASSP2024
Xiaoyu Yang, Wei Kang 0006, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey, 
PromptASR for Contextualized ASR with Controllable Style.
ICASSP2024
Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu 0004, Daniel Povey, Xie Chen 0001, 
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.
ICLR2024
Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang 0006, Fangjun Kuang, Yifan Yang, Zengrui Jin, Long Lin, Daniel Povey, 
Zipformer: A faster and better encoder for automatic speech recognition.
TASLP2023
Desh Raj, Daniel Povey, Sanjeev Khudanpur, 
SURT 2.0: Advances in Transducer-Based Multi-Talker Speech Recognition.
TASLP2023
Han Zhu 0004, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, Yonghong Yan 0002, 
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition.
ICASSP2023
Liyong Guo, Xiaoyu Yang, Quandong Wang, Yuxiang Kong, Zengwei Yao, Fan Cui, Fangjun Kuang, Wei Kang 0006, Long Lin, Mingshuang Luo, Piotr Zelasko, Daniel Povey, 
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation.
ICASSP2023
Ruizhe Huang, Matthew Wiesner, Leibny Paola García-Perera, Daniel Povey, Jan Trmal, Sanjeev Khudanpur, 
Building Keyword Search System from End-To-End Asr Systems.
ICASSP2023
Wei Kang 0006, Zengwei Yao, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Long Lin, Piotr Zelasko, Daniel Povey, 
Delay-Penalized Transducer for Low-Latency Streaming ASR.
Interspeech2023
Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola García, Daniel Povey, Sanjeev Khudanpur, 
Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts.
Interspeech2023
Desh Raj, Daniel Povey, Sanjeev Khudanpur, 
GPU-accelerated Guided Source Separation for Meeting Transcription.
Interspeech2023
Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang 0006, Fangjun Kuang, Long Lin, Xie Chen 0001, Daniel Povey, 
Blank-regularized CTC for Frame Skipping in Neural Transducer.
Interspeech2023
Zengwei Yao, Wei Kang 0006, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Yifan Yang, Long Lin, Daniel Povey, 
Delay-penalized CTC Implemented Based on Finite State Transducer.
Interspeech2022
Fangjun Kuang, Liyong Guo, Wei Kang 0006, Long Lin, Mingshuang Luo, Zengwei Yao, Daniel Povey, 
Pruned RNN-T for fast, memory-efficient ASR training.
ICASSP2021
Hang Lv 0001, Zhehuai Chen, Hainan Xu, Daniel Povey, Lei Xie 0001, Sanjeev Khudanpur, 
An Asynchronous WFST-Based Decoder for Automatic Speech Recognition.
ICASSP2021
Kyu Jeong Han, Jing Pan, Venkata Krishna Naveen Tadala, Tao Ma, Dan Povey, 
Multistream CNN for Robust Acoustic Modeling.
ICASSP2021
Ke Li 0018, Daniel Povey, Sanjeev Khudanpur, 
A Parallelizable Lattice Rescoring Strategy with Neural Language Models.
ICASSP2021
Yiming Wang 0006, Hang Lv 0001, Daniel Povey, Lei Xie 0001, Sanjeev Khudanpur, 
Wake Word Detection with Streaming Transformers.
Interspeech2021
Guoguo Chen, Shuzhou Chai, Guan-Bo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su 0002, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe 0001, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, Zhiyong Yan, 
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.
TASLP2024
Xiaofei Wang 0007, Manthan Thakker, Zhuo Chen 0006, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu 0001, Jinyu Li 0001, Takuya Yoshioka, 
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
TASLP2024
Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu 0012, Shujie Liu 0001, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Furu Wei, 
VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation.
ICASSP2024
Jian Wu 0027, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao 0017, Zhuo Chen 0006, Jinyu Li 0001, 
T-SOT FNT: Streaming Multi-Talker ASR with Text-Only Domain Adaptation Capability.
ICASSP2023
Zhuo Chen 0006, Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiaofei Wang 0009, Takuya Yoshioka, Jinyu Li 0001, Sunit Sivasankaran, Sefik Emre Eskimez, 
Speech Separation with Large-Scale Self-Supervised Learning.
ICASSP2023
Quchen Fu, Szu-Wei Fu, Yaran Fan, Yu Wu 0012, Zhuo Chen 0006, Jayant Gupchup, Ross Cutler, 
Real-Time Speech Interruption Analysis: from Cloud to Client Deployment.
ICASSP2023
Zili Huang, Zhuo Chen 0006, Naoyuki Kanda, Jian Wu 0027, Yiming Wang, Jinyu Li 0001, Takuya Yoshioka, Xiaofei Wang 0009, Peidong Wang, 
Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.
ICASSP2023
Naoyuki Kanda, Jian Wu 0027, Xiaofei Wang 0009, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
ICASSP2023
Chenda Li, Yao Qian, Zhuo Chen 0006, Dongmei Wang, Takuya Yoshioka, Shujie Liu 0001, Yanmin Qian, Michael Zeng 0001, 
Target Sound Extraction with Variable Cross-Modality Clues.
ICASSP2023
Heming Wang, Yao Qian, Hemin Yang, Nauyuki Kanda, Peidong Wang, Takuya Yoshioka, Xiaofei Wang 0009, Yiming Wang, Shujie Liu 0001, Zhuo Chen 0006, DeLiang Wang, Michael Zeng 0001, 
DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks.
ICASSP2023
Jian Wu 0027, Zhuo Chen 0006, Min Hu, Xiong Xiao, Jinyu Li 0001, 
Speaker Change Detection For Transformer Transducer ASR.
ICASSP2023
Muqiao Yang, Naoyuki Kanda, Xiaofei Wang 0009, Jian Wu 0027, Sunit Sivasankaran, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.
Interspeech2023
Chenda Li, Yao Qian, Zhuo Chen 0006, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng 0001, 
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers.
Interspeech2023
Midia Yousefi, Naoyuki Kanda, Dongmei Wang, Zhuo Chen 0006, Xiaofei Wang 0009, Takuya Yoshioka, 
Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach.
ICML2023
Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Shujie Liu 0001, Daniel Tompkins, Zhuo Chen 0006, Wanxiang Che, Xiangzhan Yu, Furu Wei, 
BEATs: Audio Pre-Training with Acoustic Tokenizers.
TASLP2022
Chenda Li, Zhuo Chen 0006, Yanmin Qian, 
Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation.
ICASSP2022
Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Zhengyang Chen, Zhuo Chen 0006, Shujie Liu 0001, Jian Wu 0027, Yao Qian, Furu Wei, Jinyu Li 0001, Xiangzhan Yu, 
Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training.
ICASSP2022
Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang 0009, Zhuo Chen 0006, Xuedong Huang 0001, 
Personalized speech enhancement: new models and Comprehensive evaluation.
ICASSP2022
Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.
ICASSP2022
Desh Raj, Liang Lu 0001, Zhuo Chen 0006, Yashesh Gaur, Jinyu Li 0001, 
Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.
ICASSP2022
Yixuan Zhang 0005, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li 0001, 
Continuous Speech Separation with Recurrent Selective Attention Network.
ICASSP2023
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix, Ryo Masumura, 
Leveraging Large Text Corpora For End-To-End Speech Summarization.
ICASSP2023
Saki Mizuno, Nobukatsu Hojo, Satoshi Kobashikawa, Ryo Masumura, 
Next-Speaker Prediction Based on Non-Verbal Information in Multi-Party Video Conversation.
ICASSP2023
Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, 
Improving Scheduled Sampling for Neural Transducer-Based ASR.
ICASSP2023
Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Hiroshi Sato, Taiga Yamane, Takanori Ashihara, Kohei Matsuura, Takafumi Moriya, 
Leveraging Language Embeddings for Cross-Lingual Self-Supervised Speech Representation Learning.
Interspeech2023
Nobukatsu Hojo, Saki Mizuno, Satoshi Kobashikawa, Ryo Masumura, Mana Ihori, Hiroshi Sato, Tomohiro Tanaka, 
Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer.
Interspeech2023
Mana Ihori, Hiroshi Sato, Tomohiro Tanaka, Ryo Masumura, Saki Mizuno, Nobukatsu Hojo, 
Transcribing Speech as Spoken and Written Dual Text Using an Autoregressive Model.
Interspeech2023
Yuki Kitagishi, Naohiro Tawara, Atsunori Ogawa, Ryo Masumura, Taichi Asami, 
What are differences? Comparing DNN and Human by Their Performance and Characteristics in Speaker Age Estimation.
Interspeech2023
Naoki Makishima, Keita Suzuki, Satoshi Suzuki, Atsushi Ando, Ryo Masumura, 
Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-level Timestamp Prediction.
Interspeech2023
Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando, 
End-to-End Joint Target and Non-Target Speakers ASR.
Interspeech2023
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami, 
Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data.
Interspeech2023
Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo, 
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss.
ICASSP2022
Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix, Takahiro Shinozaki, 
Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.
Interspeech2022
Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura, 
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data.
Interspeech2022
Ryo Masumura, Yoshihiro Yamazaki, Saki Mizuno, Naoki Makishima, Mana Ihori, Mihiro Uchida, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Shota Orihashi, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando, 
End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.
Interspeech2022
Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari, 
Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.
Interspeech2022
Fumio Nihei, Ryo Ishii, Yukiko I. Nakano, Kyosuke Nishida, Ryo Masumura, Atsushi Fukayama, Takao Nakamura, 
Dialogue Acts Aided Important Utterance Detection Based on Multiparty and Multimodal Information.
Interspeech2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
Interspeech2022
Akihiko Takashima, Ryo Masumura, Atsushi Ando, Yoshihiro Yamazaki, Mihiro Uchida, Shota Orihashi, 
Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition.
Interspeech2022
Tomohiro Tanaka, Ryo Masumura, Hiroshi Sato, Mana Ihori, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, 
Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks.
ICASSP2021
Atsushi Ando, Ryo Masumura, Hiroshi Sato, Takafumi Moriya, Takanori Ashihara, Yusuke Ijima, Tomoki Toda, 
Speech Emotion Recognition Based on Listener Adaptive Models.
TASLP2024
Xiaofei Wang 0007, Manthan Thakker, Zhuo Chen 0006, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu 0001, Jinyu Li 0001, Takuya Yoshioka, 
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
ICASSP2024
Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Midia Yousefi, Takuya Yoshioka, Jian Wu, 
Profile-Error-Tolerant Target-Speaker Voice Activity Detection.
ICASSP2024
Jian Wu 0027, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao 0017, Zhuo Chen 0006, Jinyu Li 0001, 
T-SOT FNT: Streaming Multi-Talker ASR with Text-Only Domain Adaptation Capability.
ICASSP2024
Mu Yang, Naoyuki Kanda, Xiaofei Wang 0009, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li 0001, Takuya Yoshioka, 
Diarist: Streaming Speech Translation with Speaker Diarization.
NAACL-Findings2024
Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu 0001, Dongdong Chen 0001, Yao Qian, Xuemei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao 0004, Yu Shi 0001, Lu Yuan, Takuya Yoshioka, Michael Zeng 0001, Xuedong Huang 0001, 
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data.
ICASSP2023
Zhuo Chen 0006, Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiaofei Wang 0009, Takuya Yoshioka, Jinyu Li 0001, Sunit Sivasankaran, Sefik Emre Eskimez, 
Speech Separation with Large-Scale Self-Supervised Learning.
ICASSP2023
Zili Huang, Zhuo Chen 0006, Naoyuki Kanda, Jian Wu 0027, Yiming Wang, Jinyu Li 0001, Takuya Yoshioka, Xiaofei Wang 0009, Peidong Wang, 
Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.
ICASSP2023
Naoyuki Kanda, Jian Wu 0027, Xiaofei Wang 0009, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
ICASSP2023
Chenda Li, Yao Qian, Zhuo Chen 0006, Dongmei Wang, Takuya Yoshioka, Shujie Liu 0001, Yanmin Qian, Michael Zeng 0001, 
Target Sound Extraction with Variable Cross-Modality Clues.
ICASSP2023
Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka, 
Breaking the Trade-Off in Personalized Speech Enhancement With Cross-Task Knowledge Distillation.
ICASSP2023
Heming Wang, Yao Qian, Hemin Yang, Nauyuki Kanda, Peidong Wang, Takuya Yoshioka, Xiaofei Wang 0009, Yiming Wang, Shujie Liu 0001, Zhuo Chen 0006, DeLiang Wang, Michael Zeng 0001, 
DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks.
ICASSP2023
Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu 0027, 
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-To-End Neural Diarization.
ICASSP2023
Muqiao Yang, Naoyuki Kanda, Xiaofei Wang 0009, Jian Wu 0027, Sunit Sivasankaran, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.
Interspeech2023
Sefik Emre Eskimez, Takuya Yoshioka, Alex Ju, Min Tang, Tanel Pärnamaa, Huaming Wang, 
Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation.
Interspeech2023
Naoyuki Kanda, Takuya Yoshioka, Yang Liu, 
Factual Consistency Oriented Speech Recognition.
Interspeech2023
Chenda Li, Yao Qian, Zhuo Chen 0006, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng 0001, 
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers.
Interspeech2023
Midia Yousefi, Naoyuki Kanda, Dongmei Wang, Zhuo Chen 0006, Xiaofei Wang 0009, Takuya Yoshioka, 
Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach.
ICASSP2022
Harishchandra Dubey, Vishak Gopal, Ross Cutler, Ashkan Aazami, Sergiy Matusevych, Sebastian Braun, Sefik Emre Eskimez, Manthan Thakker, Takuya Yoshioka, Hannes Gamper, Robert Aichner, 
Icassp 2022 Deep Noise Suppression Challenge.
ICASSP2022
Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang 0009, Zhuo Chen 0006, Xuedong Huang 0001, 
Personalized speech enhancement: new models and Comprehensive evaluation.
ICASSP2022
Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.
TASLP2024
Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki, Shoji Makino, 
Blind and Spatially-Regularized Online Joint Optimization of Source Separation, Dereverberation, and Noise Reduction.
TASLP2023
Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Yasunori Ohishi, Shoko Araki, 
SoundBeam: Target Sound Extraction Conditioned on Sound-Class Labels and Enrollment Clues for Increased Performance and Continuous Learning.
TASLP2023
Thilo von Neumann, Keisuke Kinoshita, Christoph Böddeker, Marc Delcroix, Reinhold Haeb-Umbach, 
Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation Criteria.
TASLP2023
Hiroshi Sawada, Rintaro Ikeshita, Keisuke Kinoshita, Tomohiro Nakatani, 
Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined Blind Source Separation and Dereverberation.
ICASSP2023
Thilo von Neumann, Christoph Böddeker, Keisuke Kinoshita, Marc Delcroix, Reinhold Haeb-Umbach, 
On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems.
ICASSP2022
Naoyuki Kamo, Rintaro Ikeshita, Keisuke Kinoshita, Tomohiro Nakatani, 
Importance of Switch Optimization Criterion in Switching WPE Dereverberation.
ICASSP2022
Keisuke Kinoshita, Marc Delcroix, Tomoharu Iwata, 
Tight Integration Of Neural- And Clustering-Based Diarization Through Deep Unfolding Of Infinite Gaussian Mixture Model.
ICASSP2022
Thilo von Neumann, Keisuke Kinoshita, Christoph Böddeker, Marc Delcroix, Reinhold Haeb-Umbach, 
SA-SDR: A Novel Loss Function for Separation of Meeting Style Data.
ICASSP2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya, 
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.
ICASSP2022
Hiroshi Sawada, Rintaro Ikeshita, Keisuke Kinoshita, Tomohiro Nakatani, 
Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined BSS in Reverberant Environments.
Interspeech2022
Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani, 
Listen only to me! How well can target speech extraction handle false alarms?
Interspeech2022
Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Böddeker, Reinhold Haeb-Umbach, 
Utterance-by-utterance overlap-aware neural diarization with Graph-PIT.
Interspeech2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
ICASSP2021
Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach, 
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.
ICASSP2021
Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani, 
Speaker Activity Driven Neural Speech Extraction.
ICASSP2021
Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara, 
Integrating End-to-End Neural and Clustering-Based Diarization: Getting the Best of Both Worlds.
ICASSP2021
Chenda Li, Zhuo Chen 0006, Yi Luo 0004, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe 0001, Yanmin Qian, 
Dual-Path Modeling for Long Recording Speech Separation in Meetings.
ICASSP2021
Julio Wissing, Benedikt T. Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura, 
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain.
Interspeech2021
Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Shoko Araki, 
Few-Shot Learning of New Sound Classes for Target Sound Extraction.
Interspeech2021
Cong Han, Yi Luo 0004, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe 0001, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen 0006, 
Continuous Speech Separation Using Speaker Inventory for Long Recording.
SpeechComm2024
Cunhang Fan, Jun Xue, Shunbo Dong, Mingming Ding, Jiangyan Yi, Jinpeng Li, Zhao Lv, 
Subband fusion of complex spectrogram for fake speech detection.
TASLP2024
Cunhang Fan, Mingming Ding, Jianhua Tao 0001, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Zhao Lv, 
Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection.
ICASSP2024
Yong Ren, Tao Wang 0074, Jiangyan Yi, Le Xu, Jianhua Tao 0001, Chu Yuan Zhang, Junzuo Zhou, 
Fewer-Token Neural Speech Codec with Time-Invariant Codes.
ICASSP2024
Chenglong Wang, Jiayi He, Jiangyan Yi, Jianhua Tao 0001, Chu Yuan Zhang, Xiaohui Zhang 0006, 
Multi-Scale Permutation Entropy for Audio Deepfake Detection.
AAAI2024
Xiaohui Zhang 0006, Jiangyan Yi, Chenglong Wang, Chu Yuan Zhang, Siding Zeng, Jianhua Tao 0001, 
What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection.
SpeechComm2023
Jiangyan Yi, Jianhua Tao 0001, Ye Bai, Zhengkun Tian, Cunhang Fan, 
Transfer knowledge for punctuation prediction via adversarial training.
TASLP2023
Jiangyan Yi, Jianhua Tao 0001, Ruibo Fu, Tao Wang 0074, Chu Yuan Zhang, Chenglong Wang, 
Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings.
ICASSP2023
Guanjun Li, Wei Xue, Wenju Liu, Jiangyan Yi, Jianhua Tao 0001, 
GCC-Speaker: Target Speaker Localization with Optimal Speaker-Dependent Weighting in Multi-Speaker Scenarios.
ICASSP2023
Jun Xue, Cunhang Fan, Jiangyan Yi, Chenglong Wang, Zhengqi Wen, Dan Zhang, Zhao Lv, 
Learning From Yourself: A Self-Distillation Method For Fake Speech Detection.
Interspeech2023
Chenglong Wang, Jiangyan Yi, Jianhua Tao 0001, Chu Yuan Zhang, Shuai Zhang 0014, Xun Chen, 
Detection of Cross-Dataset Fake Audio Based on Prosodic and Pronunciation Features.
Interspeech2023
Chenglong Wang, Jiangyan Yi, Jianhua Tao 0001, Chu Yuan Zhang, Shuai Zhang 0014, Ruibo Fu, Xun Chen, 
TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection.
ICML2023
Xiaohui Zhang 0006, Jiangyan Yi, Jianhua Tao 0001, Chenglong Wang, Chu Yuan Zhang, 
Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection.
TASLP2022
Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao 0001, Zhengqi Wen, 
NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.
TASLP2022
Tao Wang 0074, Jiangyan Yi, Ruibo Fu, Jianhua Tao 0001, Zhengqi Wen, 
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.
ICASSP2022
Tao Wang 0074, Jiangyan Yi, Liqun Deng, Ruibo Fu, Jianhua Tao 0001, Zhengqi Wen, 
Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.
Interspeech2022
Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Jianhua Tao 0001, Yu Ting Yeung, Liqun Deng, 
reducing multilingual context confusion for end-to-end code-switching automatic speech recognition.
TASLP2021
Ye Bai, Jiangyan Yi, Jianhua Tao 0001, Zhengkun Tian, Zhengqi Wen, Shuai Zhang 0014, 
Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.
TASLP2021
Ye Bai, Jiangyan Yi, Jianhua Tao 0001, Zhengqi Wen, Zhengkun Tian, Shuai Zhang 0014, 
Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.
TASLP2021
Cunhang Fan, Jiangyan Yi, Jianhua Tao 0001, Zhengkun Tian, Bin Liu 0041, Zhengqi Wen, 
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.
ICASSP2021
Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao 0001, Zhengqi Wen, 
Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.
TASLP2024
Dongchao Yang, Songxiang Liu, Rongjie Huang, Chao Weng, Helen Meng, 
InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt.
ICASSP2024
Hangting Chen, Jianwei Yu, Chao Weng, 
Complexity Scaling for Speech Denoising.
ICASSP2024
Jianwei Cui, Yu Gu, Chao Weng, Jie Zhang 0042, Liping Chen, Lirong Dai 0001, 
Sifisinger: A High-Fidelity End-to-End Singing Voice Synthesizer Based on Source-Filter Model.
ICASSP2024
Yu Gu, Qiushi Zhu, Guangzhi Lei, Chao Weng, Dan Su 0002, 
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis.
ICASSP2024
Andong Li, Rilin Chen, Yu Gu, Chao Weng, Dan Su, 
Opine: Leveraging a Optimization-Inspired Deep Unfolding Method for Multi-Channel Speech Enhancement.
ACL2024
Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Jinchuan Tian, Zhenhui Ye, Luping Liu, Zehan Wang 0001, Ziyue Jiang 0001, Xuankai Chang, Jiatong Shi, Chao Weng, Zhou Zhao, Dong Yu 0001, 
Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.
TASLP2023
Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu 0001, 
Integrating Lattice-Free MMI Into End-to-End Speech Recognition.
TASLP2023
Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu 0001, 
Diffsound: Discrete Diffusion Model for Text-to-Sound Generation.
ICASSP2023
Jianwei Yu, Hangting Chen, Yi Luo 0004, Rongzhi Gu, Weihua Li, Chao Weng, 
TSpeech-AI System Description to the 5th Deep Noise Suppression (DNS) Challenge.
Interspeech2023
Xiang Li 0105, Songxiang Liu, Max W. Y. Lam, Zhiyong Wu 0001, Chao Weng, Helen Meng, 
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model.
Interspeech2023
Hangting Chen, Jianwei Yu, Yi Luo 0004, Rongzhi Gu, Weihua Li, Zhuocheng Lu, Chao Weng, 
Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression.
Interspeech2023
Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu 0001, Shinji Watanabe 0001, 
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction.
Interspeech2023
Dongchao Yang, Songxiang Liu, Helin Wang, Jianwei Yu, Chao Weng, Yuexian Zou, 
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS.
Interspeech2023
Jianwei Yu, Hangting Chen, Yi Luo 0004, Rongzhi Gu, Chao Weng, 
High Fidelity Speech Enhancement with Band-split RNN.
ICASSP2022
Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu 0001, Helen Meng, Chao Weng, Dan Su 0002, 
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.
ICASSP2022
Xiaoyi Qin, Na Li 0012, Chao Weng, Dan Su 0002, Ming Li 0026, 
Simple Attention Module Based Speaker Verification with Iterative Noisy Label Detection.
ICASSP2022
Brian Yan, Chunlei Zhang, Meng Yu 0003, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe 0001, Dong Yu 0001, 
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.
ICASSP2022
Chunlei Zhang, Jiatong Shi, Chao Weng, Meng Yu 0003, Dong Yu 0001, 
Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.
ICASSP2022
Naijun Zheng, Na Li 0012, Xixin Wu, Lingwei Meng, Jiawen Kang 0002, Haibin Wu, Chao Weng, Dan Su 0002, Helen Meng, 
The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
ICASSP2022
Naijun Zheng, Na Li 0012, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
Multi-Channel Speaker Diarization Using Spatial Features for Meetings.
ICASSP2024
Zhiyun Fan, Linhao Dong, Jun Zhang 0066, Lu Lu 0015, Zejun Ma, 
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR.
ICASSP2024
Yerbolat Khassanov, Zhipeng Chen, Tianfeng Chen, Tze Yuang Chong, Wei Li, Lu Lu, Zejun Ma, 
Extending Multilingual ASR to New Languages Using Supplementary Encoder and Decoder Components.
ICASSP2024
Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan 0019, Wei Li 0119, Lu Lu 0015, Zejun Ma, Chao Zhang 0031, 
Extending Large Language Models for Speech and Audio Captioning.
ICASSP2024
Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan 0019, Wei Li 0119, Lu Lu 0015, Zejun Ma, Chao Zhang 0031, 
Connecting Speech Encoder and Large Language Model for ASR.
ICML2024
Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan 0019, Wei Li 0119, Lu Lu 0015, Zejun Ma, Yuxuan Wang 0002, Chao Zhang 0031, 
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models.
ICLR2024
Ziyue Jiang 0001, Jinglin Liu, Yi Ren 0006, Jinzheng He, Zhenhui Ye, Shengpeng Ji, Qian Yang, Chen Zhang 0020, Pengfei Wei 0001, Chunfeng Wang, Xiang Yin 0006, Zejun Ma, Zhou Zhao, 
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis.
ICLR2024
Qianqian Dong, Zhiying Huang, Qi Tian 0001, Chen Xu 0008, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li 0001, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu 0015, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang 0002, 
PolyVoice: Language Models for Speech to Speech Translation.
ICASSP2023
Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li 0119, Zejun Ma, Tan Lee 0001, 
Leveraging Phone-Level Linguistic-Acoustic Similarity For Utterance-Level Pronunciation Scoring.
ICASSP2023
Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li 0119, Zejun Ma, Tan Lee 0001, 
An ASR-Free Fluency Scoring Approach with Self-Supervised Learning.
ICASSP2023
Rao Ma, Xiaobo Wu, Jin Qiu, Yanan Qin, Haihua Xu, Peihao Wu, Zejun Ma, 
Internal Language Model Estimation Based Adaptive Language Model Fusion for Domain Adaptation.
ICASSP2023
Chunfeng Wang, Peisong Huang, Yuxiang Zou, Haoyu Zhang, Shichao Liu 0003, Xiang Yin 0006, Zejun Ma, 
LiteG2P: A Fast, Light and High Accuracy Model for Grapheme-to-Phoneme Conversion.
Interspeech2023
Xianzhao Chen, Yist Y. Lin, Kang Wang, Yi He, Zejun Ma, 
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition.
Interspeech2023
Zhipeng Chen, Haihua Xu, Yerbolat Khassanov, Yi He, Lu Lu, Zejun Ma, Ji Wu 0002, 
Knowledge Distillation Approach for Efficient Internal Language Model Estimation.
Interspeech2023
Yahuan Cong, Haoyu Zhang, Haopeng Lin, Shichao Liu 0003, Chunfeng Wang, Yi Ren 0006, Xiang Yin 0006, Zejun Ma, 
GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech.
Interspeech2023
Zhiyun Fan, Linhao Dong, Chen Shen 0011, Zhenlin Liang, Jun Zhang 0066, Lu Lu 0015, Zejun Ma, 
Language-specific Boundary Learning for Improving Mandarin-English Code-switching Speech Recognition.
Interspeech2023
Kaiqi Fu, Shaojun Gao, Shuju Shi, Xiaohai Tian, Wei Li 0119, Zejun Ma, 
Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring.
Interspeech2023
Lu Huang, Boyu Li, Jun Zhang 0066, Lu Lu 0015, Zejun Ma, 
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer.
Interspeech2023
Yist Y. Lin, Tao Han, Haihua Xu, Van Tung Pham, Yerbolat Khassanov, Tze Yuang Chong, Yi He, Lu Lu 0015, Zejun Ma, 
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition.
Interspeech2023
Shuju Shi, Kaiqi Fu, Yiwei Gu, Xiaohai Tian, Shaojun Gao, Wei Li 0119, Zejun Ma, 
Disentangling the Contribution of Non-native Speech in Automated Pronunciation Assessment.
Interspeech2023
Kun Song, Yi Ren 0006, Yi Lei, Chunfeng Wang, Kun Wei, Lei Xie 0001, Xiang Yin 0006, Zejun Ma, 
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation.
TASLP2024
Sara Atito Ali Ahmed 0001, Muhammad Awais 0001, Wenwu Wang 0001, Mark D. Plumbley, Josef Kittler, 
ASiT: Local-Global Audio Spectrogram Vision Transformer for Event Classification.
TASLP2024
Yuanbo Hou, Bo Kang, Andrew Mitchell, Wenwu Wang 0001, Jian Kang 0002, Dick Botteldooren, 
Cooperative Scene-Event Modelling for Acoustic Scene Classification.
TASLP2024
Haohe Liu, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Qiao Tian, Yuping Wang, Wenwu Wang 0001, Yuxuan Wang 0002, Mark D. Plumbley, 
AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining.
TASLP2024
Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang 0001, 
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research.
ICASSP2024
Yaru Chen, Ruohao Guo, Xubo Liu, Peipei Wu, Guangyao Li, Zhenbo Li, Wenwu Wang 0001, 
CM-PIE: Cross-Modal Perception for Interactive-Enhanced Audio-Visual Video Parsing.
ICASSP2024
Haiyan Lan, Qiaoxi Zhu, Jian Guan 0001, Yuming Wei, Wenwu Wang 0001, 
Hierarchical Metadata Information Constrained Self-Supervised Learning for Anomalous Sound Detection under Domain Shift.
ICASSP2024
Yi Yuan, Haohe Liu, Xubo Liu, Qiushi Huang, Mark D. Plumbley, Wenwu Wang 0001, 
Retrieval-Augmented Text-to-Audio Generation.
AAAI2024
Haohe Liu, Xubo Liu, Qiuqiang Kong, Wenwu Wang 0001, Mark D. Plumbley, 
Learning Temporal Resolution in Spectrogram for Audio Classification.
TASLP2023
Yi Li, Yang Sun 0003, Wenwu Wang 0001, Syed Mohsen Naqvi, 
U-Shaped Transformer With Frequency-Band Aware Attention for Speech Enhancement.
TASLP2023
Weitao Yuan, Shengbei Wang, Jianming Wang, Masashi Unoki, Wenwu Wang 0001, 
Unsupervised Deep Unfolded Representation Learning for Singing Voice Separation.
TASLP2023
Yiming Zhang, Hong Yu 0006, Ruoyi Du, Zheng-Hua Tan, Wenwu Wang 0001, Zhanyu Ma, Yuan Dong, 
ACTUAL: Audio Captioning With Caption Feature Space Regularization.
ICASSP2023
Yuanbo Hou, Yun Wang, Wenwu Wang 0001, Dick Botteldooren, 
Gct: Gated Contextual Transformer for Sequential Audio Tagging.
ICASSP2023
Weitao Yuan, Yuren Bian, Shengbei Wang, Masashi Unoki, Wenwu Wang 0001, 
An Improved Optimal Transport Kernel Embedding Method with Gating Mechanism for Singing Voice Separation and Speaker Identification.
Interspeech2023
Yuanbo Hou, Siyang Song, Cheng Luo, Andrew Mitchell, Qiaoqiao Ren, Weicheng Xie 0001, Jian Kang 0002, Wenwu Wang 0001, Dick Botteldooren, 
Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning.
Interspeech2023
Jinhua Liang, Xubo Liu, Haohe Liu, Huy Phan, Emmanouil Benetos, Mark D. Plumbley, Wenwu Wang 0001, 
Adapting Language-Audio Models as Few-Shot Audio Learners.
Interspeech2023
Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang 0006, H. Lilian Tang, Mark D. Plumbley, Volkan Kiliç, Wenwu Wang 0001, 
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention.
Interspeech2023
Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang 0001, Mark D. Plumbley, 
Ontology-aware Learning and Evaluation for Audio Tagging.
Interspeech2023
Jianyuan Sun, Xubo Liu, Xinhao Mei, Volkan Kiliç, Mark D. Plumbley, Wenwu Wang 0001, 
Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning.
ICML2023
Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo P. Mandic, Wenwu Wang 0001, Mark D. Plumbley, 
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models.
ICASSP2022
Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang 0001, 
Diverse Audio Captioning Via Adversarial Training.
TASLP2024
Xun Gong 0005, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, Rui Zhao 0017, Xie Chen 0001, Yanmin Qian, 
Advanced Long-Content Speech Recognition With Factorized Neural Transducer.
TASLP2024
Xiaofei Wang 0007, Manthan Thakker, Zhuo Chen 0006, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu 0001, Jinyu Li 0001, Takuya Yoshioka, 
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
TASLP2024
Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu 0012, Shujie Liu 0001, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Furu Wei, 
VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation.
TASLP2024
Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu 0012, Shuo Ren, Shujie Liu 0001, Zhuoyuan Yao, Xun Gong 0005, Li-Rong Dai 0001, Jinyu Li 0001, Furu Wei, 
SpeechLM: Enhanced Speech Pre-Training With Unpaired Textual Data.
ICASSP2023
Yan Deng, Long Zhou, Yuanhao Yi, Shujie Liu 0001, Lei He 0005, 
Prosody-Aware Speecht5 for Expressive Neural TTS.
ICASSP2023
Xun Gong 0005, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, Rui Zhao 0017, Xie Chen 0001, Yanmin Qian, 
LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer.
ICASSP2023
Chenda Li, Yao Qian, Zhuo Chen 0006, Dongmei Wang, Takuya Yoshioka, Shujie Liu 0001, Yanmin Qian, Michael Zeng 0001, 
Target Sound Extraction with Variable Cross-Modality Clues.
ICASSP2023
Heming Wang, Yao Qian, Hemin Yang, Nauyuki Kanda, Peidong Wang, Takuya Yoshioka, Xiaofei Wang 0009, Yiming Wang, Shujie Liu 0001, Zhuo Chen 0006, DeLiang Wang, Michael Zeng 0001, 
DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks.
ICASSP2023
Kun Wei, Long Zhou, Ziqiang Zhang, Liping Chen, Shujie Liu 0001, Lei He 0005, Jinyu Li 0001, Furu Wei, 
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation.
ICASSP2023
Haibin Yu, Yuxuan Hu, Yao Qian, Ma Jin, Linquan Liu, Shujie Liu 0001, Yu Shi 0001, Yanmin Qian, Edward Lin, Michael Zeng 0001, 
Code-Switching Text Generation and Injection in Mandarin-English ASR.
ICASSP2023
Qiu-Shi Zhu, Long Zhou, Jie Zhang 0042, Shujie Liu 0001, Yu-Chen Hu, Li-Rong Dai 0001, 
Robust Data2VEC: Noise-Robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning.
Interspeech2023
Youngdo Ahn, Chengyi Wang 0002, Yu Wu 0012, Jong Won Shin, Shujie Liu 0001, 
GRAVO: Learning to Generate Relevant Audio from Visual Features with Noisy Online Videos.
Interspeech2023
Yuang Li, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, 
Accelerating Transducers through Adjacent Token Merging.
Interspeech2023
Peidong Wang, Eric Sun, Jian Xue, Yu Wu 0012, Long Zhou, Yashesh Gaur, Shujie Liu 0001, Jinyu Li 0001, 
LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers.
ICML2023
Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Shujie Liu 0001, Daniel Tompkins, Zhuo Chen 0006, Wanxiang Che, Xiangzhan Yu, Furu Wei, 
BEATs: Audio Pre-Training with Acoustic Tokenizers.
NeurIPS2023
Chenyang Le, Yao Qian, Long Zhou, Shujie Liu 0001, Yanmin Qian, Michael Zeng 0001, Xuedong Huang 0001, 
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation.
ICASSP2022
Zhengyang Chen, Sanyuan Chen, Yu Wu 0012, Yao Qian, Chengyi Wang 0002, Shujie Liu 0001, Yanmin Qian, Michael Zeng 0001, 
Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification.
ICASSP2022
Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Zhengyang Chen, Zhuo Chen 0006, Shujie Liu 0001, Jian Wu 0027, Yao Qian, Furu Wei, Jinyu Li 0001, Xiangzhan Yu, 
Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training.
ICASSP2022
Rui Wang 0073, Junyi Ao, Long Zhou, Shujie Liu 0001, Zhihua Wei 0001, Tom Ko, Qing Li 0001, Yu Zhang 0006, 
Multi-View Self-Attention Based Transformer for Speaker Recognition.
ICASSP2022
Heming Wang, Yao Qian, Xiaofei Wang 0009, Yiming Wang, Chengyi Wang 0002, Shujie Liu 0001, Takuya Yoshioka, Jinyu Li 0001, DeLiang Wang, 
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.
TASLP2024
Yifan Chen, Gaofeng Cheng, Runyan Yang, Pengyuan Zhang, Yonghong Yan 0002, 
Interrelate Training and Clustering for Online Speaker Diarization.
TASLP2024
Han Zhu 0004, Gaofeng Cheng, Jindong Wang 0001, Wenxin Hou, Pengyuan Zhang, Yonghong Yan 0002, 
Boosting Cross-Domain Speech Recognition With Self-Supervision.
SpeechComm2023
Feng Dang, Hangting Chen, Qi Hu, Pengyuan Zhang, Yonghong Yan 0002, 
First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement.
TASLP2023
Han Zhu 0004, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, Yonghong Yan 0002, 
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition.
TASLP2022
Gaofeng Cheng, Haoran Miao, Runyan Yang, Keqi Deng, Yonghong Yan 0002, 
ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture.
TASLP2022
Keqi Deng, Gaofeng Cheng, Runyan Yang, Yonghong Yan 0002, 
Alleviating ASR Long-Tailed Problem by Decoupling the Learning of Representation and Classification.
TASLP2022
Changfeng Gao, Gaofeng Cheng, Ta Li, Pengyuan Zhang, Yonghong Yan 0002, 
Self-Supervised Pre-Training for Attention-Based Encoder-Decoder ASR Model.
Interspeech2022
Yifan Chen, Yifan Guo, Qingxuan Li, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan 0002, 
Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization.
Interspeech2022
Zehan Li, Haoran Miao, Keqi Deng, Gaofeng Cheng, Sanli Tian, Ta Li, Yonghong Yan 0002, 
Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies.
Interspeech2022
Yukun Liu, Ta Li, Pengyuan Zhang, Yonghong Yan 0002, 
NAS-SCAE: Searching Compact Attention-based Encoders For End-to-end Automatic Speech Recognition.
Interspeech2022
Sanli Tian, Keqi Deng, Zehan Li, Lingxuan Ye, Gaofeng Cheng, Ta Li, Yonghong Yan 0002, 
Knowledge Distillation For CTC-based Speech Recognition Via Consistent Acoustic Representation Learning.
Interspeech2022
Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie 0001, Yonghong Yan 0002, 
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset.
Interspeech2022
Lingxuan Ye, Gaofeng Cheng, Runyan Yang, Zehui Yang, Sanli Tian, Pengyuan Zhang, Yonghong Yan 0002, 
Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR by Fusing Speech Generation Methods.
Interspeech2022
Xueshuai Zhang, Jiakun Shen, Jun Zhou 0024, Pengyuan Zhang, Yonghong Yan 0002, Zhihua Huang, Yanfen Tang, Yu Wang, Fujie Zhang, Shaoxing Zhang, Aijun Sun, 
Robust Cough Feature Extraction and Classification Method for COVID-19 Cough Detection Based on Vocalization Characteristics.
Interspeech2022
Han Zhu 0004, Li Wang, Gaofeng Cheng, Jindong Wang 0001, Pengyuan Zhang, Yonghong Yan 0002, 
Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR.
Interspeech2022
Han Zhu 0004, Jindong Wang 0001, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan 0002, 
Decoupled Federated Learning for ASR with Non-IID Data.
SpeechComm2021
Danyang Liu, Ji Xu, Pengyuan Zhang, Yonghong Yan 0002, 
A unified system for multilingual speech recognition and language identification.
TASLP2021
Longbiao Cheng, Xingwei Sun, Dingding Yao, Junfeng Li, Yonghong Yan 0002, 
Estimation Reliability Function Assisted Sound Source Localization With Enhanced Steering Vector Phase Difference.
TASLP2021
Runyan Yang, Gaofeng Cheng, Haoran Miao, Ta Li, Pengyuan Zhang, Yonghong Yan 0002, 
Keyword Search Using Attention-Based End-to-End ASR and Frame-Synchronous Phoneme Alignments.
ICASSP2021
Keqi Deng, Gaofeng Cheng, Haoran Miao, Pengyuan Zhang, Yonghong Yan 0002, 
History Utterance Embedding Transformer LM for Speech Recognition.
TASLP2024
Sei Ueno, Akinobu Lee, Tatsuya Kawahara, 
Refining Synthesized Speech Using Speaker Information and Phone Masking for Data Augmentation of Speech Recognition.
ICASSP2024
Yuan Gao, Hao Shi, Chenhui Chu, Tatsuya Kawahara, 
Enhancing Two-Stage Finetuning for Speech Emotion Recognition Using Adapters.
ICASSP2024
Hao Shi, Kazuki Shimada, Masato Hirano, Takashi Shibuya 0001, Yuichiro Koyama, Zhi Zhong, Shusuke Takahashi, Tatsuya Kawahara, Yuki Mitsufuji, 
Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders.
ICASSP2024
Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li 0010, Raj Dabre, Yi Zhao, Tatsuya Kawahara, 
MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction.
TASLP2023
Hirofumi Inaguma, Tatsuya Kawahara, 
Alignment Knowledge Distillation for Online Streaming Attention-Based Speech Recognition.
ICASSP2023
Soky Kak, Sheng Li 0010, Chenhui Chu, Tatsuya Kawahara, 
Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-Based Speech Recognition of Low-Resource Language.
ICASSP2023
Hao Shi, Masato Mimura, Longbiao Wang, Jianwu Dang 0001, Tatsuya Kawahara, 
Time-Domain Speech Enhancement Assisted by Multi-Resolution Frequency Encoder and Decoder.
Interspeech2023
Yuan Gao, Chenhui Chu, Tatsuya Kawahara, 
Two-stage Finetuning of Wav2vec 2.0 for Speech Emotion Recognition with ASR and Gender Pretraining.
Interspeech2023
Jaeyoung Lee, Masato Mimura, Tatsuya Kawahara, 
Embedding Articulatory Constraints for Low-resource Speech Recognition Based on Large Pre-trained Model.
TASLP2022
Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Mathieu Fontaine 0002, Kazuyoshi Yoshii, Tatsuya Kawahara, 
Autoregressive Moving Average Jointly-Diagonalizable Spatial Covariance Analysis for Joint Source Separation and Dereverberation.
ICASSP2022
Sei Ueno, Tatsuya Kawahara, 
Phone-Informed Refinement of Synthesized Mel Spectrogram for Data Augmentation in Speech Recognition.
ICASSP2022
Heran Zhang, Masato Mimura, Tatsuya Kawahara, Kenkichi Ishizuka, 
Selective Multi-Task Learning For Speech Emotion Recognition Using Corpora Of Different Styles.
Interspeech2022
Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara, 
Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM.
Interspeech2022
Soky Kak, Sheng Li 0010, Masato Mimura, Chenhui Chu, Tatsuya Kawahara, 
Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism.
Interspeech2022
Seiya Kawano, Muteki Arioka, Akishige Yuguchi, Kenta Yamamoto, Koji Inoue, Tatsuya Kawahara, Satoshi Nakamura 0001, Koichiro Yoshino, 
Multimodal Persuasive Dialogue Corpus using Teleoperated Android.
Interspeech2022
Jumon Nozaki, Tatsuya Kawahara, Kenkichi Ishizuka, Taiichi Hashimoto, 
End-to-end Speech-to-Punctuated-Text Recognition.
Interspeech2022
Hao Shi, Longbiao Wang, Sheng Li 0010, Jianwu Dang 0001, Tatsuya Kawahara, 
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.
ICASSP2021
Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe 0001, 
ORTHROS: non-autoregressive end-to-end speech translation With dual-decoder.
Interspeech2021
Hirofumi Inaguma, Tatsuya Kawahara, 
StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR.
Interspeech2021
Hirofumi Inaguma, Tatsuya Kawahara, 
VAD-Free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording.
TASLP2024
Cheng Gong, Xin Wang 0037, Erica Cooper, Dan Wells, Longbiao Wang, Jianwu Dang 0001, Korin Richmond, Junichi Yamagishi, 
ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations.
TASLP2024
Michele Panariello, Natalia A. Tomashenko, Xin Wang 0037, Xiaoxiao Miao, Pierre Champion, Hubert Nourtel, Massimiliano Todisco, Nicholas W. D. Evans, Emmanuel Vincent 0001, Junichi Yamagishi, 
The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation.
ICASSP2024
Xin Wang 0037, Junichi Yamagishi, 
Can Large-Scale Vocoded Spoofed Data Improve Speech Spoofing Countermeasure with a Self-Supervised Front End?
ICASSP2024
Wanying Ge, Xin Wang 0037, Junichi Yamagishi, Massimiliano Todisco, Nicholas W. D. Evans, 
Spoofing Attack Augmentation: Can Differently-Trained Attack Models Improve Generalisation?
ICASSP2024
Lauri Juvela, Xin Wang 0037, 
Collaborative Watermarking for Adversarial Speech Synthesis.
ICASSP2024
Xiaoxiao Miao, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Nicholas W. D. Evans, Massimiliano Todisco, Jean-François Bonastre, Mickael Rouvier, 
Synvox2: Towards A Privacy-Friendly Voxceleb2 Dataset.
SpeechComm2023
Shi Cheng, Jun Du, Shutong Niu, Alejandrina Cristià, Xin Wang 0037, Qing Wang 0008, Chin-Hui Lee 0001, 
Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions.
TASLP2023
Xuechen Liu, Xin Wang 0037, Md. Sahidullah, Jose Patino 0001, Héctor Delgado, Tomi Kinnunen, Massimiliano Todisco, Junichi Yamagishi, Nicholas W. D. Evans, Andreas Nautsch, Kong Aik Lee, 
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild.
TASLP2023
Xiaoxiao Miao, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Natalia A. Tomashenko, 
Speaker Anonymization Using Orthogonal Householder Neural Network.
TASLP2023
Lin Zhang, Xin Wang 0037, Erica Cooper, Nicholas W. D. Evans, Junichi Yamagishi, 
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance.
ICASSP2023
Paul-Gauthier Noé, Xiaoxiao Miao, Xin Wang 0037, Junichi Yamagishi, Jean-François Bonastre, Driss Matrouf, 
Hiding Speaker's Sex in Speech Using Zero-Evidence Speaker Representation in an Analysis/Synthesis Pipeline.
ICASSP2023
Xuan Shi, Erica Cooper, Xin Wang 0037, Junichi Yamagishi, Shrikanth Narayanan, 
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural Midi-to-Audio Synthesis Systems?
ICASSP2023
Xin Wang 0037, Junichi Yamagishi, 
Spoofed Training Data for Speech Spoofing Countermeasure Can Be Efficiently Created Using Neural Vocoders.
Interspeech2023
Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang 0037, Xuechen Liu, Md. Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas W. D. Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung, 
Towards Single Integrated Spoofing-aware Speaker Verification Embeddings.
Interspeech2023
Chang Zeng, Xin Wang 0037, Xiaoxiao Miao, Erica Cooper, Junichi Yamagishi, 
Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms.
Interspeech2023
Lin Zhang, Xin Wang 0037, Erica Cooper, Nicholas W. D. Evans, Junichi Yamagishi, 
Range-Based Equal Error Rate for Spoof Localization.
TASLP2022
Brij Mohan Lal Srivastava, Mohamed Maouche, Md. Sahidullah, Emmanuel Vincent 0001, Aurélien Bellet, Marc Tommasi, Natalia A. Tomashenko, Xin Wang 0037, Junichi Yamagishi, 
Privacy and Utility of X-Vector Based Speaker Anonymization.
ICASSP2022
Xin Wang 0037, Junichi Yamagishi, 
Estimating the Confidence of Speech Spoofing Countermeasure.
ICASSP2022
Chang Zeng, Xin Wang 0037, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi, 
Attention Back-End for Automatic Speaker Verification with Multiple Enrollment Utterances.
Interspeech2022
Xiaoxiao Miao, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Natalia A. Tomashenko, 
Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions.
ICASSP2024
Peng-Jen Chen, Bowen Shi, Kelvin Niu, Ann Lee 0001, Wei-Ning Hsu, 
M2BART: Multilingual and Multimodal Encoder-Decoder Pre-Training for Any-to-Any Machine Translation.
ICLR2024
Alexander H. Liu, Matthew Le 0001, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu, 
Generative Pre-training for Speech with Flow Matching.
ACL2024
HyoJung Han, Mohamed Anwar, Juan Pino 0001, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang, 
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception.
ICASSP2023
Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed, 
Continual Learning for On-Device Speech Recognition Using Disentangled Conformers.
ICASSP2023
Ali Elkahky, Wei-Ning Hsu, Paden Tomasello, Tu Anh Nguyen, Robin Algayres, Yossi Adi, Jade Copet, Emmanuel Dupoux, Abdelrahman Mohamed, 
Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training?
ICASSP2023
Maryam Fazel-Zarandi, Wei-Ning Hsu, 
Cocktail Hubert: Generalized Self-Supervised Pre-Training for Mixture and Single-Source Speech.
Interspeech2023
Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino 0001, Changhan Wang, 
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation.
Interspeech2023
Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarandi, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux, 
Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis.
ICML2023
Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli, 
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language.
NeurIPS2023
Matthew Le 0001, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu, 
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale.
NeurIPS2023
Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass, 
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning.
ACL2023
Changhan Wang, Hirofumi Inaguma, Peng-Jen Chen, Ilia Kulikov, Yun Tang 0002, Wei-Ning Hsu, Michael Auli, Juan Pino 0001, 
Simple and Effective Unsupervised Speech Translation.
ACL-Findings2023
Peng-Jen Chen, Kevin Tran, Yilin Yang, Jingfei Du, Justine Kao, Yu-An Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Pino 0001, Wei-Ning Hsu, Ann Lee 0001, 
Speech-to-Speech Translation for a Real-world Unwritten Language.
EMNLP-Findings2023
Ju-Chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli, 
Toward Joint Language Modeling for Speech Units and Text.
Interspeech2022
Alexander H. Liu, Cheng-I Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James R. Glass, 
Simple and Effective Unsupervised Speech Synthesis.
Interspeech2022
Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino 0001, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee 0001, 
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation.
Interspeech2022
Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed, 
Robust Self-Supervised Audio-Visual Speech Recognition.
Interspeech2022
Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu, 
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT.
Interspeech2022
Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski, 
On-demand compute reduction with stochastic wav2vec 2.0.
ICML2022
Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli, 
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language.
TASLP2024
Jaesung Huh, Joon Son Chung, Arsha Nagrani, Andrew Brown 0006, Jee-weon Jung, Daniel Garcia-Romero, Andrew Zisserman, 
The VoxCeleb Speaker Recognition Challenge: A Retrospective.
ICASSP2024
Junseok Ahn, Youngjoon Jang, Joon Son Chung, 
Slowfast Network for Continuous Sign Language Recognition.
ICASSP2024
Hee-Soo Heo, Kihyun Nam, Bong-Jin Lee, Youngki Kwon, Minjae Lee, You Jin Kim, Joon Son Chung, 
Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification.
ICASSP2024
Chaeyoung Jung, Suyeon Lee, Kihyun Nam, Kyeongha Rho, You Jin Kim, Youngjoon Jang, Joon Son Chung, 
TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning.
ICASSP2024
Doyeop Kwak, Jaemin Jung, Kihyun Nam, Youngjoon Jang, Jee-Weon Jung, Shinji Watanabe 0001, Joon Son Chung, 
VoxMM: Rich Transcription of Conversations in the Wild.
ICASSP2024
Suyeon Lee, Chaeyoung Jung, Youngjoon Jang, Jaehun Kim, Joon Son Chung, 
Seeing Through The Conversation: Audio-Visual Speech Separation Based on Diffusion Model.
ICASSP2024
Yeonghyeon Lee, Inmo Yeon, Juhan Nam, Joon Son Chung, 
VoiceLDM: Text-to-Speech with Environmental Context.
ICASSP2024
Tan Dat Nguyen, Ji-Hoon Kim, Youngjoon Jang, Jaehun Kim, Joon Son Chung, 
Fregrad: Lightweight and Fast Frequency-Aware Diffusion Vocoder.
ICASSP2024
Jongbhin Woo, Hyeonggon Ryu, Arda Senocak, Joon Son Chung, 
Speech Guided Masked Image Modeling for Visually Grounded Speech.
ICML2024
Jongsuk Kim, Hyeongkeun Lee, Kyeongha Rho, Junmo Kim, Joon Son Chung, 
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning.
AAAI2024
Ji-Hoon Kim, Jaehun Kim, Joon Son Chung, 
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos.
ICASSP2023
Jee-Weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown 0006, Youngki Kwon, Shinji Watanabe 0001, Joon Son Chung, 
In Search of Strong Embedding Extractors for Speaker Diarisation.
ICASSP2023
Jaemin Jung, Youkyum Kim, Jihwan Park, Youshin Lim, Byeong-Yeol Kim, Youngjoon Jang, Joon Son Chung, 
Metric Learning for User-Defined Keyword Spotting.
ICASSP2023
You Jin Kim, Hee-Soo Heo, Jee-Weon Jung, Youngki Kwon, Bong-Jin Lee, Joon Son Chung, 
Advancing the Dimensionality Reduction of Speaker Embeddings for Speaker Diarisation: Disentangling Noise and Informing Speech Activity.
ICASSP2023
Jiyoung Lee, Joon Son Chung, Soo-Whan Chung, 
Imaginary Voice: Face-Styled Diffusion Model for Text-to-Speech.
ICASSP2023
Hyeonggon Ryu, Arda Senocak, In So Kweon, Joon Son Chung, 
Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples.
Interspeech2023
Jiu Feng, Mehmet Hamza Erol, Joon Son Chung, Arda Senocak, 
FlexiAST: Flexibility is What AST Needs.
Interspeech2023
Hee-Soo Heo, Jee-weon Jung, Jingu Kang, Youngki Kwon, Bong-Jin Lee, You Jin Kim, Joon Son Chung, 
Curriculum Learning for Self-supervised Speaker Verification.
Interspeech2023
Kihyun Nam, Youkyum Kim, Jaesung Huh, Hee-Soo Heo, Jee-weon Jung, Joon Son Chung, 
Disentangled Representation Learning for Multilingual Speaker Recognition.
ICASSP2022
Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, Nicholas W. D. Evans, 
AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks.
TASLP2024
Magdalena Rybicka, Jesús Villalba 0001, Thomas Thebaud, Najim Dehak, Konrad Kowalczyk, 
End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors.
Interspeech2023
Jesús Villalba 0001, Jonas Borgstrom, Maliha Jahan, Saurabh Kataria, Leibny Paola García, Pedro A. Torres-Carrasquillo, Najim Dehak, 
Advances in Language Recognition in Low Resource African Languages: The JHU-MIT Submission for NIST LRE22.
Interspeech2023
Saurabhchand Bhati, Jesús Villalba 0001, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak, 
Segmental SpeechCLIP: Utilizing Pretrained Image-text Models for Audio-Visual Learning.
Interspeech2023
Anna Favaro, Tianyu Cao 0003, Thomas Thebaud, Jesús Villalba 0001, Ankur A. Butala, Najim Dehak, Laureano Moro-Velázquez, 
Do Phonatory Features Display Robustness to Characterize Parkinsonian Speech Across Corpora?
Interspeech2023
Saurabh Kataria, Jesús Villalba 0001, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak, 
Self-FiLM: Conditioning GANs with self-supervised representations for bandwidth extension based speaker recognition.
Interspeech2023
Helin Wang, Thomas Thebaud, Jesús Villalba 0001, Myra Sydnor, Becky Lammers, Najim Dehak, Laureano Moro-Velázquez, 
DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model.
Interspeech2022
Jaejin Cho, Raghavendra Pappagari, Piotr Zelasko, Laureano Moro-Velázquez, Jesús Villalba 0001, Najim Dehak, 
Non-contrastive self-supervised learning of utterance-level speech representations.
Interspeech2022
Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesús Villalba 0001, Sanjeev Khudanpur, Najim Dehak, 
Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser.
Interspeech2022
Sonal Joshi, Saurabh Kataria, Jesús Villalba 0001, Najim Dehak, 
AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification.
Interspeech2022
Saurabh Kataria, Jesús Villalba 0001, Laureano Moro-Velázquez, Najim Dehak, 
Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification.
Interspeech2022
Magdalena Rybicka, Jesús Villalba 0001, Najim Dehak, Konrad Kowalczyk, 
End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors.
Interspeech2022
Yiwen Shao, Jesús Villalba 0001, Sonal Joshi, Saurabh Kataria, Sanjeev Khudanpur, Najim Dehak, 
Chunking Defense for Adversarial Attacks on ASR.
ICASSP2021
Nanxin Chen, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak, 
Focus on the Present: A Regularization Method for the ASR Source-Target Attention Layer.
ICASSP2021
Jaejin Cho, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak, 
Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios.
ICASSP2021
Saurabh Kataria, Jesús Villalba 0001, Najim Dehak, 
Perceptual Loss Based Speech Denoising with an Ensemble of Audio Pattern Recognition and Self-Supervised Models.
ICASSP2021
Raghavendra Pappagari, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak, 
CopyPaste: An Augmentation Method for Speech Emotion Recognition.
Interspeech2021
Saurabhchand Bhati, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak, 
Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation.
Interspeech2021
Nanxin Chen, Piotr Zelasko, Laureano Moro-Velázquez, Jesús Villalba 0001, Najim Dehak, 
Align-Denoise: Single-Pass Non-Autoregressive Speech Recognition.
Interspeech2021
Saurabh Kataria, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak, 
Deep Feature CycleGANs: Speaker Identity Preserving Non-Parallel Microphone-Telephone Domain Adaptation for Speaker Verification.
Interspeech2021
Raghavendra Pappagari, Jaejin Cho, Sonal Joshi, Laureano Moro-Velázquez, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak, 
Automatic Detection and Assessment of Alzheimer Disease Using Speech and Language Technologies in Low-Resource Scenarios.
TASLP2024
Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki, Shoji Makino, 
Blind and Spatially-Regularized Online Joint Optimization of Source Separation, Dereverberation, and Noise Reduction.
TASLP2023
Hiroshi Sawada, Rintaro Ikeshita, Keisuke Kinoshita, Tomohiro Nakatani, 
Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined Blind Source Separation and Dereverberation.
Interspeech2023
Shoko Araki, Ayako Yamamoto, Tsubasa Ochiai, Kenichi Arai, Atsunori Ogawa, Tomohiro Nakatani, Toshio Irino, 
Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine.
Interspeech2023
Marc Delcroix, Naohiro Tawara, Mireia Díez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukás Burget, Shoko Araki, 
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization.
Interspeech2023
Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani, 
Target Speech Extraction with Conditional Diffusion Model.
ICASSP2022
Naoyuki Kamo, Rintaro Ikeshita, Keisuke Kinoshita, Tomohiro Nakatani, 
Importance of Switch Optimization Criterion in Switching WPE Dereverberation.
ICASSP2022
Hiroshi Sawada, Rintaro Ikeshita, Keisuke Kinoshita, Tomohiro Nakatani, 
Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined BSS in Reverberant Environments.
Interspeech2022
Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani, 
Listen only to me! How well can target speech extraction handle false alarms?
TASLP2021
Nobutaka Ito, Rintaro Ikeshita, Hiroshi Sawada, Tomohiro Nakatani, 
A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter.
ICASSP2021
Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach, 
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.
ICASSP2021
Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani, 
Speaker Activity Driven Neural Speech Extraction.
ICASSP2021
Julio Wissing, Benedikt T. Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura, 
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain.
Interspeech2021
Christopher Schymura, Benedikt T. Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa, 
PILOT: Introducing Transformers for Probabilistic Sound Event Localization.
Interspeech2021
Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, 
Comparison of Remote Experiments Using Crowdsourcing and Laboratory Experiments on Speech Intelligibility.
SpeechComm2020
Katsuhiko Yamamoto, Toshio Irino, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani, 
GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech.
TASLP2020
Tomohiro Nakatani, Christoph Böddeker, Keisuke Kinoshita, Rintaro Ikeshita, Marc Delcroix, Reinhold Haeb-Umbach, 
Jointly Optimal Denoising, Dereverberation, and Source Separation.
ICASSP2020
Marc Delcroix, Tsubasa Ochiai, Katerina Zmolíková, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki, 
Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam.
ICASSP2020
Rintaro Ikeshita, Tomohiro Nakatani, Shoko Araki, 
Overdetermined Independent Vector Analysis.
ICASSP2020
Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, 
Tackling Real Noisy Reverberant Meetings with All-Neural Source Separation, Counting, and Diarization System.
ICASSP2020
Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, 
Improving Noise Robust Automatic Speech Recognition with Single-Channel Time-Domain Enhancement Network.
TASLP2024
Yang Li 0116, Cheng Yu, Guangzhi Sun, Weiqin Zu, Zheng Tian 0002, Ying Wen 0001, Wei Pan 0004, Chao Zhang 0031, Jun Wang 0012, Yang Yang 0001, Fanglei Sun, 
Cross-Utterance Conditioned VAE for Speech Generation.
TASLP2024
Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
Graph Neural Networks for Contextual ASR With the Tree-Constrained Pointer Generator.
ICASSP2024
Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan 0019, Wei Li 0119, Lu Lu 0015, Zejun Ma, Chao Zhang 0031, 
Extending Large Language Models for Speech and Audio Captioning.
ICASSP2024
Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang 0031, 
Can Whisper Perform Speech-Based In-Context Learning?
ICASSP2024
Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan 0019, Wei Li 0119, Lu Lu 0015, Zejun Ma, Chao Zhang 0031, 
Connecting Speech Encoder and Large Language Model for ASR.
ICASSP2024
Qiuming Zhao, Guangzhi Sun, Chao Zhang 0031, Mingxing Xu, Thomas Fang Zheng, 
Enhancing Quantised End-to-End ASR Models Via Personalisation.
ICML2024
Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan 0019, Wei Li 0119, Lu Lu 0015, Zejun Ma, Yuxuan Wang 0002, Chao Zhang 0031, 
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models.
ICLR2024
Yuchen Hu, Chen Chen 0075, Chao-Han Huck Yang, Ruizhe Li 0001, Chao Zhang 0031, Pin-Yu Chen, Engsiong Chng, 
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition.
ACL-Findings2024
Guangzhi Sun, Shutong Feng, Dongcheng Jiang, Chao Zhang 0031, Milica Gasic, Philip C. Woodland, 
Speech-based Slot Filling using Large Language Models.
SpeechComm2023
Qiujia Li, Chao Zhang 0031, Philip C. Woodland, 
Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring.
TASLP2023
Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
Minimising Biasing Word Errors for Contextual ASR With the Tree-Constrained Pointer Generator.
TASLP2023
Ya-Jie Zhang, Chao Zhang 0031, Wei Song, Zhengchen Zhang, Youzheng Wu, Xiaodong He 0001, 
Prosody Modelling With Pre-Trained Cross-Utterance Representations for Improved Speech Synthesis.
ICASSP2023
Shuo-Yiin Chang, Chao Zhang 0031, Tara N. Sainath, Bo Li 0028, Trevor Strohman, 
Context-Aware end-to-end ASR Using Self-Attentive Embedding and Tensor Fusion.
ICASSP2023
Evonne P. C. Lee, Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
Spectral Clustering-Aware Learning of Embeddings for Speaker Diarisation.
ICASSP2023
Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
End-to-End Spoken Language Understanding with Tree-Constrained Pointer Generator.
ICASSP2023
Wen Wu, Chao Zhang 0031, Philip C. Woodland, 
Self-Supervised Representations in Speech-Based Depression Detection.
ICASSP2023
Chao Zhang 0031, Bo Li 0028, Tara N. Sainath, Trevor Strohman, Shuo-Yiin Chang, 
UML: A Universal Monolingual Output Layer For Multilingual Asr.
Interspeech2023
Dongcheng Jiang, Chao Zhang 0031, Philip C. Woodland, 
A Neural Time Alignment Module for End-to-End Automatic Speech Recognition.
Interspeech2023
Ziyang Ma, Zhisheng Zheng, Guanrou Yang, Yu Wang 0027, Chao Zhang 0031, Xie Chen 0001, 
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation.
Interspeech2023
Guangzhi Sun, Xianrui Zheng, Chao Zhang 0031, Philip C. Woodland, 
Can Contextual Biasing Remain Effective with Whisper and GPT-2?
ICASSP2024
Siddhant Arora, George Saon, Shinji Watanabe 0001, Brian Kingsbury, 
Semi-Autoregressive Streaming ASR with Label Context.
ICASSP2024
A F. M. Saif, Xiaodong Cui, Han Shen, Songtao Lu, Brian Kingsbury, Tianyi Chen, 
Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization.
ICASSP2023
Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas 0001, Rogério Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James R. Glass, 
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval.
ICASSP2023
Vishal Sunder, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Brian Kingsbury, Eric Fosler-Lussier, 
Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and Understanding.
ICASSP2023
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, George Saon, Brian Kingsbury, 
Multi-Speaker Data Augmentation for Improved end-to-end Automatic Speech Recognition.
Interspeech2023
Xiaodong Cui, George Saon, Brian Kingsbury, 
Improving RNN Transducer Acoustic Models for English Conversational Speech Recognition.
Interspeech2023
Andrew Rouditchenko, Sameer Khurana, Samuel Thomas 0001, Rogério Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James R. Glass, 
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages.
Interspeech2023
Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Brian Kingsbury, 
ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding.
ICASSP2022
Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas 0001, Boaz Carmeli, Ron Hoory, Brian Kingsbury, 
A New Data Augmentation Method for Intent Classification Enhancement and its Application on Spoken Conversation Datasets.
ICASSP2022
Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas 0001, Brian Kingsbury, George Saon, 
Improving End-to-end Models for Set Prediction in Spoken Language Understanding.
ICASSP2022
Vishal Sunder, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Jatin Ganhotra, Brian Kingsbury, Eric Fosler-Lussier, 
Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding.
ICASSP2022
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Brian Kingsbury, George Saon, 
Towards Reducing the Need for Speech Training Data to Build Spoken Language Understanding Systems.
ICASSP2022
Samuel Thomas 0001, Brian Kingsbury, George Saon, Hong-Kwang Jeff Kuo, 
Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models.
Interspeech2022
Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata, 
Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing.
Interspeech2022
Andrea Fasoli, Chia-Yu Chen, Mauricio J. Serrano, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Kailash Gopalakrishnan, 
Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization.
Interspeech2022
Takashi Fukuda, Samuel Thomas 0001, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury, 
Global RNN Transducer Models For Multi-dialect Speech Recognition.
Interspeech2022
Jiatong Shi, George Saon, David Haws, Shinji Watanabe 0001, Brian Kingsbury, 
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States.
Interspeech2022
Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas 0001, Hong-Kwang Kuo, Brian Kingsbury, 
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems.
TASLP2021
Xiaodong Cui, Wei Zhang 0022, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, David S. Kung 0001, 
Asynchronous Decentralized Distributed Training of Acoustic Models.
ICASSP2021
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, 
RNN Transducer Models for Spoken Language Understanding.
TASLP2024
Christoph Böddeker, Aswin Shanmugam Subramanian, Gordon Wichern, Reinhold Haeb-Umbach, Jonathan Le Roux, 
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings.
ICASSP2024
Teysir Baoueb, Haocheng Liu, Mathieu Fontaine 0002, Jonathan Le Roux, Gaël Richard, 
SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis.
ICASSP2024
Haocheng Liu, Teysir Baoueb, Mathieu Fontaine 0002, Jonathan Le Roux, Gaël Richard, 
GLA-GRAD: A Griffin-Lim Extended Waveform Generation Diffusion Model.
ICASSP2024
Zexu Pan, Gordon Wichern, François G. Germain, Sameer Khurana, Jonathan Le Roux, 
NeuroHeed+: Improving Neuro-Steered Speaker Extraction with Joint Auditory Attention Detection.
TASLP2023
Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Zhong-Qiu Wang, Jonathan Le Roux, 
Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks.
TASLP2023
Zhong-Qiu Wang, Gordon Wichern, Shinji Watanabe 0001, Jonathan Le Roux, 
STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency.
ICASSP2023
Rohith Aralikatti, Christoph Böddeker, Gordon Wichern, Aswin Shanmugam Subramanian, Jonathan Le Roux, 
Reverberation as Supervision For Speech Separation.
ICASSP2023
Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Jonathan Le Roux, 
Hyperbolic Audio Source Separation.
ICASSP2023
Hao Yen, François G. Germain, Gordon Wichern, Jonathan Le Roux, 
Cold Diffusion for Speech Enhancement.
Interspeech2023
Chiori Hori, Puyuan Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh K. Jha, Diego Romeres, Jonathan Le Roux, 
Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos.
ICASSP2022
Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR.
ICASSP2022
Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori, 
Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy.
ICASSP2022
Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Sequence Transduction with Graph-Based Supervision.
ICASSP2022
Darius Petermann, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux, 
The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks.
Interspeech2022
Chiori Hori, Takaaki Hori, Jonathan Le Roux, 
Low-Latency Online Streaming VideoQA Using Audio-Visual Transformers.
Interspeech2022
Efthymios Tzinis, Gordon Wichern, Aswin Shanmugam Subramanian, Paris Smaragdis, Jonathan Le Roux, 
Heterogeneous Target Speech Separation.
TASLP2021
Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux, 
Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation.
ICASSP2021
Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training.
ICASSP2021
Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Capturing Multi-Resolution Context by Dilated Self-Attention.
ICASSP2021
Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Semi-Supervised Speech Recognition Via Graph-Based Temporal Classification.
ICASSP2024
Yimin Deng, Huaizhen Tang, Xulong Zhang 0001, Ning Cheng 0001, Jing Xiao 0006, Jianzong Wang, 
Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval.
ICASSP2024
Haobin Tang, Xulong Zhang 0001, Ning Cheng 0001, Jing Xiao 0006, Jianzong Wang, 
ED-TTS: Multi-Scale Emotion Modeling Using Cross-Domain Emotion Diarization for Emotional Speech Synthesis.
ICASSP2024
Yong Zhang, Hanzhang Li, Zhitao Li, Ning Cheng 0001, Ming Li, Jing Xiao 0006, Jianzong Wang, 
Leveraging Biases in Large Language Models: "bias-kNN" for Effective Few-Shot Learning.
ICASSP2023
Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Xiaoyang Qu, Jing Xiao 0006, 
Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification.
ICASSP2023
Ganghui Ru, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Improving Music Genre Classification from multi-modal Properties of Music and Genre Correlations Perspective.
ICASSP2023
Huaizhen Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Learning Speech Representations with Flexible Hidden Feature Dimensions.
ICASSP2023
Huaizhen Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
VQ-CL: Learning Disentangled Speech Representations with Contrastive Learning and Vector Quantization.
ICASSP2023
Haobin Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis.
ICASSP2023
Xulong Zhang 0001, Haobin Tang, Jianzong Wang, Ning Cheng 0001, Jian Luo, Jing Xiao 0006, 
Dynamic Alignment Mask CTC: Improved Mask CTC With Aligned Cross Entropy.
ICASSP2023
Kexin Zhu, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Improving EEG-based Emotion Recognition by Fusing Time-Frequency and Spatial Representations.
Interspeech2023
Jiaxin Fan, Yong Zhang, Hanzhang Li, Jianzong Wang, Zhitao Li, Sheng Ouyang, Ning Cheng 0001, Jing Xiao 0006, 
Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism.
Interspeech2023
Zuheng Kang, Jianzong Wang, Junqing Peng, Jing Xiao 0006, 
SVVAD: Personal Voice Activity Detection for Speaker Verification.
Interspeech2023
Yifu Sun, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Kaiyu Hu, Jing Xiao 0006, 
Investigation of Music Emotion Recognition Based on Segmented Semi-Supervised Learning.
Interspeech2023
Haobin Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis.
Interspeech2023
Yong Zhang, Zhitao Li, Jianzong Wang, Yiming Gao 0010, Ning Cheng 0001, Fengying Yu, Jing Xiao 0006, 
Prompt Guided Copy Mechanism for Conversational Question Answering.
ICASSP2022
Shijing Si, Jianzong Wang, Junqing Peng, Jing Xiao 0006, 
Towards Speaker Age Estimation With Label Distribution Learning.
ICASSP2022
Huaizhen Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Avqvc: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning.
ICASSP2022
Qiqi Wang 0005, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning.
ICASSP2022
Tong Ye, Shijing Si, Jianzong Wang, Rui Wang, Ning Cheng 0001, Jing Xiao 0006, 
VU-BERT: A Unified Framework for Visual Dialog.
ICASSP2022
Yong Zhang, Zhitao Li, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Self-Attention for Incomplete Utterance Rewriting.
TASLP2024
Chin-Po Chen, Ho-Hsien Pan, Susan Shur-Fen Gau, Chi-Chun Lee, 
Using Measures of Vowel Space for Autistic Traits Characterization.
ICASSP2024
An-Yan Chang, Jing-Tong Tzeng, Huan-Yu Chen, Chih-Wei Sung, Chun-Hsiang Huang, Edward Pei-Chuan Huang, Chi-Chun Lee, 
GaP-Aug: Gamma Patch-Wise Correction Augmentation Method for Respiratory Sound Classification.
ICASSP2024
Woan-Shiuan Chien, Shreya G. Upadhyay, Chi-Chun Lee, 
Balancing Speaker-Rater Fairness for Gender-Neutral Speech Emotion Recognition.
ICASSP2024
Po-Chen Lin, Jeng-Lin Li, Woan-Shiuan Chien, Chi-Chun Lee, 
In-The-Wild Physiological-Based Stress Detection Using Federated Strategy.
ICASSP2023
Woan-Shiuan Chien, Chi-Chun Lee, 
Achieving Fair Speech Emotion Recognition via Perceptual Fairness.
ICASSP2023
Shreya G. Upadhyay, Luz Martinez-Lucas, Bo-Hao Su, Wei-Cheng Lin, Woan-Shiuan Chien, Ya-Tse Wu, William Katz, Carlos Busso, Chi-Chun Lee, 
Phonetic Anchor-Based Transfer Learning to Facilitate Unsupervised Cross-Lingual Speech Emotion Recognition.
Interspeech2023
Huang-Cheng Chou, Lucas Goncalves, Seong-Gyun Leem, Chi-Chun Lee, Carlos Busso, 
The Importance of Calibration: Rethinking Confidence and Performance of Speech Multi-label Emotion Classifiers.
Interspeech2023
Yin-Tse Lin, Bo-Hao Su, Chi-Han Lin, Shih-Chan Kuo, Jyh-Shing Roger Jang, Chi-Chun Lee, 
Noise-Robust Bandwidth Expansion for 8K Speech Recordings.
Interspeech2023
Shao-Hao Lu, Yun-Shao Lin, Chi-Chun Lee, 
Speaking State Decoder with Transition Detection for Next Speaker Prediction.
Interspeech2023
Ya-Tse Wu, Yuan-Ting Chang, Shao-Hao Lu, Jing-Yi Chuang, Chi-Chun Lee, 
A Context-Constrained Sentence Modeling for Deception Detection in Real Interrogation.
Interspeech2023
Ya-Tse Wu, Chi-Chun Lee, 
MetricAug: A Distortion Metric-Lead Augmentation Strategy for Training Noise-Robust Speech Emotion Recognizer.
ICASSP2022
Huang-Cheng Chou, Wei-Cheng Lin, Chi-Chun Lee, Carlos Busso, 
Exploiting Annotators' Typed Description of Emotion Perception to Maximize Utilization of Ratings for Speech Emotion Recognition.
ICASSP2022
Ya-Tse Wu, Jeng-Lin Li, Chi-Chun Lee, 
An Audio-Saliency Masking Transformer for Audio Emotion Classification in Movies.
Interspeech2022
Chun-Yu Chen, Yun-Shao Lin, Chi-Chun Lee, 
Emotion-Shift Aware CRF for Decoding Emotion Sequence in Conversation.
Interspeech2022
Huang-Cheng Chou, Chi-Chun Lee, Carlos Busso, 
Exploiting Co-occurrence Frequency of Emotions in Perceptual Evaluations To Train A Speech Emotion Classifier.
Interspeech2022
Yu-Lin Huang, Bo-Hao Su, Y.-W. Peter Hong, Chi-Chun Lee, 
An Attention-Based Method for Guiding Attribute-Aligned Speech Representation Learning.
Interspeech2022
Bo-Hao Su, Chi-Chun Lee, 
Vaccinating SER to Neutralize Adversarial Attacks with Self-Supervised Augmentation Strategy.
Interspeech2021
Yu-Lin Huang, Bo-Hao Su, Y.-W. Peter Hong, Chi-Chun Lee, 
An Attribute-Aligned Strategy for Learning Speech Representation.
ICASSP2020
Ya-Lin Huang, Wan-Ting Hsieh, Hao-Chun Yang, Chi-Chun Lee, 
Conditional Domain Adversarial Transfer for Robust Cross-Site ADHD Classification Using Functional MRI.
ICASSP2020
Yun-Shao Lin, Chi-Chun Lee, 
Predicting Performance Outcome with a Conversational Graph Convolutional Network for Small Group Interactions.
ICASSP2024
Ilja Baumann, Dominik Wagner 0002, Maria Schuster, Elmar Nöth, Tobias Bocklet, 
Towards Interpretability of Automatic Phoneme Analysis in Cleft Lip and Palate Speech.
ICASSP2024
Paula Andrea Pérez-Toro, Judith Dineley, Agnieszka Kaczkowska, Pauline Conde, Yuezhou Zhang, Faith Matcham, Sara Siddi, Josep Maria Haro, Stuart Bruce, Til Wykes, Raquel Bailón, Srinivasan Vairavan, Richard J. B. Dobson, Andreas K. Maier, Elmar Nöth, Juan Rafael Orozco-Arroyave, Vaibhav A. Narayan, Nicholas Cummins, 
Longitudinal Modeling of Depression Shifts Using Speech and Language.
SpeechComm2023
Paula Andrea Pérez-Toro, Tomás Arias-Vergara, Philipp Klumpp, Juan Camilo Vásquez-Correa, Maria Schuster, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Depression assessment in people with Parkinson's disease: The combination of acoustic features and natural language processing.
ICASSP2023
Paula Andrea Pérez-Toro, Dalia Rodríguez-Salas, Tomás Arias-Vergara, Sebastian P. Bayerl, Philipp Klumpp, Korbinian Riedhammer, Maria Schuster, Elmar Nöth, Andreas K. Maier, Juan Rafael Orozco-Arroyave, 
Transferring Quantified Emotion Knowledge for the Detection of Depression in Alzheimer's Disease Using Forestnets.
Interspeech2023
Soroosh Tayebi Arasteh, Cristian David Ríos-Urrego, Elmar Nöth, Andreas Maier 0001, Seung Hee Yang, Jan Rusz, Juan Rafael Orozco-Arroyave, 
Federated Learning for Secure Development of AI Models for Parkinson's Disease Detection Using Speech from Different Languages.
Interspeech2023
Tomás Arias-Vergara, Elizabeth Londoño-Mora, Paula Andrea Pérez-Toro, Maria Schuster, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas Maier 0001, 
Measuring Phonological Precision in Children with Cleft Lip and Palate.
Interspeech2023
Ilja Baumann, Dominik Wagner 0002, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet, 
Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate.
Interspeech2023
Sebastian P. Bayerl, Dominik Wagner 0002, Ilja Baumann, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer, 
A Stutter Seldom Comes Alone - Cross-Corpus Stuttering Detection as a Multi-label Problem.
Interspeech2023
Franziska Braun, Sebastian P. Bayerl, Paula Andrea Pérez-Toro, Florian Hönig, Hartmut Lehfeld, Thomas Hillemacher, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer, 
Classifying Dementia in the Presence of Depression: A Cross-Corpus Study.
Interspeech2023
Daniel Escobar-Grisales, Tomás Arias-Vergara, Cristian David Ríos-Urrego, Elmar Nöth, Adolfo M. García, Juan Rafael Orozco-Arroyave, 
An Automatic Multimodal Approach to Analyze Linguistic and Acoustic Cues on Parkinson's Disease Patients.
Interspeech2023
Hiuching Hung, Paula Andrea Pérez-Toro, Tomás Arias-Vergara, Andreas Maier 0001, Elmar Nöth, 
Speaking Clearly, Understanding Better: Predicting the L2 Narrative Comprehension of Chinese Bilingual Kindergarten Children Based on Speech Intelligibility Using a Machine Learning Approach.
Interspeech2023
Paula Andrea Pérez-Toro, Tomás Arias-Vergara, Franziska Braun, Florian Hönig, Carlos Andrés Tobón-Quintero, David Aguillón, Francisco Lopera, Liliana Hincapié-Henao, Maria Schuster, Korbinian Riedhammer, Andreas Maier 0001, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Automatic Assessment of Alzheimer's across Three Languages Using Speech and Language Features.
Interspeech2023
Cristian David Ríos-Urrego, Jan Rusz, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Automatic Classification of Hypokinetic and Hyperkinetic Dysarthria based on GMM-Supervectors.
Interspeech2023
Dominik Wagner 0002, Ilja Baumann, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet, 
Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?
Interspeech2022
Sebastian Peter Bayerl, Dominik Wagner 0002, Elmar Nöth, Korbinian Riedhammer, 
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0.
Interspeech2022
Christian Bergler, Alexander Barnhill, Dominik Perrin, Manuel Schmitt, Andreas K. Maier, Elmar Nöth, 
ORCA-WHISPER: An Automatic Killer Whale Sound Type Generation Toolkit Using Deep Learning.
Interspeech2022
Teena tom Dieck, Paula Andrea Pérez-Toro, Tomas Arias, Elmar Nöth, Philipp Klumpp, 
Wav2vec behind the Scenes: How end2end Models learn Phonetics.
Interspeech2022
Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas K. Maier, Seung Hee Yang, 
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition.
Interspeech2022
Paula Andrea Pérez-Toro, Philipp Klumpp, Abner Hernandez, Tomas Arias, Patricia Lillo, Andrea Slachevsky, Adolfo Martín García, Maria Schuster, Andreas K. Maier, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Alzheimer's Detection from English to Spanish Using Acoustic and Linguistic Embeddings.
Interspeech2022
P. Schäfer, Paula Andrea Pérez-Toro, Philipp Klumpp, Juan Rafael Orozco-Arroyave, Elmar Nöth, Andreas K. Maier, A. Abad, Maria Schuster, Tomás Arias-Vergara, 
CoachLea: an Android Application to Evaluate the Speech Production and Perception of Children with Hearing Loss.
SpeechComm2024
Georgios Karakasidis, Mikko Kurimo, Peter Bell 0001, Tamás Grósz, 
Comparison and analysis of new curriculum criteria for end-to-end ASR.
ICASSP2024
Xiaoliang Wu, Peter Bell 0001, Ajitha Rajan, 
Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition.
TASLP2023
Erfan Loweimi, Andrea Carmantini, Peter Bell 0001, Steve Renals, Zoran Cvetkovic, 
Phonetic Error Analysis Beyond Phone Error Rate.
TASLP2023
Erfan Loweimi, Zhengjun Yue, Peter Bell 0001, Steve Renals, Zoran Cvetkovic, 
Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform.
ICASSP2023
Yuanchao Li, Peter Bell 0001, Catherine Lai, 
Multimodal Dyadic Impression Recognition via Listener Adaptive Cross-Domain Fusion.
ICASSP2023
Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea Carmantini, Ondrej Klejch, Peter Bell 0001, 
The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR.
ICASSP2023
Cassia Valentini-Botinhao, Andrea Lorena Aldana Blanco, Ondrej Klejch, Peter Bell 0001, 
Efficient Intelligibility Evaluation Using Keyword Spotting: A Study on Audio-Visual Speech Enhancement.
ICASSP2023
Xiaoliang Wu, Peter Bell 0001, Ajitha Rajan, 
Explanations for Automatic Speech Recognition.
Interspeech2023
Debasmita Bhattacharya, Jie Chi, Julia Hirschberg, Peter Bell 0001, 
Capturing Formality in Speech Across Domains and Languages.
Interspeech2023
Jie Chi, Brian Lu, Jason Eisner, Peter Bell 0001, Preethi Jyothi, Ahmed M. Ali 0002, 
Unsupervised Code-switched Text Generation from Parallel Text.
Interspeech2023
Yuanchao Li, Peter Bell 0001, Catherine Lai, 
Transfer Learning for Personality Perception via Speech Emotion Recognition.
Interspeech2023
Yuanchao Li, Zeyu Zhao 0004, Ondrej Klejch, Peter Bell 0001, Catherine Lai, 
ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition.
Interspeech2023
Christoph Minixhofer, Ondrej Klejch, Peter Bell 0001, 
Evaluating and reducing the distance between synthetic and real speech distributions.
Interspeech2023
Sarenne Wallbridge, Peter Bell 0001, Catherine Lai, 
Quantifying the perceptual value of lexical and non-lexical channels in speech.
Interspeech2023
Zeyu Zhao 0004, Peter Bell 0001, 
Regarding Topology and Variant Frame Rates for Differentiable WFST-based End-to-End ASR.
ICASSP2022
Yuanchao Li, Peter Bell 0001, Catherine Lai, 
Fusing ASR Outputs in Joint Training for Speech Emotion Recognition.
ICASSP2022
Zeyu Zhao 0004, Peter Bell 0001, 
Investigating Sequence-Level Normalisation For CTC-Like End-to-End ASR.
Interspeech2022
Ondrej Klejch, Electra Wallington, Peter Bell 0001, 
Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR.
Interspeech2022
Chau Luu, Steve Renals, Peter Bell 0001, 
Investigating the contribution of speaker attributes to speaker separability using disentangled speaker representations.
Interspeech2022
Sarenne Carrol Wallbridge, Catherine Lai, Peter Bell 0001, 
Investigating perception of spoken dialogue acceptability through surprisal.
ICASSP2024
Junwen Bai, Bo Li 0028, Qiujia Li, Tara N. Sainath, Trevor Strohman, 
Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR.
ICASSP2024
Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li 0028, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal, 
USM-Lite: Quantization and Sparsity Aware Fine-Tuning for Speech Recognition with Universal Speech Models.
ICASSP2024
Khe Chai Sim, Zhouyuan Huo, Tsendsuren Munkhdalai, Nikhil Siddhartha, Adam Stooke, Zhong Meng, Bo Li 0028, Tara N. Sainath, 
A Comparison of Parameter-Efficient ASR Domain Adaptation Methods for Universal Speech and Language Models.
NAACL2024
Weiran Wang, Rohit Prabhavalkar, Haozhe Shan, Zhong Meng, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li 0028, James Qin, Xingyu Cai, Adam Stooke, Chengjian Zheng, Yanzhang He, Tara N. Sainath, Pedro Moreno Mengibar, 
Massive End-to-end Speech Recognition Models with Time Reduction.
ICASSP2023
Shuo-Yiin Chang, Chao Zhang 0031, Tara N. Sainath, Bo Li 0028, Trevor Strohman, 
Context-Aware end-to-end ASR Using Self-Attentive Embedding and Tensor Fusion.
ICASSP2023
Ke Hu, Tara N. Sainath, Bo Li 0028, Nan Du 0002, Yanping Huang, Andrew M. Dai, Yu Zhang 0033, Rodrigo Cabrera, Zhifeng Chen, Trevor Strohman, 
Massively Multilingual Shallow Fusion with Large Language Models.
ICASSP2023
Zhouyuan Huo, Khe Chai Sim, Bo Li 0028, Dongseong Hwang, Tara N. Sainath, Trevor Strohman, 
Resource-Efficient Transfer Learning from Speech Foundation Model Using Hierarchical Feature Fusion.
ICASSP2023
Bo Li 0028, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang 0033, Wei Han 0002, Trevor Strohman, Françoise Beaufays, 
Efficient Domain Adaptation for Speech Foundation Models.
ICASSP2023
Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang 0033, Bo Li 0028, Andrew Rosenberg, Bhuvana Ramabhadran, 
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition.
ICASSP2023
Chao-Han Huck Yang, Bo Li 0028, Yu Zhang 0033, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman, 
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition.
ICASSP2023
Chao-Han Huck Yang, Bo Li 0028, Yu Zhang 0033, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee 0001, 
A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition.
ICASSP2023
Chao Zhang 0031, Bo Li 0028, Tara N. Sainath, Trevor Strohman, Shuo-Yiin Chang, 
UML: A Universal Monolingual Output Layer For Multilingual Asr.
Interspeech2023
Zih-Ching Chen, Chao-Han Huck Yang, Bo Li 0028, Yu Zhang 0033, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath, 
How to Estimate Model Transferability of Pre-Trained Speech Models?
Interspeech2023
Ke Hu, Bo Li 0028, Tara N. Sainath, Yu Zhang 0033, Françoise Beaufays, 
Mixture-of-Expert Conformer for Streaming Multilingual ASR.
Interspeech2023
Qiujia Li, Bo Li 0028, Dongseong Hwang, Tara N. Sainath, Pedro Moreno Mengibar, 
Modular Domain Adaptation for Conformer-Based Streaming ASR.
ICASSP2022
Junwen Bai, Bo Li 0028, Yu Zhang 0033, Ankur Bapna, Nikhil Siddhartha, Khe Chai Sim, Tara N. Sainath, 
Joint Unsupervised and Supervised Training for Multilingual ASR.
ICASSP2022
Bo Li 0028, Ruoming Pang, Yu Zhang 0033, Tara N. Sainath, Trevor Strohman, Parisa Haghani, Yun Zhu, Brian Farris, Neeraj Gaur, Manasa Prasad, 
Massively Multilingual ASR: A Lifelong Learning Solution.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang 0001, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
ICASSP2022
Chao Zhang 0031, Bo Li 0028, Zhiyun Lu, Tara N. Sainath, Shuo-Yiin Chang, 
Improving the Fusion of Acoustic and Text Representations in RNN-T.
Interspeech2022
Shuo-Yiin Chang, Bo Li 0028, Tara N. Sainath, Chao Zhang 0031, Trevor Strohman, Qiao Liang 0001, Yanzhang He, 
Turn-Taking Prediction for Natural Conversational Speech.
ICML2024
Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan 0003, Detai Xin, Dongchao Yang, Eric Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu 0001, Tao Qin 0001, Xiangyang Li 0001, Wei Ye 0004, Shikun Zhang, Jiang Bian 0002, Lei He 0005, Jinyu Li 0001, Sheng Zhao, 
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
ICML2024
Dongchao Yang, Jinchuan Tian, Xu Tan 0003, Rongjie Huang, Songxiang Liu, Haohan Guo, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian 0002, Zhou Zhao, Xixin Wu, Helen M. Meng, 
UniAudio: Towards Universal Audio Generation with Large Language Models.
ICLR2024
Yichong Leng, Zhifang Guo, Kai Shen, Zeqian Ju, Xu Tan 0003, Eric Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He 0005, Xiangyang Li 0001, Sheng Zhao, Tao Qin 0001, Jiang Bian 0002, 
PromptTTS 2: Describing and Generating Voices with Text Prompt.
ICLR2024
Kai Shen, Zeqian Ju, Xu Tan 0003, Eric Liu, Yichong Leng, Lei He 0005, Tao Qin 0001, Sheng Zhao, Jiang Bian 0002, 
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers.
ICASSP2023
Zhifang Guo, Yichong Leng, Yihan Wu, Sheng Zhao, Xu Tan 0003, 
Prompttts: Controllable Text-To-Speech With Text Descriptions.
ICASSP2023
Xiaoqiang Wang 0006, Yanqing Liu, Jinyu Li 0001, Sheng Zhao, 
Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation.
ICASSP2023
Chen Zhang 0020, Shubham Bansal, Aakash Lakhera, Jinzhu Li, Gang Wang 0001, Sandeepkumar Satpal, Sheng Zhao, Lei He 0005, 
LeanSpeech: The Microsoft Lightweight Speech Synthesis System for Limmits Challenge 2023.
Interspeech2023
Brendan Walsh, Mark Hamilton, Greg Newby, Xi Wang 0016, Serena Ruan, Sheng Zhao, Lei He 0005, Shaofei Zhang, Eric Dettinger, William T. Freeman, Markus Weimer, 
Large-Scale Automatic Audiobook Creation.
Interspeech2023
Yujia Xiao, Shaofei Zhang, Xi Wang 0016, Xu Tan 0003, Lei He 0005, Sheng Zhao, Frank K. Soong, Tan Lee 0001, 
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading.
NeurIPS2023
Yuancheng Wang, Zeqian Ju, Xu Tan 0003, Lei He 0005, Zhizheng Wu 0001, Jiang Bian 0002, Sheng Zhao, 
AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models.
AAAI2023
Yihan Wu, Junliang Guo, Xu Tan 0003, Chen Zhang 0020, Bohan Li 0003, Ruihua Song, Lei He 0005, Sheng Zhao, Arul Menezes, Jiang Bian 0002, 
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing.
TASLP2022
Xiaoqiang Wang 0006, Yanqing Liu, Jinyu Li 0001, Veljko Miljanic, Sheng Zhao, Hosam Khalil, 
Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems.
ICASSP2022
Zehua Chen, Xu Tan 0003, Ke Wang, Shifeng Pan, Danilo P. Mandic, Lei He 0005, Sheng Zhao, 
Infergrad: Improving Diffusion Models for Vocoder by Considering Inference in Training.
ICASSP2022
Liyang Chen, Zhiyong Wu 0001, Jun Ling, Runnan Li, Xu Tan 0003, Sheng Zhao, 
Transformer-S2A: Robust and Efficient Speech-to-Animation.
ICASSP2022
Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan 0003, Sheng Zhao, Tan Lee 0001, 
A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.
Interspeech2022
Yanqing Liu, Ruiqing Xue, Lei He 0005, Xu Tan 0003, Sheng Zhao, 
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders.
Interspeech2022
Yihan Wu, Xu Tan 0003, Bohan Li 0003, Lei He 0005, Sheng Zhao, Ruihua Song, Tao Qin 0001, Tie-Yan Liu, 
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios.
Interspeech2022
Dacheng Yin, Chuanxin Tang, Yanqing Liu, Xiaoqiang Wang 0006, Zhiyuan Zhao, Yucheng Zhao, Zhiwei Xiong, Sheng Zhao, Chong Luo, 
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion.
Interspeech2022
Guangyan Zhang, Kaitao Song, Xu Tan 0003, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang 0001, Wei Zhou, Tao Qin 0001, Tan Lee 0001, Sheng Zhao, 
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.
NeurIPS2022
Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen 0008, Xu Tan 0003, Danilo P. Mandic, Lei He 0005, Xiangyang Li 0001, Tao Qin 0001, Sheng Zhao, Tie-Yan Liu, 
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis.
TASLP2024
Mingyang Zhang 0003, Yi Zhou 0020, Yi Ren 0006, Chen Zhang 0020, Xiang Yin 0006, Haizhou Li 0001, 
RefXVC: Cross-Lingual Voice Conversion With Enhanced Reference Leveraging.
ICLR2024
Ziyue Jiang 0001, Jinglin Liu, Yi Ren 0006, Jinzheng He, Zhenhui Ye, Shengpeng Ji, Qian Yang, Chen Zhang 0020, Pengfei Wei 0001, Chunfeng Wang, Xiang Yin 0006, Zejun Ma, Zhou Zhao, 
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis.
AAAI2024
Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren 0006, Yuexian Zou, Zhou Zhao, Shinji Watanabe 0001, 
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
AAAI2024
Rui Liu 0008, Yifan Hu, Yi Ren 0006, Xiang Yin 0006, Haizhou Li 0001, 
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling.
ICASSP2023
Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen 0003, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren 0006, Zhou Zhao, 
Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG).
ICASSP2023
Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen 0003, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren 0006, Zhou Zhao, 
MUG: A General Meeting Understanding and Generation Benchmark.
Interspeech2023
Yahuan Cong, Haoyu Zhang, Haopeng Lin, Shichao Liu 0003, Chunfeng Wang, Yi Ren 0006, Xiang Yin 0006, Zejun Ma, 
GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech.
Interspeech2023
Kun Song, Yi Ren 0006, Yi Lei, Chunfeng Wang, Kun Wei, Lei Xie 0001, Xiang Yin 0006, Zejun Ma, 
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation.
ICML2023
Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren 0006, Luping Liu, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin 0006, Zhou Zhao, 
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models.
ICLR2023
Yi Ren 0006, Chen Zhang 0020, Shuicheng Yan, 
Bag of Tricks for Unsupervised Text-to-Speech.
ICLR2023
Rongjie Huang, Jinglin Liu, Huadai Liu, Yi Ren 0006, Lichao Zhang, Jinzheng He, Zhou Zhao, 
TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation.
ICLR2023
Zhenhui Ye, Ziyue Jiang 0001, Yi Ren 0006, Jinglin Liu, Jinzheng He, Zhou Zhao, 
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis.
ACL2023
Rongjie Huang, Huadai Liu, Xize Cheng, Yi Ren 0006, Linjun Li, Zhenhui Ye, Jinzheng He, Lichao Zhang, Jinglin Liu, Xiang Yin 0006, Zhou Zhao, 
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation.
ACL2023
Zhenhui Ye, Rongjie Huang, Yi Ren 0006, Ziyue Jiang 0001, Jinglin Liu, Jinzheng He, Xiang Yin 0006, Zhou Zhao, 
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training.
ACL-Findings2023
Rongjie Huang, Yi Ren 0006, Ziyue Jiang 0001, Chenye Cui, Jinglin Liu, Zhou Zhao, 
FastDiff 2: Revisiting and Incorporating GANs and Diffusion Models in High-Fidelity Speech Synthesis.
ACL-Findings2023
Rongjie Huang, Chunlei Zhang, Yi Ren 0006, Zhou Zhao, Dong Yu 0001, 
Prosody-TTS: Improving Prosody with Masked Autoencoder and Conditional Diffusion Model For Expressive Text-to-Speech.
ACL-Findings2023
Ziyue Jiang 0001, Qian Yang, Jialong Zuo, Zhenhui Ye, Rongjie Huang, Yi Ren 0006, Zhou Zhao, 
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models.
ICASSP2022
Yi Ren 0006, Ming Lei, Zhiying Huang, Shiliang Zhang, Qian Chen 0003, Zhijie Yan, Zhou Zhao, 
Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech.
ICASSP2022
Lichao Zhang, Yi Ren 0006, Liqun Deng, Zhou Zhao, 
HiFiDenoise: High-Fidelity Denoising Text to Speech with Adversarial Networks.
NeurIPS2022
Rongjie Huang, Yi Ren 0006, Jinglin Liu, Chenye Cui, Zhou Zhao, 
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech.
ICASSP2024
David M. Chan, Shalini Ghosh, Hitesh Tulsiani, Ariya Rastrow, Björn Hoffmeister, 
Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition.
ICASSP2024
Kevin Everson, Yile Gu, Chao-Han Huck Yang, Prashanth Gurunath Shivakumar, Guan-Ting Lin, Jari Kolehmainen, Ivan Bulyko, Ankur Gandhe, Shalini Ghosh, Wael Hamza, Hung-Yi Lee, Ariya Rastrow, Andreas Stolcke, 
Towards ASR Robust Spoken Language Understanding Through in-Context Learning with Word Confusion Networks.
ICASSP2024
Rupak Vignesh Swaminathan, Grant P. Strimel, Ariya Rastrow, Sri Harish Mallidi, Kai Zhen, Hieu Duy Nguyen, Nathan Susanj, Athanasios Mouchtaris, 
Max-Margin Transducer Loss: Improving Sequence-Discriminative Training Using a Large-Margin Learning Strategy.
ICML2024
Hitesh Tulsiani, David M. Chan, Shalini Ghosh, Garima Lalwani, Prabhat Pandey, Ankish Bansal, Sri Garimella, Ariya Rastrow, Björn Hoffmeister, 
An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems.
ACL-Findings2024
Aditya Gourav, Jari Kolehmainen, Prashanth Gurunath Shivakumar, Yile Gu, Grant P. Strimel, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko, 
Multi-Modal Retrieval For Large Language Model Based Speech Recognition.
ICASSP2023
Anastasios Alexandridis, Kanthashree Mysore Sathyendra, Grant P. Strimel, Feng-Ju Chang, Ariya Rastrow, Nathan Susanj, Athanasios Mouchtaris, 
Gated Contextual Adapters For Selective Contextual Biasing In Neural Transducers.
ICASSP2023
David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister, 
Domain Adaptation with External Off-Policy Acoustic Catalogs for Scalable Contextual End-to-End Automated Speech Recognition.
ICASSP2023
Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant P. Strimel, Andreas Stolcke, Ivan Bulyko, 
Procter: Pronunciation-Aware Contextual Adapter For Personalized Speech Recognition In Neural Transducers.
ICASSP2023
Milind Rao, Gopinath Chennupati, Gautam Tiwari, Anit Kumar Sahu, Anirudh Raju, Ariya Rastrow, Jasha Droppo, 
Federated Self-Learning with Weak Supervision for Speech Recognition.
ICASSP2023
Saumya Y. Sahai, Jing Liu, Thejaswi Muniyappa, Kanthashree Mysore Sathyendra, Anastasios Alexandridis, Grant P. Strimel, Ross McGowan, Ariya Rastrow, Feng-Ju Chang, Athanasios Mouchtaris, Siegfried Kunzmann, 
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition.
Interspeech2023
Denis Filimonov, Prabhat Pandey, Ariya Rastrow, Ankur Gandhe, Andreas Stolcke, 
Streaming Speech-to-Confusion Network Speech Recognition.
Interspeech2023
Yile Gu, Prashanth Gurunath Shivakumar, Jari Kolehmainen, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko, 
Scaling Laws for Discriminative Speech Recognition Rescoring Models.
Interspeech2023
Jari Kolehmainen, Yile Gu, Aditya Gourav, Prashanth Gurunath Shivakumar, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko, 
Personalization for BERT-based Discriminative Speech Recognition Rescoring.
Interspeech2023
Andreas Schwarz, Di He 0004, Maarten Van Segbroeck, Mohammed Hethnawi, Ariya Rastrow, 
Personalized Predictive ASR for Latency Reduction in Voice Assistants.
Interspeech2023
Prashanth Gurunath Shivakumar, Jari Kolehmainen, Yile Gu, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko, 
Distillation Strategies for Discriminative Speech Recognition Rescoring.
ICASSP2022
Anastasios Alexandridis, Grant P. Strimel, Ariya Rastrow, Pavel Kveton, Jon Webb, Maurizio Omologo, Siegfried Kunzmann, Athanasios Mouchtaris, 
Caching Networks: Capitalizing on Common Speech for ASR.
ICASSP2022
Liyan Xu, Yile Gu, Jari Kolehmainen, Haidar Khan, Ankur Gandhe, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko, 
RescoreBERT: Discriminative Speech Recognition Rescoring With Bert.
Interspeech2022
Phani Sankar Nidadavolu, Na Xu, Nick Jutila, Ravi Teja Gadde, Aswarth Abhilash Dara, Joseph Savold, Sapan Patel, Aaron Hoff, Veerdhawal Pande, Kevin Crews, Ankur Gandhe, Ariya Rastrow, Roland Maas, 
RefTextLAS: Reference Text Biased Listen, Attend, and Spell Model For Accurate Reading Evaluation.
Interspeech2022
Anirudh Raju, Milind Rao, Gautam Tiwari, Pranav Dheram, Bryan Anderson, Zhe Zhang, Chul Lee, Bach Bui, Ariya Rastrow, 
On joint training with interfaces for spoken language understanding.
Interspeech2022
Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian John King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel, 
Compute Cost Amortized Transformer for Streaming ASR.
SpeechComm2024
Wei-Cheng Lin, Carlos Busso, 
Deep temporal clustering features for speech emotion recognition.
TASLP2024
Seong-Gyun Leem, Daniel Fulford, Jukka-Pekka Onnela, David Gard, Carlos Busso, 
Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech.
ICASSP2024
Luz Martinez-Lucas, Carlos Busso, 
Dynamic Speech Emotion Recognition Using A Conditional Neural Process.
ICASSP2024
Abinay Reddy Naini, Mary A. Kohler, Elizabeth Richerson, Donita Robinson, Carlos Busso, 
Generalization of Self-Supervised Learning-Based Representations for Cross-Domain Speech Emotion Recognition.
ICASSP2024
Ismail Rasim Ulgen, Zongyang Du, Carlos Busso, Berrak Sisman, 
Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition.
SpeechComm2023
Andrea Vidal, Carlos Busso, 
Multimodal attention for lip synthesis using conditional generative adversarial networks.
TASLP2023
Wei-Cheng Lin, Carlos Busso, 
Sequential Modeling by Leveraging Non-Uniform Distribution of Speech Emotion.
ICASSP2023
Lucas Goncalves, Carlos Busso, 
Learning Cross-Modal Audiovisual Representations with Ladder Networks for Emotion Recognition.
ICASSP2023
Seong-Gyun Leem, Daniel Fulford, Jukka-Pekka Onnela, David Gard, Carlos Busso, 
Adapting a Self-Supervised Speech Representation for Noisy Speech Emotion Recognition by Using Contrastive Teacher-Student Learning.
ICASSP2023
Wei-Cheng Lin, Carlos Busso, 
Role of Lexical Boundary Information in Chunk-Level Segmentation for Speech Emotion Recognition.
ICASSP2023
Abinay Reddy Naini, Mary A. Kohler, Carlos Busso, 
Unsupervised Domain Adaptation for Preference Learning Based Speech Emotion Recognition.
ICASSP2023
Shreya G. Upadhyay, Luz Martinez-Lucas, Bo-Hao Su, Wei-Cheng Lin, Woan-Shiuan Chien, Ya-Tse Wu, William Katz, Carlos Busso, Chi-Chun Lee, 
Phonetic Anchor-Based Transfer Learning to Facilitate Unsupervised Cross-Lingual Speech Emotion Recognition.
Interspeech2023
Huang-Cheng Chou, Lucas Goncalves, Seong-Gyun Leem, Chi-Chun Lee, Carlos Busso, 
The Importance of Calibration: Rethinking Confidence and Performance of Speech Multi-label Emotion Classifiers.
Interspeech2023
Nicolás Grágeda, Eduardo Alvarado, Rodrigo Mahú, Carlos Busso, Néstor Becerra Yoma, 
Distant Speech Emotion Recognition in an Indoor Human-robot Interaction Scenario.
Interspeech2023
Seong-Gyun Leem, Daniel Fulford, Jukka-Pekka Onnela, David Gard, Carlos Busso, 
Computation and Memory Efficient Noise Adaptation of Wav2Vec2.0 for Noisy Speech Emotion Recognition with Skip Connection Adapters.
Interspeech2023
Abinay Reddy Naini, Ali N. Salman, Carlos Busso, 
Preference Learning Labels by Anchoring on Consecutive Annotations.
ICASSP2022
Huang-Cheng Chou, Wei-Cheng Lin, Chi-Chun Lee, Carlos Busso, 
Exploiting Annotators' Typed Description of Emotion Perception to Maximize Utilization of Ratings for Speech Emotion Recognition.
ICASSP2022
Lucas Goncalves, Carlos Busso, 
AuxFormer: Robust Approach to Audiovisual Emotion Recognition.
ICASSP2022
Seong-Gyun Leem, Daniel Fulford, Jukka-Pekka Onnela, David Gard, Carlos Busso, 
Not All Features are Equal: Selection of Robust Features for Speech Emotion Recognition in Noisy Environments.
Interspeech2022
Huang-Cheng Chou, Chi-Chun Lee, Carlos Busso, 
Exploiting Co-occurrence Frequency of Emotions in Perceptual Evaluations To Train A Speech Emotion Classifier.
ICASSP2023
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, 
Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models.
ICASSP2023
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix, Ryo Masumura, 
Leveraging Large Text Corpora For End-To-End Speech Summarization.
ICASSP2023
Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, 
Improving Scheduled Sampling for Neural Transducer-Based ASR.
ICASSP2023
Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Hiroshi Sato, Taiga Yamane, Takanori Ashihara, Kohei Matsuura, Takafumi Moriya, 
Leveraging Language Embeddings for Cross-Lingual Self-Supervised Speech Representation Learning.
Interspeech2023
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix, Yukinori Honma, 
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Interspeech2023
Nobukatsu Hojo, Saki Mizuno, Satoshi Kobashikawa, Ryo Masumura, Mana Ihori, Hiroshi Sato, Tomohiro Tanaka, 
Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer.
Interspeech2023
Mana Ihori, Hiroshi Sato, Tomohiro Tanaka, Ryo Masumura, Saki Mizuno, Nobukatsu Hojo, 
Transcribing Speech as Spoken and Written Dual Text Using an Autoregressive Model.
Interspeech2023
Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando, 
End-to-End Joint Target and Non-Target Speakers ASR.
Interspeech2023
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, 
Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization.
Interspeech2023
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami, 
Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data.
Interspeech2023
Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo, 
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss.
ICASSP2022
Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix, Takahiro Shinozaki, 
Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.
Interspeech2022
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, 
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models.
Interspeech2022
Ryo Masumura, Yoshihiro Yamazaki, Saki Mizuno, Naoki Makishima, Mana Ihori, Mihiro Uchida, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Shota Orihashi, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando, 
End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.
Interspeech2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
Interspeech2022
Tomohiro Tanaka, Ryo Masumura, Hiroshi Sato, Mana Ihori, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, 
Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks.
ICASSP2021
Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura, 
MAPGN: Masked Pointer-Generator Network for Sequence-to-Sequence Pre-Training.
ICASSP2021
Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, Ryo Masumura, 
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss.
ICASSP2021
Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, 
Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation.
ICASSP2021
Takafumi Moriya, Takanori Ashihara, Tomohiro Tanaka, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Yusuke Ijima, Ryo Masumura, Yusuke Shinohara, 
Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.
ICML2024
Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan 0003, Detai Xin, Dongchao Yang, Eric Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu 0001, Tao Qin 0001, Xiangyang Li 0001, Wei Ye 0004, Shikun Zhang, Jiang Bian 0002, Lei He 0005, Jinyu Li 0001, Sheng Zhao, 
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
ICML2024
Dongchao Yang, Jinchuan Tian, Xu Tan 0003, Rongjie Huang, Songxiang Liu, Haohan Guo, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian 0002, Zhou Zhao, Xixin Wu, Helen M. Meng, 
UniAudio: Towards Universal Audio Generation with Large Language Models.
ICLR2024
Yichong Leng, Zhifang Guo, Kai Shen, Zeqian Ju, Xu Tan 0003, Eric Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He 0005, Xiangyang Li 0001, Sheng Zhao, Tao Qin 0001, Jiang Bian 0002, 
PromptTTS 2: Describing and Generating Voices with Text Prompt.
ICLR2024
Kai Shen, Zeqian Ju, Xu Tan 0003, Eric Liu, Yichong Leng, Lei He 0005, Tao Qin 0001, Sheng Zhao, Jiang Bian 0002, 
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers.
ICASSP2023
Zhifang Guo, Yichong Leng, Yihan Wu, Sheng Zhao, Xu Tan 0003, 
Prompttts: Controllable Text-To-Speech With Text Descriptions.
Interspeech2023
Yujia Xiao, Shaofei Zhang, Xi Wang 0016, Xu Tan 0003, Lei He 0005, Sheng Zhao, Frank K. Soong, Tan Lee 0001, 
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading.
NeurIPS2023
Yuancheng Wang, Zeqian Ju, Xu Tan 0003, Lei He 0005, Zhizheng Wu 0001, Jiang Bian 0002, Sheng Zhao, 
AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models.
AAAI2023
Yichong Leng, Xu Tan 0003, Wenjie Liu, Kaitao Song, Rui Wang 0028, Xiang-Yang Li 0001, Tao Qin 0001, Edward Lin, Tie-Yan Liu, 
SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition.
AAAI2023
Yihan Wu, Junliang Guo, Xu Tan 0003, Chen Zhang 0020, Bohan Li 0003, Ruihua Song, Lei He 0005, Sheng Zhao, Arul Menezes, Jiang Bian 0002, 
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing.
ICASSP2022
Zehua Chen, Xu Tan 0003, Ke Wang, Shifeng Pan, Danilo P. Mandic, Lei He 0005, Sheng Zhao, 
Infergrad: Improving Diffusion Models for Vocoder by Considering Inference in Training.
ICASSP2022
Liyang Chen, Zhiyong Wu 0001, Jun Ling, Runnan Li, Xu Tan 0003, Sheng Zhao, 
Transformer-S2A: Robust and Efficient Speech-to-Animation.
ICASSP2022
Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan 0003, Sheng Zhao, Tan Lee 0001, 
A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.
Interspeech2022
Yanqing Liu, Ruiqing Xue, Lei He 0005, Xu Tan 0003, Sheng Zhao, 
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders.
Interspeech2022
Yihan Wu, Xu Tan 0003, Bohan Li 0003, Lei He 0005, Sheng Zhao, Ruihua Song, Tao Qin 0001, Tie-Yan Liu, 
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios.
Interspeech2022
Guangyan Zhang, Kaitao Song, Xu Tan 0003, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang 0001, Wei Zhou, Tao Qin 0001, Tan Lee 0001, Sheng Zhao, 
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.
NeurIPS2022
Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen 0008, Xu Tan 0003, Danilo P. Mandic, Lei He 0005, Xiangyang Li 0001, Tao Qin 0001, Sheng Zhao, Tie-Yan Liu, 
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis.
ACL2022
Yi Ren 0006, Xu Tan 0003, Tao Qin 0001, Zhou Zhao, Tie-Yan Liu, 
Revisiting Over-Smoothness in Text to Speech.
ICASSP2021
Yichong Leng, Xu Tan 0003, Sheng Zhao, Frank K. Soong, Xiang-Yang Li 0001, Tao Qin 0001, 
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network.
ICASSP2021
Renqian Luo, Xu Tan 0003, Rui Wang 0028, Tao Qin 0001, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu, 
Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search.
ICASSP2021
Linghui Meng 0001, Jin Xu 0010, Xu Tan 0003, Jindong Wang 0001, Tao Qin 0001, Bo Xu 0002, 
MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition.
ICASSP2024
Ryandhimas E. Zezario, Bo-Ren Brian Bai, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001, 
Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model.
TASLP2023
Qian-Bei Hong, Chung-Hsien Wu 0001, Hsin-Min Wang, 
Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification.
TASLP2023
Qian-Bei Hong, Chung-Hsien Wu 0001, Hsin-Min Wang, 
Decomposition and Reorganization of Phonetic Information for Speaker Embedding Learning.
TASLP2023
Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen 0011, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001, 
Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features.
Interspeech2023
Hsin-Hao Chen 0006, Yung-Lun Chien, Ming-Chi Yen, Shu-Wei Tsai, Tai-Shih Chi, Hsin-Min Wang, Yu Tsao 0001, 
Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features.
Interspeech2023
Li-Wei Chen, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao 0001, Hsin-Min Wang, 
A Training and Inference Strategy Using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech.
Interspeech2023
Yung-Lun Chien, Hsin-Hao Chen 0006, Ming-Chi Yen, Shu-Wei Tsai, Hsin-Min Wang, Yu Tsao 0001, Tai-Shih Chi, 
Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion.
ICLR2023
Chi-Chang Lee, Yu Tsao 0001, Hsin-Min Wang, Chu-Song Chen, 
D4AM: A General Denoising Framework for Downstream Acoustic Models.
TASLP2022
Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao 0001, 
Improved Lite Audio-Visual Speech Enhancement.
ICASSP2022
Kuan-Chen Wang, Kai-Chun Liu, Hsin-Min Wang, Yu Tsao 0001, 
EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement.
ICASSP2022
Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao 0001, Hsin-Min Wang, Helen Meng, 
Partially Fake Audio Detection by Self-Attention-Based Fake Span Discovery.
Interspeech2022
Wen-Chin Huang, Erica Cooper, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi, 
The VoiceMOS Challenge 2022.
Interspeech2022
Hung-Shin Lee, Pin-Tuan Huang, Yao-Fei Cheng, Hsin-Min Wang, 
Chain-based Discriminative Autoencoders for Speech Recognition.
Interspeech2022
Chi-Chang Lee, Cheng-Hung Hu, Yu-Chen Lin, Chu-Song Chen, Hsin-Min Wang, Yu Tsao 0001, 
NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling.
Interspeech2022
Fan-Lin Wang, Hung-Shin Lee, Yu Tsao 0001, Hsin-Min Wang, 
Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks.
Interspeech2022
Ryandhimas Edo Zezario, Fei Chen 0011, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001, 
MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids.
Interspeech2022
Ryandhimas Edo Zezario, Szu-Wei Fu, Fei Chen 0011, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001, 
MTI-Net: A Multi-Target Speech Intelligibility Prediction Model.
ICASSP2021
Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda, 
Speech Recognition by Simply Fine-Tuning Bert.
ICASSP2021
Chung-En Sun, Yi-Wei Chen, Hung-Shin Lee, Yen-Hsing Chen, Hsin-Min Wang, 
Melody Harmonization Using Orderless Nade, Chord Balancing, and Blocked Gibbs Sampling.
Interspeech2021
Yao-Fei Cheng, Hung-Shin Lee, Hsin-Min Wang, 
AlloST: Low-Resource Speech Translation Without Source Transcription.
SpeechComm2024
Mittapalle Kiran Reddy, Yagnavajjula Madhu Keerthana, Paavo Alku, 
Classification of functional dysphonia using the tunable Q wavelet transform.
SpeechComm2024
Paavo Alku, Manila Kodali, Laura Laaksonen, Sudarsana Reddy Kadiri, 
AVID: A speech database for machine learning studies on vocal intensity.
SpeechComm2024
Yagnavajjula Madhu Keerthana, Mittapalle Kiran Reddy, Paavo Alku, K. Sreenivasa Rao, Pabitra Mitra, 
Automatic classification of neurological voice disorders using wavelet scattering features.
SpeechComm2024
Farhad Javanmardi, Sudarsana Reddy Kadiri, Paavo Alku, 
Pre-trained models for detection and severity level classification of dysarthria from speech.
TASLP2023
Yuanyuan Liu 0002, Mittapalle Kiran Reddy, Nelly Penttilä, Tiina Ihalainen, Paavo Alku, Okko Räsänen, 
Automatic Assessment of Parkinson's Disease Using Speech Representations of Phonation and Articulation.
TASLP2023
Mittapalle Kiran Reddy, Paavo Alku, 
Exemplar-Based Sparse Representations for Detection of Parkinson's Disease From Speech.
ICASSP2023
Farhad Javanmardi, Saska Tirronen, Manila Kodali, Sudarsana Reddy Kadiri, Paavo Alku, 
Wav2vec-Based Detection and Severity Level Classification of Dysarthria From Speech.
ICASSP2023
Manila Kodali, Sudarsana Reddy Kadiri, Laura Laaksonen, Paavo Alku, 
Automatic Classification of Vocal Intensity Category from Speech.
ICASSP2023
Saska Tirronen, Farhad Javanmardi, Manila Kodali, Sudarsana Reddy Kadiri, Paavo Alku, 
Utilizing Wav2Vec In Database-Independent Voice Disorder Detection.
Interspeech2023
Sudarsana Reddy Kadiri, Manila Kodali, Paavo Alku, 
Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features.
Interspeech2023
Manila Kodali, Sudarsana Reddy Kadiri, Paavo Alku, 
Classification of Vocal Intensity Category from Speech using the Wav2vec2 and Whisper Embeddings.
SpeechComm2022
Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Paavo Alku, Mikko Kurimo, 
A formant modification method for improved ASR of children's speech.
SpeechComm2022
Mittapalle Kiran Reddy, Hilla Pohjalainen, Pyry Helkkula, Kasimir Kaitue, Mikko Minkkinen, Heli Tolppanen, Tuomo Nieminen, Paavo Alku, 
Glottal flow characteristics in vowels produced by speakers with heart failure.
Interspeech2022
Farhad Javanmardi, Sudarsana Reddy Kadiri, Manila Kodali, Paavo Alku, 
Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers.
Interspeech2022
Sudarsana Reddy Kadiri, Farhad Javanmardi, Paavo Alku, 
Convolutional Neural Networks for Classification of Voice Qualities from Speech and Neck Surface Accelerometer Signals.
SpeechComm2021
Krishna Gurugubelli, Anil Kumar Vuppala, N. P. Narendra, Paavo Alku, 
Duration of the rhotic approximant /ɹ/ in spastic dysarthria of different severity levels.
TASLP2021
N. P. Narendra, Björn W. Schuller, Paavo Alku, 
The Detection of Parkinson's Disease From Speech Using Voice Source Information.
SpeechComm2020
Sudarsana Reddy Kadiri, Paavo Alku, B. Yegnanarayana 0001, 
Analysis and classification of phonation types in speech and singing voice.
SpeechComm2020
N. P. Narendra, Paavo Alku, 
Automatic intelligibility assessment of dysarthric speech using glottal parameters.
TASLP2020
Dhananjaya N. Gowda, Sudarsana Reddy Kadiri, Brad H. Story, Paavo Alku, 
Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals.
TASLP2024
Xiaofei Wang 0007, Manthan Thakker, Zhuo Chen 0006, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu 0001, Jinyu Li 0001, Takuya Yoshioka, 
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
ICASSP2024
Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li 0001, Yashesh Gaur, 
Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation.
ICASSP2024
Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Midia Yousefi, Takuya Yoshioka, Jian Wu, 
Profile-Error-Tolerant Target-Speaker Voice Activity Detection.
ICASSP2024
Jian Wu 0027, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao 0017, Zhuo Chen 0006, Jinyu Li 0001, 
T-SOT FNT: Streaming Multi-Talker ASR with Text-Only Domain Adaptation Capability.
ICASSP2024
Mu Yang, Naoyuki Kanda, Xiaofei Wang 0009, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li 0001, Takuya Yoshioka, 
Diarist: Streaming Speech Translation with Speaker Diarization.
NAACL-Findings2024
Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu 0001, Dongdong Chen 0001, Yao Qian, Xuemei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao 0004, Yu Shi 0001, Lu Yuan, Takuya Yoshioka, Michael Zeng 0001, Xuedong Huang 0001, 
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data.
ICASSP2023
Zhuo Chen 0006, Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiaofei Wang 0009, Takuya Yoshioka, Jinyu Li 0001, Sunit Sivasankaran, Sefik Emre Eskimez, 
Speech Separation with Large-Scale Self-Supervised Learning.
ICASSP2023
Zili Huang, Zhuo Chen 0006, Naoyuki Kanda, Jian Wu 0027, Yiming Wang, Jinyu Li 0001, Takuya Yoshioka, Xiaofei Wang 0009, Peidong Wang, 
Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.
ICASSP2023
Naoyuki Kanda, Jian Wu 0027, Xiaofei Wang 0009, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
ICASSP2023
Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu 0027, 
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-To-End Neural Diarization.
ICASSP2023
Muqiao Yang, Naoyuki Kanda, Xiaofei Wang 0009, Jian Wu 0027, Sunit Sivasankaran, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.
Interspeech2023
Naoyuki Kanda, Takuya Yoshioka, Yang Liu, 
Factual Consistency Oriented Speech Recognition.
Interspeech2023
Chenda Li, Yao Qian, Zhuo Chen 0006, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng 0001, 
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers.
Interspeech2023
Midia Yousefi, Naoyuki Kanda, Dongmei Wang, Zhuo Chen 0006, Xiaofei Wang 0009, Takuya Yoshioka, 
Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach.
ICASSP2022
Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Interspeech2022
Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li 0001, Xie Chen 0001, Yu Wu 0012, Yifan Gong 0001, 
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.
Interspeech2022
Xiaofei Wang 0009, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka, 
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation.
Interspeech2022
Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.
ICASSP2024
Cheol Jun Cho, Abdelrahman Mohamed, Shang-Wen Li 0001, Alan W. Black, Gopala Krishna Anumanchipalli, 
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in Hubert.
ICASSP2024
Cheol Jun Cho, Abdelrahman Mohamed, Alan W. Black, Gopala Krishna Anumanchipalli, 
Self-Supervised Models of Speech Infer Universal Articulatory Kinematics.
ICASSP2023
Jiachen Lian, Alan W. Black, Yijing Lu, Louis Goldstein, Shinji Watanabe 0001, Gopala Krishna Anumanchipalli, 
Articulatory Representation Learning via Joint Factor Analysis and Neural Matrix Factorization.
ICASSP2023
Yisi Liu, Peter Wu, Alan W. Black, Gopala Krishna Anumanchipalli, 
A Fast and Accurate Pitch Estimation Algorithm Based on the Pseudo Wigner-Ville Distribution.
ICASSP2023
Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe 0001, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli, 
Speaker-Independent Acoustic-to-Articulatory Speech Inversion.
Interspeech2023
Peter Wu, Tingle Li, Yijing Lu, Yubin Zhang, Jiachen Lian, Alan W. Black, Louis Goldstein, Shinji Watanabe 0001, Gopala Krishna Anumanchipalli, 
Deep Speech Synthesis from MRI-Based Articulatory Representations.
ICASSP2022
Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan 0003, Brian Yan, Ngoc Thang Vu, Alan W. Black, Shinji Watanabe 0001, 
ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet.
ICASSP2022
Roshan Sharma, Shruti Palaskar, Alan W. Black, Florian Metze, 
End-to-End Speech Summarization Using Restricted Self-Attention.
Interspeech2022
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W. Black, Shinji Watanabe 0001, 
Two-Pass Low Latency End-to-End Spoken Language Understanding.
Interspeech2022
Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, Shinji Watanabe 0001, 
ASR2K: Speech Recognition for Around 2000 Languages without Audio.
Interspeech2022
Jiachen Lian, Alan W. Black, Louis Goldstein, Gopala Krishna Anumanchipalli, 
Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition.
Interspeech2022
Perez Ogayo, Graham Neubig, Alan W. Black, 
Building African Voices.
Interspeech2022
Peter Wu, Shinji Watanabe 0001, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli, 
Deep Speech Synthesis from Articulatory Representations.
Interspeech2022
Hemant Yadav, Akshat Gupta, Sai Krishna Rallabandi, Alan W. Black, Rajiv Ratn Shah, 
Intent classification using pre-trained language agnostic embeddings for low resource languages.
EMNLP-Findings2022
Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W. Black, Shinji Watanabe 0001, 
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models.
ICASSP2021
Akshat Gupta, Xinjian Li, Sai Krishna Rallabandi, Alan W. Black, 
Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages.
ICASSP2021
Xinjian Li, Juncheng Li 0001, Jiali Yao, Alan W. Black, Florian Metze, 
Phone Distribution Estimation for Low Resource Languages.
ICASSP2021
Xinjian Li, David R. Mortensen, Florian Metze, Alan W. Black, 
Multilingual Phonetic Dataset for Low Resource Speech Recognition.
Interspeech2021
Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan 0002, Siddharth Dalmia, Florian Metze, Shinji Watanabe 0001, Alan W. Black, 
Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding.
Interspeech2021
Xinjian Li, Juncheng Li 0001, Florian Metze, Alan W. Black, 
Hierarchical Phone Recognition with Compositional Phonetics.
ICASSP2024
Qian Chen 0003, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang 0003, 
Loss Masking Is Not Needed In Decoder-Only Transformer For Discrete-Token-Based ASR.
ICASSP2024
Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng, 
FunCodec: A Fundamental, Reproducible and Integrable Open-Source Toolkit for Neural Speech Codec.
ICASSP2024
Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen 0003, Shiliang Zhang, Xie Chen 0001, 
Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition.
ICASSP2024
Xian Shi, Yexin Yang, Zerui Li, Yanni Chen, Zhifu Gao, Shiliang Zhang, 
SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability.
ICASSP2024
Haoxu Wang, Fan Yu, Xian Shi, Yuezhang Wang, Shiliang Zhang, Ming Li, 
SlideSpeech: A Large Scale Slide-Enriched Audio-Visual Corpus.
ICASSP2024
Fan Yu, Haoxu Wang, Ziyang Ma, Shiliang Zhang, 
Hourglass-AVSR: Down-Up Sampling-Based Computational Efficiency Model for Audio-Visual Speech Recognition.
ICASSP2024
Fan Yu, Haoxu Wang, Xian Shi, Shiliang Zhang, 
LCB-Net: Long-Context Biasing for Audio-Visual Speech Recognition.
ACL-Findings2024
Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen 0001, 
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation.
ICASSP2023
Haoyu Lu, Nan Li, Tongtong Song, Longbiao Wang, Jianwu Dang 0001, Xiaobao Wang, Shiliang Zhang, 
Speech and Noise Dual-Stream Spectrogram Refine Network With Speech Distortion Loss For Robust Speech Recognition.
ICASSP2023
Jiaming Wang, Zhihao Du, Shiliang Zhang, 
TOLD: a Novel Two-Stage Overlap-Aware Framework for Speaker Diarization.
Interspeech2023
Keyu An, Xian Shi, Shiliang Zhang, 
BAT: Boundary aware transducer for memory-efficient and low-latency ASR.
Interspeech2023
Zhifu Gao, Zerui Li, Jiaming Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Shiliang Zhang, 
FunASR: A Fundamental End-to-End Speech Recognition Toolkit.
Interspeech2023
Yue Gu, Zhihao Du, Shiliang Zhang, Qian Chen 0003, Jiqing Han 0001, 
Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition.
Interspeech2023
Junjie Li, Meng Ge, Zexu Pan, Rui Cao, Longbiao Wang, Jianwu Dang 0001, Shiliang Zhang, 
Rethinking the Visual Cues in Audio-Visual Speaker Extraction.
Interspeech2023
Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen 0003, Lei Xie 0001, 
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR.
Interspeech2023
Mohan Shi, Zhihao Du, Qian Chen 0003, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang 0042, Li-Rong Dai 0001, 
CASA-ASR: Context-Aware Speaker-Attributed ASR.
Interspeech2023
Xian Shi, Haoneng Luo, Zhifu Gao, Shiliang Zhang, Zhijie Yan, 
Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System.
Interspeech2023
Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen 0003, Shiliang Zhang, Jie Zhang 0042, Li-Rong Dai 0001, 
Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction.
Interspeech2023
Xiaohuan Zhou, Jiaming Wang, Zeyu Cui, Shiliang Zhang, Zhijie Yan, Jingren Zhou, Chang Zhou, 
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for speech recognition.
ICASSP2022
Yi Ren 0006, Ming Lei, Zhiying Huang, Shiliang Zhang, Qian Chen 0003, Zhijie Yan, Zhou Zhao, 
Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech.
TASLP2024
Shu-Wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li 0001, Abdelrahman Mohamed, Shinji Watanabe 0001, Hung-yi Lee, 
A Large-Scale Evaluation of Speech Foundation Models.
ICASSP2024
Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe 0001, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang, 
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
ICASSP2024
Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji Watanabe 0001, 
Hubertopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model.
ICASSP2024
Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-Weon Jung, Xuankai Chang, Shinji Watanabe 0001, 
VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks.
ICASSP2024
Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe 0001, 
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing.
ICML2024
Dongchao Yang, Jinchuan Tian, Xu Tan 0003, Rongjie Huang, Songxiang Liu, Haohan Guo, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian 0002, Zhou Zhao, Xixin Wu, Helen M. Meng, 
UniAudio: Towards Universal Audio Generation with Large Language Models.
AAAI2024
Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren 0006, Yuexian Zou, Zhou Zhao, Shinji Watanabe 0001, 
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
ACL2024
Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Jinchuan Tian, Zhenhui Ye, Luping Liu, Zehan Wang 0001, Ziyue Jiang 0001, Xuankai Chang, Jiatong Shi, Chao Weng, Zhou Zhao, Dong Yu 0001, 
Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.
ICASSP2023
Junwei Huang, Karthik Ganesan 0003, Soumi Maiti, Young Min Kim, Xuankai Chang, Paul Liang, Shinji Watanabe 0001, 
FindAdaptNet: Find and Insert Adapters by Learned Layer Importance.
ICASSP2023
Takashi Maekaku, Yuya Fujita, Xuankai Chang, Shinji Watanabe 0001, 
Fully Unsupervised Topic Clustering of Unlabelled Spoken Audio Using Self-Supervised Representation Learning and Topic Model.
Interspeech2023
Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, Shinji Watanabe 0001, 
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning.
Interspeech2023
William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe 0001, 
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute.
Interspeech2023
Hakan Erdogan, Scott Wisdom, Xuankai Chang, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey, 
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition.
Interspeech2023
Jiatong Shi, Dan Berrebbi, William Chen, En-Pei Hu, Wei-Ping Huang, Ho-Lam Chung, Xuankai Chang, Shang-Wen Li 0001, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe 0001, 
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark.
Interspeech2023
Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe 0001, Brian MacWhinney, 
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning.
ICASSP2022
Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan 0003, Brian Yan, Ngoc Thang Vu, Alan W. Black, Shinji Watanabe 0001, 
ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet.
ICASSP2022
Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR.
ICASSP2022
Takashi Maekaku, Xuankai Chang, Yuya Fujita, Shinji Watanabe 0001, 
An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion.
ICASSP2022
Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, Shinji Watanabe 0001, 
Joint Speech Recognition and Audio Captioning.
Interspeech2022
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W. Black, Shinji Watanabe 0001, 
Two-Pass Low Latency End-to-End Spoken Language Understanding.
ICASSP2024
Jian-Tao Zhang, Yan Song 0001, Jin Li, Wu Guo, Hao-Yu Song, Ian McLoughlin 0001, 
Meta Representation Learning Method for Robust Speaker Verification in Unseen Domains.
ICASSP2023
Hang-Rui Hu, Yan Song 0001, Jian-Tao Zhang, Li-Rong Dai 0001, Ian McLoughlin 0001, Zhu Zhuo, Yu Zhou, Yu-Hong Li, Hui Xue, 
Stargan-vc Based Cross-Domain Data Augmentation for Speaker Verification.
Interspeech2023
Kang Li, Yan Song 0001, Ian McLoughlin 0001, Lin Liu 0017, Jin Li, Li-Rong Dai 0001, 
Fine-tuning Audio Spectrogram Transformer with Task-aware Adapters for Sound Event Detection.
Interspeech2023
Xiao-Min Zeng, Yan Song 0001, Ian McLoughlin 0001, Lin Liu 0017, Li-Rong Dai 0001, 
Robust Prototype Learning for Anomalous Sound Detection.
ICASSP2022
Han Chen, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu 0017, 
Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift.
ICASSP2022
Hang-Rui Hu, Yan Song 0001, Ying Liu, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu 0017, 
Domain Robust Deep Embedding Learning for Speaker Recognition.
ICASSP2022
Yuxuan Xi, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu 0017, 
Frontend Attributes Disentanglement for Speech Emotion Recognition.
Interspeech2022
Zhifu Gao, Shiliang Zhang, Ian McLoughlin 0001, Zhijie Yan, 
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition.
Interspeech2022
Hang-Rui Hu, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu 0017, 
Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification.
TASLP2021
Jian Tang, Jie Zhang 0042, Yan Song 0001, Ian McLoughlin 0001, Li-Rong Dai 0001, 
Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR.
ICASSP2021
Ying Liu, Yan Song 0001, Ian McLoughlin 0001, Lin Liu 0017, Li-Rong Dai 0001, 
An Effective Deep Embedding Learning Method Based on Dense-Residual Networks for Speaker Verification.
ICASSP2021
Huy Phan, Huy Le Nguyen, Oliver Y. Chén, Philipp Koch, Ngoc Q. K. Duong, Ian McLoughlin 0001, Alfred Mertins, 
Self-Attention Generative Adversarial Network for Speech Enhancement.
ICASSP2021
Huy Phan, Huy Le Nguyen, Oliver Y. Chén, Lam Dang Pham, Philipp Koch, Ian McLoughlin 0001, Alfred Mertins, 
Multi-View Audio And Music Classification.
Interspeech2021
Zhifu Gao, Yiwu Yao, Shiliang Zhang, Jun Yang, Ming Lei, Ian McLoughlin 0001, 
Extremely Low Footprint End-to-End ASR System for Smart Device.
Interspeech2021
Hui Wang, Lin Liu 0017, Yan Song 0001, Lei Fang, Ian McLoughlin 0001, Li-Rong Dai 0001, 
A Weight Moving Average Based Alternate Decoupled Learning Algorithm for Long-Tailed Language Identification.
Interspeech2021
Xu Zheng, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu 0017, 
An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection.
TASLP2020
Olivier Perrotin, Ian Vince McLoughlin, 
Glottal Flow Synthesis for Whisper-to-Speech Conversion.
ICASSP2020
Hui Wang, Yan Song 0001, Zengxi Li, Ian McLoughlin 0001, Li-Rong Dai 0001, 
An Online Speaker-aware Speech Separation Approach Based on Time-domain Representation.
ICASSP2020
Jie Yan, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, 
Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection.
Interspeech2020
Zhifu Gao, Shiliang Zhang, Ming Lei, Ian McLoughlin 0001, 
SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition.
ICASSP2024
Takanori Ashihara, Marc Delcroix, Takafumi Moriya, Kohei Matsuura, Taichi Asami, Yusuke Ijima, 
What Do Self-Supervised Speech and Speaker Models Learn? New Findings from a Cross Model Layer-Wise Analysis.
ICASSP2024
Kenichi Fujita, Hiroshi Sato, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix, Takafumi Moriya, Yusuke Ijima, 
Noise-Robust Zero-Shot Text-to-Speech Synthesis Conditioned on Self-Supervised Speech-Representation Model with Adapters.
ICASSP2023
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, 
Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models.
ICASSP2023
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix, Ryo Masumura, 
Leveraging Large Text Corpora For End-To-End Speech Summarization.
ICASSP2023
Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, 
Improving Scheduled Sampling for Neural Transducer-Based ASR.
ICASSP2023
Atsunori Ogawa, Takafumi Moriya, Naoyuki Kamo, Naohiro Tawara, Marc Delcroix, 
Iterative Shallow Fusion of Backward Language Model for End-To-End Speech Recognition.
ICASSP2023
Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Hiroshi Sato, Taiga Yamane, Takanori Ashihara, Kohei Matsuura, Takafumi Moriya, 
Leveraging Language Embeddings for Cross-Lingual Self-Supervised Speech Representation Learning.
Interspeech2023
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix, Yukinori Honma, 
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Interspeech2023
Hiroki Kanagawa, Takafumi Moriya, Yusuke Ijima, 
VC-T: Streaming Voice Conversion Based on Neural Transducer.
Interspeech2023
Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando, 
End-to-End Joint Target and Non-Target Speakers ASR.
Interspeech2023
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, 
Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization.
Interspeech2023
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami, 
Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data.
Interspeech2023
Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo, 
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss.
ICASSP2022
Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix, Takahiro Shinozaki, 
Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.
ICASSP2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya, 
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.
Interspeech2022
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, 
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models.
Interspeech2022
Ryo Masumura, Yoshihiro Yamazaki, Saki Mizuno, Naoki Makishima, Mana Ihori, Mihiro Uchida, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Shota Orihashi, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando, 
End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.
Interspeech2022
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki, 
Streaming Target-Speaker ASR with Neural Transducer.
Interspeech2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
Interspeech2022
Tomohiro Tanaka, Ryo Masumura, Hiroshi Sato, Mana Ihori, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, 
Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks.
ICASSP2024
Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno 0001, 
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models.
ICASSP2024
Haozhe Shan, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara N. Sainath, 
Augmenting Conformers With Structured State-Space Sequence Models For Online Speech Recognition.
ICASSP2024
Khe Chai Sim, Zhouyuan Huo, Tsendsuren Munkhdalai, Nikhil Siddhartha, Adam Stooke, Zhong Meng, Bo Li 0028, Tara N. Sainath, 
A Comparison of Parameter-Efficient ASR Domain Adaptation Methods for Universal Speech and Language Models.
NAACL2024
Weiran Wang, Rohit Prabhavalkar, Haozhe Shan, Zhong Meng, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li 0028, James Qin, Xingyu Cai, Adam Stooke, Chengjian Zheng, Yanzhang He, Tara N. Sainath, Pedro Moreno Mengibar, 
Massive End-to-end Speech Recognition Models with Time Reduction.
ICASSP2023
Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang 0033, Bo Li 0028, Andrew Rosenberg, Bhuvana Ramabhadran, 
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition.
Interspeech2023
Shaan Bijwadia, Shuo-Yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, 
Text Injection for Capitalization and Turn-Taking Prediction in Speech Models.
Interspeech2023
Cal Peyser, Zhong Meng, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho, Ke Hu, 
Improving Joint Speech-Text Representations Without Alignment.
ICASSP2022
Xie Chen 0001, Zhong Meng, Sarangarajan Parthasarathy, Jinyu Li 0001, 
Factorized Neural Transducer for Efficient Language Model Adaptation.
ICASSP2022
Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.
ICASSP2022
Yixuan Zhang 0005, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li 0001, 
Continuous Speech Separation with Recurrent Selective Attention Network.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Interspeech2022
Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li 0001, Xie Chen 0001, Yu Wu 0012, Yifan Gong 0001, 
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.
Interspeech2022
Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.
ICASSP2021
Xuankai Chang, Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Takuya Yoshioka, 
Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings.
ICASSP2021
Naoyuki Kanda, Zhong Meng, Liang Lu 0001, Yashesh Gaur, Xiaofei Wang 0009, Zhuo Chen 0006, Takuya Yoshioka, 
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.
ICASSP2021
Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu 0001, Xie Chen 0001, Jinyu Li 0001, Yifan Gong 0001, 
Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.
ICASSP2021
Eric Sun, Liang Lu 0001, Zhong Meng, Yifan Gong 0001, 
Sequence-Level Self-Teaching Regularization.
Interspeech2021
Liang Lu 0001, Zhong Meng, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001, 
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.
Interspeech2021
Yan Deng, Rui Zhao 0017, Zhong Meng, Xie Chen 0001, Bing Liu, Jinyu Li 0001, Yifan Gong 0001, Lei He 0005, 
Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS.
TASLP2024
Michele Panariello, Natalia A. Tomashenko, Xin Wang 0037, Xiaoxiao Miao, Pierre Champion, Hubert Nourtel, Massimiliano Todisco, Nicholas W. D. Evans, Emmanuel Vincent 0001, Junichi Yamagishi, 
The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation.
ICASSP2024
Wanying Ge, Xin Wang 0037, Junichi Yamagishi, Massimiliano Todisco, Nicholas W. D. Evans, 
Spoofing Attack Augmentation: Can Differently-Trained Attack Models Improve Generalisation?
ICASSP2024
Xiaoxiao Miao, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Nicholas W. D. Evans, Massimiliano Todisco, Jean-François Bonastre, Mickael Rouvier, 
Synvox2: Towards A Privacy-Friendly Voxceleb2 Dataset.
ICASSP2024
Michele Panariello, Francesco Nespoli, Massimiliano Todisco, Nicholas W. D. Evans, 
Speaker Anonymization Using Neural Audio Codec Language Models.
TASLP2023
Xuechen Liu, Xin Wang 0037, Md. Sahidullah, Jose Patino 0001, Héctor Delgado, Tomi Kinnunen, Massimiliano Todisco, Junichi Yamagishi, Nicholas W. D. Evans, Andreas Nautsch, Kong Aik Lee, 
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild.
TASLP2023
Lin Zhang, Xin Wang 0037, Erica Cooper, Nicholas W. D. Evans, Junichi Yamagishi, 
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance.
ICASSP2023
Wanying Ge, Hemlata Tak, Massimiliano Todisco, Nicholas W. D. Evans, 
Can Spoofing Countermeasure And Speaker Verification Systems Be Jointly Optimised?
Interspeech2023
Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang 0037, Xuechen Liu, Md. Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas W. D. Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung, 
Towards Single Integrated Spoofing-aware Speaker Verification Embeddings.
Interspeech2023
Michele Panariello, Wanying Ge, Hemlata Tak, Massimiliano Todisco, Nicholas W. D. Evans, 
Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems.
Interspeech2023
Michele Panariello, Massimiliano Todisco, Nicholas W. D. Evans, 
Vocoder drift in x-vector-based speaker anonymization.
Interspeech2023
Lin Zhang, Xin Wang 0037, Erica Cooper, Nicholas W. D. Evans, Junichi Yamagishi, 
Range-Based Equal Error Rate for Spoof Localization.
ICASSP2022
Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, Nicholas W. D. Evans, 
AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks.
ICASSP2022
Hemlata Tak, Madhu R. Kamble, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans, 
Rawboost: A Raw Data Boosting and Augmentation Method Applied to Automatic Speaker Verification Anti-Spoofing.
Interspeech2022
Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas W. D. Evans, Tomi Kinnunen, 
SASV 2022: The First Spoofing-Aware Speaker Verification Challenge.
ICASSP2021
Anthony Larcher, Ambuj Mehrish, Marie Tahon, Sylvain Meignier, Jean Carrive, David Doukhan, Olivier Galibert, Nicholas W. D. Evans, 
Speaker Embeddings for Diarization of Broadcast Data In The Allies Challenge.
ICASSP2021
Hemlata Tak, Jose Patino 0001, Massimiliano Todisco, Andreas Nautsch, Nicholas W. D. Evans, Anthony Larcher, 
End-to-End anti-spoofing with RawNet2.
Interspeech2021
Jose Patino 0001, Natalia A. Tomashenko, Massimiliano Todisco, Andreas Nautsch, Nicholas W. D. Evans, 
Speaker Anonymisation Using the McAdams Coefficient.
Interspeech2021
Oubaïda Chouchane, Baptiste Brossier, Jorge Esteban Gamboa Gamboa, Thomas Lardy, Hemlata Tak, Orhan Ermis, Madhu R. Kamble, Jose Patino 0001, Nicholas W. D. Evans, Melek Önen, Massimiliano Todisco, 
Privacy-Preserving Voice Anti-Spoofing Using Secure Multi-Party Computation.
Interspeech2021
Wanying Ge, Michele Panariello, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans, 
Partially-Connected Differentiable Architecture Search for Deepfake and Spoofing Detection.
Interspeech2021
Madhu R. Kamble, José Andrés González López, Teresa Grau, Juan M. Espín, Lorenzo Cascioli, Yiqing Huang, Alejandro Gómez Alanís, Jose Patino 0001, Roberto Font, Antonio M. Peinado, Angel M. Gomez, Nicholas W. D. Evans, Maria A. Zuluaga, Massimiliano Todisco, 
PANACEA Cough Sound-Based Diagnosis of COVID-19 for the DiCOVA 2021 Challenge.
TASLP2024
Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura 0001, 
Improving Speech Translation Accuracy and Time Efficiency With Fine-Tuned wav2vec 2.0-Based Speech Segmentation.
ICASSP2023
Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
Self-Adaptive Incremental Machine Speech Chain for Lombard TTS with High-Granularity ASR Feedback in Dynamic Noise Condition.
Interspeech2023
Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura 0001, 
Average Token Delay: A Latency Metric for Simultaneous Translation.
Interspeech2023
Yuta Nishikawa, Satoshi Nakamura 0001, 
Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation.
TASLP2022
Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments.
TASLP2022
Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR.
Interspeech2022
Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura 0001, 
Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation.
Interspeech2022
Kei Furukawa, Takeshi Kishiyama, Satoshi Nakamura 0001, 
Applying Syntax-Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis.
Interspeech2022
Seiya Kawano, Muteki Arioka, Akishige Yuguchi, Kenta Yamamoto, Koji Inoue, Tatsuya Kawahara, Satoshi Nakamura 0001, Koichiro Yoshino, 
Multimodal Persuasive Dialogue Corpus using Teleoperated Android.
Interspeech2022
Heli Qi, Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing.
TASLP2021
Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load.
Interspeech2021
Johanes Effendi, Sakriani Sakti, Satoshi Nakamura 0001, 
Weakly-Supervised Speech-to-Text Mapping with Visually Connected Non-Parallel Speech-Text Data Using Cyclic Partially-Aligned Transformer.
Interspeech2021
Yuka Ko, Katsuhito Sudoh, Sakriani Sakti, Satoshi Nakamura 0001, 
ASR Posterior-Based Loss for Multi-Task End-to-End Speech Translation.
Interspeech2021
Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
Dynamically Adaptive Machine Speech Chain Inference for TTS in Noisy Environment: Listen and Speak Louder.
Interspeech2021
Shun Takahashi, Sakriani Sakti, Satoshi Nakamura 0001, 
Unsupervised Neural-Based Graph Clustering for Variable-Length Speech Representation Discovery of Zero-Resource Languages.
Interspeech2021
Hirotaka Tokuyama, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura 0001, 
Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-Based Speech-to-Text Translation.
TASLP2020
Yuta Nishimura, Katsuhito Sudoh, Graham Neubig, Satoshi Nakamura 0001, 
Multi-Source Neural Machine Translation With Missing Data.
TASLP2020
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Machine Speech Chain.
TASLP2020
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Corrections to "Machine Speech Chain".
ICASSP2020
Andros Tjandra, Chunxi Liu, Frank Zhang 0001, Xiaohui Zhang 0007, Yongqiang Wang 0005, Gabriel Synnaeve, Satoshi Nakamura 0001, Geoffrey Zweig, 
DEJA-VU: Double Feature Presentation and Iterated Loss in Deep Transformer Networks.
ICASSP2024
Kevin Everson, Yile Gu, Chao-Han Huck Yang, Prashanth Gurunath Shivakumar, Guan-Ting Lin, Jari Kolehmainen, Ivan Bulyko, Ankur Gandhe, Shalini Ghosh, Wael Hamza, Hung-Yi Lee, Ariya Rastrow, Andreas Stolcke, 
Towards ASR Robust Spoken Language Understanding Through in-Context Learning with Word Confusion Networks.
ICASSP2024
Chenyang Gao, Brecht Desplanques, Chelsea J.-T. Ju, Aman Chadha, Andreas Stolcke, 
Post-Training Embedding Alignment for Decoupling Enrollment and Runtime Speaker Recognition Models.
ICASSP2024
Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-Yi Lee, Ivan Bulyko, 
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue.
ICASSP2023
Do June Min, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He 0004, Venkatesh Ravichandran, Viet Anh Trinh, 
Adaptive Endpointing with Deep Contextual Multi-Armed Bandits.
ICASSP2023
Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant P. Strimel, Andreas Stolcke, Ivan Bulyko, 
Procter: Pronunciation-Aware Contextual Adapter For Personalized Speech Recognition In Neural Transducers.
ICASSP2023
Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Qianli Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran, 
Cross-Utterance ASR Rescoring with Graph-Based Label Propagation.
Interspeech2023
Aakriti Agrawal, Milind Rao, Anit Kumar Sahu, Gopinath Chennupati, Andreas Stolcke, 
Learning When to Trust Which Teacher for Weakly Supervised ASR.
Interspeech2023
Denis Filimonov, Prabhat Pandey, Ariya Rastrow, Ankur Gandhe, Andreas Stolcke, 
Streaming Speech-to-Confusion Network Speech Recognition.
ICASSP2022
Metehan Cekic, Ruirui Li 0002, Zeya Chen, Yuguang Yang 0004, Andreas Stolcke, Upamanyu Madhow, 
Self-Supervised Speaker Recognition Training using Human-Machine Dialogues.
ICASSP2022
Aparna Khare, Eunjung Han, Yuguang Yang 0004, Andreas Stolcke, 
ASR-Aware End-to-End Neural Diarization.
ICASSP2022
K. C. Kishan, Zhenning Tan, Long Chen, Minho Jin, Eunjung Han, Andreas Stolcke, Chul Lee, 
OpenFEAT: Improving Speaker Identification by Open-Set Few-Shot Embedding Adaptation with Transformer.
ICASSP2022
Hua Shen, Yuguang Yang 0004, Guoli Sun, Ryan Langman, Eunjung Han, Jasha Droppo, Andreas Stolcke, 
Improving Fairness in Speaker Verification via Group-Adapted Fusion Network.
ICASSP2022
Liyan Xu, Yile Gu, Jari Kolehmainen, Haidar Khan, Ankur Gandhe, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko, 
RescoreBERT: Discriminative Speech Recognition Rescoring With Bert.
ICASSP2022
Chao-Han Huck Yang, Zeeshan Ahmed, Yile Gu, Joseph Szurley, Roger Ren, Linda Liu, Andreas Stolcke, Ivan Bulyko, 
Mitigating Closed-Model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition.
ICASSP2022
Xin Zhang, Minho Jin, Roger Cheng, Ruirui Li 0002, Eunjung Han, Andreas Stolcke, 
Contrastive-mixup Learning for Improved Speaker Verification.
Interspeech2022
Long Chen, Yixiong Meng, Venkatesh Ravichandran, Andreas Stolcke, 
Graph-based Multi-View Fusion and Local Adaptation: Mitigating Within-Household Confusability for Speaker Identification.
Interspeech2022
Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke, 
Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities.
Interspeech2022
Minho Jin, Chelsea Ju, Zeya Chen, Yi-Chieh Liu, Jasha Droppo, Andreas Stolcke, 
Adversarial Reweighting for Speaker Verification Fairness.
Interspeech2022
Viet Anh Trinh, Pegah Ghahremani, Brian John King, Jasha Droppo, Andreas Stolcke, Roland Maas, 
Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation.
ICASSP2021
Aditya Gourav, Linda Liu, Ankur Gandhe, Yile Gu, Guitang Lan, Xiangyang Huang, Shashank Kalmane, Gautam Tiwari, Denis Filimonov, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko, 
Personalization Strategies for End-to-End Speech Recognition Systems.
SpeechComm2024
Yuqin Lin, Jianwu Dang 0001, Longbiao Wang, Sheng Li 0010, Chenchen Ding, 
Disordered speech recognition considering low resources and abnormal articulation.
SpeechComm2024
Nan Li, Longbiao Wang, Meng Ge, Masashi Unoki, Sheng Li 0010, Jianwu Dang 0001, 
Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network.
ICASSP2024
Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li 0010, Raj Dabre, Yi Zhao, Tatsuya Kawahara, 
MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction.
ICASSP2023
Soky Kak, Sheng Li 0010, Chenhui Chu, Tatsuya Kawahara, 
Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-Based Speech Recognition of Low-Resource Language.
ICASSP2023
Qianying Liu, Zhuo Gong, Zhengdong Yang, Yuhang Yang, Sheng Li 0010, Chenchen Ding, Nobuaki Minematsu, Hao Huang 0009, Fei Cheng 0002, Chenhui Chu, Sadao Kurohashi, 
Hierarchical Softmax for End-To-End Low-Resource Multilingual Speech Recognition.
ICASSP2023
Chao Tan, Yang Cao 0011, Sheng Li 0010, Masatoshi Yoshikawa, 
General or Specific? Investigating Effective Privacy Protection in Federated Learning for Speech Emotion Recognition.
ICASSP2023
Kai Wang, Yuhang Yang, Hao Huang 0009, Ying Hu 0005, Sheng Li 0010, 
Speakeraugment: Data Augmentation for Generalizable Source Separation via Speaker Parameter Manipulation.
ICASSP2023
Yuhang Yang, Haihua Xu, Hao Huang 0009, Eng Siong Chng, Sheng Li 0010, 
Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition.
ACL-Findings2023
Shuichiro Shimizu, Chenhui Chu, Sheng Li 0010, Sadao Kurohashi, 
Towards Speech Dialogue Translation Mediating Speakers of Different Languages.
ICASSP2022
Yongjie Lv, Longbiao Wang, Meng Ge, Sheng Li 0010, Chenchen Ding, Lixin Pan, Yuguang Wang 0003, Jianwu Dang 0001, Kiyoshi Honda, 
Compressing Transformer-Based ASR Model by Task-Driven Loss and Attention-Based Multi-Level Feature Distillation.
ICASSP2022
Kai Wang, Yizhou Peng, Hao Huang 0009, Ying Hu 0005, Sheng Li 0010, 
Mining Hard Samples Locally And Globally For Improved Speech Separation.
Interspeech2022
Soky Kak, Sheng Li 0010, Masato Mimura, Chenhui Chu, Tatsuya Kawahara, 
Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism.
Interspeech2022
Kai Li 0018, Sheng Li 0010, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang 0001, Masashi Unoki, 
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.
Interspeech2022
Nan Li, Meng Ge, Longbiao Wang, Masashi Unoki, Sheng Li 0010, Jianwu Dang 0001, 
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network.
Interspeech2022
Siqing Qin, Longbiao Wang, Sheng Li 0010, Yuqin Lin, Jianwu Dang 0001, 
Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition.
Interspeech2022
Hao Shi, Longbiao Wang, Sheng Li 0010, Jianwu Dang 0001, Tatsuya Kawahara, 
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.
Interspeech2022
Longfei Yang, Wenqing Wei, Sheng Li 0010, Jiyi Li, Takahiro Shinozaki, 
Augmented Adversarial Self-Supervised Learning for Early-Stage Alzheimer's Speech Detection.
Interspeech2022
Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li 0010, Raj Dabre, Raphael Rubino, Yi Zhao, 
Fusion of Self-supervised Learned Models for MOS Prediction.
ICASSP2021
Shunfei Chen, Xinhui Hu, Sheng Li 0010, Xinkang Xu, 
An Investigation of Using Hybrid Modeling Units for Improving End-to-End Speech Recognition System.
ICASSP2021
Hao Huang 0009, Kai Wang, Ying Hu 0005, Sheng Li 0010, 
Encoder-Decoder Based Pitch Tracking and Joint Model Training for Mandarin Tone Classification.
TASLP2024
Cunhang Fan, Mingming Ding, Jianhua Tao 0001, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Zhao Lv, 
Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection.
ICASSP2023
Jun Xue, Cunhang Fan, Jiangyan Yi, Chenglong Wang, Zhengqi Wen, Dan Zhang, Zhao Lv, 
Learning From Yourself: A Self-Distillation Method For Fake Speech Detection.
TASLP2022
Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao 0001, Zhengqi Wen, 
NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.
TASLP2022
Tao Wang 0074, Jiangyan Yi, Ruibo Fu, Jianhua Tao 0001, Zhengqi Wen, 
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.
ICASSP2022
Tao Wang 0074, Jiangyan Yi, Liqun Deng, Ruibo Fu, Jianhua Tao 0001, Zhengqi Wen, 
Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.
TASLP2021
Ye Bai, Jiangyan Yi, Jianhua Tao 0001, Zhengkun Tian, Zhengqi Wen, Shuai Zhang 0014, 
Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.
TASLP2021
Ye Bai, Jiangyan Yi, Jianhua Tao 0001, Zhengqi Wen, Zhengkun Tian, Shuai Zhang 0014, 
Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.
TASLP2021
Cunhang Fan, Jiangyan Yi, Jianhua Tao 0001, Zhengkun Tian, Bin Liu 0041, Zhengqi Wen, 
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.
ICASSP2021
Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao 0001, Zhengqi Wen, 
Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.
ICASSP2021
Ruibo Fu, Jianhua Tao 0001, Zhengqi Wen, Jiangyan Yi, Tao Wang 0074, Chunyu Qiang, 
Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.
ICASSP2021
Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao 0001, Zhengqi Wen, Chunyu Qiang, Shiming Wang, 
Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.
Interspeech2021
Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao 0001, Xuefei Liu, Zhengqi Wen, 
End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.
Interspeech2021
Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao 0001, Shuai Zhang 0014, Zhengqi Wen, 
FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization.
TASLP2020
Cunhang Fan, Jianhua Tao 0001, Bin Liu 0041, Jiangyan Yi, Zhengqi Wen, Xuefei Liu, 
End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features.
ICASSP2020
Ruibo Fu, Jianhua Tao 0001, Zhengqi Wen, Jiangyan Yi, Tao Wang 0074, 
Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis.
ICASSP2020
Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao 0001, Shuai Zhang 0014, Zhengqi Wen, 
Synchronous Transformers for end-to-end Speech Recognition.
Interspeech2020
Ye Bai, Jiangyan Yi, Jianhua Tao 0001, Zhengkun Tian, Zhengqi Wen, Shuai Zhang 0014, 
Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition.
Interspeech2020
Cunhang Fan, Jianhua Tao 0001, Bin Liu 0041, Jiangyan Yi, Zhengqi Wen, 
Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.
Interspeech2020
Cunhang Fan, Jianhua Tao 0001, Bin Liu 0041, Jiangyan Yi, Zhengqi Wen, 
Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.
Interspeech2020
Ruibo Fu, Jianhua Tao 0001, Zhengqi Wen, Jiangyan Yi, Chunyu Qiang, Tao Wang 0074, 
Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.
ICASSP2024
Jiangyu Han, Federico Landini, Johan Rohdin, Mireia Díez, Lukás Burget, Yuhang Cao, Heng Lu, Jan Cernocký, 
Diacorrect: Error Correction Back-End for Speaker Diarization.
ICASSP2024
Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Shoko Araki, Jan Cernocký, 
Target Speech Extraction with Pre-Trained Self-Supervised Learning Models.
TASLP2023
Bolaji Yusuf, Jan Cernocký, Murat Saraçlar, 
End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations.
ICASSP2023
Junyi Peng, Themos Stafylakis, Rongzhi Gu, Oldrich Plchot, Ladislav Mosner, Lukás Burget, Jan Cernocký, 
Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters.
Interspeech2023
Ladislav Mosner, Oldrich Plchot, Junyi Peng, Lukás Burget, Jan Cernocký, 
Multi-Channel Speech Separation with Cross-Attention and Beamforming.
Interspeech2023
Junyi Peng, Oldrich Plchot, Themos Stafylakis, Ladislav Mosner, Lukás Burget, Jan Cernocký, 
Improving Speaker Verification with Self-Pretrained Transformer Models.
ICASSP2022
Jiangyu Han, Yanhua Long, Lukás Burget, Jan Cernocký, 
DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation and Extraction.
ICASSP2022
Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Honza Cernocký, 
Multisv: Dataset for Far-Field Multi-Channel Speaker Verification.
Interspeech2022
Murali Karthick Baskar, Tim Herzig, Diana Nguyen, Mireia Díez, Tim Polzehl, Lukás Burget, Jan Cernocký, 
Speaker adaptation for Wav2vec2 based dysarthric ASR.
Interspeech2022
Martin Kocour, Katerina Zmolíková, Lucas Ondel, Jan Svec, Marc Delcroix, Tsubasa Ochiai, Lukás Burget, Jan Cernocký, 
Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model.
Interspeech2022
Junyi Peng, Rongzhi Gu, Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Cernocký, 
Learnable Sparse Filterbank for Speaker Verification.
Interspeech2022
Themos Stafylakis, Ladislav Mosner, Oldrich Plchot, Johan Rohdin, Anna Silnova, Lukás Burget, Jan Cernocký, 
Training speaker embedding extractors using multi-speaker audio with unknown speaker boundaries.
ICASSP2021
Murali Karthick Baskar, Lukás Burget, Shinji Watanabe 0001, Ramón Fernandez Astudillo, Jan Honza Cernocký, 
Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition.
ICASSP2021
Martin Karafiát, Karel Veselý, Jan Honza Cernocký, Ján Profant, Jirí Nytra, Miroslav Hlavácek, Tomás Pavlícek, 
Analysis of X-Vectors for Low-Resource Speech Recognition.
ICASSP2021
Hari Krishna Vydana, Martin Karafiát, Katerina Zmolíková, Lukás Burget, Honza Cernocký, 
Jointly Trained Transformers Models for Spoken Language Translation.
ICASSP2021
Bolaji Yusuf, Lucas Ondel, Lukás Burget, Jan Cernocký, Murat Saraçlar, 
A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery.
Interspeech2021
Ekaterina Egorova, Hari Krishna Vydana, Lukás Burget, Jan Cernocký, 
Out-of-Vocabulary Words Detection with Attention and CTC Alignments in an End-to-End ASR System.
Interspeech2021
Martin Kocour, Karel Veselý, Alexander Blatt, Juan Zuluaga-Gomez, Igor Szöke, Jan Cernocký, Dietrich Klakow, Petr Motlícek, 
Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition.
Interspeech2021
Junyi Peng, Xiaoyang Qu, Rongzhi Gu, Jianzong Wang, Jing Xiao 0006, Lukás Burget, Jan Cernocký, 
Effective Phase Encoding for End-To-End Speaker Verification.
Interspeech2021
Junyi Peng, Xiaoyang Qu, Jianzong Wang, Rongzhi Gu, Jing Xiao 0006, Lukás Burget, Jan Cernocký, 
ICSpk: Interpretable Complex Speaker Embedding Extractor from Raw Waveform.
TASLP2024
Xun Gong 0005, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, Rui Zhao 0017, Xie Chen 0001, Yanmin Qian, 
Advanced Long-Content Speech Recognition With Factorized Neural Transducer.
TASLP2024
Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu 0012, Shujie Liu 0001, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Furu Wei, 
VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation.
TASLP2024
Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu 0012, Shuo Ren, Shujie Liu 0001, Zhuoyuan Yao, Xun Gong 0005, Li-Rong Dai 0001, Jinyu Li 0001, Furu Wei, 
SpeechLM: Enhanced Speech Pre-Training With Unpaired Textual Data.
ICASSP2023
Zhuo Chen 0006, Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiaofei Wang 0009, Takuya Yoshioka, Jinyu Li 0001, Sunit Sivasankaran, Sefik Emre Eskimez, 
Speech Separation with Large-Scale Self-Supervised Learning.
ICASSP2023
Quchen Fu, Szu-Wei Fu, Yaran Fan, Yu Wu 0012, Zhuo Chen 0006, Jayant Gupchup, Ross Cutler, 
Real-Time Speech Interruption Analysis: from Cloud to Client Deployment.
ICASSP2023
Xun Gong 0005, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, Rui Zhao 0017, Xie Chen 0001, Yanmin Qian, 
LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer.
Interspeech2023
Youngdo Ahn, Chengyi Wang 0002, Yu Wu 0012, Jong Won Shin, Shujie Liu 0001, 
GRAVO: Learning to Generate Relevant Audio from Visual Features with Noisy Online Videos.
Interspeech2023
Yuang Li, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, 
Accelerating Transducers through Adjacent Token Merging.
Interspeech2023
Peidong Wang, Eric Sun, Jian Xue, Yu Wu 0012, Long Zhou, Yashesh Gaur, Shujie Liu 0001, Jinyu Li 0001, 
LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers.
ICML2023
Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Shujie Liu 0001, Daniel Tompkins, Zhuo Chen 0006, Wanxiang Che, Xiangzhan Yu, Furu Wei, 
BEATs: Audio Pre-Training with Acoustic Tokenizers.
ICASSP2022
Zhengyang Chen, Sanyuan Chen, Yu Wu 0012, Yao Qian, Chengyi Wang 0002, Shujie Liu 0001, Yanmin Qian, Michael Zeng 0001, 
Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification.
ICASSP2022
Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Zhengyang Chen, Zhuo Chen 0006, Shujie Liu 0001, Jian Wu 0027, Yao Qian, Furu Wei, Jinyu Li 0001, Xiangzhan Yu, 
Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training.
ICASSP2022
Yiming Wang, Jinyu Li 0001, Heming Wang, Yao Qian, Chengyi Wang 0002, Yu Wu 0012, 
Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition.
ICASSP2022
Chengyi Wang 0002, Yu Wu 0012, Sanyuan Chen, Shujie Liu 0001, Jinyu Li 0001, Yao Qian, Zhenglu Yang, 
Improving Self-Supervised Learning for Speech Recognition with Intermediate Layer Supervision.
Interspeech2022
Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Shujie Liu 0001, Zhuo Chen 0006, Peidong Wang, Gang Liu 0001, Jinyu Li 0001, Jian Wu 0027, Xiangzhan Yu, Furu Wei, 
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Interspeech2022
Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li 0001, Xie Chen 0001, Yu Wu 0012, Yifan Gong 0001, 
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.
Interspeech2022
Shuo Ren, Shujie Liu 0001, Yu Wu 0012, Long Zhou, Furu Wei, 
Speech Pre-training with Acoustic Piece.
Interspeech2022
Chengyi Wang 0002, Yiming Wang, Yu Wu 0012, Sanyuan Chen, Jinyu Li 0001, Shujie Liu 0001, Furu Wei, 
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training.
ICASSP2024
Gowtham Ramesh, Kartik Audhkhasi, Bhuvana Ramabhadran, 
Task Vector Algebra for ASR Models.
ICASSP2024
Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov, 
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data.
ICASSP2023
Kartik Audhkhasi, Brian Farris, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Modular Conformer Training for Flexible End-to-End ASR.
ICASSP2023
Tongzhou Chen, Cyril Allauzen, Yinghui Huang, Daniel S. Park, David Rybach, W. Ronny Huang, Rodrigo Cabrera, Kartik Audhkhasi, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Michael Riley 0001, 
Large-Scale Language Model Rescoring on Long-Form Data.
ICASSP2023
Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang 0033, Bo Li 0028, Andrew Rosenberg, Bhuvana Ramabhadran, 
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition.
ICASSP2023
Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang 0033, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran, 
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech.
ICASSP2023
Gary Wang, Kyle Kastner, Ankur Bapna, Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang 0033, 
Understanding Shared Speech-Text Representations.
Interspeech2023
Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Kartik Audhkhasi, 
O-1: Self-training with Oracle and 1-best Hypothesis.
Interspeech2023
Yochai Blau, Rohan Agrawal, Lior Madmony, Gary Wang, Andrew Rosenberg, Zhehuai Chen, Zorik Gekhman, Genady Beryozkin, Parisa Haghani, Bhuvana Ramabhadran, 
Using Text Injection to Improve Recognition of Personal Identifiers in Speech.
ICASSP2022
Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Gary Wang, 
Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses.
ICASSP2022
Neeraj Gaur, Tongzhou Chen, Ehsan Variani, Parisa Haghani, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Multilingual Second-Pass Rescoring for Automatic Speech Recognition Systems.
Interspeech2022
Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition.
Interspeech2022
Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang 0033, Nicolás Serrano, 
Reducing Domain mismatch in Self-supervised speech pre-training.
Interspeech2022
Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Ankur Bapna, Heiga Zen, 
MAESTRO: Matched Speech Text Representations through Modality Matching.
Interspeech2022
Ehsan Variani, Michael Riley 0001, David Rybach, Cyril Allauzen, Tongzhou Chen, Bhuvana Ramabhadran, 
On Adaptive Weight Interpolation of the Hybrid Autoregressive Transducer.
Interspeech2022
Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach, 
Improving Rare Word Recognition with LM-aware MWER Training.
Interspeech2022
Gary Wang, Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Jesse Emond, Yinghui Huang, Pedro J. Moreno 0001, 
Non-Parallel Voice Conversion for ASR Augmentation.
ICASSP2021
Rohan Doshi, Youzheng Chen, Liyang Jiang, Xia Zhang, Fadi Biadsy, Bhuvana Ramabhadran, Fang Chu, Andrew Rosenberg, Pedro J. Moreno 0001, 
Extending Parrotron: An End-to-End, Speech Conversion and Speech Recognition Model for Atypical Speech.
ICASSP2021
Neeraj Gaur, Brian Farris, Parisa Haghani, Isabel Leal, Pedro J. Moreno 0001, Manasa Prasad, Bhuvana Ramabhadran, Yun Zhu, 
Mixture of Informed Experts for Multilingual Speech Recognition.
ICASSP2021
Hainan Xu, Yinghui Huang, Yun Zhu, Kartik Audhkhasi, Bhuvana Ramabhadran, 
Convolutional Dropout and Wordpiece Augmentation for End-to-End Speech Recognition.
ICASSP2024
Yang Zhang 0089, Travis M. Bartley, Mariana Graterol-Fuenmayor, Vitaly Lavrukhin, Evelina Bakhturina, Boris Ginsburg, 
A Chat about Boring Problems: Studying GPT-Based Text Normalization.
ICASSP2024
Maxime Burchi, Krishna C. Puvvada, Jagadeesh Balam, Boris Ginsburg, Radu Timofte, 
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer.
ICASSP2024
Zhehuai Chen, He Huang, Andrei Andrusenko, Oleksii Hrinchuk, Krishna C. Puvvada, Jason Li, Subhankar Ghosh, Jagadeesh Balam, Boris Ginsburg, 
SALM: Speech-Augmented Language Model with in-Context Learning for Speech Recognition and Translation.
ICASSP2024
Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg, 
Investigating End-to-End ASR Architectures for Long Form Audio Transcription.
ICASSP2024
Vahid Noroozi, Somshubra Majumdar, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg, 
Stateful Conformer with Cache-Based Inference for Streaming Automatic Speech Recognition.
ICASSP2024
Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg, 
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition.
ICASSP2024
Hainan Xu, Zhehuai Chen, Fei Jia, Boris Ginsburg, 
Transducers with Pronunciation-Aware Embeddings for Automatic Speech Recognition.
ICML2024
Paarth Neekhara, Shehzeen Samarah Hussain, Rafael Valle, Boris Ginsburg, Rishabh Ranjan, Shlomo Dubnov, Farinaz Koushanfar, Julian J. McAuley, 
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations.
ICASSP2023
Rohan Badlani, Akshit Arora, Subhankar Ghosh, Rafael Valle, Kevin J. Shih, João Felipe Santos, Boris Ginsburg, Bryan Catanzaro, 
Vani: Very-Lightweight Accent-Controllable TTS for Native And Non-Native Speakers With Identity Preservation.
ICASSP2023
Travis M. Bartley, Fei Jia, Krishna C. Puvvada, Samuel Kriman, Boris Ginsburg, 
Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models.
ICASSP2023
Shehzeen Hussain, Paarth Neekhara, Jocelyn Huang, Jason Li, Boris Ginsburg, 
ACE-VC: Adaptive and Controllable Voice Conversion Using Explicitly Disentangled Self-Supervised Speech Representations.
ICASSP2023
Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe 0001, Boris Ginsburg, 
Multi-Blank Transducers for Speech Recognition.
ICASSP2023
Yang Zhang 0089, Krishna C. Puvvada, Vitaly Lavrukhin, Boris Ginsburg, 
Conformer-Based Target-Speaker Automatic Speech Recognition For Single-Channel Audio.
Interspeech2023
Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg, 
SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings.
Interspeech2023
Vladimir Bataev, Roman Korostik, Evgeny Shabalin, Vitaly Lavrukhin, Boris Ginsburg, 
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator.
Interspeech2023
Igor Gitman, Vitaly Lavrukhin, Aleksandr Laptev, Boris Ginsburg, 
Confidence-based Ensembles of End-to-End Speech Recognition Models.
Interspeech2023
Cheng-Ping Hsieh, Subhankar Ghosh, Boris Ginsburg, 
Adapter-Based Extension of Multi-Speaker Text-To-Speech Model for New Speakers.
Interspeech2023
He Huang, Jagadeesh Balam, Boris Ginsburg, 
Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling.
Interspeech2023
Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg, 
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification.
Interspeech2023
Elena Rastorgueva, Vitaly Lavrukhin, Boris Ginsburg, 
NeMo Forced Aligner and its application to word alignment for subtitle generation.
ICASSP2024
Dianwen Ng, Chong Zhang 0003, Ruixi Zhang, Yukun Ma, Fabian Ritter Gutierrez, Trung Hieu Nguyen 0001, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma 0001, 
Are Soft Prompts Good Zero-Shot Learners for Speech Recognition?
ICASSP2024
Jia Qi Yip, Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang 0003, Hao Wang 0199, Trung Hieu Nguyen 0001, Kun Zhou 0003, Dianwen Ng, Eng Siong Chng, Bin Ma 0001, 
SPGM: Prioritizing Local Features for Enhanced Speech Separation Performance.
ICASSP2024
Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang 0003, Hao Wang 0199, Trung Hieu Nguyen 0001, Kun Zhou 0003, Jia Qi Yip, Dianwen Ng, Bin Ma 0001, 
MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation.
ICASSP2023
Yukun Ma, Trung Hieu Nguyen 0001, Jinjie Ni, Wen Wang, Qian Chen 0003, Chong Zhang 0003, Bin Ma 0001, 
Auxiliary Pooling Layer For Spoken Language Understanding.
ICASSP2023
Dianwen Ng, Ruixi Zhang, Jia Qi Yip, Zhao Yang, Jinjie Ni, Chong Zhang 0003, Yukun Ma, Chongjia Ni, Eng Siong Chng, Bin Ma 0001, 
De'hubert: Disentangling Noise in a Self-Supervised Model for Robust Speech Recognition.
ICASSP2023
Dianwen Ng, Ruixi Zhang, Jia Qi Yip, Chong Zhang 0003, Yukun Ma, Trung Hieu Nguyen 0001, Chongjia Ni, Eng Siong Chng, Bin Ma 0001, 
Contrastive Speech Mixup for Low-Resource Keyword Spotting.
ICASSP2023
Jinjie Ni, Yukun Ma, Wen Wang, Qian Chen 0033, Dianwen Ng, Han Lei, Trung Hieu Nguyen 0001, Chong Zhang 0003, Bin Ma 0001, Erik Cambria, 
Adaptive Knowledge Distillation Between Text and Speech Pre-Trained Models.
ICASSP2023
Shengkui Zhao, Bin Ma 0001, 
D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network Using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement.
ICASSP2023
Shengkui Zhao, Bin Ma 0001, 
MossFormer: Pushing the Performance Limit of Monaural Speech Separation Using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions.
Interspeech2023
Dianwen Ng, Chong Zhang 0003, Ruixi Zhang, Yukun Ma, Trung Hieu Nguyen 0001, Chongjia Ni, Shengkui Zhao, Qian Chen 0003, Wen Wang, Eng Siong Chng, Bin Ma 0001, 
Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition.
Interspeech2023
Dianwen Ng, Yang Xiao, Jia Qi Yip, Zhao Yang, Biao Tian, Qiang Fu 0001, Eng Siong Chng, Bin Ma 0001, 
Small Footprint Multi-channel Network for Keyword Spotting with Centroid Based Awareness.
Interspeech2023
Zhao Yang, Dianwen Ng, Chong Zhang 0003, Xiao Fu, Rui Jiang, Wei Xi, Yukun Ma, Chongjia Ni, Eng Siong Chng, Bin Ma 0001, Jizhong Zhao, 
Dual Acoustic Linguistic Self-supervised Representation Learning for Cross-Domain Speech Recognition.
Interspeech2023
Zhao Yang, Dianwen Ng, Chong Zhang 0003, Rui Jiang, Wei Xi, Yukun Ma, Chongjia Ni, Jizhong Zhao, Bin Ma 0001, Eng Siong Chng, 
A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions.
Interspeech2023
Zhao Yang, Dianwen Ng, Xizhe Li, Chong Zhang 0003, Rui Jiang, Wei Xi, Yukun Ma, Chongjia Ni, Jizhong Zhao, Bin Ma 0001, Eng Siong Chng, 
Dual-Memory Multi-Modal Learning for Continual Spoken Keyword Spotting with Confidence Selection and Diversity Enhancement.
Interspeech2023
Jia Qi Yip, Duc-Tuan Truong, Dianwen Ng, Chong Zhang 0003, Yukun Ma, Trung Hieu Nguyen 0001, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma 0001, 
ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention.
ICASSP2022
Yukun Ma, Trung Hieu Nguyen 0001, Bin Ma 0001, 
CPT: Cross-Modal Prefix-Tuning for Speech-To-Text Translation.
ICASSP2022
Karn N. Watcharasupat, Thi Ngoc Tho Nguyen, Woon-Seng Gan, Shengkui Zhao, Bin Ma 0001, 
End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression.
ICASSP2022
Fan Yu, Shiliang Zhang, Yihui Fu, Lei Xie 0001, Siqi Zheng, Zhihao Du, Weilong Huang, Pengcheng Guo, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
ICASSP2022
Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie 0001, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
ICASSP2022
Shengkui Zhao, Bin Ma 0001, Karn N. Watcharasupat, Woon-Seng Gan, 
FRCRN: Boosting Feature Representation Using Frequency Recurrence for Monaural Speech Enhancement.
ICASSP2024
Xueyuan Chen, Xi Wang 0016, Shaofei Zhang, Lei He 0005, Zhiyong Wu 0001, Xixin Wu, Helen Meng, 
Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis.
ICML2024
Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan 0003, Detai Xin, Dongchao Yang, Eric Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu 0001, Tao Qin 0001, Xiangyang Li 0001, Wei Ye 0004, Shikun Zhang, Jiang Bian 0002, Lei He 0005, Jinyu Li 0001, Sheng Zhao, 
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
ICLR2024
Yichong Leng, Zhifang Guo, Kai Shen, Zeqian Ju, Xu Tan 0003, Eric Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He 0005, Xiangyang Li 0001, Sheng Zhao, Tao Qin 0001, Jiang Bian 0002, 
PromptTTS 2: Describing and Generating Voices with Text Prompt.
ICLR2024
Kai Shen, Zeqian Ju, Xu Tan 0003, Eric Liu, Yichong Leng, Lei He 0005, Tao Qin 0001, Sheng Zhao, Jiang Bian 0002, 
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers.
ICASSP2023
Yan Deng, Long Zhou, Yuanhao Yi, Shujie Liu 0001, Lei He 0005, 
Prosody-Aware Speecht5 for Expressive Neural TTS.
ICASSP2023
Kun Wei, Long Zhou, Ziqiang Zhang, Liping Chen, Shujie Liu 0001, Lei He 0005, Jinyu Li 0001, Furu Wei, 
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation.
ICASSP2023
Chen Zhang 0020, Shubham Bansal, Aakash Lakhera, Jinzhu Li, Gang Wang 0001, Sandeepkumar Satpal, Sheng Zhao, Lei He 0005, 
LeanSpeech: The Microsoft Lightweight Speech Synthesis System for Limmits Challenge 2023.
Interspeech2023
Brendan Walsh, Mark Hamilton, Greg Newby, Xi Wang 0016, Serena Ruan, Sheng Zhao, Lei He 0005, Shaofei Zhang, Eric Dettinger, William T. Freeman, Markus Weimer, 
Large-Scale Automatic Audiobook Creation.
Interspeech2023
Yujia Xiao, Shaofei Zhang, Xi Wang 0016, Xu Tan 0003, Lei He 0005, Sheng Zhao, Frank K. Soong, Tan Lee 0001, 
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading.
NeurIPS2023
Yuancheng Wang, Zeqian Ju, Xu Tan 0003, Lei He 0005, Zhizheng Wu 0001, Jiang Bian 0002, Sheng Zhao, 
AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models.
AAAI2023
Yihan Wu, Junliang Guo, Xu Tan 0003, Chen Zhang 0020, Bohan Li 0003, Ruihua Song, Lei He 0005, Sheng Zhao, Arul Menezes, Jiang Bian 0002, 
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing.
ICASSP2022
Zehua Chen, Xu Tan 0003, Ke Wang, Shifeng Pan, Danilo P. Mandic, Lei He 0005, Sheng Zhao, 
Infergrad: Improving Diffusion Models for Vocoder by Considering Inference in Training.
ICASSP2022
Yujia Xiao, Xi Wang 0016, Lei He 0005, Frank K. Soong, 
Improving Fastspeech TTS with Efficient Self-Attention and Compact Feed-Forward Network.
ICASSP2022
Yuanhao Yi, Lei He 0005, Shifeng Pan, Xi Wang 0016, Yujia Xiao, 
Prosodyspeech: Towards Advanced Prosody Model for Neural Text-to-Speech.
ICASSP2022
Fengpeng Yue, Yan Deng, Lei He 0005, Tom Ko, Yu Zhang 0006, 
Exploring Machine Speech Chain For Domain Adaptation.
Interspeech2022
Mutian He 0001, Jingzhou Yang, Lei He 0005, Frank K. Soong, 
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge.
Interspeech2022
Yanqing Liu, Ruiqing Xue, Lei He 0005, Xu Tan 0003, Sheng Zhao, 
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders.
Interspeech2022
Yihan Wu, Xu Tan 0003, Bohan Li 0003, Lei He 0005, Sheng Zhao, Ruihua Song, Tao Qin 0001, Tie-Yan Liu, 
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios.
Interspeech2022
Yihan Wu, Xi Wang 0016, Shaofei Zhang, Lei He 0005, Ruihua Song, Jian-Yun Nie, 
Self-supervised Context-aware Style Representation for Expressive Speech Synthesis.
Interspeech2022
Yuanhao Yi, Lei He 0005, Shifeng Pan, Xi Wang 0016, Yuchao Zhang, 
SoftSpeech: Unsupervised Duration Model in FastSpeech 2.
SpeechComm2024
Keqi Deng, Philip C. Woodland, 
Decoupled structure for improved adaptability of end-to-end models.
TASLP2024
Keqi Deng, Philip C. Woodland, 
Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition.
TASLP2024
Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
Graph Neural Networks for Contextual ASR With the Tree-Constrained Pointer Generator.
ICASSP2024
Keqi Deng, Philip C. Woodland, 
FastInject: Injecting Unpaired Text Data into CTC-Based ASR Training.
ICASSP2024
Nineli Lashkarashvili, Wen Wu, Guangzhi Sun, Philip C. Woodland, 
Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation.
ACL2024
Keqi Deng, Philip C. Woodland, 
Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation.
ACL-Findings2024
Guangzhi Sun, Shutong Feng, Dongcheng Jiang, Chao Zhang 0031, Milica Gasic, Philip C. Woodland, 
Speech-based Slot Filling using Large Language Models.
SpeechComm2023
Qiujia Li, Chao Zhang 0031, Philip C. Woodland, 
Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring.
TASLP2023
Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
Minimising Biasing Word Errors for Contextual ASR With the Tree-Constrained Pointer Generator.
ICASSP2023
Keqi Deng, Philip C. Woodland, 
Adaptable End-to-End ASR Models Using Replaceable Internal LMs and Residual Softmax.
ICASSP2023
Evonne P. C. Lee, Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
Spectral Clustering-Aware Learning of Embeddings for Speaker Diarisation.
ICASSP2023
Yuang Li, Xianrui Zheng, Philip C. Woodland, 
Self-Supervised Learning-Based Source Separation for Meeting Data.
ICASSP2023
Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
End-to-End Spoken Language Understanding with Tree-Constrained Pointer Generator.
ICASSP2023
Wen Wu, Chao Zhang 0031, Philip C. Woodland, 
Self-Supervised Representations in Speech-Based Depression Detection.
Interspeech2023
Dongcheng Jiang, Chao Zhang 0031, Philip C. Woodland, 
A Neural Time Alignment Module for End-to-End Automatic Speech Recognition.
Interspeech2023
Florian L. Kreyssig, Yangyang Shi, Jinxi Guo, Leda Sari, Abdel-rahman Mohamed, Philip C. Woodland, 
Biased Self-supervised Learning for ASR.
Interspeech2023
Guangzhi Sun, Xianrui Zheng, Chao Zhang 0031, Philip C. Woodland, 
Can Contextual Biasing Remain Effective with Whisper and GPT-2?
Interspeech2023
Wen Wu, Chao Zhang 0031, Philip C. Woodland, 
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations.
ICASSP2022
Qiujia Li, Yu Zhang 0033, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland, 
Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition.
ICASSP2022
Xiaoyu Yang, Qiujia Li, Philip C. Woodland, 
Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-Trained Models.
TASLP2024
Kristina Tesch, Timo Gerkmann, 
Multi-Channel Speech Separation Using Spatially Selective Deep Non-Linear Filters.
ICASSP2024
Bunlong Lay, Jean-Marie Lemercier, Julius Richter, Timo Gerkmann, 
Single and Few-Step Diffusion for Generative Speech Enhancement.
ICASSP2024
Danilo de Oliveira, Timo Gerkmann, 
Distilling Hubert with LSTMs via Decoupled Knowledge Distillation.
ICASSP2024
Navin Raj Prabhu, Bunlong Lay, Simon Welker, Nale Lehmann-Willenbrock, Timo Gerkmann, 
EMOCONV-Diff: Diffusion-Based Speech Emotion Conversion for Non-Parallel and in-the-Wild Data.
ICASSP2024
Simon Welker, Tal Peer, Henry N. Chapman, Timo Gerkmann, 
Live Iterative Ptychography with Projection-Based Algorithms.
TASLP2023
Huajian Fang, Dennis Becker, Stefan Wermter, Timo Gerkmann, 
Integrating Uncertainty Into Neural Network-Based Speech Enhancement.
TASLP2023
Jean-Marie Lemercier, Julius Richter, Simon Welker, Timo Gerkmann, 
StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation.
TASLP2023
Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Timo Gerkmann, 
Speech Enhancement and Dereverberation With Diffusion-Based Generative Models.
TASLP2023
Kristina Tesch, Timo Gerkmann, 
Insights Into Deep Non-Linear Filters for Improved Multi-Channel Speech Enhancement.
ICASSP2023
Huajian Fang, Timo Gerkmann, 
Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models.
ICASSP2023
Huajian Fang, Niklas Wittmer, Johannes Twiefel, Stefan Wermter, Timo Gerkmann, 
Partially Adaptive Multichannel Joint Reduction of Ego-Noise and Environmental Noise.
ICASSP2023
Jean-Marie Lemercier, Julius Richter, Simon Welker, Timo Gerkmann, 
Analysing Diffusion-based Generative Approaches Versus Discriminative Approaches for Speech Restoration.
ICASSP2023
Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Tal Peer, Timo Gerkmann, 
Speech Signal Improvement Using Causal Generative Diffusion Models.
ICASSP2023
Kristina Tesch, Timo Gerkmann, 
Spatially Selective Deep Non-Linear Filters For Speaker Extraction.
Interspeech2023
Bunlong Lay, Simon Welker, Julius Richter, Timo Gerkmann, 
Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement.
Interspeech2023
Jean-Marie Lemercier, Julian Tobergte, Timo Gerkmann, 
Extending DNN-based Multiplicative Masking to Deep Subband Filtering for Improved Dereverberation.
Interspeech2023
Héctor Martel, Julius Richter, Kai Li, Xiaolin Hu 0001, Timo Gerkmann, 
Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model.
Interspeech2023
Danilo de Oliveira, Navin Raj Prabhu, Timo Gerkmann, 
Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models.
ICASSP2022
Huajian Fang, Tal Peer, Stefan Wermter, Timo Gerkmann, 
Integrating Statistical Uncertainty into Neural Network-Based Speech Enhancement.
ICASSP2022
Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann, 
Customizable End-To-End Optimization Of Online Neural Network-Supported Dereverberation For Hearing Devices.
TASLP2024
Jae-Hong Lee, Joon-Hyuk Chang, 
Partitioning Attention Weight: Mitigating Adverse Effect of Incorrect Pseudo-Labels for Self-Supervised ASR.
ICASSP2024
Won-Gook Choi, Donghyun Seong, Joon-Hyuk Chang, 
Adversarial Learning on Compressed Posterior Space for Non-Iterative Score-based End-to-End Text-to-Speech.
ICASSP2024
Dong-Hyun Kim, Jae-Hong Lee, Joon-Hyuk Chang, 
Text-Only Unsupervised Domain Adaptation for Neural Transducer-Based ASR Personalization Using Synthesized Data.
ICASSP2023
Jin-Seong Choi, Jae-Hong Lee, Chae-Won Lee, Joon-Hyuk Chang, 
M-CTRL: A Continual Representation Learning Framework with Slowly Improving Past Pre-Trained Model.
ICASSP2023
Sohee Jang, Jiye Kim, Yeon-Ju Kim, Joon-Hyuk Chang, 
Adaptive Time-Scale Modification for Improving Speech Intelligibility Based On Phoneme Clustering For Streaming Services.
ICASSP2023
Ye-Rin Jeoung, Joon-Young Yang, Jeong-Hwan Choi, Joon-Hyuk Chang, 
Improving Transformer-Based End-to-End Speaker Diarization by Assigning Auxiliary Losses to Attention Heads.
ICASSP2023
Jae-Hong Lee, Dong-Hyun Kim, Joon-Hyuk Chang, 
Repackagingaugment: Overcoming Prediction Error Amplification in Weight-Averaged Speech Recognition Models Subject to Self-Training.
ICASSP2023
Ju-Seok Seong, Jeong-Hwan Choi, Jehyun Kyung, Ye-Rin Jeoung, Joon-Hyuk Chang, 
Noise-Aware Target Extension with Self-Distillation for Robust Speech Recognition.
ICASSP2023
Da-Hee Yang, Joon-Hyuk Chang, 
Selective Film Conditioning with CTC-Based ASR Probability for Speech Enhancement.
Interspeech2023
Min-Sang Baek, Joon-Young Yang, Joon-Hyuk Chang, 
Deeply Supervised Curriculum Learning for Deep Neural Network-based Sound Source Localization.
Interspeech2023
Jae-Heung Cho, Joon-Hyuk Chang, 
SR-SRP: Super-Resolution based SRP-PHAT for Sound Source Localization and Tracking.
Interspeech2023
Won-Gook Choi, Joon-Hyuk Chang, 
Resolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection.
Interspeech2023
Won-Gook Choi, So-Jeong Kim, Tae-Ho Kim, Joon-Hyuk Chang, 
Prior-free Guided TTS: An Improved and Efficient Diffusion-based Text-Guided Speech Synthesis.
Interspeech2023
Ye-Rin Jeoung, Jeong-Hwan Choi, Ju-Seok Seong, Jehyun Kyung, Joon-Hyuk Chang, 
Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization.
Interspeech2023
Do-Hee Kim, Ji-Eun Choi, Joon-Hyuk Chang, 
Intra-ensemble: A New Method for Combining Intermediate Outputs in Transformer-based Automatic Speech Recognition.
Interspeech2023
Do-Hee Kim, Daeyeol Shim, Joon-Hyuk Chang, 
General-purpose Adversarial Training for Enhanced Automatic Speech Recognition Model Generalization.
Interspeech2023
Jehyun Kyung, Ju-Seok Seong, Jeong-Hwan Choi, Ye-Rin Jeoung, Joon-Hyuk Chang, 
Improving Joint Speech and Emotion Recognition Using Global Style Tokens.
Interspeech2023
JungPhil Park, Jeong-Hwan Choi, Yungyeo Kim, Joon-Hyuk Chang, 
HAD-ANC: A Hybrid System Comprising an Adaptive Filter and Deep Neural Networks for Active Noise Control.
TASLP2022
Moa Lee, Junmo Lee, Joon-Hyuk Chang, 
Non-Autoregressive Fully Parallel Deep Convolutional Neural Speech Synthesis.
TASLP2022
Joon-Young Yang, Joon-Hyuk Chang, 
VACE-WPE: Virtual Acoustic Channel Expansion Based on Neural Networks for Weighted Prediction Error-Based Speech Dereverberation.
ICASSP2024
Ankit Shah 0001, Fuyu Tang, Zelin Ye, Rita Singh, Bhiksha Raj, 
Importance of Negative Sampling in Weak Label Learning.
ICASSP2024
Soham Deshmukh, Benjamin Elizalde, Dimitra Emmanouilidou, Bhiksha Raj, Rita Singh, Huaming Wang, 
Training Audio Captioning Models without Audio.
ICASSP2024
Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh, 
Prompting Audios Using Acoustic Properties for Emotion Representation.
ICASSP2024
Chien-Yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan S. Sharma, Shinji Watanabe 0001, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee, 
Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech.
ICASSP2024
Jee-Weon Jung, Roshan S. Sharma, William Chen, Bhiksha Raj, Shinji Watanabe 0001, 
AugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language Models.
ICASSP2024
Muqiao Yang, Chunlei Zhang, Yong Xu 0004, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu 0001, 
uSee: Unified Speech Enhancement And Editing with Conditional Diffusion Models.
ACL2024
Roshan Sharma 0001, Suwon Shon, Mark Lindsey, Hira Dhamyal, Bhiksha Raj, 
Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?
ACL-Findings2024
Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj, 
Continual Contrastive Spoken Language Understanding.
NAACL-Findings2024
Roshan Sharma 0001, Ruchira Sharma, Hira Dhamyal, Rita Singh, Bhiksha Raj, 
R-BASS : Relevance-aided Block-wise Adaptation for Speech Summarization.
ICASSP2023
Francisco Teixeira, Alberto Abad, Bhiksha Raj, Isabel Trancoso, 
Privacy-Preserving Automatic Speaker Diarization.
ICASSP2023
Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar 0003, Shinji Watanabe 0001, Bhiksha Raj, 
Paaploss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement.
ICASSP2023
Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar 0003, Shinji Watanabe 0001, Bhiksha Raj, 
TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement.
Interspeech2023
Roshan Sharma 0001, Siddhant Arora, Kenneth Zheng, Shinji Watanabe 0001, Rita Singh, Bhiksha Raj, 
BASS: Block-wise Adaptation for Speech Summarization.
Interspeech2023
Raphaël Olivier, Bhiksha Raj, 
There is more than one kind of robustness: Fooling Whisper with adversarial examples.
Interspeech2023
Liao Qu, Xianwei Zou, Xiang Li 0106, Yandong Wen, Rita Singh, Bhiksha Raj, 
The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features.
NeurIPS2023
Shentong Mo, Bhiksha Raj, 
Weakly-Supervised Audio-Visual Segmentation.
AAAI2023
Xiang Li 0003, Haoyuan Cao, Shijie Zhao 0001, Junlin Li, Li Zhang 0006, Bhiksha Raj, 
Panoramic Video Salient Object Detection with Ambisonic Audio Guidance.
EMNLP2023
Xiang Li 0106, Jinglu Wang, Xiaohao Xu, Muqiao Yang, Fan Yang, Yizhou Zhao, Rita Singh, Bhiksha Raj, 
Towards Noise-Tolerant Speech-Referring Video Object Segmentation: Bridging Speech and Text.
Interspeech2022
Hira Dhamyal, Bhiksha Raj, Rita Singh, 
Positional Encoding for Capturing Modality Specific Cadence for Emotion Detection.
Interspeech2022
Raphaël Olivier, Bhiksha Raj, 
Recent improvements of ASR models in the face of adversarial attacks.
TASLP2024
Hao Zhang, Yixuan Zhang 0005, Meng Yu 0003, Dong Yu 0001, 
Enhanced Acoustic Howling Suppression via Hybrid Kalman Filter and Deep Learning Models.
Interspeech2023
Yong Xu 0004, Vinay Kothapally, Meng Yu 0003, Shixiong Zhang, Dong Yu 0001, 
Zoneformer: On-device Neural Beamformer For In-car Multi-zone Speech Separation, Enhancement and Echo Cancellation.
Interspeech2023
Hao Zhang, Meng Yu 0003, Yuzhong Wu, Tao Yu, Dong Yu 0001, 
Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression.
ICASSP2022
Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu 0003, Zhenyu Tang 0001, Dinesh Manocha, Dong Yu 0001, 
Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator.
ICASSP2022
Brian Yan, Chunlei Zhang, Meng Yu 0003, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe 0001, Dong Yu 0001, 
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.
ICASSP2022
Chunlei Zhang, Jiatong Shi, Chao Weng, Meng Yu 0003, Dong Yu 0001, 
Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.
Interspeech2022
Vinay Kothapally, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Dong Yu 0001, 
Joint Neural AEC and Beamforming with Double-Talk Detection.
TASLP2021
Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, Jesper Jensen 0001, 
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.
TASLP2021
Zhuohuang Zhang, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu 0001, 
Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation.
ICASSP2021
Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe 0001, Meng Yu 0003, Dong Yu 0001, 
Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation.
ICASSP2021
Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe 0001, Meng Yu 0003, Yong Xu 0004, Shi-Xiong Zhang, Dong Yu 0001, 
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.
ICASSP2021
Wei Xia, Chunlei Zhang, Chao Weng, Meng Yu 0003, Dong Yu 0001, 
Self-Supervised Text-Independent Speaker Verification Using Prototypical Momentum Contrastive Learning.
ICASSP2021
Chunlei Zhang, Meng Yu 0003, Chao Weng, Dong Yu 0001, 
Towards Robust Speaker Verification with Target Speaker Enhancement.
ICASSP2021
Naijun Zheng, Na Li 0012, Bo Wu, Meng Yu 0003, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.
Interspeech2021
Xiyun Li, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Jiaming Xu 0001, Bo Xu 0002, Dong Yu 0001, 
MIMO Self-Attentive RNN Beamformer for Multi-Speaker Speech Separation.
Interspeech2021
Helin Wang, Bo Wu, Lianwu Chen, Meng Yu 0003, Jianwei Yu, Yong Xu 0004, Shi-Xiong Zhang, Chao Weng, Dan Su 0002, Dong Yu 0001, 
TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.
Interspeech2021
Yong Xu 0004, Zhuohuang Zhang, Meng Yu 0003, Shi-Xiong Zhang, Dong Yu 0001, 
Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation.
Interspeech2021
Meng Yu 0003, Chunlei Zhang, Yong Xu 0004, Shi-Xiong Zhang, Dong Yu 0001, 
MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment.
ICASSP2020
Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu 0004, Meng Yu 0003, Dan Su 0002, Yuexian Zou, Dong Yu 0001, 
Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning.
ICASSP2020
Xuan Ji, Meng Yu 0003, Chunlei Zhang, Dan Su 0002, Tao Yu, Xiaoyu Liu, Dong Yu 0001, 
Speaker-Aware Target Speaker Enhancement by Jointly Learning with Speaker Embedding Extraction.
ICASSP2023
Takashi Fukuda, Samuel Thomas 0001, 
Effective Training of RNN Transducer Models on Diverse Sources of Speech and Text Data.
ICASSP2023
Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas 0001, Rogério Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James R. Glass, 
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval.
ICASSP2023
Vishal Sunder, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Brian Kingsbury, Eric Fosler-Lussier, 
Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and Understanding.
ICASSP2023
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, George Saon, Brian Kingsbury, 
Multi-Speaker Data Augmentation for Improved end-to-end Automatic Speech Recognition.
Interspeech2023
Andrew Rouditchenko, Sameer Khurana, Samuel Thomas 0001, Rogério Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James R. Glass, 
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages.
Interspeech2023
Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Brian Kingsbury, 
ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding.
ICASSP2022
Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas 0001, Boaz Carmeli, Ron Hoory, Brian Kingsbury, 
A New Data Augmentation Method for Intent Classification Enhancement and its Application on Spoken Conversation Datasets.
ICASSP2022
Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas 0001, Brian Kingsbury, George Saon, 
Improving End-to-end Models for Set Prediction in Spoken Language Understanding.
ICASSP2022
Vishal Sunder, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Jatin Ganhotra, Brian Kingsbury, Eric Fosler-Lussier, 
Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding.
ICASSP2022
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Brian Kingsbury, George Saon, 
Towards Reducing the Need for Speech Training Data to Build Spoken Language Understanding Systems.
ICASSP2022
Samuel Thomas 0001, Brian Kingsbury, George Saon, Hong-Kwang Jeff Kuo, 
Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models.
Interspeech2022
Takashi Fukuda, Samuel Thomas 0001, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury, 
Global RNN Transducer Models For Multi-dialect Speech Recognition.
Interspeech2022
Zvi Kons, Hagai Aronowitz, Edmilson da Silva Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas 0001, George Saon, 
Extending RNN-T-based speech recognition systems with emotion and language classification.
Interspeech2022
Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas 0001, Hong-Kwang Kuo, Brian Kingsbury, 
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems.
TASLP2021
Leda Sari, Mark Hasegawa-Johnson, Samuel Thomas 0001, 
Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection.
ICASSP2021
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, 
RNN Transducer Models for Spoken Language Understanding.
ICASSP2021
Edmilson da Silva Morais, Hong-Kwang Jeff Kuo, Samuel Thomas 0001, Zoltán Tüske, Brian Kingsbury, 
End-to-End Spoken Language Understanding Using Transformer Networks and Self-Supervised Pre-Trained Features.
Interspeech2021
Sujeong Cha, Wangrui Hou, Hyun Jung, My Phung, Michael Picheny, Hong-Kwang Jeff Kuo, Samuel Thomas 0001, Edmilson da Silva Morais, 
Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs.
Interspeech2021
Takashi Fukuda, Samuel Thomas 0001, 
Knowledge Distillation Based Training of Universal ASR Source Models for Cross-Lingual Transfer.
Interspeech2021
Jatin Ganhotra, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury, 
Integrating Dialog History into End-to-End Spoken Language Understanding Systems.
TASLP2024
Jaesung Huh, Joon Son Chung, Arsha Nagrani, Andrew Brown 0006, Jee-weon Jung, Daniel Garcia-Romero, Andrew Zisserman, 
The VoxCeleb Speaker Recognition Challenge: A Retrospective.
ICASSP2024
Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe 0001, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang, 
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
ICASSP2024
Kwanghee Choi, Jee-Weon Jung, Shinji Watanabe 0001, 
Understanding Probe Behaviors Through Variational Bounds of Mutual Information.
ICASSP2024
Samuele Cornell, Jee-Weon Jung, Shinji Watanabe 0001, Stefano Squartini, 
One Model to Rule Them All ? Towards End-to-End Joint Speaker Diarization and Speech Recognition.
ICASSP2024
Jee-Weon Jung, Roshan S. Sharma, William Chen, Bhiksha Raj, Shinji Watanabe 0001, 
AugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language Models.
ICASSP2024
Doyeop Kwak, Jaemin Jung, Kihyun Nam, Youngjoon Jang, Jee-Weon Jung, Shinji Watanabe 0001, Joon Son Chung, 
VoxMM: Rich Transcription of Conversations in the Wild.
ICASSP2024
Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-Weon Jung, Xuankai Chang, Shinji Watanabe 0001, 
VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks.
ICASSP2024
Wangyou Zhang, Jee-weon Jung, Yanmin Qian, 
Improving Design of Input Condition Invariant Speech Enhancement.
NAACL2024
Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan S. Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe 0001, 
UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions.
ACL-Findings2024
Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan S. Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe 0001, 
On the Evaluation of Speech Foundation Models for Spoken Language Understanding.
ICASSP2023
Hee-Soo Heo, Youngki Kwon, Bong-Jin Lee, You Jin Kim, Jee-Weon Jung, 
High-Resolution Embedding Extractor for Speaker Diarisation.
ICASSP2023
Jee-Weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown 0006, Youngki Kwon, Shinji Watanabe 0001, Joon Son Chung, 
In Search of Strong Embedding Extractors for Speaker Diarisation.
ICASSP2023
You Jin Kim, Hee-Soo Heo, Jee-Weon Jung, Youngki Kwon, Bong-Jin Lee, Joon Son Chung, 
Advancing the Dimensionality Reduction of Speaker Embeddings for Speaker Diarisation: Disentangling Noise and Informing Speech Activity.
ICASSP2023
Youngki Kwon, Hee-Soo Heo, Bong-Jin Lee, You Jin Kim, Jee-Weon Jung, 
Absolute Decision Corrupts Absolutely: Conservative Online Speaker Diarisation.
Interspeech2023
Hee-Soo Heo, Jee-weon Jung, Jingu Kang, Youngki Kwon, Bong-Jin Lee, You Jin Kim, Joon Son Chung, 
Curriculum Learning for Self-supervised Speaker Verification.
Interspeech2023
Jee-weon Jung, Soonshin Seo, Hee-Soo Heo, Geonmin Kim, You Jin Kim, Youngki Kwon, Minjae Lee, Bong-Jin Lee, 
Encoder-decoder Multimodal Speaker Change Detection.
Interspeech2023
Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang 0037, Xuechen Liu, Md. Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas W. D. Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung, 
Towards Single Integrated Spoofing-aware Speaker Verification Embeddings.
Interspeech2023
Kihyun Nam, Youkyum Kim, Jaesung Huh, Hee-Soo Heo, Jee-weon Jung, Joon Son Chung, 
Disentangled Representation Learning for Multilingual Speaker Recognition.
Interspeech2023
Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen, 
Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing.
ICASSP2022
Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, Nicholas W. D. Evans, 
AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks.
TASLP2024
Shu-Wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li 0001, Abdelrahman Mohamed, Shinji Watanabe 0001, Hung-yi Lee, 
A Large-Scale Evaluation of Speech Foundation Models.
ICASSP2024
Cheol Jun Cho, Abdelrahman Mohamed, Shang-Wen Li 0001, Alan W. Black, Gopala Krishna Anumanchipalli, 
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in Hubert.
ICASSP2024
Cheol Jun Cho, Abdelrahman Mohamed, Alan W. Black, Gopala Krishna Anumanchipalli, 
Self-Supervised Models of Speech Infer Universal Articulatory Kinematics.
ICASSP2024
Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li 0001, Abdelrahman Mohamed, Hung-Yi Lee, Lin-Shan Lee, 
SpeechDPR: End-To-End Spoken Passage Retrieval For Open-Domain Spoken Question Answering.
ICASSP2024
Yuan Tseng, Layne Berry, Yiting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Poyao Huang 0001, Chun-Mao Lai, Shang-Wen Li 0001, David Harwath, Yu Tsao 0001, Abdelrahman Mohamed, Chi-Luen Feng, Hung-Yi Lee, 
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.
ACL2024
Puyuan Peng, Po-Yao Huang 0001, Shang-Wen Li 0001, Abdelrahman Mohamed, David Harwath, 
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild.
TASLP2023
Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe 0001, Florian Metze, Luke Zettlemoyer, Abdelrahman Mohamed, 
LegoNN: Building Modular Encoder-Decoder Models.
ICASSP2023
Cheol Jun Cho, Peter Wu, Abdelrahman Mohamed, Gopala Krishna Anumanchipalli, 
Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech.
ICASSP2023
Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed, 
Continual Learning for On-Device Speech Recognition Using Disentangled Conformers.
ICASSP2023
Ali Elkahky, Wei-Ning Hsu, Paden Tomasello, Tu Anh Nguyen, Robin Algayres, Yossi Adi, Jade Copet, Emmanuel Dupoux, Abdelrahman Mohamed, 
Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training?
ICASSP2023
Andros Tjandra, Nayan Singhal, David Zhang, Ozlem Kalinli, Abdelrahman Mohamed, Duc Le, Michael L. Seltzer, 
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities.
Interspeech2023
Florian L. Kreyssig, Yangyang Shi, Jinxi Guo, Leda Sari, Abdel-rahman Mohamed, Philip C. Woodland, 
Biased Self-supervised Learning for ASR.
Interspeech2023
Puyuan Peng, Shang-Wen Li 0001, Okko Räsänen, Abdelrahman Mohamed, David Harwath, 
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model.
Interspeech2023
Jiatong Shi, Dan Berrebbi, William Chen, En-Pei Hu, Wei-Ping Huang, Ho-Lam Chung, Xuankai Chang, Shang-Wen Li 0001, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe 0001, 
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark.
Interspeech2022
Guan-Ting Lin, Yung-Sung Chuang, Ho-Lam Chung, Shu-Wen Yang, Hsuan-Jui Chen, Shuyan Annie Dong, Shang-Wen Li 0001, Abdelrahman Mohamed, Hung-yi Lee, Lin-Shan Lee, 
DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering.
Interspeech2022
Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed, 
Robust Self-Supervised Audio-Visual Speech Recognition.
Interspeech2022
Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu, 
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT.
Interspeech2022
Weiyi Zheng, Alex Xiao, Gil Keren, Duc Le, Frank Zhang 0001, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed, 
Scaling ASR Improves Zero and Few Shot Learning.
ICLR2022
Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed, 
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction.
ACL2022
Eugene Kharitonov, Ann Lee 0001, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu, 
Text-Free Prosody-Aware Generative Spoken Language Modeling.
TASLP2024
Xun Gong 0005, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, Rui Zhao 0017, Xie Chen 0001, Yanmin Qian, 
Advanced Long-Content Speech Recognition With Factorized Neural Transducer.
ICASSP2024
Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen 0001, Kai Yu 0004, 
VoiceFlow: Efficient Text-To-Speech with Rectified Flow Matching.
ICASSP2024
Junjie Li, Yiwei Guo, Xie Chen 0001, Kai Yu 0004, 
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention.
ICASSP2024
Sen Liu, Yiwei Guo, Xie Chen 0001, Kai Yu 0004, 
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations.
ICASSP2024
Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen 0003, Shiliang Zhang, Xie Chen 0001, 
Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition.
ICASSP2024
Feiyu Shen, Yiwei Guo, Chenpeng Du, Xie Chen 0001, Kai Yu 0004, 
Acoustic BPE for Speech Generation with Discrete Tokens.
ICASSP2024
Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu 0004, Daniel Povey, Xie Chen 0001, 
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.
AAAI2024
Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen 0001, Shuai Wang 0016, Hui Zhang, Kai Yu 0004, 
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding.
ACL-Findings2024
Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen 0001, 
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation.
TASLP2023
Chenpeng Du, Yiwei Guo, Xie Chen 0001, Kai Yu 0004, 
Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature.
ICASSP2023
Xie Chen 0001, Ziyang Ma, Changli Tang, Yujin Wang, Zhisheng Zheng, 
Front-End Adapter: Adapting Front-End Input of Speech Based Self-Supervised Learning for Speech Recognition.
ICASSP2023
Xun Gong 0005, Yu Wu 0012, Jinyu Li 0001, Shujie Liu 0001, Rui Zhao 0017, Xie Chen 0001, Yanmin Qian, 
LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer.
ICASSP2023
Xun Gong 0005, Wei Wang 0010, Hang Shao, Xie Chen 0001, Yanmin Qian, 
Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR.
ICASSP2023
Yiwei Guo, Chenpeng Du, Xie Chen 0001, Kai Yu 0004, 
Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance.
ICASSP2023
Tianrui Wang, Xie Chen 0001, Zhuo Chen, Shu Yu, Weibin Zhu, 
An Adapter Based Multi-Label Pre-Training for Speech Separation and Enhancement.
Interspeech2023
Mingyu Cui, Jiawen Kang 0002, Jiajun Deng, Xi Yin 0010, Yutao Xie, Xie Chen 0001, Xunying Liu, 
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems.
Interspeech2023
Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu 0004, Xie Chen 0001, 
Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation.
Interspeech2023
Sen Liu, Yiwei Guo, Chenpeng Du, Xie Chen 0001, Kai Yu 0004, 
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech.
Interspeech2023
Ziyang Ma, Zhisheng Zheng, Changli Tang, Yujin Wang, Xie Chen 0001, 
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets.
Interspeech2023
Ziyang Ma, Zhisheng Zheng, Guanrou Yang, Yu Wang 0027, Chao Zhang 0031, Xie Chen 0001, 
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation.
ICASSP2024
Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li 0028, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal, 
USM-Lite: Quantization and Sparsity Aware Fine-Tuning for Speech Recognition with Universal Speech Models.
ICASSP2024
Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno 0001, 
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models.
NAACL2024
Weiran Wang, Rohit Prabhavalkar, Haozhe Shan, Zhong Meng, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li 0028, James Qin, Xingyu Cai, Adam Stooke, Chengjian Zheng, Yanzhang He, Tara N. Sainath, Pedro Moreno Mengibar, 
Massive End-to-end Speech Recognition Models with Time Reduction.
ICASSP2023
Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw, 
Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models.
ICASSP2023
W. Ronny Huang, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, David Rybach, Robert David, Rohit Prabhavalkar, Cyril Allauzen, Cal Peyser, Trevor D. Strohman, 
E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model.
ICASSP2023
Tom O'Malley, Shaojin Ding, Arun Narayanan, Quan Wang, Rajeev Rikhye, Qiao Liang 0001, Yanzhang He, Ian McGraw, 
Conditional Conformer: Improving Speaker Modulation For Single And Multi-User Speech Enhancement.
ICASSP2023
Weiran Wang, Ding Zhao, Shaojin Ding, Hao Zhang 0010, Shuo-Yiin Chang, David Rybach, Tara N. Sainath, Yanzhang He, Ian McGraw, Shankar Kumar, 
Multi-Output RNN-T Joint Networks for Multi-Task Learning of ASR and Auxiliary Tasks.
Interspeech2023
Oleg Rybakov, Phoenix Meadowlark, Shaojin Ding, David Qiu, Jian Li, David Rim, Yanzhang He, 
2-bit Conformer quantization for automatic speech recognition.
ICASSP2022
Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He, 
Large-Scale ASR Domain Adaptation Using Self- and Semi-Supervised Learning.
ICASSP2022
Qiujia Li, Yu Zhang 0033, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland, 
Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang 0001, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
Interspeech2022
Shuo-Yiin Chang, Bo Li 0028, Tara N. Sainath, Chao Zhang 0031, Trevor Strohman, Qiao Liang 0001, Yanzhang He, 
Turn-Taking Prediction for Natural Conversational Speech.
Interspeech2022
Shaojin Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani Agrawal, Oleg Rybakov, 
4-bit Conformer with Native Quantization Aware Training for Speech Recognition.
Interspeech2022
Shaojin Ding, Rajeev Rikhye, Qiao Liang 0001, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw, 
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition.
Interspeech2022
Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang 0016, Rina Panigrahy, Qiao Liang 0001, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman, 
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes.
Interspeech2022
Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang, 
Improving Deliberation by Text-Only and Semi-Supervised Training.
Interspeech2022
Bo Li 0028, Tara N. Sainath, Ruoming Pang, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang 0001, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani, 
A Language Agnostic Multilingual Streaming On-Device ASR System.
Interspeech2022
Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach, 
Improving Rare Word Recognition with LM-aware MWER Training.
ICASSP2021
Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang 0001, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.
ICASSP2021
Qiujia Li, David Qiu, Yu Zhang 0033, Bo Li 0028, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman, 
Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition.
Interspeech2023
Zhaoqing Li, Tianzi Wang, Jiajun Deng, Junhao Xu, Shoukang Hu, Xunying Liu, 
Lossless 4-bit Quantization of Architecture Compressed Conformer ASR Systems on the 300-hr Switchboard Corpus.
Interspeech2023
Tianzi Wang, Shoukang Hu, Jiajun Deng, Zengrui Jin, Mengzhe Geng, Yi Wang, Helen Meng, Xunying Liu, 
Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition.
TASLP2022
Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
TASLP2022
Boyang Xue, Shoukang Hu, Junhao Xu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Neural Network Language Modeling for Speech Recognition.
ICASSP2022
Shujie Hu, Shansong Liu, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shoukang Hu, Mingyu Cui, Xunying Liu, Helen Meng, 
Exploiting Cross Domain Acoustic-to-Articulatory Inverted Features for Disordered Speech Recognition.
ICASSP2022
Xixin Wu, Shoukang Hu, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Neural Architecture Search for Speech Emotion Recognition.
Interspeech2022
Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng, 
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems.
Interspeech2022
Tianzi Wang, Jiajun Deng, Mengzhe Geng, Zi Ye 0001, Shoukang Hu, Yi Wang, Mingyu Cui, Zengrui Jin, Xunying Liu, Helen Meng, 
Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection.
Interspeech2022
Yi Wang, Tianzi Wang, Zi Ye 0001, Lingwei Meng, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng, 
Exploring linguistic feature and model combination for speech recognition based automatic AD detection.
Interspeech2022
Junhao Xu, Shoukang Hu, Xunying Liu, Helen Meng, 
Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Swithboard Corpus.
TASLP2021
Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye 0001, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.
TASLP2021
Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng, 
Recent Progress in the CUHK Dysarthric Speech Recognition System.
TASLP2021
Junhao Xu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng, 
Mixed Precision Low-Bit Quantization of Neural Network Language Models for Speech Recognition.
TASLP2021
Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.
ICASSP2021
Shoukang Hu, Xurong Xie, Shansong Liu, Mingyu Cui, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
ICASSP2021
Junhao Xu, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng, 
Mixed Precision Quantization of Transformer Language Models for Speech Recognition.
ICASSP2021
Boyang Xue, Jianwei Yu, Junhao Xu, Shansong Liu, Shoukang Hu, Zi Ye 0001, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Transformer Language Models for Speech Recognition.
ICASSP2021
Zi Ye 0001, Shoukang Hu, Jinchao Li, Xurong Xie, Mengzhe Geng, Jianwei Yu, Junhao Xu, Boyang Xue, Shansong Liu, Xunying Liu, Helen Meng, 
Development of the Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using the Dementiabank Corpus.
Interspeech2021
Jiajun Deng, Fabian Ritter Gutierrez, Shoukang Hu, Mengzhe Geng, Xurong Xie, Zi Ye 0001, Shansong Liu, Jianwei Yu, Xunying Liu, Helen Meng, 
Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition.
Interspeech2021
Mengzhe Geng, Shansong Liu, Jianwei Yu, Xurong Xie, Shoukang Hu, Zi Ye 0001, Zengrui Jin, Xunying Liu, Helen Meng, 
Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition.
TASLP2024
Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe 0001, 
End-to-End Speech Recognition: A Survey.
ICASSP2024
Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li 0028, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal, 
USM-Lite: Quantization and Sparsity Aware Fine-Tuning for Speech Recognition with Universal Speech Models.
ICASSP2024
Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno 0001, 
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models.
NAACL2024
Weiran Wang, Rohit Prabhavalkar, Haozhe Shan, Zhong Meng, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li 0028, James Qin, Xingyu Cai, Adam Stooke, Chengjian Zheng, Yanzhang He, Tara N. Sainath, Pedro Moreno Mengibar, 
Massive End-to-end Speech Recognition Models with Time Reduction.
ICASSP2023
Rami Botros, Rohit Prabhavalkar, Johan Schalkwyk, Ciprian Chelba, Tara N. Sainath, Françoise Beaufays, 
Lego-Features: Exporting Modular Encoder Features for Streaming and Deliberation ASR.
ICASSP2023
Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw, 
Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models.
ICASSP2023
W. Ronny Huang, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, David Rybach, Robert David, Rohit Prabhavalkar, Cyril Allauzen, Cal Peyser, Trevor D. Strohman, 
E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model.
ICASSP2023
Soheil Khorram, Anshuman Tripathi, Jaeyoung Kim, Han Lu, Qian Zhang, Rohit Prabhavalkar, Hasim Sak, 
Cross-Training: A Semi-Supervised Training Scheme for Speech Recognition.
ICASSP2023
Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang 0033, Bo Li 0028, Andrew Rosenberg, Bhuvana Ramabhadran, 
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition.
ICASSP2023
Cal Peyser, Michael Picheny, Kyunghyun Cho, Rohit Prabhavalkar, W. Ronny Huang, Tara N. Sainath, 
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale.
ICASSP2023
Tara N. Sainath, Rohit Prabhavalkar, Diamantino Caseiro, Pat Rondon, Cyril Allauzen, 
Improving Contextual Biasing with Text Injection.
ICASSP2023
Chao-Han Huck Yang, Bo Li 0028, Yu Zhang 0033, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman, 
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition.
Interspeech2023
Zih-Ching Chen, Chao-Han Huck Yang, Bo Li 0028, Yu Zhang 0033, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath, 
How to Estimate Model Transferability of Pre-Trained Speech Models?
Interspeech2023
Cal Peyser, Zhong Meng, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho, Ke Hu, 
Improving Joint Speech-Text Representations Without Alignment.
ICASSP2022
Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu 0011, Bo Wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer, 
Neural-FST Class Language Model for End-to-End Speech Recognition.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang 0001, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
Interspeech2022
Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang 0016, Rina Panigrahy, Qiao Liang 0001, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman, 
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes.
Interspeech2022
Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang, 
Improving Deliberation by Text-Only and Semi-Supervised Training.
Interspeech2022
W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Tara N. Sainath, Rohit Prabhavalkar, Cal Peyser, Zhiyun Lu, Cyril Allauzen, 
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR.
Interspeech2022
Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach, 
Improving Rare Word Recognition with LM-aware MWER Training.
ICASSP2024
Jian Wu 0027, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao 0017, Zhuo Chen 0006, Jinyu Li 0001, 
T-SOT FNT: Streaming Multi-Talker ASR with Text-Only Domain Adaptation Capability.
ICASSP2023
Zhuo Chen 0006, Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiaofei Wang 0009, Takuya Yoshioka, Jinyu Li 0001, Sunit Sivasankaran, Sefik Emre Eskimez, 
Speech Separation with Large-Scale Self-Supervised Learning.
ICASSP2023
Zili Huang, Zhuo Chen 0006, Naoyuki Kanda, Jian Wu 0027, Yiming Wang, Jinyu Li 0001, Takuya Yoshioka, Xiaofei Wang 0009, Peidong Wang, 
Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.
ICASSP2023
Naoyuki Kanda, Jian Wu 0027, Xiaofei Wang 0009, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
ICASSP2023
Mufan Sang, Yong Zhao 0008, Gang Liu 0001, John H. L. Hansen, Jian Wu 0027, 
Improving Transformer-Based Networks with Locality for Automatic Speaker Verification.
ICASSP2023
Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu 0027, 
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-To-End Neural Diarization.
ICASSP2023
Jian Wu 0027, Zhuo Chen 0006, Min Hu, Xiong Xiao, Jinyu Li 0001, 
Speaker Change Detection For Transformer Transducer ASR.
ICASSP2023
Muqiao Yang, Naoyuki Kanda, Xiaofei Wang 0009, Jian Wu 0027, Sunit Sivasankaran, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.
ICASSP2022
Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Zhengyang Chen, Zhuo Chen 0006, Shujie Liu 0001, Jian Wu 0027, Yao Qian, Furu Wei, Jinyu Li 0001, Xiangzhan Yu, 
Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training.
ICASSP2022
Yixuan Zhang 0005, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li 0001, 
Continuous Speech Separation with Recurrent Selective Attention Network.
Interspeech2022
Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Shujie Liu 0001, Zhuo Chen 0006, Peidong Wang, Gang Liu 0001, Jinyu Li 0001, Jian Wu 0027, Xiangzhan Yu, Furu Wei, 
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
ICASSP2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Jinyu Li 0001, Takuya Yoshioka, Chengyi Wang 0002, Shujie Liu 0001, Ming Zhou 0001, 
Continuous Speech Separation with Conformer.
ICASSP2021
Amit Das 0007, Kshitiz Kumar, Jian Wu 0027, 
Multi-Dialect Speech Recognition in English Using Attention on Ensemble of Experts.
ICASSP2021
Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu 0001, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Interspeech2021
Amber Afshan, Kshitiz Kumar, Jian Wu 0027, 
Sequence-Level Confidence Classifier for ASR Utterance Accuracy and Application to Acoustic Models.
Interspeech2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Shujie Liu 0001, Jinyu Li 0001, Xiangzhan Yu, 
Ultra Fast Speech Separation Model with Teacher Student Learning.
Interspeech2021
Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen 0006, Yanxin Hu, Lei Xie 0001, Jian Wu 0027, Hui Bu, Xin Xu, Jun Du, Jingdong Chen, 
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario.
Interspeech2021
Jian Wu 0027, Zhuo Chen 0006, Sanyuan Chen, Yu Wu 0012, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, 
Investigation of Practical Aspects of Single Channel Speech Separation for ASR.
ICASSP2023
Marvin Borsdorf, Saurav Pahuja, Gabriel Ivucic, Siqi Cai, Haizhou Li 0001, Tanja Schultz, 
Multi-Head Attention and GRU for Improved Match-Mismatch Classification of Speech Stimulus and EEG Response.
ICASSP2023
Kevin Scheck, Tanja Schultz, 
Multi-Speaker Speech Synthesis from Electromyographic Signals by Soft Speech Unit Prediction.
Interspeech2023
Catarina Botelho, Alberto Abad, Tanja Schultz, Isabel Trancoso, 
Towards Reference Speech Characterization for Health Applications.
Interspeech2023
Kevin Scheck, Tanja Schultz, 
STE-GAN: Speech-to-Electromyography Signal Conversion using Generative Adversarial Networks.
SpeechComm2022
Martha Yifiru Tachbelie, Solomon Teferra Abate, Tanja Schultz, 
Multilingual speech recognition for GlobalPhone languages.
ICASSP2022
Ayimnisagul Ablimit, Catarina Botelho, Alberto Abad, Tanja Schultz, Isabel Trancoso, 
Exploring Dementia Detection from Speech: Cross Corpus Analysis.
ICASSP2022
Miguel Angrick, Maarten C. Ottenhoff, Lorenz Diener, Darius Ivucic, Gabriel Ivucic, Sophocles Goulis, Albert J. Colon, G. Louis Wagner, Dean J. Krusienski, Pieter Leonard Kubben, Tanja Schultz, Christian Herff, 
Towards Closed-Loop Speech Synthesis from Stereotactic EEG: A Unit Selection Approach.
ICASSP2022
Marvin Borsdorf, Kevin Scheck, Haizhou Li 0001, Tanja Schultz, 
Experts Versus All-Rounders: Target Language Extraction for Multiple Target Languages.
ICASSP2022
Sreeja Manghat, Sreeram Manghat, Tanja Schultz, 
Hybrid sub-word segmentation for handling long tail in morphologically rich low resource languages.
ICASSP2022
Kun Qian 0003, Tanja Schultz, Björn W. Schuller, 
An Overview of the FIRST ICASSP Special Session on Computer Audition for Healthcare.
Interspeech2022
Ayimnisagul Ablimit, Karen Scholz, Tanja Schultz, 
Deep Learning Approaches for Detecting Alzheimer's Dementia from Conversational Speech of ILSE Study.
Interspeech2022
Marvin Borsdorf, Kevin Scheck, Haizhou Li 0001, Tanja Schultz, 
Blind Language Separation: Disentangling Multilingual Cocktail Party Voices by Language.
Interspeech2022
Catarina Botelho, Tanja Schultz, Alberto Abad, Isabel Trancoso, 
Challenges of using longitudinal and cross-domain corpora on studies of pathological speech.
Interspeech2022
Sreeram Manghat, Sreeja Manghat, Tanja Schultz, 
Normalization of code-switched text for speech synthesis.
ICASSP2021
Solomon Teferra Abate, Martha Yifiru Tachbelie, Tanja Schultz, 
End-to-End Multilingual Automatic Speech Recognition for Less-Resourced Languages: The Case of Four Ethiopian Languages.
Interspeech2021
Marvin Borsdorf, Chenglin Xu, Haizhou Li 0001, Tanja Schultz, 
Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers.
Interspeech2021
Marvin Borsdorf, Chenglin Xu, Haizhou Li 0001, Tanja Schultz, 
GlobalPhone Mix-To-Separate Out of 2: A Multilingual 2000 Speakers Mixtures Database for Speech Separation.
Interspeech2021
Catarina Botelho, Alberto Abad, Tanja Schultz, Isabel Trancoso, 
Visual Speech for Obstructive Sleep Apnea Detection.
Interspeech2021
Lars Steinert, Felix Putze, Dennis Küster, Tanja Schultz, 
Audio-Visual Recognition of Emotional Engagement of People with Dementia.
ICASSP2020
Solomon Teferra Abate, Martha Yifiru Tachbelie, Tanja Schultz, 
Deep Neural Networks Based Automatic Speech Recognition for Four Ethiopian Languages.
TASLP2023
Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe 0001, Florian Metze, Luke Zettlemoyer, Abdelrahman Mohamed, 
LegoNN: Building Modular Encoder-Decoder Models.
ICASSP2022
Roshan Sharma, Shruti Palaskar, Alan W. Black, Florian Metze, 
End-to-End Speech Summarization Using Restricted Self-Attention.
Interspeech2022
Juncheng Li 0001, Shuhui Qu, Po-Yao Huang 0001, Florian Metze, 
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification.
Interspeech2022
Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, Shinji Watanabe 0001, 
ASR2K: Speech Recognition for Around 2000 Languages without Audio.
EMNLP-Findings2022
Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W. Black, Shinji Watanabe 0001, 
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models.
ICASSP2021
Xinjian Li, Juncheng Li 0001, Jiali Yao, Alan W. Black, Florian Metze, 
Phone Distribution Estimation for Low Resource Languages.
ICASSP2021
Xinjian Li, David R. Mortensen, Florian Metze, Alan W. Black, 
Multilingual Phonetic Dataset for Low Resource Speech Recognition.
Interspeech2021
Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan 0002, Siddharth Dalmia, Florian Metze, Shinji Watanabe 0001, Alan W. Black, 
Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding.
Interspeech2021
Xinjian Li, Juncheng Li 0001, Florian Metze, Alan W. Black, 
Hierarchical Phone Recognition with Compositional Phonetics.
Interspeech2021
Shruti Palaskar, Ruslan Salakhutdinov, Alan W. Black, Florian Metze, 
Multimodal Speech Summarization Through Semantic Concept Learning.
Interspeech2021
Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze, Shinji Watanabe 0001, 
Differentiable Allophone Graphs for Language-Universal Speech Recognition.
TASLP2020
Odette Scharenborg, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, 
Speech Technology for Unwritten Languages.
ICASSP2020
Xinjian Li, Siddharth Dalmia, Juncheng Li 0001, Matthew Lee 0012, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W. Black, Florian Metze, 
Universal Phone Recognition with a Multilingual Allophone System.
ICASSP2020
Anirudh Mani, Shruti Palaskar, Nimshi Venkat Meripo, Sandeep Konam, Florian Metze, 
ASR Error Correction and Domain Adaptation Using Machine Translation.
ICASSP2020
Tejas Srinivasan, Ramon Sanabria, Florian Metze, 
Looking Enhances Listening: Recovering Missing Speech Using Images.
Interspeech2020
Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf, 
Contextual RNN-T for Open Domain ASR.
Interspeech2020
Zimeng Qiu, Yiyuan Li, Xinjian Li, Florian Metze, William M. Campbell, 
Towards Context-Aware End-to-End Code-Switching Speech Recognition.
EMNLP-Findings2020
Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott, 
Fine-Grained Grounding for Multimodal Speech Recognition.
SpeechComm2019
Okko Räsänen, Shreyas Seshadri, Julien Karadayi, Eric Riebling, John P. Bunce, Alejandrina Cristià, Florian Metze, Marisa Casillas, Celia Rosemberg, Elika Bergelson, Melanie Soderstrom, 
Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech.
ICASSP2019
Yun Wang 0005, Florian Metze, 
Connectionist Temporal Localization for Sound Event Detection with Sequential Labeling.
SpeechComm2024
Shuai Wang 0016, Zhengyang Chen, Bing Han, Hongji Wang, Chengdong Liang, Binbin Zhang, Xu Xiang, Wen Ding, Johan Rohdin, Anna Silnova, Yanmin Qian, Haizhou Li 0001, 
Advancing speaker embedding learning: Wespeaker toolkit for research and production.
TASLP2024
Zhengyang Chen, Bing Han, Shuai Wang 0016, Yanmin Qian, 
Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer.
TASLP2024
Wupeng Wang, Zexu Pan, Xinke Li, Shuai Wang 0016, Haizhou Li 0001, 
Speech Separation With Pretrained Frontend to Minimize Domain Mismatch.
ICASSP2024
Wen Huang 0004, Bing Han, Shuai Wang 0016, Zhengyang Chen, Yanmin Qian, 
Robust Cross-Domain Speaker Verification with Multi-Level Domain Adapters.
ICASSP2024
Sho Inoue, Kun Zhou 0003, Shuai Wang 0016, Haizhou Li 0001, 
Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis.
ICASSP2024
Junjie Li, Ruijie Tao, Zexu Pan, Meng Ge, Shuai Wang 0016, Haizhou Li 0001, 
Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-Talker Speech.
ICASSP2024
Shuai Wang 0016, Qibing Bai, Qi Liu 0018, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li 0001, 
Leveraging in-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition.
AAAI2024
Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen 0001, Shuai Wang 0016, Hui Zhang, Kai Yu 0004, 
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding.
ICASSP2023
Hongji Wang, Chengdong Liang, Shuai Wang 0016, Zhengyang Chen, Binbin Zhang, Xu Xiang, Yanlei Deng, Yanmin Qian, 
Wespeaker: A Research and Production Oriented Speaker Embedding Learning Toolkit.
Interspeech2023
Zhengyang Chen, Bing Han, Shuai Wang 0016, Yanmin Qian, 
Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor.
ICASSP2022
Bei Liu, Haoyu Wang 0007, Zhengyang Chen, Shuai Wang 0016, Yanmin Qian, 
Self-Knowledge Distillation via Feature Enhancement for Speaker Verification.
Interspeech2022
Bei Liu, Zhengyang Chen, Shuai Wang 0016, Haoyu Wang 0007, Bing Han, Yanmin Qian, 
DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design.
TASLP2021
Heinrich Dinkel, Shuai Wang 0016, Xuenan Xu, Mengyue Wu, Kai Yu 0004, 
Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training.
TASLP2021
Yanmin Qian, Zhengyang Chen, Shuai Wang 0016, 
Audio-Visual Deep Neural Network for Robust Person Verification.
ICASSP2021
Zhengyang Chen, Shuai Wang 0016, Yanmin Qian, 
Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification.
ICASSP2021
Chenpeng Du, Bing Han, Shuai Wang 0016, Yanmin Qian, Kai Yu 0004, 
SynAug: Synthesis-Based Data Augmentation for Text-Dependent Speaker Verification.
ICASSP2021
Houjun Huang, Xu Xiang, Fei Zhao, Shuai Wang 0016, Yanmin Qian, 
Unit Selection Synthesis Based Data Augmentation for Fixed Phrase Speaker Verification.
TASLP2020
Shuai Wang 0016, Yexin Yang, Zhanghao Wu, Yanmin Qian, Kai Yu 0004, 
Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition.
ICASSP2020
Shuai Wang 0016, Johan Rohdin, Oldrich Plchot, Lukás Burget, Kai Yu 0004, Jan Cernocký, 
Investigation of Specaugment for Deep Speaker Embedding Learning.
ICASSP2020
Zhengyang Chen, Shuai Wang 0016, Yanmin Qian, Kai Yu 0004, 
Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training.
ICASSP2024
Rehan Ahmad, Muhammad Umar Farooq, Thomas Hain, 
Progressive Unsupervised Domain Adaptation for ASR Using Ensemble Models and Multi-Stage Training.
ICASSP2024
George Close, William Ravenscroft, Thomas Hain, Stefan Goetze, 
Multi-CMGAN+/+: Leveraging Multi-Objective Speech Quality Metric Prediction for Speech Enhancement.
ICASSP2024
Amit Meghanani, Thomas Hain, 
SCORE: Self-Supervised Correspondence Fine-Tuning for Improved Content Representations.
ICASSP2024
Rhiannon Mogridge, George Close, Robert Sutherland, Thomas Hain, Jon Barker, Stefan Goetze, Anton Ragni, 
Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users Using Intermediate ASR Features and Human Memory Models.
ICASSP2024
William Ravenscroft, Stefan Goetze, Thomas Hain, 
Combining Conformer and Dual-Path-Transformer Networks for Single Channel Noisy Reverberant Speech Separation.
ICASSP2023
Rehan Ahmad, Md Asif Jalal, Muhammad Umar Farooq, Anna Ollerenshaw, Thomas Hain, 
Towards Domain Generalisation in ASR with Elitist Sampling and Ensemble Knowledge Distillation.
ICASSP2023
George Close, William Ravenscroft, Thomas Hain, Stefan Goetze, 
Perceive and Predict: Self-Supervised Speech Representation Based Loss Functions for Speech Enhancement.
ICASSP2023
William Ravenscroft, Stefan Goetze, Thomas Hain, 
Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation.
Interspeech2023
Cong-Thanh Do, Rama Doddipatla, Mohan Li, Thomas Hain, 
Domain Adaptive Self-supervised Training of Automatic Speech Recognition.
Interspeech2023
Muhammad Umar Farooq, Thomas Hain, 
Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition.
ICASSP2022
Chanho Park, Rehan Ahmad, Thomas Hain, 
Unsupervised Data Selection for Speech Recognition with Contrastive Loss Ratios.
ICASSP2022
Jose Antonio Lopez Saenz, Thomas Hain, 
A Model for Assessor Bias in Automatic Pronunciation Assessment.
Interspeech2022
George Close, Samuel Hollands, Stefan Goetze, Thomas Hain, 
Non-intrusive Speech Intelligibility Metric Prediction for Hearing Impaired Individuals.
Interspeech2022
Muhammad Umar Farooq, Thomas Hain, 
Investigating the Impact of Crosslingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition.
Interspeech2022
Muhammad Umar Farooq, Darshan Adiga Haniya Narayana, Thomas Hain, 
Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion.
ICASSP2021
Qiang Huang 0008, Thomas Hain, 
Improving Audio Anomalies Recognition Using Temporal Convolutional Attention Networks.
ICASSP2021
Mingjie Chen, Yanpei Shi, Thomas Hain, 
Towards Low-Resource Stargan Voice Conversion Using Weight Adaptive Instance Normalization.
ICASSP2021
Cong-Thanh Do, Rama Doddipatla, Thomas Hain, 
Multiple-Hypothesis CTC-Based Semi-Supervised Adaptation of End-to-End Speech Recognition.
Interspeech2021
Anna Ollerenshaw, Md. Asif Jalal, Thomas Hain, 
Insights on Neural Representations for End-to-End Speech Recognition.
ICASSP2020
Yanpei Shi, Qiang Huang 0008, Thomas Hain, 
H-Vectors: Utterance-Level Speaker Embedding Using a Hierarchical Attention Model.
ICASSP2024
Siddhant Arora, George Saon, Shinji Watanabe 0001, Brian Kingsbury, 
Semi-Autoregressive Streaming ASR with Label Context.
ICASSP2024
Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Masayasu Muraoka, George Saon, 
Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems.
ICASSP2023
George Saon, Ankit Gupta 0001, Xiaodong Cui, 
Diagonal State Space Augmented Transformers for Speech Recognition.
ICASSP2023
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, George Saon, Brian Kingsbury, 
Multi-Speaker Data Augmentation for Improved end-to-end Automatic Speech Recognition.
Interspeech2023
Xiaodong Cui, George Saon, Brian Kingsbury, 
Improving RNN Transducer Acoustic Models for English Conversational Speech Recognition.
EMNLP2023
Ashish R. Mittal, Sunita Sarawagi, Preethi Jyothi, George Saon, Gakuto Kurata, 
Speech-enriched Memory for Inference-time Adaptation of ASR Models to Word Dictionaries.
ICASSP2022
Thomas Bohnstingl, Ayush Garg 0006, Stanislaw Wozniak, George Saon, Evangelos Eleftheriou, Angeliki Pantazi, 
Speech Recognition Using Biologically-Inspired Neural Networks.
ICASSP2022
Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas 0001, Brian Kingsbury, George Saon, 
Improving End-to-end Models for Set Prediction in Spoken Language Understanding.
ICASSP2022
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Brian Kingsbury, George Saon, 
Towards Reducing the Need for Speech Training Data to Build Spoken Language Understanding Systems.
ICASSP2022
Samuel Thomas 0001, Brian Kingsbury, George Saon, Hong-Kwang Jeff Kuo, 
Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models.
Interspeech2022
Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata, 
Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing.
Interspeech2022
Andrea Fasoli, Chia-Yu Chen, Mauricio J. Serrano, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Kailash Gopalakrishnan, 
Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization.
Interspeech2022
Takashi Fukuda, Samuel Thomas 0001, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury, 
Global RNN Transducer Models For Multi-dialect Speech Recognition.
Interspeech2022
Zvi Kons, Hagai Aronowitz, Edmilson da Silva Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas 0001, George Saon, 
Extending RNN-T-based speech recognition systems with emotion and language classification.
Interspeech2022
Jiatong Shi, George Saon, David Haws, Shinji Watanabe 0001, Brian Kingsbury, 
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States.
Interspeech2022
Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George Saon, 
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems.
TASLP2021
Xiaodong Cui, Wei Zhang 0022, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, David S. Kung 0001, 
Asynchronous Decentralized Distributed Training of Acoustic Models.
ICASSP2021
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, 
RNN Transducer Models for Spoken Language Understanding.
ICASSP2021
George Saon, Zoltán Tüske, Daniel Bolaños, Brian Kingsbury, 
Advancing RNN Transducer Technology for Speech Recognition.
Interspeech2021
Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltán Tüske, 
Reducing Exposure Bias in Training Recurrent Neural Network Transducers.
TASLP2024
Shujie Hu, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Yi Wang, Mingyu Cui, Tianzi Wang, Helen Meng, Xunying Liu, 
Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition.
ICASSP2024
Jiajun Deng, Xurong Xie, Guinan Li, Mingyu Cui, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Zhaoqing Li, Xunying Liu, 
Towards High-Performance and Low-Latency Feature-Based Speaker Adaptation of Conformer Speech Recognition Systems.
ICASSP2024
Zengrui Jin, Xurong Xie, Tianzi Wang, Mengzhe Geng, Jiajun Deng, Guinan Li, Shujie Hu, Xunying Liu, 
Towards Automatic Data Augmentation for Disordered Speech Recognition.
TASLP2023
Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Guinan Li, Shujie Hu, Xunying Liu, 
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems.
ICASSP2023
Shujie Hu, Xurong Xie, Zengrui Jin, Mengzhe Geng, Yi Wang, Mingyu Cui, Jiajun Deng, Xunying Liu, Helen Meng, 
Exploring Self-Supervised Pre-Trained ASR Models for Dysarthric and Elderly Speech Recognition.
ICASSP2023
Zengrui Jin, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shujie Hu, Jiajun Deng, Guinan Li, Xunying Liu, 
Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition.
ICASSP2023
Xurong Xie, Xunying Liu, Hui Chen 0020, Hongan Wang, 
Unsupervised Model-Based Speaker Adaptation of End-To-End Lattice-Free MMI Model for Speech Recognition.
Interspeech2023
Jiajun Deng, Guinan Li, Xurong Xie, Zengrui Jin, Mingyu Cui, Tianzi Wang, Shujie Hu, Mengzhe Geng, Xunying Liu, 
Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems.
Interspeech2023
Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Jiajun Deng, Mingyu Cui, Guinan Li, Jianwei Yu, Xurong Xie, Xunying Liu, 
Use of Speech Impairment Severity for Dysarthric Speech Recognition.
Interspeech2023
Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui Jin, Tianzi Wang, Shujie Hu, Zi Ye 0001, Helen Meng, Xunying Liu, 
On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition.
Interspeech2023
Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Guinan Li, Tianzi Wang, Helen Meng, Xunying Liu, 
Exploiting Cross-Domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition.
TASLP2022
Mengzhe Geng, Xurong Xie, Zi Ye 0001, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng, 
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition.
TASLP2022
Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
ICASSP2022
Shujie Hu, Shansong Liu, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shoukang Hu, Mingyu Cui, Xunying Liu, Helen Meng, 
Exploiting Cross Domain Acoustic-to-Articulatory Inverted Features for Disordered Speech Recognition.
Interspeech2022
Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng, 
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems.
Interspeech2022
Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng, 
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition.
Interspeech2022
Jin Li, Rongfeng Su, Xurong Xie, Lan Wang, Nan Yan, 
A Multi-level Acoustic Feature Extraction Framework for Transformer Based End-to-End Speech Recognition.
TASLP2021
Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye 0001, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.
TASLP2021
Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng, 
Recent Progress in the CUHK Dysarthric Speech Recognition System.
TASLP2021
Xurong Xie, Xunying Liu, Tan Lee 0001, Lan Wang, 
Bayesian Learning for Deep Neural Network Adaptation.
ICASSP2024
Junwen Bai, Bo Li 0028, Qiujia Li, Tara N. Sainath, Trevor Strohman, 
Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR.
ICASSP2023
Shuo-Yiin Chang, Chao Zhang 0031, Tara N. Sainath, Bo Li 0028, Trevor Strohman, 
Context-Aware end-to-end ASR Using Self-Attentive Embedding and Tensor Fusion.
ICASSP2023
Ke Hu, Tara N. Sainath, Bo Li 0028, Nan Du 0002, Yanping Huang, Andrew M. Dai, Yu Zhang 0033, Rodrigo Cabrera, Zhifeng Chen, Trevor Strohman, 
Massively Multilingual Shallow Fusion with Large Language Models.
ICASSP2023
W. Ronny Huang, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, David Rybach, Robert David, Rohit Prabhavalkar, Cyril Allauzen, Cal Peyser, Trevor D. Strohman, 
E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model.
ICASSP2023
Zhouyuan Huo, Khe Chai Sim, Bo Li 0028, Dongseong Hwang, Tara N. Sainath, Trevor Strohman, 
Resource-Efficient Transfer Learning from Speech Foundation Model Using Hierarchical Feature Fusion.
ICASSP2023
Dongseong Hwang, Khe Chai Sim, Yu Zhang 0033, Trevor Strohman, 
Comparison of Soft and Hard Target RNN-T Distillation for Large-Scale ASR.
ICASSP2023
Bo Li 0028, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang 0033, Wei Han 0002, Trevor Strohman, Françoise Beaufays, 
Efficient Domain Adaptation for Speech Foundation Models.
ICASSP2023
Chao-Han Huck Yang, Bo Li 0028, Yu Zhang 0033, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman, 
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition.
ICASSP2023
Chao Zhang 0031, Bo Li 0028, Tara N. Sainath, Trevor Strohman, Shuo-Yiin Chang, 
UML: A Universal Monolingual Output Layer For Multilingual Asr.
ICASSP2022
Ke Hu, Tara N. Sainath, Arun Narayanan, Ruoming Pang, Trevor Strohman, 
Transducer-Based Streaming Deliberation for Cascaded Encoders.
ICASSP2022
Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He, 
Large-Scale ASR Domain Adaptation Using Self- and Semi-Supervised Learning.
ICASSP2022
Bo Li 0028, Ruoming Pang, Yu Zhang 0033, Tara N. Sainath, Trevor Strohman, Parisa Haghani, Yun Zhu, Brian Farris, Neeraj Gaur, Manasa Prasad, 
Massively Multilingual ASR: A Lifelong Learning Solution.
ICASSP2022
Tsendsuren Munkhdalai, Khe Chai Sim, Angad Chandorkar, Fan Gao, Mason Chua, Trevor Strohman, Françoise Beaufays, 
Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang 0001, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
Interspeech2022
Shuo-Yiin Chang, Bo Li 0028, Tara N. Sainath, Chao Zhang 0031, Trevor Strohman, Qiao Liang 0001, Yanzhang He, 
Turn-Taking Prediction for Natural Conversational Speech.
Interspeech2022
Shuo-Yiin Chang, Guru Prakash, Zelin Wu, Tara N. Sainath, Bo Li 0028, Qiao Liang 0001, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman, 
Streaming Intended Query Detection using E2E Modeling for Continued Conversation.
Interspeech2022
Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang 0016, Rina Panigrahy, Qiao Liang 0001, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman, 
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes.
Interspeech2022
Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang, 
Improving Deliberation by Text-Only and Semi-Supervised Training.
Interspeech2022
W. Ronny Huang, Cal Peyser, Tara N. Sainath, Ruoming Pang, Trevor D. Strohman, Shankar Kumar, 
Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition.
Interspeech2022
Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya Misra, Nikhil Siddhartha, Trevor Strohman, Françoise Beaufays, 
Incremental Layer-Wise Self-Supervised Learning for Efficient Unsupervised Speech Domain Adaptation On Device.
SpeechComm2024
Jinhan Wang, Vijay Ravi, Jonathan Flint, Abeer Alwan, 
Speechformer-CTC: Sequential modeling of depression detection with speech temporal classification.
ICASSP2024
Natarajan Balaji Shankar, Alexander Johnson, Christina Chance, Hariram Veeramani, Abeer Alwan, 
CORAAL QA: A Dataset and Framework for Open Domain Spontaneous Speech Question Answering from Long Audio Files.
TASLP2023
Ruchao Fan, Wei Chu, Peng Chang 0002, Abeer Alwan, 
A CTC Alignment-Based Non-Autoregressive Transformer for End-to-End Automatic Speech Recognition.
ICASSP2023
Alexander Johnson, Vishwas M. Shetty, Mari Ostendorf, Abeer Alwan, 
Leveraging Multiple Sources in Automatic African American English Dialect Detection for Adults and Children.
Interspeech2023
Eray Eren, Lee Ngee Tan, Abeer Alwan, 
FusedF0: Improving DNN-based F0 Estimation by Fusion of Summary-Correlograms and Raw Waveform Representations of Speech Signals.
Interspeech2023
Alexander Johnson, Hariram Veeramani, Natarajan Balaji Shankar, Abeer Alwan, 
An Equitable Framework for Automatically Assessing Children's Oral Narrative Language Abilities.
Interspeech2023
Vishwas M. Shetty, Steven M. Lulich, Abeer Alwan, 
Developmental Articulatory and Acoustic Features for Six to Ten Year Old Children.
Interspeech2023
Jinhan Wang, Vijay Ravi, Abeer Alwan, 
Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals.
SpeechComm2022
Gary Yeung, Ruchao Fan, Abeer Alwan, 
Fundamental frequency feature warping for frequency normalization and data augmentation in child automatic speech recognition.
ICASSP2022
Alexander Johnson, Ruchao Fan, Robin Morris, Abeer Alwan, 
LPC Augment: an LPC-based ASR Data Augmentation Algorithm for Low and Zero-Resource Children's Dialects.
ICASSP2022
Vijay Ravi, Jinhan Wang, Jonathan Flint, Abeer Alwan, 
Fraug: A Frame Rate Based Data Augmentation Method for Depression Detection from Speech Signals.
ICASSP2022
Yunzheng Zhu, Ruchao Fan, Abeer Alwan, 
Towards Better Meta-Initialization with Task Augmentation for Kindergarten-Aged Speech Recognition.
Interspeech2022
Amber Afshan, Abeer Alwan, 
Attention-based conditioning methods using variable frame rate for style-robust speaker verification.
Interspeech2022
Amber Afshan, Abeer Alwan, 
Learning from human perception to improve automatic speaker verification in style-mismatched conditions.
Interspeech2022
Ruchao Fan, Abeer Alwan, 
DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR.
Interspeech2022
Alexander Johnson, Kevin Everson, Vijay Ravi, Anissa Gladney, Mari Ostendorf, Abeer Alwan, 
Automatic Dialect Density Estimation for African American English.
Interspeech2022
Vijay Ravi, Jinhan Wang, Jonathan Flint, Abeer Alwan, 
A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement.
Interspeech2022
Jinhan Wang, Vijay Ravi, Jonathan Flint, Abeer Alwan, 
Unsupervised Instance Discriminative Learning for Depression Detection from Speech Signals.
ICASSP2021
Gary Yeung, Ruchao Fan, Abeer Alwan, 
Fundamental Frequency Feature Normalization and Data Augmentation for Child Speech Recognition.
Interspeech2021
Ruchao Fan, Wei Chu, Peng Chang 0002, Jing Xiao 0006, Abeer Alwan, 
An Improved Single Step Non-Autoregressive Transformer for Automatic Speech Recognition.
ICASSP2024
Stefano Bannò, Rao Ma, Mengjie Qian, Kate M. Knill, Mark J. F. Gales, 
Towards End-to-End Spoken Grammatical Error Correction.
NAACL2024
Rao Ma, Adian Liusie, Mark J. F. Gales, Kate M. Knill, 
Investigating the Emergent Audio Classification Ability of ASR Foundation Models.
ICASSP2023
Tian Huey Teh, Vivian Hu, Devang S. Ram Mohan, Zack Hodari, Christopher G. R. Wallis, Tomás Gómez Ibarrondo, Alexandra Torresquintero, James Leoni, Mark J. F. Gales, Simon King 0001, 
Ensemble Prosody Prediction For Expressive Speech Synthesis.
Interspeech2023
Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, Mark J. F. Gales, 
Multi-Head State Space Model for Speech Recognition.
Interspeech2023
Rao Ma, Mark J. F. Gales, Kate M. Knill, Mengjie Qian, 
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space.
Interspeech2023
Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill, 
Adapting an Unadaptable ASR System.
Interspeech2023
Diane Nicholls, Kate M. Knill, Mark J. F. Gales, Anton Ragni, Paul Ricketts, 
Speak & Improve: L2 English Speaking Practice Tool.
TASLP2022
Anton Ragni, Mark J. F. Gales, Oliver Rose, Katherine M. Knill, Alexandros Kastanos, Qiujia Li, Preben Ness, 
Increasing Context for Estimating Confidence Scores in Automatic Speech Recognition.
Interspeech2022
Stefano Bannò, Bhanu Balusu, Mark J. F. Gales, Kate M. Knill, Konstantinos Kyriakopoulos, 
View-Specific Assessment of L2 Spoken English.
ICASSP2021
Yassir Fathullah, Mark J. F. Gales, Andrey Malinin, 
Ensemble Distillation Approaches for Grammatical Error Correction.
ICASSP2021
Yiting Lu, Yu Wang 0027, Mark J. F. Gales, 
Efficient Use of End-to-End Data in Spoken Language Processing.
ICASSP2021
Xizi Wei, Mark J. F. Gales, Kate M. Knill, 
Analysing Bias in Spoken Language Assessment Using Concept Activation Vectors.
Interspeech2021
Qingyun Dou, Xixin Wu, Moquan Wan, Yiting Lu, Mark J. F. Gales, 
Deliberation-Based Multi-Pass Speech Synthesis.
ICASSP2020
Alexandros Kastanos, Anton Ragni, Mark J. F. Gales, 
Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks.
Interspeech2020
Qingyun Dou, Joshua Efiong, Mark J. F. Gales, 
Attention Forcing for Speech Synthesis.
Interspeech2020
Kate M. Knill, Linlin Wang, Yu Wang 0027, Xixin Wu, Mark J. F. Gales, 
Non-Native Children's Automatic Speech Recognition: The INTERSPEECH 2020 Shared Task ALTA Systems.
Interspeech2020
Konstantinos Kyriakopoulos, Kate M. Knill, Mark J. F. Gales, 
Automatic Detection of Accent and Lexical Pronunciation Errors in Spontaneous Non-Native English Speech.
Interspeech2020
Yiting Lu, Mark J. F. Gales, Yu Wang 0027, 
Spoken Language 'Grammatical Error Correction'.
Interspeech2020
Potsawee Manakul, Mark J. F. Gales, Linlin Wang, 
Abstractive Spoken Document Summarization Using Hierarchical Model with Multi-Stage Attention Diversity Optimization.
Interspeech2020
Vyas Raina, Mark J. F. Gales, Kate M. Knill, 
Universal Adversarial Attacks on Spoken Language Assessment Systems.
ICASSP2023
Jianan Chen, Sakriani Sakti, 
An Isotropy Analysis for Self-Supervised Acoustic Unit Embeddings on the Zero Resource Speech Challenge 2021 Framework.
ICASSP2023
Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
Self-Adaptive Incremental Machine Speech Chain for Lombard TTS with High-Granularity ASR Feedback in Dynamic Noise Condition.
Interspeech2023
Shun Takahashi, Sakriani Sakti, 
Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams.
Interspeech2023
Chung Tran, Chi Mai Luong, Sakriani Sakti, 
STEN-TTS: Improving Zero-shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework.
EMNLP2023
Ruhiyah Widiaputri, Ayu Purwarianti, Dessi Puji Lestari, Kurniawati Azizah, Dipta Tanaya, Sakriani Sakti, 
Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in Indonesian.
TASLP2022
Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments.
TASLP2022
Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR.
Interspeech2022
Heli Qi, Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing.
TASLP2021
Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load.
Interspeech2021
Johanes Effendi, Sakriani Sakti, Satoshi Nakamura 0001, 
Weakly-Supervised Speech-to-Text Mapping with Visually Connected Non-Parallel Speech-Text Data Using Cyclic Partially-Aligned Transformer.
Interspeech2021
Yuka Ko, Katsuhito Sudoh, Sakriani Sakti, Satoshi Nakamura 0001, 
ASR Posterior-Based Loss for Multi-Task End-to-End Speech Translation.
Interspeech2021
Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
Dynamically Adaptive Machine Speech Chain Inference for TTS in Noisy Environment: Listen and Speak Louder.
Interspeech2021
Shun Takahashi, Sakriani Sakti, Satoshi Nakamura 0001, 
Unsupervised Neural-Based Graph Clustering for Variable-Length Speech Representation Discovery of Zero-Resource Languages.
Interspeech2021
Hirotaka Tokuyama, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura 0001, 
Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-Based Speech-to-Text Translation.
TASLP2020
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Machine Speech Chain.
TASLP2020
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Corrections to "Machine Speech Chain".
Interspeech2020
Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux, 
The Zero Resource Speech Challenge 2020: Discovering Discrete Subword and Word Units.
Interspeech2020
Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Augmenting Images for ASR and TTS Through Single-Loop and Dual-Loop Multimodal Chain Framework.
Interspeech2020
Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura 0001, 
Incremental Machine Speech Chain Towards Enabling Listening While Speaking in Real-Time.
Interspeech2020
Ivan Halim Parmonangan, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura 0001, 
Combining Audio and Brain Activity for Predicting Speech Quality.
ICASSP2024
Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang 0029, Hongbo Lan, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao, 
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
SpeechComm2023
Bence Mark Halpern, Siyuan Feng 0001, Rob van Son, Michiel W. M. van den Brekel, Odette Scharenborg, 
Automatic evaluation of spontaneous oral cancer speech using ratings from naive listeners.
ICASSP2023
Hang Chen, Shilong Wu, Yusheng Dai, Zhe Wang, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge.
ICASSP2023
Bo Dekker, Alfred C. Schouten, Odette Scharenborg, 
DAIS: The Delft Database of EEG Recordings of Dutch Articulated and Imagined Speech.
ICASSP2023
Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.
SpeechComm2022
Bence Mark Halpern, Siyuan Feng 0001, Rob van Son, Michiel W. M. van den Brekel, Odette Scharenborg, 
Low-resource automatic speech recognition and error analyses of oral cancer speech.
ICASSP2022
Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.
ICASSP2022
Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda, 
Towards Identity Preserving Normal to Dysarthric Voice Conversion.
Interspeech2022
Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee 0001, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan, 
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.
Interspeech2022
Tanvina Patel, Odette Scharenborg, 
Using cross-model learnings for the Gram Vaani ASR Challenge 2022.
Interspeech2022
Luke Prananta, Bence Mark Halpern, Siyuan Feng 0001, Odette Scharenborg, 
The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition.
Interspeech2022
Yuanyuan Zhang, Yixuan Zhang, Bence Mark Halpern, Tanvina Patel, Odette Scharenborg, 
Mitigating bias against non-native accents.
Interspeech2022
Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee 0001, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Shifu Xiong, Jianqing Gao, 
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.
SpeechComm2021
Polina Drozdova, Roeland van Hout, Sven L. Mattys, Odette Scharenborg, 
The effect of intermittent noise on lexically-guided perceptual learning in native and non-native listening.
TASLP2021
Xinsheng Wang, Justin van der Hout, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg, 
Synthesizing Spoken Descriptions of Images.
TASLP2021
Xinsheng Wang, Tingting Qiao, Jihua Zhu, Alan Hanjalic, Odette Scharenborg, 
Generating Images From Spoken Descriptions.
ICASSP2021
Siyuan Feng 0001, Piotr Zelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak, 
How Phonotactics Affect Multilingual and Zero-Shot ASR Performance.
ICASSP2021
Xinsheng Wang, Siyuan Feng 0001, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg, 
Show and Speak: Directly Synthesize Spoken Description of Images.
ICASSP2021
Liming Wang, Xinsheng Wang, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak, 
Align or attend? Toward More Efficient and Accurate Spoken Word Discovery Using Speech-to-Image Retrieval.
Interspeech2021
Siyuan Feng 0001, Piotr Zelasko, Laureano Moro-Velázquez, Odette Scharenborg, 
Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation.
SpeechComm2024
Detai Xin, Shinnosuke Takamichi, Hiroshi Saruwatari, 
JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions.
TASLP2024
Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe 0001, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis.
ICASSP2024
Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryotaro Nagase, Takahiro Fukumori, Yoichi Yamashita, 
Environmental Sound Synthesis from Vocal Imitations and Sound Event Labels.
ICASSP2024
Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari, 
Diversity-Based Core-Set Selection for Text-to-Speech with Linguistic and Acoustic Features.
ICASSP2024
Shinnosuke Takamichi, Hiroki Maeda, Joonyong Park, Daisuke Saito, Hiroshi Saruwatari, 
Do Learned Speech Symbols Follow Zipf's Law?
ICASSP2023
Tomohiko Nakamura, Shinnosuke Takamichi, Naoko Tanji, Satoru Fukayama, Hiroshi Saruwatari, 
jaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus.
ICASSP2023
Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Detai Xin, Hiroshi Saruwatari, 
MID-Attribute Speaker Generation Using Optimal-Transport-Based Interpolation of Gaussian Mixture Models.
ICASSP2023
Detai Xin, Sharath Adavanne, Federico Ang, Ashish Kulkarni, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual Contexts.
Interspeech2023
Joonyong Park, Shinnosuke Takamichi, Tomohiko Nakamura, Kentaro Seki, Detai Xin, Hiroshi Saruwatari, 
How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics.
Interspeech2023
Yuki Saito, Eiji Iimori, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari, 
CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center.
Interspeech2023
Yuki Saito, Shinnosuke Takamichi, Eiji Iimori, Kentaro Tachibana, Hiroshi Saruwatari, 
ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings.
Interspeech2023
Yota Ueda, Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Hiroshi Saruwatari, 
HumanDiffusion: diffusion model using perceptual gradients.
Interspeech2023
Detai Xin, Shinnosuke Takamichi, Ai Morimatsu, Hiroshi Saruwatari, 
Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus.
IJCAI2023
Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe 0001, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining.
Interspeech2022
Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari, 
Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.
Interspeech2022
Yuto Nishimura, Yuki Saito, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari, 
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History.
Interspeech2022
Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, Hiroshi Saruwatari, 
SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling.
Interspeech2022
Takaaki Saeki, Detai Xin, Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari, 
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022.
Interspeech2022
Yuki Saito, Yuto Nishimura, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari, 
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent.
Interspeech2022
Shinnosuke Takamichi, Wataru Nakata, Naoko Tanji, Hiroshi Saruwatari, 
J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis.
ICASSP2024
Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang 0029, Hongbo Lan, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao, 
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
ICASSP2024
Hao Yen, Sabato Marco Siniscalchi, Chin-Hui Lee 0001, 
Boosting End-to-End Multilingual Phoneme Recognition Through Exploiting Universal Speech Attributes Constraints.
ICLR2024
Chen Chen 0075, Ruizhe Li 0001, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Engsiong Chng, Chao-Han Huck Yang, 
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition.
ICASSP2023
Hang Chen, Shilong Wu, Yusheng Dai, Zhe Wang, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge.
ICASSP2023
Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.
ICASSP2023
Chao-Han Huck Yang, Bo Li 0028, Yu Zhang 0033, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee 0001, 
A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition.
Interspeech2023
Chun-Wei Ho, Chao-Han Huck Yang, Sabato Marco Siniscalchi, 
Differentially Private Adapters for Parameter Efficient Acoustic Modeling.
Interspeech2023
Pin-Jui Ku, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee 0001, 
A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models.
Interspeech2023
Salvatore Sarni, Sandro Cumani, Sabato Marco Siniscalchi, Andrea Bottino, 
Description and analysis of the KPT system for NIST Language Recognition Evaluation 2022.
Interspeech2023
Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao 0001, 
Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition.
NeurIPS2023
Chen Chen 0075, Yuchen Hu, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Pin-Yu Chen, Chng Eng Siong, 
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models.
TASLP2022
Abdolreza Sabzi Shahrebabaki, Giampiero Salvi, Torbjørn Svendsen, Sabato Marco Siniscalchi, 
Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models.
ICASSP2022
Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.
Interspeech2022
Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee 0001, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan, 
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.
Interspeech2022
Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee 0001, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Shifu Xiong, Jianqing Gao, 
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.
ICASSP2021
Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai 0002, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee 0001, 
A Two-Stage Approach to Device-Robust Acoustic Scene Classification.
ICASSP2021
Abdolreza Sabzi Shahrebabaki, Negar Olfati, Ali Shariq Imran, Magne Hallstein Johnsen, Sabato Marco Siniscalchi, Torbjørn Svendsen, 
A Two-Stage Deep Modeling Approach to Articulatory Inversion.
ICASSP2021
Chao-Han Huck Yang, Jun Qi 0002, Samuel Yen-Chi Chen, Pin-Yu Chen, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee 0001, 
Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition.
Interspeech2021
Abdolreza Sabzi Shahrebabaki, Sabato Marco Siniscalchi, Torbjørn Svendsen, 
Raw Speech-to-Articulatory Inversion by Temporal Filtering and Decimation.
Interspeech2021
Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee 0001, 
PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification.
ICML2024
Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan 0003, Detai Xin, Dongchao Yang, Eric Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu 0001, Tao Qin 0001, Xiangyang Li 0001, Wei Ye 0004, Shikun Zhang, Jiang Bian 0002, Lei He 0005, Jinyu Li 0001, Sheng Zhao, 
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
ICLR2024
Yichong Leng, Zhifang Guo, Kai Shen, Zeqian Ju, Xu Tan 0003, Eric Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He 0005, Xiangyang Li 0001, Sheng Zhao, Tao Qin 0001, Jiang Bian 0002, 
PromptTTS 2: Describing and Generating Voices with Text Prompt.
ICLR2024
Kai Shen, Zeqian Ju, Xu Tan 0003, Eric Liu, Yichong Leng, Lei He 0005, Tao Qin 0001, Sheng Zhao, Jiang Bian 0002, 
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers.
AAAI2023
Yichong Leng, Xu Tan 0003, Wenjie Liu, Kaitao Song, Rui Wang 0028, Xiang-Yang Li 0001, Tao Qin 0001, Edward Lin, Tie-Yan Liu, 
SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition.
TASLP2022
Wenxin Hou, Han Zhu 0004, Yidong Wang, Jindong Wang 0001, Tao Qin 0001, Renjun Xu, Takahiro Shinozaki, 
Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition.
TASLP2022
Xiaobo Liang, Lijun Wu, Juntao Li, Tao Qin 0001, Min Zhang 0005, Tie-Yan Liu, 
Multi-Teacher Distillation With Single Model for Neural Machine Translation.
Interspeech2022
Yihan Wu, Xu Tan 0003, Bohan Li 0003, Lei He 0005, Sheng Zhao, Ruihua Song, Tao Qin 0001, Tie-Yan Liu, 
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios.
Interspeech2022
Guangyan Zhang, Kaitao Song, Xu Tan 0003, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang 0001, Wei Zhou, Tao Qin 0001, Tan Lee 0001, Sheng Zhao, 
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.
NeurIPS2022
Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen 0008, Xu Tan 0003, Danilo P. Mandic, Lei He 0005, Xiangyang Li 0001, Tao Qin 0001, Sheng Zhao, Tie-Yan Liu, 
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis.
ACL2022
Yi Ren 0006, Xu Tan 0003, Tao Qin 0001, Zhou Zhao, Tie-Yan Liu, 
Revisiting Over-Smoothness in Text to Speech.
ICASSP2021
Yichong Leng, Xu Tan 0003, Sheng Zhao, Frank K. Soong, Xiang-Yang Li 0001, Tao Qin 0001, 
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network.
ICASSP2021
Renqian Luo, Xu Tan 0003, Rui Wang 0028, Tao Qin 0001, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu, 
Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search.
ICASSP2021
Linghui Meng 0001, Jin Xu 0010, Xu Tan 0003, Jindong Wang 0001, Tao Qin 0001, Bo Xu 0002, 
MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition.
ICASSP2021
Yuzi Yan, Xu Tan 0003, Bohan Li 0003, Tao Qin 0001, Sheng Zhao, Yuan Shen 0001, Tie-Yan Liu, 
Adaspeech 2: Adaptive Text to Speech with Untranscribed Data.
ICASSP2021
Chen Zhang 0020, Yi Ren 0006, Xu Tan 0003, Jinglin Liu, Kejun Zhang, Tao Qin 0001, Sheng Zhao, Tie-Yan Liu, 
Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling.
Interspeech2021
Wenxin Hou, Jindong Wang 0001, Xu Tan 0003, Tao Qin 0001, Takahiro Shinozaki, 
Cross-Domain Speech Recognition with Unsupervised Character-Level Distribution Matching.
Interspeech2021
Yuzi Yan, Xu Tan 0003, Bohan Li 0003, Guangyan Zhang, Tao Qin 0001, Sheng Zhao, Yuan Shen 0001, Wei-Qiang Zhang, Tie-Yan Liu, 
Adaptive Text to Speech for Spontaneous Style.
NeurIPS2021
Jiawei Chen 0008, Xu Tan 0003, Yichong Leng, Jin Xu 0010, Guihua Wen, Tao Qin 0001, Tie-Yan Liu, 
Speech-T: Transducer for Text to Speech and Beyond.
NeurIPS2021
Yichong Leng, Xu Tan 0003, Linchen Zhu, Jin Xu 0010, Renqian Luo, Linquan Liu, Tao Qin 0001, Xiangyang Li 0001, Edward Lin, Tie-Yan Liu, 
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition.
ICLR2021
Yi Ren 0006, Chenxu Hu, Xu Tan 0003, Tao Qin 0001, Sheng Zhao, Zhou Zhao, Tie-Yan Liu, 
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.
TASLP2024
Jianchen Li, Jiqing Han 0001, Fan Qian, Tieran Zheng, Yongjun He, Guibin Zheng, 
Distance Metric-Based Open-Set Domain Adaptation for Speaker Verification.
ICASSP2024
Wenjie Song 0003, Jiqing Han 0001, Jianchen Li, Guibin Zheng, Tieran Zheng, Yongjun He, 
Modeling Quasi-Periodic Dependency via Self-Supervised Pre-Training for Respiratory Sound Classification.
ICASSP2024
Yadong Guan, Jiqing Han 0001, Hongwei Song, Wenjie Song 0003, Guibin Zheng, Tieran Zheng, Yongjun He, 
Contrastive Loss Based Frame-Wise Feature Disentanglement for Polyphonic Sound Event Detection.
ICASSP2023
Feng Chen, Shiwen Deng, Tieran Zheng, Yongjun He, Jiqing Han 0001, 
Graph-Based Spectro-Temporal Dependency Modeling for Anti-Spoofing.
ICASSP2023
Yadong Guan, Guibin Zheng, Jiqing Han 0001, Huanliang Wang, 
Subband Dependency Modeling for Sound Event Detection.
ICASSP2023
Dekai Sun, Yancheng He, Jiqing Han 0001, 
Using Auxiliary Tasks In Multimodal Fusion of Wav2vec 2.0 And Bert for Multimodal Emotion Recognition.
Interspeech2023
Ying Shi 0001, Dong Wang 0013, Lantian Li, Jiqing Han 0001, Shi Yin, 
Spot Keywords From Very Noisy and Mixed Speech.
Interspeech2023
Yue Gu, Zhihao Du, Shiliang Zhang, Qian Chen 0003, Jiqing Han 0001, 
Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition.
Interspeech2023
Jianchen Li, Jiqing Han 0001, Shiwen Deng, Tieran Zheng, Yongjun He, Guibin Zheng, 
Mutual Information-based Embedding Decoupling for Generalizable Speaker Verification.
ICASSP2022
Yadong Guan, Jiabin Xue, Guibin Zheng, Jiqing Han 0001, 
Sparse Self-Attention for Semi-Supervised Sound Event Detection.
ICASSP2022
Jianchen Li, Jiqing Han 0001, Hongwei Song, 
CDMA: Cross-Domain Distance Metric Adaptation for Speaker Verification.
Interspeech2022
Fan Qian, Hongwei Song, Jiqing Han 0001, 
Word-wise Sparse Attention for Multimodal Sentiment Analysis.
ICASSP2021
Hongwei Song, Jiqing Han 0001, Shiwen Deng, Zhihao Du, 
Capturing Temporal Dependencies Through Future Prediction for CNN-Based Audio Classifiers.
Interspeech2021
Jianchen Li, Jiqing Han 0001, Hongwei Song, 
Gradient Regularization for Noise-Robust Speaker Verification.
Interspeech2021
Fan Qian, Jiqing Han 0001, 
Multimodal Sentiment Analysis with Temporal Modality Attention.
Interspeech2021
Jiabin Xue, Tieran Zheng, Jiqing Han 0001, 
Model-Agnostic Fast Adaptive Multi-Objective Balancing Algorithm for Multilingual Automatic Speech Recognition Model Training.
TASLP2020
Zhihao Du, Xueliang Zhang 0001, Jiqing Han 0001, 
A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement.
TASLP2020
Hui Luo, Jiqing Han 0001, 
Nonnegative Matrix Factorization Based Transfer Subspace Learning for Cross-Corpus Speech Emotion Recognition.
ICASSP2020
Chen Chen 0086, Jiqing Han 0001, 
TDMF: Task-Driven Multilevel Framework for End-to-End Speaker Verification.
ICASSP2020
Zhihao Du, Ming Lei, Jiqing Han 0001, Shiliang Zhang, 
Pan: Phoneme-Aware Network for Monaural Speech Enhancement.
ICLR2024
Alon Ziv, Itai Gat, Gaël Le Lan, Tal Remez, Felix Kreuk, Jade Copet, Alexandre Défossez, Gabriel Synnaeve, Yossi Adi, 
Masked Audio Generation using a Single Non-Autoregressive Transformer.
AAAI2024
Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, Yossi Adi, 
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation.
ICASSP2023
Ali Elkahky, Wei-Ning Hsu, Paden Tomasello, Tu Anh Nguyen, Robin Algayres, Yossi Adi, Jade Copet, Emmanuel Dupoux, Abdelrahman Mohamed, 
Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training?
ICASSP2023
Wen-Chin Huang, Benjamin Peloquin, Justine Kao, Changhan Wang, Hongyu Gong, Elizabeth Salesky, Yossi Adi, Ann Lee 0001, Peng-Jen Chen, 
A Holistic Cascade System, Benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation.
ICASSP2023
Amitay Sicherman, Yossi Adi, 
Analysing Discrete Self Supervised Speech Representation For Spoken Language Modeling.
Interspeech2023
Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarandi, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux, 
Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis.
Interspeech2023
Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi, Idan Schwartz, 
Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation.
NeurIPS2023
Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Défossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz 0001, Yossi Adi, 
Textually Pretrained Speech Language Models.
NeurIPS2023
Matthew Le 0001, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu, 
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale.
NeurIPS2023
Robin San Roman, Yossi Adi, Antoine Deleforge, Romain Serizel, Gabriel Synnaeve, Alexandre Défossez, 
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion.
ICLR2023
Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi, 
AudioGen: Textually Guided Audio Generation.
EMNLP2023
Robin Algayres, Yossi Adi, Tu Anh Nguyen, Jade Copet, Gabriel Synnaeve, Benoît Sagot, Emmanuel Dupoux, 
Generative Spoken Language Model based on continuous word-sized audio tokens.
EMNLP-Findings2023
Gallil Maimon, Yossi Adi, 
Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units.
ICASSP2022
Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar 0003, 
Continual Self-Training With Bootstrapped Remixing For Speech Enhancement.
Interspeech2022
Shahaf Bassan, Yossi Adi, Jeffrey S. Rosenschein, 
Unsupervised Symbolic Music Segmentation using Ensemble Temporal Prediction Errors.
Interspeech2022
Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino 0001, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee 0001, 
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation.
Interspeech2022
Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski, 
Probing phoneme, language and speaker information in unsupervised speech representations.
Interspeech2022
Or Tal, Moshe Mandel, Felix Kreuk, Yossi Adi, 
A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement.
Interspeech2022
Arnon Turetzky, Tzvi Michelson, Yossi Adi, Shmuel Peleg, 
Deep Audio Waveform Prior.
ACL2022
Eugene Kharitonov, Ann Lee 0001, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu, 
Text-Free Prosody-Aware Generative Spoken Language Modeling.
ICASSP2024
Zhe Liu 0011, Ozlem Kalinli, 
Forgetting Private Textual Sequences in Language Models Via Leave-One-Out Ensemble.
ICASSP2024
Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, Jinxi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer, 
Prompting Large Language Models with Speech Recognition Abilities.
ICASSP2024
Egor Lakomkin, Chunyang Wu, Yassir Fathullah, Ozlem Kalinli, Michael L. Seltzer, Christian Fuegen, 
End-to-End Speech Recognition Contextualization with Large Language Models.
ICASSP2024
Yingyi Ma, Zhe Liu 0011, Ozlem Kalinli, 
Correction Focused Language Model Training For Speech Recognition.
ICASSP2024
Yuan Shangguan, Haichuan Yang, Danni Li, Chunyang Wu, Yassir Fathullah, Dilin Wang, Ayushi Dalmia, Raghuraman Krishnamoorthi, Ozlem Kalinli, Junteng Jia, Jay Mahadeokar, Xin Lei, Mike Seltzer, Vikas Chandra, 
TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-Device ASR Models.
ICASSP2024
Chuanneng Sun, Zeeshan Ahmed, Yingyi Ma, Zhe Liu 0011, Lucas Kabela, Yutong Pang, Ozlem Kalinli, 
Contextual Biasing of Named-Entities with Large Language Models.
ICASSP2024
Arpita Vats, Zhe Liu, Peng Su, Debjyoti Paul, Yingyi Ma, Yutong Pang, Zeeshan Ahmed, Ozlem Kalinli, 
Recovering from Privacy-Preserving Masking with Large Language Models.
ICASSP2024
Jiamin Xie, Ke Li, Jinxi Guo, Andros Tjandra, Yuan Shangguan, Leda Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli, 
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of a Multilingual ASR Model.
NAACL2024
Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Ke Li, Junteng Jia, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer, 
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs.
ICASSP2023
Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang 0007, Ozlem Kalinli, 
Anchored Speech Recognition with Neural Transducers.
ICASSP2023
Andros Tjandra, Nayan Singhal, David Zhang, Ozlem Kalinli, Abdelrahman Mohamed, Duc Le, Michael L. Seltzer, 
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities.
ICASSP2023
Mu Yang, Andros Tjandra, Chunxi Liu, David Zhang, Duc Le, Ozlem Kalinli, 
Learning ASR Pathways: A Sparse Multilingual ASR Model.
Interspeech2023
Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, Mark J. F. Gales, 
Multi-Head State Space Model for Speech Recognition.
Interspeech2023
Suyoun Kim, Akshat Shrivastava, Duc Le, Ju Lin, Ozlem Kalinli, Michael L. Seltzer, 
Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding.
ICASSP2022
Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu 0011, Bo Wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer, 
Neural-FST Class Language Model for End-to-End Speech Recognition.
ICASSP2022
Yangyang Shi, Chunyang Wu, Dilin Wang, Alex Xiao, Jay Mahadeokar, Xiaohui Zhang 0007, Chunxi Liu, Ke Li, Yuan Shangguan, Varun Nagaraja, Ozlem Kalinli, Mike Seltzer, 
Streaming Transformer Transducer based Speech Recognition Using Non-Causal Convolution.
ICASSP2022
Haichuan Yang, Yuan Shangguan, Dilin Wang, Meng Li 0004, Pierce Chuang, Xiaohui Zhang 0007, Ganesh Venkatesh, Ozlem Kalinli, Vikas Chandra, 
Omni-Sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR Via Supernet.
Interspeech2022
Junteng Jia, Jay Mahadeokar, Weiyi Zheng, Yuan Shangguan, Ozlem Kalinli, Frank Seide, 
Federated Domain Adaptation for ASR with Full Self-Supervision.
Interspeech2022
Suyoun Kim, Duc Le, Weiyi Zheng, Tarun Singh, Abhinav Arora, Xiaoyu Zhai, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer, 
Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric.
Interspeech2022
Duc Le, Akshat Shrivastava, Paden D. Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer, 
Deliberation Model for On-Device Spoken Language Understanding.
ICASSP2024
Yimin Deng, Huaizhen Tang, Xulong Zhang 0001, Ning Cheng 0001, Jing Xiao 0006, Jianzong Wang, 
Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval.
ICASSP2024
Haobin Tang, Xulong Zhang 0001, Ning Cheng 0001, Jing Xiao 0006, Jianzong Wang, 
ED-TTS: Multi-Scale Emotion Modeling Using Cross-Domain Emotion Diarization for Emotional Speech Synthesis.
ICASSP2024
Yong Zhang, Hanzhang Li, Zhitao Li, Ning Cheng 0001, Ming Li, Jing Xiao 0006, Jianzong Wang, 
Leveraging Biases in Large Language Models: "bias-kNN" for Effective Few-Shot Learning.
ICASSP2023
Ganghui Ru, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Improving Music Genre Classification from multi-modal Properties of Music and Genre Correlations Perspective.
ICASSP2023
Huaizhen Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Learning Speech Representations with Flexible Hidden Feature Dimensions.
ICASSP2023
Huaizhen Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
VQ-CL: Learning Disentangled Speech Representations with Contrastive Learning and Vector Quantization.
ICASSP2023
Haobin Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis.
ICASSP2023
Xulong Zhang 0001, Haobin Tang, Jianzong Wang, Ning Cheng 0001, Jian Luo, Jing Xiao 0006, 
Dynamic Alignment Mask CTC: Improved Mask CTC With Aligned Cross Entropy.
ICASSP2023
Kexin Zhu, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Improving EEG-based Emotion Recognition by Fusing Time-Frequency and Spatial Representations.
Interspeech2023
Jiaxin Fan, Yong Zhang, Hanzhang Li, Jianzong Wang, Zhitao Li, Sheng Ouyang, Ning Cheng 0001, Jing Xiao 0006, 
Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism.
Interspeech2023
Yifu Sun, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Kaiyu Hu, Jing Xiao 0006, 
Investigation of Music Emotion Recognition Based on Segmented Semi-Supervised Learning.
Interspeech2023
Haobin Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis.
Interspeech2023
Yong Zhang, Zhitao Li, Jianzong Wang, Yiming Gao 0010, Ning Cheng 0001, Fengying Yu, Jing Xiao 0006, 
Prompt Guided Copy Mechanism for Conversational Question Answering.
ICASSP2022
Huaizhen Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Avqvc: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning.
ICASSP2022
Qiqi Wang 0005, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning.
ICASSP2022
Tong Ye, Shijing Si, Jianzong Wang, Rui Wang, Ning Cheng 0001, Jing Xiao 0006, 
VU-BERT: A Unified Framework for Visual Dialog.
ICASSP2022
Yong Zhang, Zhitao Li, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Self-Attention for Incomplete Utterance Rewriting.
ICASSP2022
Botao Zhao 0001, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-speaker text-to-speech.
Interspeech2022
Jian Luo, Jianzong Wang, Ning Cheng 0001, Edward Xiao, Xulong Zhang 0001, Jing Xiao 0006, 
Tiny-Sepformer: A Tiny Time-Domain Transformer Network For Speech Separation.
Interspeech2022
Sicheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu 0001, Aolan Sun, Jianzong Wang, Ning Cheng 0001, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng, 
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion.
TASLP2024
Vishal Kumar, Vinayak Abrol, Mathew Magimai-Doss, 
On the Quantization of Neural Models for Speaker Verification.
ICASSP2024
Sevada Hovsepyan, Mathew Magimai-Doss, 
Syllable Level Features for Parkinson's Disease Detection from Speech.
ICASSP2024
Bogdan Vlasenko, Sargam Vyas, Mathew Magimai-Doss, 
Comparing data-Driven and Handcrafted Features for Dimensional Emotion Recognition.
ICASSP2023
Tilak Purohit, Sarthak Yadav, Bogdan Vlasenko, S. Pavankumar Dubagunta, Mathew Magimai-Doss, 
Towards Learning Emotion Information from Short Segments of Speech.
Interspeech2023
Enno Hermann, Mathew Magimai-Doss, 
Few-shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation.
Interspeech2023
Timothy Piton, Enno Hermann, Angela Pasqualotto, Marjolaine Cohen, Mathew Magimai-Doss, Daphne Bavelier, 
Using Commercial ASR Solutions to Assess Reading Skills in Children: A Case Report.
Interspeech2023
Tilak Purohit, Bogdan Vlasenko, Mathew Magimai-Doss, 
Implicit phonetic information modeling for speech emotion recognition.
Interspeech2023
Eklavya Sarkar, Mathew Magimai-Doss, 
Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?
ICASSP2022
Zohreh Mostaani, RaviShankar Prasad, Bogdan Vlasenko, Mathew Magimai-Doss, 
Modeling of Pre-Trained Neural Network Embeddings Learned From Raw Waveform for COVID-19 Infection Detection.
Interspeech2022
Zohreh Mostaani, Mathew Magimai-Doss, 
On Breathing Pattern Information in Synthetic Speech.
Interspeech2022
Eklavya Sarkar, RaviShankar Prasad, Mathew Magimai-Doss, 
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering.
ICASSP2021
Zohreh Mostaani, Venkata Srikanth Nallanthighal, Aki Härmä, Helmer Strik, Mathew Magimai-Doss, 
On The Relationship Between Speech-Based Breathing Signal Prediction Evaluation Measures and Breathing Parameters Estimation.
Interspeech2021
Enno Hermann, Mathew Magimai-Doss, 
Handling Acoustic Variation in Dysarthric Speech Recognition Systems Through Model Combination.
Interspeech2021
RaviShankar Prasad, Mathew Magimai-Doss, 
Identification of F1 and F2 in Speech Using Modified Zero Frequency Filtering.
Interspeech2021
Juan Camilo Vásquez-Correa, Julian Fritsch, Juan Rafael Orozco-Arroyave, Elmar Nöth, Mathew Magimai-Doss, 
On Modeling Glottal Source Information for Phonation Assessment in Parkinson's Disease.
Interspeech2021
Esaú Villatoro-Tello, S. Pavankumar Dubagunta, Julian Fritsch, Gabriela Ramírez-de-la-Rosa, Petr Motlícek, Mathew Magimai-Doss, 
Late Fusion of the Available Lexicon and Raw Waveform-Based Acoustic Modeling for Depression and Dementia Recognition.
ICASSP2020
Julian Fritsch, S. Pavankumar Dubagunta, Mathew Magimai-Doss, 
Estimating the Degree of Sleepiness by Integrating Articulatory Feature Knowledge in Raw Waveform Based CNNS.
ICASSP2020
Enno Hermann, Mathew Magimai-Doss, 
Dysarthric Speech Recognition with Lattice-Free MMI.
ICASSP2020
RaviShankar Prasad, Gürkan Yilmaz, Olivier Chételat, Mathew Magimai-Doss, 
Detection Of S1 And S2 Locations In Phonocardiogram Signals Using Zero Frequency Filter.
ICASSP2020
Sandrine Tornay, Marzieh Razavi, Mathew Magimai-Doss, 
Towards Multilingual Sign Language Recognition.
ICASSP2024
Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-Based ASR.
ICASSP2024
Yamato Ohtani, Takuma Okamoto, Tomoki Toda, Hisashi Kawai, 
FIRNet: Fundamental Frequency Controllable Fast Neural Vocoder With Trainable Finite Impulse Response Filter.
ICASSP2024
Takuma Okamoto, Yamato Ohtani, Tomoki Toda, Hisashi Kawai, 
Convnext-TTS And Convnext-VC: Convnext-Based Fast End-To-End Sequence-To-Sequence Text-To-Speech And Voice Conversion.
TASLP2023
Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Hisashi Kawai, 
Harmonic-Net: Fundamental Frequency and Speech Rate Controllable Fast Neural Vocoder.
Interspeech2023
Takuma Okamoto, Tomoki Toda, Hisashi Kawai, 
E2E-S2S-VC: End-To-End Sequence-To-Sequence Voice Conversion.
SpeechComm2022
Takuma Okamoto, Keisuke Matsubara, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
Neural speech-rate conversion with multispeaker WaveNet vocoder.
Interspeech2022
Peng Shen, Xugang Lu, Hisashi Kawai, 
Transducer-based language embedding for spoken language identification.
TASLP2021
Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification.
TASLP2021
Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.
ICASSP2021
Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Unsupervised Neural Adaptation Model Based on Optimal Transport for Spoken Language Identification.
ICASSP2021
Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
High-Intelligibility Speech Synthesis for Dysarthric Speakers with LPCNet-Based TTS and CycleVAE-Based VC.
ICASSP2021
Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
Noise Level Limited Sub-Modeling for Diffusion Probabilistic Vocoders.
Interspeech2021
Masakiyo Fujimoto, Hisashi Kawai, 
Noise Robust Acoustic Modeling for Single-Channel Speech Recognition Based on a Stream-Wise Transformer Architecture.
TASLP2020
Peng Shen, Xugang Lu, Sheng Li 0010, Hisashi Kawai, 
Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification.
ICASSP2020
Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
Transformer-Based Text-to-Speech with Weighted Forced Attention.
Interspeech2020
Peng Shen, Xugang Lu, Hisashi Kawai, 
Investigation of NICT Submission for Short-Duration Speaker Verification Challenge 2020.
Interspeech2020
Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation.
ICASSP2019
Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
Investigations of Real-time Gaussian Fftnet and Parallel Wavenet Neural Vocoders with Simple Acoustic Features.
ICASSP2019
Peng Shen, Xugang Lu, Sheng Li 0010, Hisashi Kawai, 
Interactive Learning of Teacher-student Model for Short Utterance Spoken Language Identification.
ICASSP2019
Ryoichi Takashima, Sheng Li 0010, Hisashi Kawai, 
Investigation of Sequence-level Knowledge Distillation Methods for CTC Acoustic Models.
ICASSP2024
Wenhao Guan, Qi Su, Haodong Zhou, Shiyu Miao, Xingjia Xie, Lin Li, Qingyang Hong, 
Reflow-TTS: A Rectified Flow Model for High-Fidelity Text-to-Speech.
ICASSP2024
Yishuang Li, Hukai Huang, Zhicong Chen, Wenhao Guan, Jiayan Lin, Lin Li, Qingyang Hong, 
SR-HuBERT : An Efficient Pre-Trained Model for Speaker Verification.
AAAI2024
Wenhao Guan, Yishuang Li, Tao Li, Hukai Huang, Feng Wang, Jiayan Lin, Lingyan Huang, Lin Li, Qingyang Hong, 
MM-TTS: Multi-Modal Prompt Based Style Transfer for Expressive Text-to-Speech Synthesis.
ICASSP2023
Zhicong Chen, Jie Wang, Wenxuan Hu, Lin Li 0032, Qingyang Hong, 
Unsupervised Speaker Verification Using Pre-Trained Model and Label Correction.
ICASSP2023
Tao Li, Haodong Zhou, Jie Wang, Qingyang Hong, Lin Li, 
The XMU System for Audio-Visual Diarization and Recognition in MISP Challenge 2022.
ICASSP2023
Dexin Liao, Tao Jiang 0033, Feng Wang, Lin Li 0032, Qingyang Hong, 
Towards A Unified Conformer Structure: from ASR to ASV Task.
ICASSP2023
Jie Wang, Zhicong Chen, Haodong Zhou, Lin Li, Qingyang Hong, 
Community Detection Graph Convolutional Network for Overlap-Aware Speaker Diarization.
ICASSP2023
Qiulin Wang, Wenxuan Hu, Lin Li 0032, Qingyang Hong, 
Meta Learning with Adaptive Loss Weight for Low-Resource Speech Recognition.
Interspeech2023
Wenhao Guan, Tao Li, Yishuang Li, Hukai Huang, Qingyang Hong, Lin Li, 
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge.
Interspeech2023
Lingyan Huang, Tao Li, Haodong Zhou, Qingyang Hong, Lin Li, 
Cross-Modal Semantic Alignment before Fusion for Two-Pass End-to-End Spoken Language Understanding.
Interspeech2023
Feng Wang, Lingyan Huang, Tao Li, Qingyang Hong, Lin Li 0032, 
Conformer-based Language Embedding with Self-Knowledge Distillation for Spoken Language Identification.
TASLP2022
Lin Li 0032, Fuchuan Tong, Qingyang Hong, 
When Speaker Recognition Meets Noisy Labels: Optimizations for Front-Ends and Back-Ends.
ICASSP2022
Fuchuan Tong, Siqi Zheng, Min Zhang, Yafeng Chen, Hongbin Suo, Qingyang Hong, Lin Li 0032, 
Graph Convolutional Network Based Semi-Supervised Learning on Multi-Speaker Meeting Data.
Interspeech2022
Jie Wang, Yuji Liu, Binling Wang, Yiming Zhi, Song Li, Shipeng Xia, Jiayang Zhang, Feng Tong, Lin Li 0032, Qingyang Hong, 
Spatial-aware Speaker Diarizaiton for Multi-channel Multi-party Meeting.
Interspeech2022
Binling Wang, Feng Wang, Wenxuan Hu, Qiulin Wang, Jing Li, Dong Wang 0013, Lin Li 0032, Qingyang Hong, 
Oriental Language Recognition (OLR) 2021: Summary and Analysis.
ICASSP2021
Song Li, Beibei Ouyang, Lin Li 0032, Qingyang Hong, 
Light-TTS: Lightweight Multi-Speaker Multi-Lingual Text-to-Speech.
ICASSP2021
Song Li, Beibei Ouyang, Dexin Liao, Shipeng Xia, Lin Li 0032, Qingyang Hong, 
End-To-End Multi-Accent Speech Recognition with Unsupervised Accent Modelling.
ICASSP2021
Fuchuan Tong, Miao Zhao, Jianfeng Zhou, Hao Lu, Zheng Li, Lin Li 0032, Qingyang Hong, 
ASV-SUBTOOLS: Open Source Toolkit for Automatic Speaker Verification.
Interspeech2021
Zheng Li, Yan Liu, Lin Li 0032, Qingyang Hong, 
Additive Phoneme-Aware Margin Softmax Loss for Language Recognition.
Interspeech2021
Song Li, Beibei Ouyang, Fuchuan Tong, Dexin Liao, Lin Li 0032, Qingyang Hong, 
Real-Time End-to-End Monaural Multi-Speaker Speech Recognition.
SpeechComm2024
Shuai Wang 0016, Zhengyang Chen, Bing Han, Hongji Wang, Chengdong Liang, Binbin Zhang, Xu Xiang, Wen Ding, Johan Rohdin, Anna Silnova, Yanmin Qian, Haizhou Li 0001, 
Advancing speaker embedding learning: Wespeaker toolkit for research and production.
TASLP2024
Zhengyang Chen, Bing Han, Shuai Wang 0016, Yanmin Qian, 
Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer.
TASLP2024
Bing Han, Zhengyang Chen, Yanmin Qian, 
Self-Supervised Learning With Cluster-Aware-DINO for High-Performance Robust Speaker Verification.
ICASSP2024
Bing Han, Zhiqiang Lv, Anbai Jiang, Wen Huang 0004, Zhengyang Chen, Yufeng Deng, Jiawei Ding, Cheng Lu 0007, Wei-Qiang Zhang 0001, Pingyi Fan, Jia Liu 0001, Yanmin Qian, 
Exploring Large Scale Pre-Trained Models for Robust Machine Anomalous Sound Detection.
ICASSP2024
Wen Huang 0004, Bing Han, Shuai Wang 0016, Zhengyang Chen, Yanmin Qian, 
Robust Cross-Domain Speaker Verification with Multi-Level Domain Adapters.
ICASSP2024
Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li 0001, 
Prompt-Driven Target Speech Diarization.
ICASSP2024
Shuai Wang 0016, Qibing Bai, Qi Liu 0018, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li 0001, 
Leveraging in-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition.
TASLP2023
Bei Liu, Zhengyang Chen, Yanmin Qian, 
Depth-First Neural Architecture With Attentive Feature Fusion for Efficient Speaker Verification.
ICASSP2023
Bing Han, Zhengyang Chen, Yanmin Qian, 
Exploring Binary Classification Loss for Speaker Verification.
ICASSP2023
Tao Liu, Zhengyang Chen, Yanmin Qian, Kai Yu 0004, 
Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for the MISP 2022 Challenge.
ICASSP2023
Hongji Wang, Chengdong Liang, Shuai Wang 0016, Zhengyang Chen, Binbin Zhang, Xu Xiang, Yanlei Deng, Yanmin Qian, 
Wespeaker: A Research and Production Oriented Speaker Embedding Learning Toolkit.
ICASSP2023
Leying Zhang, Zhengyang Chen, Yanmin Qian, 
Adaptive Large Margin Fine-Tuning For Robust Speaker Verification.
Interspeech2023
Zhengyang Chen, Bing Han, Shuai Wang 0016, Yanmin Qian, 
Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor.
Interspeech2023
Zhengyang Chen, Bing Han, Xu Xiang, Houjun Huang, Bei Liu, Yanmin Qian, 
Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022.
ICASSP2022
Zhengyang Chen, Sanyuan Chen, Yu Wu 0012, Yao Qian, Chengyi Wang 0002, Shujie Liu 0001, Yanmin Qian, Michael Zeng 0001, 
Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification.
ICASSP2022
Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Zhengyang Chen, Zhuo Chen 0006, Shujie Liu 0001, Jian Wu 0027, Yao Qian, Furu Wei, Jinyu Li 0001, Xiangzhan Yu, 
Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training.
ICASSP2022
Bing Han, Zhengyang Chen, Bei Liu, Yanmin Qian, 
MLP-SVNET: A Multi-Layer Perceptrons Based Network for Speaker Verification.
ICASSP2022
Bing Han, Zhengyang Chen, Yanmin Qian, 
Local Information Modeling with Self-Attention for Speaker Verification.
ICASSP2022
Bei Liu, Haoyu Wang 0007, Zhengyang Chen, Shuai Wang 0016, Yanmin Qian, 
Self-Knowledge Distillation via Feature Enhancement for Speaker Verification.
Interspeech2022
Bing Han, Zhengyang Chen, Yanmin Qian, 
Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction.
TASLP2024
Shujie Hu, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Yi Wang, Mingyu Cui, Tianzi Wang, Helen Meng, Xunying Liu, 
Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition.
TASLP2024
Zengrui Jin, Mengzhe Geng, Jiajun Deng, Tianzi Wang, Shujie Hu, Guinan Li, Xunying Liu, 
Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition.
ICASSP2024
Jiajun Deng, Xurong Xie, Guinan Li, Mingyu Cui, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Zhaoqing Li, Xunying Liu, 
Towards High-Performance and Low-Latency Feature-Based Speaker Adaptation of Conformer Speech Recognition Systems.
ICASSP2024
Zengrui Jin, Xurong Xie, Tianzi Wang, Mengzhe Geng, Jiajun Deng, Guinan Li, Shujie Hu, Xunying Liu, 
Towards Automatic Data Augmentation for Disordered Speech Recognition.
ICASSP2024
Huimeng Wang, Zengrui Jin, Mengzhe Geng, Shujie Hu, Guinan Li, Tianzi Wang, Haoning Xu, Xunying Liu, 
Enhancing Pre-Trained ASR System Fine-Tuning for Dysarthric Speech Recognition Using Adversarial Data Augmentation.
TASLP2023
Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu, 
Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition.
ICASSP2023
Shujie Hu, Xurong Xie, Zengrui Jin, Mengzhe Geng, Yi Wang, Mingyu Cui, Jiajun Deng, Xunying Liu, Helen Meng, 
Exploring Self-Supervised Pre-Trained ASR Models for Dysarthric and Elderly Speech Recognition.
ICASSP2023
Zengrui Jin, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shujie Hu, Jiajun Deng, Guinan Li, Xunying Liu, 
Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition.
Interspeech2023
Jiajun Deng, Guinan Li, Xurong Xie, Zengrui Jin, Mingyu Cui, Tianzi Wang, Shujie Hu, Mengzhe Geng, Xunying Liu, 
Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems.
Interspeech2023
Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Jiajun Deng, Mingyu Cui, Guinan Li, Jianwei Yu, Xurong Xie, Xunying Liu, 
Use of Speech Impairment Severity for Dysarthric Speech Recognition.
Interspeech2023
Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui Jin, Tianzi Wang, Shujie Hu, Zi Ye 0001, Helen Meng, Xunying Liu, 
On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition.
Interspeech2023
Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Guinan Li, Tianzi Wang, Helen Meng, Xunying Liu, 
Exploiting Cross-Domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition.
Interspeech2023
Tianzi Wang, Shoukang Hu, Jiajun Deng, Zengrui Jin, Mengzhe Geng, Yi Wang, Helen Meng, Xunying Liu, 
Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition.
TASLP2022
Mengzhe Geng, Xurong Xie, Zi Ye 0001, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng, 
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition.
TASLP2022
Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
TASLP2022
Boyang Xue, Shoukang Hu, Junhao Xu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Neural Network Language Modeling for Speech Recognition.
ICASSP2022
Shujie Hu, Shansong Liu, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shoukang Hu, Mingyu Cui, Xunying Liu, Helen Meng, 
Exploiting Cross Domain Acoustic-to-Articulatory Inverted Features for Disordered Speech Recognition.
Interspeech2022
Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng, 
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems.
Interspeech2022
Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng, 
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition.
Interspeech2022
Tianzi Wang, Jiajun Deng, Mengzhe Geng, Zi Ye 0001, Shoukang Hu, Yi Wang, Mingyu Cui, Zengrui Jin, Xunying Liu, Helen Meng, 
Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection.
ICASSP2024
Kevin Everson, Yile Gu, Chao-Han Huck Yang, Prashanth Gurunath Shivakumar, Guan-Ting Lin, Jari Kolehmainen, Ivan Bulyko, Ankur Gandhe, Shalini Ghosh, Wael Hamza, Hung-Yi Lee, Ariya Rastrow, Andreas Stolcke, 
Towards ASR Robust Spoken Language Understanding Through in-Context Learning with Word Confusion Networks.
ICASSP2024
Xianyan Fu, Xiao-Lei Zhang 0001, Chao-Han Huck Yang, Jun Qi 0002, 
Exploiting A Quantum Multiple Kernel Learning Approach For Low-Resource Spoken Command Recognition.
ICASSP2024
Pin-Jui Ku, I-Fan Chen, Chao-Han Huck Yang, Anirudh Raju, Pranav Dheram, Pegah Ghahremani, Brian King, Jing Liu, Roger Ren, Phani Sankar Nidadavolu, 
Hot-Fixing Wake Word Recognition for End-to-End ASR Via Neural Model Reprogramming.
ICASSP2024
Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-Yi Lee, Ivan Bulyko, 
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue.
ICASSP2024
Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang 0031, 
Can Whisper Perform Speech-Based In-Context Learning?
ICLR2024
Chen Chen 0075, Ruizhe Li 0001, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Engsiong Chng, Chao-Han Huck Yang, 
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition.
ICLR2024
Yuchen Hu, Chen Chen 0075, Chao-Han Huck Yang, Ruizhe Li 0001, Chao Zhang 0031, Pin-Yu Chen, Engsiong Chng, 
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition.
ACL2024
Yuchen Hu, Chen Chen 0075, Chao-Han Huck Yang, Ruizhe Li 0001, Dong Zhang, Zhehuai Chen, EngSiong Chng, 
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators.
TASLP2023
Jun Qi 0002, Chao-Han Huck Yang, Pin-Yu Chen, Javier Tejedor, 
Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on Riemannian Gradient Descent With Illustrations of Speech Processing.
ICASSP2023
Chao-Han Huck Yang, Bo Li 0028, Yu Zhang 0033, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman, 
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition.
ICASSP2023
Chao-Han Huck Yang, Bo Li 0028, Yu Zhang 0033, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee 0001, 
A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition.
Interspeech2023
Chen Chen 0075, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng Siong Chng, 
A Neural State-Space Modeling Approach to Efficient Speech Separation.
Interspeech2023
Zih-Ching Chen, Chao-Han Huck Yang, Bo Li 0028, Yu Zhang 0033, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath, 
How to Estimate Model Transferability of Pre-Trained Speech Models?
Interspeech2023
Chun-Wei Ho, Chao-Han Huck Yang, Sabato Marco Siniscalchi, 
Differentially Private Adapters for Parameter Efficient Acoustic Modeling.
Interspeech2023
Pin-Jui Ku, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee 0001, 
A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models.
Interspeech2023
Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Narsis A. Kiani, David Gomez-Cabrero, Jesper N. Tegnér, 
A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-Trained General-Purpose Speech Model.
Interspeech2023
Li-Jen Yang, Chao-Han Huck Yang, Jen-Tzung Chien, 
Parameter-Efficient Learning for Text-to-Speech Accent Adaptation.
Interspeech2023
Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao 0001, 
Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition.
NeurIPS2023
Chen Chen 0075, Yuchen Hu, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Pin-Yu Chen, Chng Eng Siong, 
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models.
EMNLP2023
Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Rohit Kumar, Narsis A. Kiani, David Gomez-Cabrero, Jesper Tegnér, 
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition.
TASLP2024
Shu-Wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li 0001, Abdelrahman Mohamed, Shinji Watanabe 0001, Hung-yi Lee, 
A Large-Scale Evaluation of Speech Foundation Models.
ICASSP2024
Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe 0001, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang, 
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
ICASSP2024
Chien-Yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan S. Sharma, Shinji Watanabe 0001, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee, 
Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech.
ICASSP2024
Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji Watanabe 0001, 
Hubertopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model.
ICML2024
Dongchao Yang, Jinchuan Tian, Xu Tan 0003, Rongjie Huang, Songxiang Liu, Haohan Guo, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian 0002, Zhou Zhao, Xixin Wu, Helen M. Meng, 
UniAudio: Towards Universal Audio Generation with Large Language Models.
ICLR2024
Jiatong Shi, Hirofumi Inaguma, Xutai Ma, Ilia Kulikov, Anna Y. Sun, 
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction.
AAAI2024
Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren 0006, Yuexian Zou, Zhou Zhao, Shinji Watanabe 0001, 
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
ACL2024
Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel Robinson, Jiatong Shi, Shinji Watanabe 0001, Graham Neubig, David R. Mortensen, Lori S. Levin, 
Wav2Gloss: Generating Interlinear Glossed Text from Speech.
ACL2024
Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Jinchuan Tian, Zhenhui Ye, Luping Liu, Zehan Wang 0001, Ziyue Jiang 0001, Xuankai Chang, Jiatong Shi, Chao Weng, Zhou Zhao, Dong Yu 0001, 
Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.
ICASSP2023
William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe 0001, 
Improving Massively Multilingual ASR with Auxiliary CTC Objectives.
ICASSP2023
Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola García, Hung-Yi Lee, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Euro: Espnet Unsupervised ASR Open-Source Toolkit.
ICASSP2023
Jiatong Shi, Chan-Jan Hsu, Ho-Lam Chung, Dongji Gao, Paola García 0001, Shinji Watanabe 0001, Ann Lee 0001, Hung-Yi Lee, 
Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR.
ICASSP2023
Jiatong Shi, Yun Tang 0002, Ann Lee 0001, Hirofumi Inaguma, Changhan Wang, Juan Pino 0001, Shinji Watanabe 0001, 
Enhancing Speech-To-Speech Translation with Multiple TTS Targets.
ICASSP2023
Yuning Wu, Jiatong Shi, Tao Qian, Dongji Gao, Qin Jin, 
Phoneix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation With Phoneme Distribution Predictor.
Interspeech2023
Jiatong Shi, Yun Tang 0002, Hirofumi Inaguma, Hongyu Gong, Juan Pino 0001, Shinji Watanabe 0001, 
Exploration on HuBERT with Multiple Resolution.
Interspeech2023
Jiatong Shi, Dan Berrebbi, William Chen, En-Pei Hu, Wei-Ping Huang, Ho-Lam Chung, Xuankai Chang, Shang-Wen Li 0001, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe 0001, 
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark.
Interspeech2023
Yui Sudo, Muhammad Shakeel 0001, Brian Yan, Jiatong Shi, Shinji Watanabe 0001, 
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders.
ICASSP2022
Tao Qian, Jiatong Shi, Shuai Guo, Peter Wu, Qin Jin, 
Training Strategies for Automatic Song Writing: A Unified Framework Perspective.
ICASSP2022
Chunlei Zhang, Jiatong Shi, Chao Weng, Meng Yu 0003, Dong Yu 0001, 
Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.
Interspeech2022
Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan D. Amith, Shinji Watanabe 0001, 
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation.
TASLP2024
Ruiteng Zhang, Jianguo Wei, Xugang Lu, Wenhuan Lu, Di Jin 0001, Lin Zhang, Junhai Xu, 
Unsupervised Adaptive Speaker Recognition by Coupling-Regularized Optimal Transport.
ICASSP2024
Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-Based ASR.
ICASSP2024
Ruiteng Zhang, Jianguo Wei, Xugang Lu, Yongwei Li, Wenhuan Lu, Di Jin 0001, Junhai Xu, 
Self-Supervised Domain Exploration with an Optimal Transport Regularization for Open Set Cross-Domain Speech Emotion Recognition.
SpeechComm2023
Ruiteng Zhang, Jianguo Wei, Xugang Lu, Wenhuan Lu, Di Jin 0001, Lin Zhang, Yantao Ji, Junhai Xu, 
Self-supervised learning based domain regularization for mask-wearing speaker verification.
ICASSP2023
Ruiteng Zhang, Jianguo Wei, Xugang Lu, Wenhuan Lu, Di Jin 0001, Lin Zhang, Junhai Xu, 
Optimal Transport with a Diversified Memory Bank for Cross-Domain Speaker Verification.
Interspeech2023
Yang Liu, Haoqin Sun, Geng Chen, Qingyue Wang, Zhen Zhao, Xugang Lu, Longbiao Wang, 
Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions.
Interspeech2023
Ruiteng Zhang, Jianguo Wei, Xugang Lu, Yongwei Li, Junhai Xu, Di Jin 0001, Jianhua Tao 0001, 
SOT: Self-supervised Learning-Assisted Optimal Transport for Unsupervised Adaptive Speech Emotion Recognition.
ICASSP2022
Ruiteng Zhang, Jianguo Wei, Wenhuan Lu, Lin Zhang, Yantao Ji, Junhai Xu, Xugang Lu, 
CS-REP: Making Speaker Verification Networks Embracing Re-Parameterization.
Interspeech2022
Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao 0001, 
Perceptual Contrast Stretching on Target Feature for Speech Enhancement.
Interspeech2022
Kai Li 0018, Sheng Li 0010, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang 0001, Masashi Unoki, 
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.
Interspeech2022
Peng Shen, Xugang Lu, Hisashi Kawai, 
Transducer-based language embedding for spoken language identification.
TASLP2021
Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification.
ICASSP2021
Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Unsupervised Neural Adaptation Model Based on Optimal Transport for Spoken Language Identification.
Interspeech2021
Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao 0001, 
MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement.
Interspeech2021
Tsun-An Hsieh, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao 0001, 
Improving Perceptual Quality by Phone-Fortified Perceptual Loss Using Wasserstein Distance for Speech Enhancement.
NeurIPS2021
Hsin-Yi Lin, Huan-Hsin Tseng, Xugang Lu, Yu Tsao 0001, 
Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport.
TASLP2020
Peng Shen, Xugang Lu, Sheng Li 0010, Hisashi Kawai, 
Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification.
TASLP2020
Cheng Yu, Ryandhimas E. Zezario, Syu-Siang Wang, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao 0001, 
Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders.
ICASSP2020
Ryandhimas E. Zezario, Tassadaq Hussain, Xugang Lu, Hsin-Min Wang, Yu Tsao 0001, 
Self-Supervised Denoising Autoencoder with Linear Regression Decoder for Speech Enhancement.
Interspeech2020
Yen-Ju Lu, Chien-Feng Liao, Xugang Lu, Jeih-weih Hung, Yu Tsao 0001, 
Incorporating Broad Phonetic Information for Speech Enhancement.
TASLP2024
Jagabandhu Mishra, S. R. Mahadeva Prasanna, 
Implicit Self-Supervised Language Representation for Spoken Language Diarization.
TASLP2023
Mrinmoy Bhattacharjee, S. R. M. Prasanna, Prithwijit Guha, 
Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning.
Interspeech2023
Jagabandhu Mishra, Jayadev N. Patil, Amartya Chowdhury, S. R. Mahadeva Prasanna, 
End to End Spoken Language Diarization with Wav2vec Embeddings.
SpeechComm2022
Mrinmoy Bhattacharjee, S. R. Mahadeva Prasanna, Prithwijit Guha, 
Speech/music classification using phase-based and magnitude-based features.
Interspeech2022
Moakala Tzudir, Priyankoo Sarmah, S. R. Mahadeva Prasanna, 
Prosodic Information in Dialect Identification of a Tonal Language: The case of Ao.
Interspeech2021
Shikha Baghel, Mrinmoy Bhattacharjee, S. R. Mahadeva Prasanna, Prithwijit Guha, 
Automatic Detection of Shouted Speech Segments in Indian News Debates.
Interspeech2021
Moakala Tzudir, Shikha Baghel, Priyankoo Sarmah, S. R. Mahadeva Prasanna, 
Excitation Source Feature Based Dialect Identification in Ao - A Low Resource Language.
SpeechComm2020
Protima Nomo Sudro, S. R. Mahadeva Prasanna, 
Enhancement of cleft palate speech using temporal and spectral processing.
SpeechComm2020
Akhilesh Kumar Dubey, S. R. Mahadeva Prasanna, Samarendra Dandapat, 
Sinusoidal model-based hypernasality detection in cleft palate speech using CVCV sequence.
TASLP2020
Mrinmoy Bhattacharjee, S. R. Mahadeva Prasanna, Prithwijit Guha, 
Speech/Music Classification Using Features From Spectral Peaks.
TASLP2020
Vikram C. Mathad, S. R. Mahadeva Prasanna, 
Vowel Onset Point Based Screening of Misarticulated Stops in Cleft Lip and Palate Speech.
Interspeech2020
Ajish K. Abraham, M. Pushpavathi, N. Sreedevi, A. Navya, Vikram C. Mathad, S. R. Mahadeva Prasanna, 
Spectral Moment and Duration of Burst of Plosives in Speech of Children with Hearing Impairment and Typically Developing Children - A Comparative Study.
Interspeech2020
Ayush Agarwal, Jagabandhu Mishra, S. R. Mahadeva Prasanna, 
VOP Detection in Variable Speech Rate Condition.
TASLP2019
Vikram C. M., Nagaraj Adiga, S. R. Mahadeva Prasanna, 
Detection of Nasalized Voiced Stops in Cleft Palate Speech Using Epoch-Synchronous Features.
ICASSP2019
K. T. Deepak, Pavitra Kulkarni, U. Mudenagudi, S. R. M. Prasanna, 
Glottal Instants Extraction from Speech Signal Using Generative Adversarial Network.
Interspeech2019
Akhilesh Kumar Dubey, S. R. Mahadeva Prasanna, Samarendra Dandapat, 
Hypernasality Severity Detection Using Constant Q Cepstral Coefficients.
Interspeech2019
Sarfaraz Jelil, Abhishek Shrivastava, Rohan Kumar Das, S. R. Mahadeva Prasanna, Rohit Sinha 0003, 
SpeechMarker: A Voice Based Multi-Level Attendance Application.
Interspeech2019
Sishir Kalita, Protima Nomo Sudro, S. R. Mahadeva Prasanna, Samarendra Dandapat, 
Nasal Air Emission in Sibilant Fricatives of Cleft Lip and Palate Speech.
Interspeech2019
Protima Nomo Sudro, S. R. Mahadeva Prasanna, 
Modification of Devoicing Error in Cleft Lip and Palate Speech.
Interspeech2018
Kishalay Chakraborty, Senjam Shantirani Devi, Sanjeevan Devnath, S. R. Mahadeva Prasanna, Priyankoo Sarmah, 
Glotto Vibrato Graph: A Device and Method for Recording, Analysis and Visualization of Glottal Activity.
ICASSP2024
Jon Barker, Michael A. Akeroyd, Will Bailey, Trevor J. Cox, John F. Culling, Jennifer Firth, Simone Graetzer, Graham Naylor, 
The 2nd Clarity Prediction Challenge: A Machine Learning Challenge for Hearing Aid Intelligibility Prediction.
ICASSP2024
Rhiannon Mogridge, George Close, Robert Sutherland, Thomas Hain, Jon Barker, Stefan Goetze, Anton Ragni, 
Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users Using Intermediate ASR Features and Human Memory Models.
ICASSP2023
Michael A. Akeroyd, Will Bailey, Jon Barker, Trevor J. Cox, John F. Culling, Simone Graetzer, Graham Naylor, Zuzanna Podwinska, Zehai Tu, 
The 2nd Clarity Enhancement Challenge for Hearing Aid Speech Intelligibility Enhancement: Overview and Outcomes.
ICASSP2023
Trevor J. Cox, Jon Barker, Will Bailey, Simone Graetzer, Michael A. Akeroyd, John F. Culling, Graham Naylor, 
Overview of the 2023 ICASSP SP Clarity Challenge: Speech Enhancement for Hearing Aids.
TASLP2022
Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic, 
Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition.
ICASSP2022
Jack Deadman, Jon Barker, 
Improved Simulation of Realistically-Spatialised Simultaneous Speech Using Multi-Camera Analysis in The Chime-5 Dataset.
ICASSP2022
Zehai Tu, Jack Deadman, Ning Ma 0002, Jon Barker, 
Auditory-Based Data Augmentation for end-to-end Automatic Speech Recognition.
ICASSP2022
Zhengjun Yue, Erfan Loweimi, Zoran Cvetkovic, Heidi Christensen, Jon Barker, 
Multi-Modal Acoustic-Articulatory Feature Fusion For Dysarthric Speech Recognition.
Interspeech2022
Jon Barker, Michael Akeroyd, Trevor J. Cox, John F. Culling, Jennifer Firth, Simone Graetzer, Holly Griffiths, Lara Harris, Graham Naylor, Zuzanna Podwinska, Eszter Porter, Rhoddy Viveros Muñoz, 
The 1st Clarity Prediction Challenge: A machine learning challenge for hearing aid intelligibility prediction.
Interspeech2022
Jack Deadman, Jon Barker, 
Modelling Turn-taking in Multispeaker Parties for Realistic Data Simulation.
Interspeech2022
Zehai Tu, Ning Ma 0002, Jon Barker, 
Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners.
Interspeech2022
Zehai Tu, Ning Ma 0002, Jon Barker, 
Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction.
Interspeech2022
Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic, 
Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs.
Interspeech2022
Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker, 
On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training.
ICASSP2021
Gerardo Roa Dabike, Jon Barker, 
The use of Voice Source Features for Sung Speech Recognition.
ICASSP2021
Zehai Tu, Ning Ma 0002, Jon Barker, 
DHASP: Differentiable Hearing Aid Speech Processing.
ICASSP2021
Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker, 
Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism.
Interspeech2021
Simone Graetzer, Jon Barker, Trevor J. Cox, Michael Akeroyd, John F. Culling, Graham Naylor, Eszter Porter, Rhoddy Viveros Muñoz, 
Clarity-2021 Challenges: Machine Learning Challenges for Advancing Hearing Aid Processing.
Interspeech2021
Zehai Tu, Ning Ma 0002, Jon Barker, 
Optimising Hearing Aid Fittings for Speech in Noise with a Differentiable Hearing Loss Model.
Interspeech2021
Zhengjun Yue, Jon Barker, Heidi Christensen, Cristina McKean, Elaine Ashton, Yvonne Wren, Swapnil Gadgil, Rebecca Bright, 
Parental Spoken Scaffolding and Narrative Skills in Crowd-Sourced Storytelling Samples of Young Children.
ICASSP2023
Brian Yan, Matthew Wiesner, Ondrej Klejch, Preethi Jyothi, Shinji Watanabe 0001, 
Towards Zero-Shot Code-Switched Speech Recognition.
Interspeech2023
Vineet Bhat, Preethi Jyothi, Pushpak Bhattacharyya, 
DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction.
Interspeech2023
Jie Chi, Brian Lu, Jason Eisner, Peter Bell 0001, Preethi Jyothi, Ahmed M. Ali 0002, 
Unsupervised Code-switched Text Generation from Parallel Text.
Interspeech2023
Tankala Pavan Kalyan, Preeti Rao, Preethi Jyothi, Pushpak Bhattacharyya, 
Narrator or Character: Voice Modulation in an Expressive Multi-speaker TTS.
Interspeech2023
Vinit S. Unni, Ashish R. Mittal, Preethi Jyothi, Sunita Sarawagi, 
Improving RNN-Transducers with Acoustic LookAhead.
ICLR2023
Ashish R. Mittal, Sunita Sarawagi, Preethi Jyothi, 
In-Situ Text-Only Adaptation of Speech Models with Low-Overhead Speech Imputations.
IJCAI2023
Piyush Singh Pasi, Karthikeya Battepati, Preethi Jyothi, Ganesh Ramakrishnan, Tanmay Mahapatra, Manoj Singh, 
Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration.
ACL2023
Suraj Kothawade, Anmol Reddy Mekala, D. Chandra Sekhara Hetha Havya, Mayank Kothyari, Rishabh K. Iyer, Ganesh Ramakrishnan, Preethi Jyothi, 
DITTO: Data-efficient and Fair Targeted Subset Selection for ASR Accent Adaptation.
EMNLP2023
Ashish R. Mittal, Sunita Sarawagi, Preethi Jyothi, George Saon, Gakuto Kurata, 
Speech-enriched Memory for Inference-time Adaptation of ASR Models to Word Dictionaries.
EMNLP2023
Darshan Prabhu, Preethi Jyothi, Sriram Ganapathy, Vinit Unni, 
Accented Speech Recognition With Accent-specific Codebooks.
ICASSP2022
Vinit Unni, Shreya Khare, Ashish R. Mittal, Preethi Jyothi, Sunita Sarawagi, Samarth Bharadwaj, 
Adaptive Discounting of Implicit Language Models in RNN-Transducers.
Interspeech2022
Arjit Jain, Pranay Reddy Samala, Deepak Mittal, Preethi Jyothi, Maneesh Singh 0001, 
SPLICEOUT: A Simple and Efficient Audio Augmentation Method.
Interspeech2022
Rishabh Kumar, Devaraja Adiga, Mayank Kothyari, Jatin Dalal, Ganesh Ramakrishnan, Preethi Jyothi, 
VAgyojaka: An Annotating and Post-Editing Tool for Automatic Speech Recognition.
Interspeech2022
Rishabh Kumar, Devaraja Adiga, Rishav Ranjan, Amrith Krishna, Ganesh Ramakrishnan, Pawan Goyal 0002, Preethi Jyothi, 
Linguistically Informed Post-processing for ASR Error correction in Sanskrit.
EMNLP-Findings2022
Ashish R. Mittal, Durga Sivasubramanian, Rishabh K. Iyer, Preethi Jyothi, Ganesh Ramakrishnan, 
Partitioned Gradient Matching-based Data Subset Selection for Compute-Efficient Robust ASR Training.
ICASSP2021
Abhijeet Awasthi, Aman Kansal, Sunita Sarawagi, Preethi Jyothi, 
Error-Driven Fixed-Budget ASR Personalization for Accented Speakers.
ICASSP2021
Archiki Prasad, Preethi Jyothi, Rajbabu Velmurugan, 
An Investigation of End-to-End Models for Robust Speech Recognition.
Interspeech2021
Anuj Diwan, Preethi Jyothi, 
Reduce and Reconstruct: ASR for Low-Resource Phonetic Languages.
Interspeech2021
Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan K. M., Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish R. Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, 
MUCS 2021: Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages.
Interspeech2021
Shreya Khare, Ashish R. Mittal, Anuj Diwan, Sunita Sarawagi, Preethi Jyothi, Samarth Bharadwaj, 
Low Resource ASR: The Surprising Effectiveness of High Resource Transliteration.
TASLP2024
Michele Panariello, Natalia A. Tomashenko, Xin Wang 0037, Xiaoxiao Miao, Pierre Champion, Hubert Nourtel, Massimiliano Todisco, Nicholas W. D. Evans, Emmanuel Vincent 0001, Junichi Yamagishi, 
The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation.
ICASSP2024
Wanying Ge, Xin Wang 0037, Junichi Yamagishi, Massimiliano Todisco, Nicholas W. D. Evans, 
Spoofing Attack Augmentation: Can Differently-Trained Attack Models Improve Generalisation?
ICASSP2024
Xiaoxiao Miao, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Nicholas W. D. Evans, Massimiliano Todisco, Jean-François Bonastre, Mickael Rouvier, 
Synvox2: Towards A Privacy-Friendly Voxceleb2 Dataset.
ICASSP2024
Michele Panariello, Francesco Nespoli, Massimiliano Todisco, Nicholas W. D. Evans, 
Speaker Anonymization Using Neural Audio Codec Language Models.
TASLP2023
Xuechen Liu, Xin Wang 0037, Md. Sahidullah, Jose Patino 0001, Héctor Delgado, Tomi Kinnunen, Massimiliano Todisco, Junichi Yamagishi, Nicholas W. D. Evans, Andreas Nautsch, Kong Aik Lee, 
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild.
ICASSP2023
Wanying Ge, Hemlata Tak, Massimiliano Todisco, Nicholas W. D. Evans, 
Can Spoofing Countermeasure And Speaker Verification Systems Be Jointly Optimised?
Interspeech2023
Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang 0037, Xuechen Liu, Md. Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas W. D. Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung, 
Towards Single Integrated Spoofing-aware Speaker Verification Embeddings.
Interspeech2023
Michele Panariello, Wanying Ge, Hemlata Tak, Massimiliano Todisco, Nicholas W. D. Evans, 
Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems.
Interspeech2023
Michele Panariello, Massimiliano Todisco, Nicholas W. D. Evans, 
Vocoder drift in x-vector-based speaker anonymization.
ICASSP2022
Madhu R. Kamble, Jose Patino 0001, Maria A. Zuluaga, Massimiliano Todisco, 
Exploring Auditory Acoustic Features for The Diagnosis of Covid-19.
ICASSP2022
Hemlata Tak, Madhu R. Kamble, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans, 
Rawboost: A Raw Data Boosting and Augmentation Method Applied to Automatic Speaker Verification Anti-Spoofing.
ICASSP2021
Hemlata Tak, Jose Patino 0001, Massimiliano Todisco, Andreas Nautsch, Nicholas W. D. Evans, Anthony Larcher, 
End-to-End anti-spoofing with RawNet2.
Interspeech2021
Jose Patino 0001, Natalia A. Tomashenko, Massimiliano Todisco, Andreas Nautsch, Nicholas W. D. Evans, 
Speaker Anonymisation Using the McAdams Coefficient.
Interspeech2021
Oubaïda Chouchane, Baptiste Brossier, Jorge Esteban Gamboa Gamboa, Thomas Lardy, Hemlata Tak, Orhan Ermis, Madhu R. Kamble, Jose Patino 0001, Nicholas W. D. Evans, Melek Önen, Massimiliano Todisco, 
Privacy-Preserving Voice Anti-Spoofing Using Secure Multi-Party Computation.
Interspeech2021
Wanying Ge, Michele Panariello, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans, 
Partially-Connected Differentiable Architecture Search for Deepfake and Spoofing Detection.
Interspeech2021
Madhu R. Kamble, José Andrés González López, Teresa Grau, Juan M. Espín, Lorenzo Cascioli, Yiqing Huang, Alejandro Gómez Alanís, Jose Patino 0001, Roberto Font, Antonio M. Peinado, Angel M. Gomez, Nicholas W. D. Evans, Maria A. Zuluaga, Massimiliano Todisco, 
PANACEA Cough Sound-Based Diagnosis of COVID-19 for the DiCOVA 2021 Challenge.
Interspeech2021
Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee, 
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.
Interspeech2021
Hemlata Tak, Jee-weon Jung, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans, 
Graph Attention Networks for Anti-Spoofing.
TASLP2020
Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.
ICASSP2020
Pramod B. Bachhav, Massimiliano Todisco, Nicholas W. D. Evans, 
Artificial Bandwidth Extension Using Conditional Variational Auto-encoders and Adversarial Learning.
TASLP2024
Tsubasa Ochiai, Kazuma Iwamoto, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance.
ICASSP2024
Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
How Does End-To-End Speech Recognition Training Impact Speech Enhancement Artifacts?
ICASSP2024
Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Shoko Araki, Jan Cernocký, 
Target Speech Extraction with Pre-Trained Self-Supervised Learning Models.
TASLP2023
Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Yasunori Ohishi, Shoko Araki, 
SoundBeam: Target Sound Extraction Conditioned on Sound-Class Labels and Enrollment Clues for Increased Performance and Continuous Learning.
Interspeech2023
Shoko Araki, Ayako Yamamoto, Tsubasa Ochiai, Kenichi Arai, Atsunori Ogawa, Tomohiro Nakatani, Toshio Irino, 
Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine.
Interspeech2023
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami, 
Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data.
Interspeech2023
Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo, 
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss.
ICASSP2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya, 
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.
Interspeech2022
Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani, 
Listen only to me! How well can target speech extraction handle false alarms?
Interspeech2022
Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR.
Interspeech2022
Martin Kocour, Katerina Zmolíková, Lucas Ondel, Jan Svec, Marc Delcroix, Tsubasa Ochiai, Lukás Burget, Jan Cernocký, 
Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model.
Interspeech2022
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki, 
Streaming Target-Speaker ASR with Neural Transducer.
Interspeech2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
ICASSP2021
Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach, 
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.
ICASSP2021
Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani, 
Speaker Activity Driven Neural Speech Extraction.
ICASSP2021
Takafumi Moriya, Takanori Ashihara, Tomohiro Tanaka, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Yusuke Ijima, Ryo Masumura, Yusuke Shinohara, 
Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.
ICASSP2021
Julio Wissing, Benedikt T. Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura, 
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain.
Interspeech2021
Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Shoko Araki, 
Few-Shot Learning of New Sound Classes for Target Sound Extraction.
Interspeech2021
Takafumi Moriya, Tomohiro Tanaka, Takanori Ashihara, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Ryo Masumura, Marc Delcroix, Taichi Asami, 
Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.
Interspeech2021
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoyuki Kamo, 
Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang 0001, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
Interspeech2022
Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang 0033, Yonghui Wu, Rob Clark, 
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks.
ICML2022
Chung-Cheng Chiu, James Qin, Yu Zhang, Jiahui Yu, Yonghui Wu, 
Self-supervised learning with random-projection quantizer for speech recognition.
ICASSP2021
Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang 0033, Ye Jia, Ron J. Weiss, Yonghui Wu, 
Parallel Tacotron: Non-Autoregressive and Controllable TTS.
ICASSP2021
Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang 0001, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.
ICASSP2021
Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang, 
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.
Interspeech2021
Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang 0033, Ye Jia, R. J. Skerry-Ryan, Yonghui Wu, 
Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling.
Interspeech2021
Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang 0033, Yonghui Wu, 
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS.
ICLR2021
Jiahui Yu, Wei Han 0002, Anmol Gulati, Chung-Cheng Chiu, Bo Li 0028, Tara N. Sainath, Yonghui Wu, Ruoming Pang, 
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.
ICASSP2020
Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu, 
Towards Fast and Accurate Streaming End-To-End ASR.
ICASSP2020
Daniel S. Park, Yu Zhang 0033, Chung-Cheng Chiu, Youzheng Chen, Bo Li 0028, William Chan, Quoc V. Le, Yonghui Wu, 
Specaugment on Large Scale Datasets.
ICASSP2020
Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang 0001, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.
ICASSP2020
Guangzhi Sun, Yu Zhang 0033, Ron J. Weiss, Yuan Cao 0007, Heiga Zen, Andrew Rosenberg, Bhuvana Ramabhadran, Yonghui Wu, 
Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior.
ICASSP2020
Guangzhi Sun, Yu Zhang 0033, Ron J. Weiss, Yuan Cao 0007, Heiga Zen, Yonghui Wu, 
Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis.
ICASSP2020
Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang 0033, Bhuvana Ramabhadran, Yonghui Wu, Pedro J. Moreno 0001, 
Improving Speech Recognition Using Consistent Predictions on Synthesized Speech.
Interspeech2020
Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang 0033, Jiahui Yu, Wei Han 0002, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang, 
Conformer: Convolution-augmented Transformer for Speech Recognition.
Interspeech2020
Wei Han 0002, Zhengdong Zhang, Yu Zhang 0033, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu, 
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context.
Interspeech2020
Daniel S. Park, Yu Zhang 0033, Ye Jia, Wei Han 0002, Chung-Cheng Chiu, Bo Li 0028, Yonghui Wu, Quoc V. Le, 
Improved Noisy Student Training for Automatic Speech Recognition.
ICASSP2019
Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang 0001, Deepti Bhatia, Yuan Shangguan, Bo Li 0028, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-Yiin Chang, Kanishka Rao, Alexander Gruenstein, 
Streaming End-to-end Speech Recognition for Mobile Devices.
ICASSP2019
Wei-Ning Hsu, Yu Zhang 0033, Ron J. Weiss, Yu-An Chung, Yuxuan Wang 0002, Yonghui Wu, James R. Glass, 
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization.
TASLP2024
Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe 0001, 
End-to-End Speech Recognition: A Survey.
ICASSP2023
Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang, 
Variable Attention Masking for Configurable Transformer Transducer Speech Recognition.
ICASSP2022
Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR.
ICASSP2022
Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori, 
Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy.
ICASSP2022
Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Sequence Transduction with Graph-Based Supervision.
Interspeech2022
Chiori Hori, Takaaki Hori, Jonathan Le Roux, 
Low-Latency Online Streaming VideoQA Using Audio-Visual Transformers.
ICASSP2021
Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training.
ICASSP2021
Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Capturing Multi-Resolution Context by Dilated Self-Attention.
ICASSP2021
Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Semi-Supervised Speech Recognition Via Graph-Based Temporal Classification.
Interspeech2021
Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori, 
Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition.
Interspeech2021
Chiori Hori, Takaaki Hori, Jonathan Le Roux, 
Optimizing Latency for Online Video Captioning Using Audio-Visual Transformers.
Interspeech2021
Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux, 
Advanced Long-Context End-to-End Speech Recognition Using Context-Expanded Transformers.
Interspeech2021
Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition.
TASLP2020
Ruizhi Li, Xiaofei Wang 0007, Sri Harish Mallidi, Shinji Watanabe 0001, Takaaki Hori, Hynek Hermansky, 
Multi-Stream End-to-End Speech Recognition.
ICASSP2020
Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Streaming Automatic Speech Recognition with the Transformer Model.
ICASSP2020
Leda Sari, Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Unsupervised Speaker Adaptation Using Attention-Based Speaker Memory for End-to-End ASR.
Interspeech2020
Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux, 
Transformer-Based Long-Context End-to-End Speech Recognition.
Interspeech2020
Niko Moritz, Gordon Wichern, Takaaki Hori, Jonathan Le Roux, 
All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection.
ICASSP2019
Murali Karthick Baskar, Lukás Burget, Shinji Watanabe 0001, Martin Karafiát, Takaaki Hori, Jan Honza Cernocký, 
Promising Accurate Prefix Boosting for Sequence-to-sequence ASR.
ICASSP2019
Jaejin Cho, Shinji Watanabe 0001, Takaaki Hori, Murali Karthick Baskar, Hirofumi Inaguma, Jesús Villalba 0001, Najim Dehak, 
Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition.
ICASSP2024
Atli Sigurgeirsson, Simon King 0001, 
Controllable Speaking Styles Using A Large Language Model.
ICASSP2023
Atli Þór Sigurgeirsson, Simon King 0001, 
Do Prosody Transfer Models Transfer Prosodyƒ.
ICASSP2023
Tian Huey Teh, Vivian Hu, Devang S. Ram Mohan, Zack Hodari, Christopher G. R. Wallis, Tomás Gómez Ibarrondo, Alexandra Torresquintero, James Leoni, Mark J. F. Gales, Simon King 0001, 
Ensemble Prosody Prediction For Expressive Speech Synthesis.
ICASSP2023
Jacob J. Webber, Cassia Valentini-Botinhao, Evelyn Williams, Gustav Eje Henter, Simon King 0001, 
Autovocoder: Fast Waveform Generation from a Learned Speech Representation Using Differentiable Digital Signal Processing.
Interspeech2023
Niamh Corkey, Johannah O'Mahony, Simon King 0001, 
Intonation Control for Neural Text-to-Speech Synthesis with Polynomial Models of F0.
Interspeech2022
Jason Fong, Daniel Lyth, Gustav Eje Henter, Hao Tang, Simon King 0001, 
Speech Audio Corrector: using speech from non-target speakers for one-off correction of mispronunciations in grapheme-input text-to-speech.
Interspeech2022
Sébastien Le Maguer, Simon King 0001, Naomi Harte, 
Back to the Future: Extending the Blizzard Challenge 2013.
Interspeech2022
Johannah O'Mahony, Catherine Lai, Simon King 0001, 
Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis.
TASLP2021
Berrak Sisman, Junichi Yamagishi, Simon King 0001, Haizhou Li 0001, 
An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning.
Interspeech2021
Devang S. Ram Mohan, Qinmin Vivian Hu, Tian Huey Teh, Alexandra Torresquintero, Christopher G. R. Wallis, Marlene Staib, Lorenzo Foglianti, Jiameng Gao, Simon King 0001, 
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis.
Interspeech2021
Alexandra Torresquintero, Tian Huey Teh, Christopher G. R. Wallis, Marlene Staib, Devang S. Ram Mohan, Vivian Hu, Lorenzo Foglianti, Jiameng Gao, Simon King 0001, 
ADEPT: A Dataset for Evaluating Prosody Transfer.
Interspeech2021
Cassia Valentini-Botinhao, Simon King 0001, 
Detection and Analysis of Attention Errors in Sequence-to-Sequence Text-to-Speech.
TASLP2020
Xin Wang 0037, Shinji Takaki, Junichi Yamagishi, Simon King 0001, Keiichi Tokuda, 
A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis.
ICASSP2020
Ivan Himawan, Sandesh Aryal, Iris Ouyang, Sam Kang, Pierre Lanchantin, Simon King 0001, 
Speaker Adaptation of a Multilingual Acoustic Model for Cross-Language Synthesis.
Interspeech2020
Carol Chermaz, Simon King 0001, 
A Sound Engineering Approach to Near End Listening Enhancement.
Interspeech2020
Jason Fong, Jason Taylor, Simon King 0001, 
Testing the Limits of Representation Mixing for Pronunciation Correction in End-to-End Speech Synthesis.
Interspeech2020
Pilar Oplustil Gallegos, Jennifer Williams 0001, Joanna Rownicka, Simon King 0001, 
An Unsupervised Method to Select a Speaker Subset from Large Multi-Speaker Speech Synthesis Datasets.
Interspeech2020
Jacob J. Webber, Olivier Perrotin, Simon King 0001, 
Hider-Finder-Combiner: An Adversarial Architecture for General Speech Signal Modification.
ICASSP2019
Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King 0001, 
Attentive Filtering Networks for Audio Replay Attack Detection.
ICASSP2019
Oliver Watts, Cassia Valentini-Botinhao, Simon King 0001, 
Speech Waveform Reconstruction Using Convolutional Neural Networks with Noise and Periodic Inputs.
SpeechComm2024
Simon Stone, Peter Birkholz, 
Monophthong vocal tract shapes are sufficient for articulatory synthesis of German primary diphthongs.
TASLP2024
Yingming Gao, Peter Birkholz, Ya Li, 
Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab and Convolutional Recurrent Neural Networks.
SpeechComm2023
Daniel R. van Niekerk, Anqi Xu, Branislav Gerazov, Paul Konstantin Krug, Peter Birkholz, Lorna F. Halliday, Santitham Prom-on, Yi Xu 0007, 
Simulating vocal learning of spoken language: Beyond imitation.
TASLP2023
Paul Konstantin Krug, Peter Birkholz, Branislav Gerazov, Daniel Rudolph van Niekerk, Anqi Xu, Yi Xu 0007, 
Artificial Vocal Learning Guided by Phoneme Recognition and Visual Information.
Interspeech2023
Paul Konstantin Krug, Peter Birkholz, Branislav Gerazov, Daniel R. van Niekerk, Anqi Xu, Yi Xu 0007, 
Self-Supervised Solution to the Control Problem of Articulatory Synthesis.
TASLP2022
Simon Stone, Yingming Gao, Peter Birkholz, 
Articulatory Synthesis of Vocalized /r/ Allophones in German.
ICASSP2022
Peter Birkholz, P. Häsner, Steffen Kürbis, 
Acoustic Comparison of Physical Vocal Tract Models with Hard and Soft Walls.
ICASSP2022
Hannes Kath, Simon Stone, Stefan Rapp, Peter Birkholz, 
Carina - A Corpus of Aligned German Read Speech Including Annotations.
Interspeech2022
Pouriya Amini Digehsara, João Vítor Possamai de Menezes, Christoph Wagner, Michael Bärhold, Petr Schaffer, Dirk Plettemeier, Peter Birkholz, 
A user-friendly headset for radar-based silent speech recognition.
Interspeech2022
Arne-Lukas Fietkau, Simon Stone, Peter Birkholz, 
Relationship between the acoustic time intervals and tongue movements of German diphthongs.
Interspeech2022
Paul Konstantin Krug, Peter Birkholz, Branislav Gerazov, Daniel Rudolph van Niekerk, Anqi Xu, Yi Xu 0007, 
Articulatory Synthesis for Data Augmentation in Phoneme Recognition.
Interspeech2022
Ingo Langheinrich, Simon Stone, Xinyu Zhang, Peter Birkholz, 
Glottal inverse filtering based on articulatory synthesis and deep learning.
Interspeech2022
Leon Liebig, Christoph Wagner, Alexander Mainka, Peter Birkholz, 
An investigation of regression-based prediction of the femininity or masculinity in speech of transgender people.
Interspeech2022
João Vítor Menezes, Pouriya Amini Digehsara, Christoph Wagner, Marco Mütze, Michael Bärhold, Petr Schaffer, Dirk Plettemeier, Peter Birkholz, 
Evaluation of different antenna types and positions in a stepped frequency continuous-wave radar-based silent speech interface.
Interspeech2022
Debasish Ray Mohapatra, Mario Fleischer, Victor Zappi, Peter Birkholz, Sidney S. Fels, 
Three-dimensional finite-difference time-domain acoustic analysis of simplified vocal tract shapes.
Interspeech2022
Daniel R. van Niekerk, Anqi Xu, Branislav Gerazov, Paul Konstantin Krug, Peter Birkholz, Yi Xu 0007, 
Exploration strategies for articulatory synthesis of complex syllable onsets.
Interspeech2022
Yi Xu 0007, Anqi Xu, Daniel R. van Niekerk, Branislav Gerazov, Peter Birkholz, Paul Konstantin Krug, Santitham Prom-on, Lorna F. Halliday, 
Evoc-Learn - High quality simulation of early vocal learning.
SpeechComm2021
Peter Birkholz, Susanne Drechsel, 
Effects of the piriform fossae, transvelar acoustic coupling, and laryngeal wall vibration on the naturalness of articulatory speech synthesis.
Interspeech2021
Rémi Blandin, Marc Arnela, Simon Félix, Jean-Baptiste Doc, Peter Birkholz, 
Comparison of the Finite Element Method, the Multimodal Method and the Transmission-Line Model for the Computation of Vocal Tract Transfer Functions.
Interspeech2021
Alexander Wilbrandt, Simon Stone, Peter Birkholz, 
Articulatory Data Recorder: A Framework for Real-Time Articulatory Data Recording.
TASLP2024
Mathias Bach Pedersen, Søren Holdt Jensen, Zheng-Hua Tan, Jesper Jensen 0001, 
Data-Driven Non-Intrusive Speech Intelligibility Prediction Using Speech Presence Probability.
ICASSP2024
Holger Severin Bovbjerg, Jesper Jensen 0001, Jan Østergaard, Zheng-Hua Tan, 
Self-Supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions.
ICASSP2024
Amin Edraki, Wai-Yip Chan, Jesper Jensen 0001, Daniel Fogerty, 
Speaker Adaptation For Enhancement Of Bone-Conducted Speech.
ICASSP2024
Philippe Gonzalez, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen 0001, Tommy Sonne Alstrøm, Tobias May, 
Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler.
ICASSP2024
Vikas Tokala, Eric Grinstein, Mike Brookes, Simon Doclo, Jesper Jensen 0001, Patrick A. Naylor, 
Binaural Speech Enhancement Using Deep Complex Convolutional Transformer Networks.
SpeechComm2023
Iván López-Espejo, Amin Edraki, Wai-Yip Chan, Zheng-Hua Tan, Jesper Jensen 0001, 
On the deficiency of intelligibility metrics as proxies for subjective intelligibility.
TASLP2023
Andreas Jonas Fuglsig, Jesper Jensen 0001, Zheng-Hua Tan, Lars Søndergaard Bertelsen, Jens Christian Lindof, Jan Østergaard, 
Minimum Processing Near-End Listening Enhancement.
ICASSP2023
Iván López-Espejo, Ram C. M. C. Shekar, Zheng-Hua Tan, Jesper Jensen 0001, John H. L. Hansen, 
Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting.
Interspeech2023
Juan Felipe Montesinos, Daniel Michelsanti, Gloria Haro, Zheng-Hua Tan, Jesper Jensen 0001, 
Speech inpainting: Context-based speech synthesis guided by video.
TASLP2022
Poul Hoang, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen 0001, 
Multichannel Speech Enhancement With Own Voice-Based Interfering Speech Suppression for Hearing Assistive Devices.
ICASSP2022
Andreas Jonas Fuglsig, Jan Østergaard, Jesper Jensen 0001, Lars Søndergaard Bertelsen, Peter Mariager, Zheng-Hua Tan, 
Joint Far- and Near-End Speech Intelligibility Enhancement Based on the Approximated Speech Intelligibility Index.
TASLP2021
Amin Edraki, Wai-Yip Chan, Jesper Jensen 0001, Daniel Fogerty, 
Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis.
TASLP2021
Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen 0001, 
A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting.
TASLP2021
Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, Jesper Jensen 0001, 
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.
ICASSP2021
Giovanni Morrone, Daniel Michelsanti, Zheng-Hua Tan, Jesper Jensen 0001, 
Audio-Visual Speech Inpainting with Deep Learning.
Interspeech2021
Amin Edraki, Wai-Yip Chan, Jesper Jensen 0001, Daniel Fogerty, 
A Spectro-Temporal Glimpsing Index (STGI) for Speech Intelligibility Prediction.
SpeechComm2020
Daniel Michelsanti, Zheng-Hua Tan, Sigurdur Sigurdsson, Jesper Jensen 0001, 
Deep-learning-based audio-visual speech enhancement in presence of Lombard effect.
TASLP2020
Morten Kolbæk, Zheng-Hua Tan, Søren Holdt Jensen, Jesper Jensen 0001, 
On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement.
TASLP2020
Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen 0001, 
Improved External Speaker-Robust Keyword Spotting for Hearing Assistive Devices.
TASLP2020
Juan M. Martín-Doñas, Jesper Jensen 0001, Zheng-Hua Tan, Angel M. Gomez, Antonio M. Peinado, 
Online Multichannel Speech Enhancement Based on Recursive EM and DNN-Based Speech Presence Estimation.
TASLP2023
Erfan Loweimi, Andrea Carmantini, Peter Bell 0001, Steve Renals, Zoran Cvetkovic, 
Phonetic Error Analysis Beyond Phone Error Rate.
TASLP2023
Erfan Loweimi, Zhengjun Yue, Peter Bell 0001, Steve Renals, Zoran Cvetkovic, 
Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform.
TASLP2022
Dino Oglic, Zoran Cvetkovic, Peter Sollich, Steve Renals, Bin Yu 0001, 
Towards Robust Waveform-Based Acoustic Models.
Interspeech2022
Chau Luu, Steve Renals, Peter Bell 0001, 
Investigating the contribution of speaker attributes to speaker separability using disentangled speaker representations.
SpeechComm2021
Manuel Sam Ribeiro, Joanne Cleland, Aciel Eshky, Korin Richmond, Steve Renals, 
Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors.
SpeechComm2021
Aciel Eshky, Joanne Cleland, Manuel Sam Ribeiro, Eleanor Sugden, Korin Richmond, Steve Renals, 
Automatic audiovisual synchronisation for ultrasound tongue imaging.
ICASSP2021
Erfan Loweimi, Zoran Cvetkovic, Peter Bell 0001, Steve Renals, 
Speech Acoustic Modelling from Raw Phase Spectrum.
ICASSP2021
Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell 0001, Steve Renals, 
Train Your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers.
Interspeech2021
Erfan Loweimi, Zoran Cvetkovic, Peter Bell 0001, Steve Renals, 
Speech Acoustic Modelling Using Raw Source and Filter Components.
Interspeech2021
Chau Luu, Peter Bell 0001, Steve Renals, 
Leveraging Speaker Attribute Information Using Multi Task Learning for Speaker Verification and Diarization.
Interspeech2021
Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals, 
Silent versus Modal Multi-Speaker Speech Recognition from Ultrasound and Video.
Interspeech2021
Shucong Zhang, Erfan Loweimi, Peter Bell 0001, Steve Renals, 
Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models.
ICASSP2020
Alberto Abad, Peter Bell 0001, Andrea Carmantini, Steve Renals, 
Cross Lingual Transfer Learning for Zero-Resource Domain Adaptation.
ICASSP2020
Chau Luu, Peter Bell 0001, Steve Renals, 
Channel Adversarial Training for Speaker Verification and Diarization.
ICASSP2020
Joanna Rownicka, Peter Bell 0001, Steve Renals, 
Multi-Scale Octave Convolutions for Robust Speech Recognition.
ICASSP2020
Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Steve Renals, 
Learning Noise Invariant Features Through Transfer Learning For Robust End-to-End Speech Recognition.
Interspeech2020
Ahmed Ali 0002, Steve Renals, 
Word Error Rate Estimation Without ASR Output: e-WER2.
Interspeech2020
Neethu M. Joy, Dino Oglic, Zoran Cvetkovic, Peter Bell 0001, Steve Renals, 
Deep Scattering Power Spectrum Features for Robust Speech Recognition.
Interspeech2020
Erfan Loweimi, Peter Bell 0001, Steve Renals, 
On the Robustness and Training Dynamics of Raw Waveform Models.
Interspeech2020
Erfan Loweimi, Peter Bell 0001, Steve Renals, 
Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling.
TASLP2024
Matthew Baas, Herman Kamper, 
Disentanglement in a GAN for Unconditional Speech Synthesis.
TASLP2024
Leanne Nortje, Dan Oneata, Herman Kamper, 
Visually Grounded Few-Shot Word Learning in Low-Resource Settings.
TASLP2023
Herman Kamper, 
Word Segmentation on Discovered Phone Units With Dynamic Programming and Self-Supervised Scoring.
Interspeech2023
Matthew Baas, Benjamin van Niekerk, Herman Kamper, 
Voice Conversion With Just Nearest Neighbors.
Interspeech2023
Christiaan Jacobs, Nathanaël Carraz Rakotonirina, Everlyn Asiko Chimoto, Bruce A. Bassett, Herman Kamper, 
Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili.
Interspeech2023
Ruan van der Merwe, Herman Kamper, 
Mitigating Catastrophic Forgetting for Few-Shot Spoken Word Classification Through Meta-Learning.
Interspeech2023
Leanne Nortje, Benjamin van Niekerk, Herman Kamper, 
Visually grounded few-shot word acquisition with fewer shots.
ICASSP2022
Benjamin van Niekerk, Marc-André Carbonneau, Julian Zaïdi, Matthew Baas, Hugo Seuté, Herman Kamper, 
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.
Interspeech2022
Matthew Baas, Herman Kamper, 
Voice Conversion Can Improve ASR in Very Low-Resource Settings.
Interspeech2022
Werner van der Merwe, Herman Kamper, Johan Adam du Preez, 
A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery.
TASLP2021
Herman Kamper, Yevgen Matusevych, Sharon Goldwater, 
Improved Acoustic Word Embeddings for Zero-Resource Languages Using Multilingual Transfer.
Interspeech2021
Christiaan Jacobs, Herman Kamper, 
Multilingual Transfer of Acoustic Word Embeddings Improves When Training on Languages Related to the Target Zero-Resource Language.
Interspeech2021
Herman Kamper, Benjamin van Niekerk, 
Towards Unsupervised Phone and Word Segmentation Using Self-Supervised Vector-Quantized Neural Networks.
Interspeech2021
Benjamin van Niekerk, Leanne Nortje, Matthew Baas, Herman Kamper, 
Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing.
Interspeech2021
Leanne Nortje, Herman Kamper, 
Direct Multimodal Few-Shot Learning of Speech and Images.
Interspeech2021
Kayode Olaleye, Herman Kamper, 
Attention-Based Keyword Localisation in Speech Using Visual Grounding.
ICASSP2020
Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater, 
Cross-Lingual Topic Prediction For Speech Using Translations.
ICASSP2020
Herman Kamper, Yevgen Matusevych, Sharon Goldwater, 
Multilingual Acoustic Word Embedding Models for Processing Zero-resource Languages.
Interspeech2020
Benjamin van Niekerk, Leanne Nortje, Herman Kamper, 
Vector-Quantized Neural Networks for Acoustic Unit Discovery in the ZeroSpeech 2020 Challenge.
Interspeech2020
Leanne Nortje, Herman Kamper, 
Unsupervised vs. Transfer Learning for Multimodal One-Shot Matching of Speech and Images.
ICASSP2023
Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda, 
Low-Latency Electrolaryngeal Speech Enhancement Based on Fastspeech2-Based Voice Conversion and Self-Supervised Speech Representation.
ICASSP2022
Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, 
An Investigation of Streaming Non-Autoregressive sequence-to-sequence Voice Conversion.
ICASSP2022
Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe 0001, Tomoki Toda, 
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations.
Interspeech2022
Jiatong Shi, Shuai Guo, Tao Qian, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe 0001, Qin Jin, 
Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis.
TASLP2021
Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Pretraining Techniques for Sequence-to-Sequence Voice Conversion.
TASLP2021
Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.
TASLP2021
Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda, 
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.
ICASSP2021
Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi 0003, Shinji Watanabe 0001, Kun Wei, Wangyou Zhang, Yuekai Zhang, 
Recent Developments on Espnet Toolkit Boosted By Conformer.
ICASSP2021
Tomoki Hayashi, Wen-Chin Huang, Kazuhiro Kobayashi, Tomoki Toda, 
Non-Autoregressive Sequence-To-Sequence Voice Conversion.
ICASSP2021
Wen-Chin Huang, Yi-Chiao Wu, Tomoki Hayashi, 
Any-to-One Sequence-to-Sequence Voice Conversion Using Self-Supervised Discrete Speech Representations.
ICASSP2021
Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda, 
Crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder.
Interspeech2021
Tatsuya Komatsu, Shinji Watanabe 0001, Koichi Miyazaki, Tomoki Hayashi, 
Acoustic Event Detection with Classifier Chains.
ICASSP2020
Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe 0001, Tomoki Toda, Kazuya Takeda, Yu Zhang 0033, Xu Tan 0003, 
Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit.
ICASSP2020
Katsuki Inoue, Sunao Hara, Masanobu Abe, Tomoki Hayashi, Ryuichi Yamamoto, Shinji Watanabe 0001, 
Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models.
ICASSP2020
Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe 0001, Tomoki Toda, Kazuya Takeda, 
Weakly-Supervised Sound Event Detection with Self-Attention.
ICASSP2020
Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, 
Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction.
ICASSP2020
Takenori Yoshimura, Tomoki Hayashi, Kazuya Takeda, Shinji Watanabe 0001, 
End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection.
Interspeech2020
Shu Hikosaka, Shogo Seki, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Hideki Banno, Tomoki Toda, 
Intelligibility Enhancement Based on Speech Waveform Modification Using Hearing Impairment.
Interspeech2020
Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining.
Interspeech2020
Patrick Lumban Tobing, Tomoki Hayashi, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda, 
Cyclic Spectral Modeling for Unsupervised Unit Discovery into Voice Conversion with Excitation and Waveform Modeling.
ICASSP2022
Ke Hu, Tara N. Sainath, Arun Narayanan, Ruoming Pang, Trevor Strohman, 
Transducer-Based Streaming Deliberation for Cascaded Encoders.
ICASSP2022
Bo Li 0028, Ruoming Pang, Yu Zhang 0033, Tara N. Sainath, Trevor Strohman, Parisa Haghani, Yun Zhu, Brian Farris, Neeraj Gaur, Manasa Prasad, 
Massively Multilingual ASR: A Lifelong Learning Solution.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang 0001, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
Interspeech2022
W. Ronny Huang, Cal Peyser, Tara N. Sainath, Ruoming Pang, Trevor D. Strohman, Shankar Kumar, 
Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition.
Interspeech2022
Bo Li 0028, Tara N. Sainath, Ruoming Pang, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang 0001, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani, 
A Language Agnostic Multilingual Streaming On-Device ASR System.
ICASSP2021
Thibault Doutre, Wei Han 0002, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang 0033, Liangliang Cao, 
Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data.
ICASSP2021
Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang 0001, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.
ICASSP2021
Arun Narayanan, Tara N. Sainath, Ruoming Pang, Jiahui Yu, Chung-Cheng Chiu, Rohit Prabhavalkar, Ehsan Variani, Trevor Strohman, 
Cascaded Encoders for Unifying Streaming and Non-Streaming ASR.
ICASSP2021
Zhaofeng Wu, Ding Zhao, Qiao Liang 0001, Jiahui Yu, Anmol Gulati, Ruoming Pang, 
Dynamic Sparsity Neural Networks for Automatic Speech Recognition.
ICASSP2021
Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang, 
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.
Interspeech2021
Thibault Doutre, Wei Han 0002, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao, 
Bridging the Gap Between Streaming and Non-Streaming ASR Systems by Distilling Ensembles of CTC and RNN-T Models.
Interspeech2021
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Ruoming Pang, David Rybach, Cyril Allauzen, Ehsan Variani, James Qin, Quoc-Nam Le-The, Shuo-Yiin Chang, Bo Li 0028, Anmol Gulati, Jiahui Yu, Chung-Cheng Chiu, Diamantino Caseiro, Wei Li 0133, Qiao Liang 0001, Pat Rondon, 
An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.
Interspeech2021
Andros Tjandra, Ruoming Pang, Yu Zhang 0033, Shigeki Karita, 
Unsupervised Learning of Disentangled Speech Content and Style Representation.
ICLR2021
Jiahui Yu, Wei Han 0002, Anmol Gulati, Chung-Cheng Chiu, Bo Li 0028, Tara N. Sainath, Yonghui Wu, Ruoming Pang, 
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.
ICASSP2020
Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prabhavalkar, 
Deliberation Model Based Two-Pass End-To-End Speech Recognition.
ICASSP2020
Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu, 
Towards Fast and Accurate Streaming End-To-End ASR.
ICASSP2020
Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang 0001, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.
ICASSP2020
Tara N. Sainath, Ruoming Pang, Ron J. Weiss, Yanzhang He, Chung-Cheng Chiu, Trevor Strohman, 
An Attention-Based Joint Acoustic and Text on-Device End-To-End Model.
Interspeech2020
Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang 0033, Jiahui Yu, Wei Han 0002, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang, 
Conformer: Convolution-augmented Transformer for Speech Recognition.
Interspeech2020
Wei Han 0002, Zhengdong Zhang, Yu Zhang 0033, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu, 
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context.
ICASSP2024
William Chen, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe 0001, 
Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing.
ICASSP2023
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan S. Sharma, Kohei Matsuura, Shinji Watanabe 0001, 
Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders.
ICASSP2023
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix, Ryo Masumura, 
Leveraging Large Text Corpora For End-To-End Speech Summarization.
ICASSP2023
Atsunori Ogawa, Takafumi Moriya, Naoyuki Kamo, Naohiro Tawara, Marc Delcroix, 
Iterative Shallow Fusion of Backward Language Model for End-To-End Speech Recognition.
Interspeech2023
Shoko Araki, Ayako Yamamoto, Tsubasa Ochiai, Kenichi Arai, Atsunori Ogawa, Tomohiro Nakatani, Toshio Irino, 
Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine.
Interspeech2023
Marc Delcroix, Naohiro Tawara, Mireia Díez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukás Burget, Shoko Araki, 
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization.
Interspeech2023
Yuki Kitagishi, Naohiro Tawara, Atsunori Ogawa, Ryo Masumura, Taichi Asami, 
What are differences? Comparing DNN and Human by Their Performance and Characteristics in Speaker Age Estimation.
Interspeech2023
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, 
Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization.
Interspeech2023
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami, 
Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data.
ICASSP2022
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe 0001, 
Integrating Multiple ASR Systems into NLP Backend with Attention Fusion.
ICASSP2022
Atsunori Ogawa, Naohiro Tawara, Marc Delcroix, Shoko Araki, 
Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models.
Interspeech2022
Koharu Horii, Meiko Fukuda, Kengo Ohta, Ryota Nishimura, Atsunori Ogawa, Norihide Kitaoka, 
End-to-End Spontaneous Speech Recognition Using Disfluency Labeling.
ICASSP2021
Atsunori Ogawa, Naohiro Tawara, Takatomo Kano, Marc Delcroix, 
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition.
Interspeech2021
Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, 
Comparison of Remote Experiments Using Crowdsourcing and Laboratory Experiments on Speech Intelligibility.
ICASSP2020
Naohiro Tawara, Hosana Kamiyama, Satoshi Kobashikawa, Atsunori Ogawa, 
Improving Speaker-Attribute Estimation by Voting Based on Speaker Cluster Information.
ICASSP2020
Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Marc Delcroix, Tetsuji Ogawa, 
Frame-Level Phoneme-Invariant Speaker Embedding for Text-Independent Speaker Recognition on Extremely Short Utterances.
Interspeech2020
Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Toshio Irino, 
Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System.
Interspeech2020
Atsunori Ogawa, Naohiro Tawara, Marc Delcroix, 
Language Model Data Augmentation Based on Text Domain Transfer.
ICASSP2019
Michael Hentschel, Marc Delcroix, Atsunori Ogawa, Tomoharu Iwata, Tomohiro Nakatani, 
A Unified Framework for Feature-based Domain Adaptation of Neural Network Language Models.
ICASSP2019
Shigeki Karita, Shinji Watanabe 0001, Tomoharu Iwata, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani, 
Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders.
ICASSP2024
Mrinmoy Bhattacharjee, Iuliia Nigmatulina, Amrutha Prasad, Pradeep Rangappa, Srikanth R. Madikeri, Petr Motlícek, Hartmut Helmke, Matthias Kleinert, 
Contextual Biasing Methods for Improving Rare Word Detection in Automatic Speech Recognition.
ICASSP2024
Shashi Kumar, Srikanth R. Madikeri, Iuliia Nigmatulina, Esaú Villatoro-Tello, Petr Motlícek, Karthik Pandia, S. Pavankumar Dubagunta, Aravind Ganapathiraju, 
Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers.
ICASSP2024
Amrutha Prasad, Andrés Carofilis, Geoffroy Vanderreydt, Driss Khalil, Srikanth R. Madikeri, Petr Motlícek, Christof Schüpbach, 
Fine-Tuning Self-Supervised Models for Language Identification Using Orthonormal Constraint.
ICASSP2024
Esaú Villatoro-Tello, Srikanth R. Madikeri, Bidisha Sharma, Driss Khalil, Shashi Kumar, Iuliia Nigmatulina, Petr Motlícek, Aravind Ganapathiraju, 
Probability-Aware Word-Confusion-Network-To-Text Alignment Approach for Intent Classification.
ICASSP2023
Esaú Villatoro-Tello, Srikanth R. Madikeri, Juan Zuluaga-Gomez, Bidisha Sharma, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Petr Motlícek, Alexei V. Ivanov, Aravind Ganapathiraju, 
Effectiveness of Text, Acoustic, and Lattice-Based Representations in Spoken Language Understanding Tasks.
Interspeech2023
Sergio Burdisso, Esaú Villatoro-Tello, Srikanth R. Madikeri, Petr Motlícek, 
Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews.
Interspeech2023
Florian Mai, Juan Zuluaga-Gomez, Titouan Parcollet, Petr Motlícek, 
HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition.
Interspeech2023
Iuliia Nigmatulina, Srikanth R. Madikeri, Esaú Villatoro-Tello, Petr Motlícek, Juan Zuluaga-Gomez, Karthik Pandia, Aravind Ganapathiraju, 
Implementing Contextual Biasing in GPU Decoder for Online ASR.
ICASSP2022
Iuliia Nigmatulina, Juan Zuluaga-Gomez, Amrutha Prasad, Seyyed Saeed Sarfjoo, Petr Motlícek, 
A Two-Step Approach to Leverage Contextual Data: Speech Recognition in Air-Traffic Communications.
ICASSP2021
Rudolf A. Braun, Srikanth R. Madikeri, Petr Motlícek, 
A Comparison of Methods for OOV-Word Recognition on a New Public Dataset.
Interspeech2021
Maël Fabien, Shantipriya Parida, Petr Motlícek, Dawei Zhu, Aravind Krishnan, Hoang H. Nguyen, 
ROXANNE Research Platform: Automate Criminal Investigations.
Interspeech2021
Weipeng He, Petr Motlícek, Jean-Marc Odobez, 
Multi-Task Neural Network for Robust Multiple Speaker Embedding Extraction.
Interspeech2021
Martin Kocour, Karel Veselý, Alexander Blatt, Juan Zuluaga-Gomez, Igor Szöke, Jan Cernocký, Dietrich Klakow, Petr Motlícek, 
Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition.
Interspeech2021
Srikanth R. Madikeri, Petr Motlícek, Hervé Bourlard, 
Multitask Adaptation with Lattice-Free MMI for Multi-Genre Speech Recognition of Low Resource Languages.
Interspeech2021
Oliver Ohneiser, Seyyed Saeed Sarfjoo, Hartmut Helmke, Shruthi Shetty, Petr Motlícek, Matthias Kleinert, Heiko Ehr, Sarunas Murauskas, 
Robust Command Recognition for Lithuanian Air Traffic Control Tower Utterances.
Interspeech2021
Seyyed Saeed Sarfjoo, Srikanth R. Madikeri, Petr Motlícek, 
Speech Activity Detection Based on Multilingual Speech Recognition System.
Interspeech2021
Esaú Villatoro-Tello, S. Pavankumar Dubagunta, Julian Fritsch, Gabriela Ramírez-de-la-Rosa, Petr Motlícek, Mathew Magimai-Doss, 
Late Fusion of the Available Lexicon and Raw Waveform-Based Acoustic Modeling for Depression and Dementia Recognition.
Interspeech2021
Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlícek, Karel Veselý, Martin Kocour, Igor Szöke, 
Contextual Semi-Supervised Learning: An Approach to Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems.
ICASSP2020
Banriskhem K. Khonglah, Srikanth R. Madikeri, Subhadeep Dey, Hervé Bourlard, Petr Motlícek, Jayadev Billa, 
Incremental Semi-Supervised Learning for Multi-Genre Speech Recognition.
Interspeech2020
Srikanth R. Madikeri, Banriskhem K. Khonglah, Sibo Tong, Petr Motlícek, Hervé Bourlard, Daniel Povey, 
Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems.
Interspeech2022
Cécile Fougeron, Nicolas Audibert, Ina Kodrasi, Parvaneh Janbakhshi, Michaela Pernon, Nathalie Lévêque, Stephanie Borel, Marina Laganaro, Hervé Bourlard, Frédéric Assal, 
Comparison of 5 methods for the evaluation of intelligibility in mild to moderate French dysarthric speech.
Interspeech2022
Selen Hande Kabil, Hervé Bourlard, 
From Undercomplete to Sparse Overcomplete Autoencoders to Improve LF-MMI based Speech Recognition.
ICASSP2021
Deepak Baby, Hervé Bourlard, 
Speech Dereverberation Using Variational Autoencoders.
ICASSP2021
Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard, 
Automatic Dysarthric Speech Detection Exploiting Pairwise Distance-Based Convolutional Neural Networks.
ICASSP2021
Ina Kodrasi, Michaela Pernon, Marina Laganaro, Hervé Bourlard, 
Automatic And Perceptual Discrimination Between Dysarthria, Apraxia of Speech, and Neurotypical Speech.
ICASSP2021
Apoorv Vyas, Srikanth R. Madikeri, Hervé Bourlard, 
Lattice-Free Mmi Adaptation of Self-Supervised Pretrained Acoustic Models.
Interspeech2021
Srikanth R. Madikeri, Petr Motlícek, Hervé Bourlard, 
Multitask Adaptation with Lattice-Free MMI for Multi-Genre Speech Recognition of Low Resource Languages.
Interspeech2021
Apoorv Vyas, Srikanth R. Madikeri, Hervé Bourlard, 
Comparing CTC and LFMMI for Out-of-Domain Adaptation of wav2vec 2.0 Acoustic Model.
SpeechComm2020
Pranay Dighe, Afsaneh Asaei, Hervé Bourlard, 
On quantifying the quality of acoustic models in hybrid DNN-HMM ASR.
TASLP2020
Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard, 
Automatic Pathological Speech Intelligibility Assessment Exploiting Subspace-Based Analyses.
TASLP2020
Ina Kodrasi, Hervé Bourlard, 
Spectro-Temporal Sparsity Characterization for Dysarthric Speech Detection.
TASLP2020
Dhananjay Ram, Lesly Miculicich, Hervé Bourlard, 
Neural Network Based End-to-End Query by Example Spoken Term Detection.
ICASSP2020
Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard, 
Synthetic Speech References for Automatic Pathological Speech Intelligibility Assessment.
ICASSP2020
Banriskhem K. Khonglah, Srikanth R. Madikeri, Subhadeep Dey, Hervé Bourlard, Petr Motlícek, Jayadev Billa, 
Incremental Semi-Supervised Learning for Multi-Genre Speech Recognition.
Interspeech2020
Ina Kodrasi, Michaela Pernon, Marina Laganaro, Hervé Bourlard, 
Automatic Discrimination of Apraxia of Speech and Dysarthria Using a Minimalistic Set of Handcrafted Features.
Interspeech2020
Srikanth R. Madikeri, Banriskhem K. Khonglah, Sibo Tong, Petr Motlícek, Hervé Bourlard, Daniel Povey, 
Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems.
SpeechComm2019
Pranay Dighe, Afsaneh Asaei, Hervé Bourlard, 
Low-rank and sparse subspace modeling of speech for DNN based acoustic modeling.
ICASSP2019
Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard, 
Pathological Speech Intelligibility Assessment Based on the Short-time Objective Intelligibility Measure.
ICASSP2019
Ina Kodrasi, Hervé Bourlard, 
Super-gaussianity of Speech Spectral Coefficients as a Potential Biomarker for Dysarthric Speech Detection.
ICASSP2019
François Marelli, Bastian Schnell, Hervé Bourlard, Thierry Dutoit, Philip N. Garner, 
An End-to-end Network to Synthesize Intonation Using a Generalized Command Response Model.
TASLP2024
Mathias Bach Pedersen, Søren Holdt Jensen, Zheng-Hua Tan, Jesper Jensen 0001, 
Data-Driven Non-Intrusive Speech Intelligibility Prediction Using Speech Presence Probability.
ICASSP2024
Holger Severin Bovbjerg, Jesper Jensen 0001, Jan Østergaard, Zheng-Hua Tan, 
Self-Supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions.
ICASSP2024
Philippe Gonzalez, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen 0001, Tommy Sonne Alstrøm, Tobias May, 
Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler.
ICLR2024
Sarthak Yadav, Sergios Theodoridis, Lars Kai Hansen, Zheng-Hua Tan, 
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners.
SpeechComm2023
Iván López-Espejo, Amin Edraki, Wai-Yip Chan, Zheng-Hua Tan, Jesper Jensen 0001, 
On the deficiency of intelligibility metrics as proxies for subjective intelligibility.
TASLP2023
Andreas Jonas Fuglsig, Jesper Jensen 0001, Zheng-Hua Tan, Lars Søndergaard Bertelsen, Jens Christian Lindof, Jan Østergaard, 
Minimum Processing Near-End Listening Enhancement.
TASLP2023
Yiming Zhang, Hong Yu 0006, Ruoyi Du, Zheng-Hua Tan, Wenwu Wang 0001, Zhanyu Ma, Yuan Dong, 
ACTUAL: Audio Captioning With Caption Feature Space Regularization.
ICASSP2023
Iván López-Espejo, Ram C. M. C. Shekar, Zheng-Hua Tan, Jesper Jensen 0001, John H. L. Hansen, 
Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting.
Interspeech2023
Juan Felipe Montesinos, Daniel Michelsanti, Gloria Haro, Zheng-Hua Tan, Jesper Jensen 0001, 
Speech inpainting: Context-based speech synthesis guided by video.
TASLP2022
Poul Hoang, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen 0001, 
Multichannel Speech Enhancement With Own Voice-Based Interfering Speech Suppression for Hearing Assistive Devices.
ICASSP2022
Andreas Jonas Fuglsig, Jan Østergaard, Jesper Jensen 0001, Lars Søndergaard Bertelsen, Peter Mariager, Zheng-Hua Tan, 
Joint Far- and Near-End Speech Intelligibility Enhancement Based on the Approximated Speech Intelligibility Index.
ICASSP2022
Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie 0001, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
Interspeech2022
Claus M. Larsen, Peter Koch 0001, Zheng-Hua Tan, 
Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic Delay.
TASLP2021
Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen 0001, 
A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting.
TASLP2021
Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, Jesper Jensen 0001, 
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.
ICASSP2021
Giovanni Morrone, Daniel Michelsanti, Zheng-Hua Tan, Jesper Jensen 0001, 
Audio-Visual Speech Inpainting with Deep Learning.
SpeechComm2020
Daniel Michelsanti, Zheng-Hua Tan, Sigurdur Sigurdsson, Jesper Jensen 0001, 
Deep-learning-based audio-visual speech enhancement in presence of Lombard effect.
TASLP2020
Morten Kolbæk, Zheng-Hua Tan, Søren Holdt Jensen, Jesper Jensen 0001, 
On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement.
TASLP2020
Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen 0001, 
Improved External Speaker-Robust Keyword Spotting for Hearing Assistive Devices.
TASLP2020
Juan M. Martín-Doñas, Jesper Jensen 0001, Zheng-Hua Tan, Angel M. Gomez, Antonio M. Peinado, 
Online Multichannel Speech Enhancement Based on Recursive EM and DNN-Based Speech Presence Estimation.
TASLP2024
Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien, 
Contrastive Self-Supervised Speaker Embedding With Sequential Disentanglement.
ICASSP2024
Bobbi Aditya, Mahdin Rohmatillah, Liang-Hsuan Tai, Jen-Tzung Chien, 
Attention-Guided Adaptation for Code-Switching Speech Recognition.
ICASSP2024
Chong-Xin Gan, Man-Wai Mak, Weiwei Lin 0002, Jen-Tzung Chien, 
Asymmetric Clean Segments-Guided Self-Supervised Learning for Robust Speaker Verification.
ICASSP2024
Mahdin Rohmatillah, Jen-Tzung Chien, 
Revise the NLU: A Prompting Strategy for Robust Dialogue System.
ICASSP2024
Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien, 
Contrastive Speaker Embedding With Sequential Disentanglement.
TASLP2023
Mahdin Rohmatillah, Jen-Tzung Chien, 
Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy.
ICASSP2023
Ming-Yen Chen, Mahdin Rohmatillah, Ching-Hsien Lee, Jen-Tzung Chien, 
Meta Learning for Domain Agnostic Soft Prompt.
ICASSP2023
Jen-Tzung Chien, Yuan-An Chen, 
Self-Supervised Adversarial Training for Contrastive Sentence Embedding.
Interspeech2023
Jen-Tzung Chien, Shang-En Li, 
Contrastive Disentangled Learning for Memory-Augmented Transformer.
Interspeech2023
Mahdin Rohmatillah, Bobbi Aditya, Li-Jen Yang, Bryan Gautama Ngo, Willianto Sulaiman, Jen-Tzung Chien, 
Promoting Mental Self-Disclosure in a Spoken Dialogue System.
Interspeech2023
Li-Jen Yang, Chao-Han Huck Yang, Jen-Tzung Chien, 
Parameter-Efficient Learning for Text-to-Speech Accent Adaptation.
ICASSP2022
Chang-Ting Chu, Mahdin Rohmatillah, Ching-Hsien Lee, Jen-Tzung Chien, 
Augmentation Strategy Optimization for Language Understanding.
ICASSP2022
Hou Lio, Shang-En Li, Jen-Tzung Chien, 
Adversarial Mask Transformer for Sequential Learning.
Interspeech2022
Jen-Tzung Chien, Yu-Han Huang, 
Bayesian Transformer Using Disentangled Mask Attention.
ICASSP2021
Sheng-Jhe Huang, Jen-Tzung Chien, 
Attribute Decomposition for Flow-Based Domain Mapping.
ICASSP2021
Tien-Ching Luo, Jen-Tzung Chien, 
Variational Dialogue Generation with Normalizing Flows.
Interspeech2021
Chi-Hang Leong, Yu-Han Huang, Jen-Tzung Chien, 
Online Compressive Transformer for End-to-End Speech Recognition.
Interspeech2021
Mahdin Rohmatillah, Jen-Tzung Chien, 
Causal Confusion Reduction for Robust Multi-Domain Dialogue Policy.
TASLP2020
Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien, 
Variational Domain Adversarial Learning With Mutual Information Maximization for Speaker Verification.
ICASSP2020
Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien, 
Information Maximized Variational Domain Adversarial Learning for Speaker Verification.
TASLP2024
Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien, 
Contrastive Self-Supervised Speaker Embedding With Sequential Disentanglement.
ICASSP2024
Chong-Xin Gan, Man-Wai Mak, Weiwei Lin 0002, Jen-Tzung Chien, 
Asymmetric Clean Segments-Guided Self-Supervised Learning for Robust Speaker Verification.
ICASSP2024
Zhe Li, Man-Wai Mak, Helen Mei-Ling Meng, 
Dual Parameter-Efficient Fine-Tuning for Speaker Representation Via Speaker Prompt Tuning and Adapters.
ICASSP2024
Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien, 
Contrastive Speaker Embedding With Sequential Disentanglement.
ICASSP2024
Lishi Zuo, Man-Wai Mak, Youzhi Tu, 
Promoting Independence of Depression and Speaker Features for Speaker Disentanglement in Speech-Based Depression Detection.
TASLP2023
Weiwei Lin 0002, Man-Wai Mak, 
Robust Speaker Verification Using Deep Weight Space Ensemble.
TASLP2023
Weiwei Lin 0002, Man-Wai Mak, 
Model-Agnostic Meta-Learning for Fast Text-Dependent Speaker Embedding Adaptation.
ICASSP2023
Xiaoquan Ke, Man-Wai Mak, Helen M. Meng, 
Feature Selection and Text Embedding for Detecting Dementia from Spontaneous Cantonese.
ICASSP2023
Zhe Li, Man-Wai Mak, Helen Mei-Ling Meng, 
Discriminative Speaker Representation Via Contrastive Learning with Class-Aware Attention in Angular Space.
Interspeech2023
Helen Meng, Brian Mak, Man-Wai Mak, Helene H. Fung, Xianmin Gong, Timothy C. Y. Kwok, Xunying Liu, Vincent C. T. Mok, Patrick C. M. Wong, Jean Woo, Xixin Wu, Ka Ho Wong, Sean Shensheng Xu, Naijun Zheng, Ranzo Huang, Jiawen Kang 0002, Xiaoquan Ke, Junan Li, Jinchao Li, Yi Wang, 
Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders.
ICML2023
Weiwei Lin 0002, Chenhang He, Man-Wai Mak, Youzhi Tu, 
Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations.
TASLP2022
Weiwei Lin 0002, Man-Wai Mak, 
Mixture Representation Learning for Deep Speaker Embedding.
TASLP2022
Youzhi Tu, Man-Wai Mak, 
Aggregating Frame-Level Information in the Spectral Domain With Self-Attention for Speaker Embedding.
ICASSP2022
Weiwei Lin 0002, Man-Wai Mak, 
Robust Speaker Verification Using Population-Based Data Augmentation.
ICASSP2022
Lu Yi 0001, Man-Wai Mak, 
Disentangled Speaker Embedding for Robust Speaker Verification.
Interspeech2022
Zhenke Gao, Man-Wai Mak, Weiwei Lin 0002, 
UNet-DenseNet for Robust Far-Field Speaker Verification.
Interspeech2022
Xiaoquan Ke, Man-Wai Mak, Helen M. Meng, 
Automatic Selection of Discriminative Features for Dementia Detection in Cantonese-Speaking People.
ICASSP2021
Jinchao Li, Jianwei Yu, Zi Ye 0001, Simon Wong, Man-Wai Mak, Brian Mak, Xunying Liu, Helen Meng, 
A Comparative Study of Acoustic and Linguistic Features Classification for Alzheimer's Disease Detection.
ICASSP2021
Youzhi Tu, Man-Wai Mak, 
Short-Time Spectral Aggregation for Speaker Embedding.
Interspeech2021
Youzhi Tu, Man-Wai Mak, 
Mutual Information Enhanced Training for Speaker Embedding.
TASLP2024
Zhong-Qiu Wang, 
USDnet: Unsupervised Speech Dereverberation via Neural Forward Filtering.
ICASSP2024
Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhongqiu Wang, Shinji Watanabe 0001, 
Boosting Unknown-Number Speaker Separation with Transformer Decoder-Based Attractor.
ICASSP2024
Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang 0029, Hongbo Lan, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao, 
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
TASLP2023
Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Zhong-Qiu Wang, Jonathan Le Roux, 
Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks.
TASLP2023
Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe 0001, 
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation.
TASLP2023
Zhong-Qiu Wang, Gordon Wichern, Shinji Watanabe 0001, Jonathan Le Roux, 
STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency.
ICASSP2023
Samuele Cornell, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe 0001, Manuel Pariente, Nobutaka Ono, Stefano Squartini, 
Multi-Channel Speaker Extraction with Adversarial Training: The Wavlab Submission to The Clarity ICASSP 2023 Grand Challenge.
ICASSP2023
Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe 0001, 
TF-GRIDNET: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation.
ICASSP2023
Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe 0001, 
FNeural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated full- and sub-band Modeling.
NeurIPS2023
Zhong-Qiu Wang, Shinji Watanabe 0001, 
UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures.
TASLP2022
Ke Tan 0001, Zhong-Qiu Wang, DeLiang Wang, 
Neural Spectrospatial Filtering.
ICASSP2022
Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe 0001, Alexander Richard, Cheng Yu, Yu Tsao 0001, 
Conditional Diffusion Probabilistic Model for Speech Enhancement.
ICASSP2022
Darius Petermann, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux, 
The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks.
ICASSP2022
Zhong-Qiu Wang, DeLiang Wang, 
Localization based Sequential Grouping for Continuous Speech Separation.
Interspeech2022
Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao 0001, Yanmin Qian, Shinji Watanabe 0001, 
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.
TASLP2021
Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux, 
Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation.
ICASSP2021
Zhong-Qiu Wang, DeLiang Wang, 
Count And Separate: Incorporating Speaker Counting For Continuous Speaker Separation.
TASLP2020
Hassan Taherian, Zhong-Qiu Wang, Jorge Chang, DeLiang Wang, 
Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement.
TASLP2020
Zhong-Qiu Wang, DeLiang Wang, 
Deep Learning Based Target Cancellation for Speech Dereverberation.
TASLP2020
Zhong-Qiu Wang, Peidong Wang, DeLiang Wang, 
Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR.
ICASSP2024
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Dan Luo, Zhiyong Wu 0001, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han 0001, Helen Meng, 
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.
ICASSP2024
Xingda Li, Fan Zhuo, Dan Luo, Jun Chen 0024, Shiyin Kang, Zhiyong Wu 0001, Tao Jiang, Yang Li, Han Fang, Yahui Zhou, 
Generating Stereophonic Music with Single-Stage Language Models.
ICASSP2024
Zhiwei Lin, Jun Chen 0024, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu 0001, Helen Meng, 
Multi-View Midivae: Fusing Track- and Bar-View Representations for Long Multi-Track Symbolic Music Generation.
TASLP2023
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Zhiyong Wu 0001, Xixin Wu, Shiyin Kang, Helen Meng, 
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.
ICASSP2023
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis.
ICASSP2023
Weinan Tong, Jiaxu Zhu, Jun Chen 0024, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
TFCnet: Time-Frequency Domain Corrector for Speech Separation.
ICASSP2023
Yaoxun Xu, Baiji Liu, Qiaochu Huang, Xingchen Song, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
CB-Conformer: Contextual Biasing Conformer for Biased Word Recognition.
Interspeech2023
Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou 0002, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis.
ICASSP2022
Jun Chen 0024, Zilin Wang, Deyi Tuo, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
FullSubNet+: Channel Attention Fullsubnet with Complex Spectrograms for Speech Enhancement.
ICASSP2022
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
ICASSP2022
Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu 0001, Shiyin Kang, Deyi Tuo, Helen Meng, 
Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion.
Interspeech2022
Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu 0001, Helen Meng, 
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.
Interspeech2022
Shun Lei, Yixuan Zhou 0002, Liyang Chen, Jiankun Hu, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
Interspeech2022
Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information.
TASLP2021
Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Shiyin Kang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exemplar-Based Emotive Speech Synthesis.
ICASSP2021
Jie Wang, Yuren You, Feng Liu, Deyi Tuo, Shiyin Kang, Zhiyong Wu 0001, Helen Meng, 
The Huya Multi-Speaker and Multi-Style Speech Synthesis System for M2voc Challenge 2020.
Interspeech2021
Hui Lu, Zhiyong Wu 0001, Xixin Wu, Xu Li 0015, Shiyin Kang, Xunying Liu, Helen Meng, 
VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis.
Interspeech2021
Jie Wang, Jingbei Li, Xintao Zhao, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Adversarially Learning Disentangled Speech Representations for Robust Multi-Factor Voice Conversion.
ICASSP2020
Yuewen Cao, Songxiang Liu, Xixin Wu, Shiyin Kang, Peng Liu, Zhiyong Wu 0001, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.
ICASSP2020
Songxiang Liu, Disong Wang, Yuewen Cao, Lifa Sun, Xixin Wu, Shiyin Kang, Zhiyong Wu 0001, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
End-To-End Accent Conversion Without Using Native Utterances.
TASLP2023
Haohan Guo, Fenglong Xie, Xixin Wu, Frank K. Soong, Helen Meng, 
MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS.
Interspeech2023
Yujia Xiao, Shaofei Zhang, Xi Wang 0016, Xu Tan 0003, Lei He 0005, Sheng Zhao, Frank K. Soong, Tan Lee 0001, 
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading.
TASLP2022
Xiaochun An, Frank K. Soong, Lei Xie 0001, 
Disentangling Style and Speaker Attributes for TTS Style Transfer.
TASLP2022
Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie 0001, 
ParaTTS: Learning Linguistic and Prosodic Cross-Sentence Information in Paragraph-Based TTS.
ICASSP2022
Shaoguang Mao, Frank K. Soong, Yan Xia 0005, Jonathan Tien, 
A Universal Ordinal Regression for Assessing Phoneme-Level Pronunciation.
ICASSP2022
Yujia Xiao, Xi Wang 0016, Lei He 0005, Frank K. Soong, 
Improving Fastspeech TTS with Efficient Self-Attention and Compact Feed-Forward Network.
ICASSP2022
Wenxuan Ye, Shaoguang Mao, Frank K. Soong, Wenshan Wu, Yan Xia 0005, Jonathan Tien, Zhiyong Wu 0001, 
An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings.
Interspeech2022
Mutian He 0001, Jingzhou Yang, Lei He 0005, Frank K. Soong, 
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge.
Interspeech2022
Haohan Guo, Feng-Long Xie, Frank K. Soong, Xixin Wu, Helen Meng, 
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS.
ICASSP2021
Liping Chen, Yan Deng, Xi Wang 0016, Frank K. Soong, Lei He 0005, 
Speech Bert Embedding for Improving Prosody in Neural TTS.
ICASSP2021
Yichong Leng, Xu Tan 0003, Sheng Zhao, Frank K. Soong, Xiang-Yang Li 0001, Tao Qin 0001, 
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network.
ICASSP2021
Bin Su, Shaoguang Mao, Frank K. Soong, Yan Xia 0005, Jonathan Tien, Zhiyong Wu 0001, 
Improving Pronunciation Assessment Via Ordinal Regression with Anchored Reference Samples.
ICASSP2021
Feng-Long Xie, Xinhui Li, Wen-Chao Su, Li Lu, Frank K. Soong, 
A New High Quality Trajectory Tiling Based Hybrid TTS In Real Time.
Interspeech2021
Xiaochun An, Frank K. Soong, Lei Xie 0001, 
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-End Neural TTS.
ICASSP2020
Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank K. Soong, Hong-Goo Kang, 
Improving LPCNET-Based Text-to-Speech with Linear Prediction-Structured Mixture Density Network.
ICASSP2020
Yujia Xiao, Lei He 0005, Huaiping Ming, Frank K. Soong, 
Improving Prosody with Linguistic and Bert Derived Features in Multi-Speaker Based Mandarin Chinese Neural TTS.
ICASSP2020
Feng-Long Xie, Xinhui Li, Bo Liu, Yibin Zheng, Li Meng, Li Lu, Frank K. Soong, 
An Improved Frame-Unit-Selection Based Voice Conversion System Without Parallel Training Data.
Interspeech2020
Yang Cui, Xi Wang 0016, Lei He 0005, Frank K. Soong, 
An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis.
Interspeech2020
Yuanbo Hou, Frank K. Soong, Jian Luan 0001, Shengchen Li, 
Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music.
SpeechComm2019
Feng-Long Xie, Frank K. Soong, Haifeng Li 0001, 
Voice conversion with SI-DNN and KL divergence based mapping without parallel training data.
TASLP2024
Zhihua Fang, Liang He 0003, Lin Li 0032, Ying Hu 0005, 
Improving Speaker Verification With Noise-Aware Label Ensembling and Sample Selection: Learning and Correcting Noisy Speaker Labels.
ICASSP2024
Jian Zhang, Jing Ma, Xiaochen Guo, Lin Li 0032, Liang He 0003, 
A Speaker Recognition Method Based on Stable Learning.
ICASSP2023
Zhicong Chen, Jie Wang, Wenxuan Hu, Lin Li 0032, Qingyang Hong, 
Unsupervised Speaker Verification Using Pre-Trained Model and Label Correction.
ICASSP2023
Dexin Liao, Tao Jiang 0033, Feng Wang, Lin Li 0032, Qingyang Hong, 
Towards A Unified Conformer Structure: from ASR to ASV Task.
ICASSP2023
Qiulin Wang, Wenxuan Hu, Lin Li 0032, Qingyang Hong, 
Meta Learning with Adaptive Loss Weight for Low-Resource Speech Recognition.
Interspeech2023
Zhihua Fang, Liang He 0003, Hanhan Ma, Xiaochen Guo, Lin Li 0032, 
Robust Training for Speaker Verification against Noisy Labels.
Interspeech2023
Feng Wang, Lingyan Huang, Tao Li, Qingyang Hong, Lin Li 0032, 
Conformer-based Language Embedding with Self-Knowledge Distillation for Spoken Language Identification.
TASLP2022
Lin Li 0032, Fuchuan Tong, Qingyang Hong, 
When Speaker Recognition Meets Noisy Labels: Optimizations for Front-Ends and Back-Ends.
ICASSP2022
Fuchuan Tong, Siqi Zheng, Min Zhang, Yafeng Chen, Hongbin Suo, Qingyang Hong, Lin Li 0032, 
Graph Convolutional Network Based Semi-Supervised Learning on Multi-Speaker Meeting Data.
Interspeech2022
Jie Wang, Yuji Liu, Binling Wang, Yiming Zhi, Song Li, Shipeng Xia, Jiayang Zhang, Feng Tong, Lin Li 0032, Qingyang Hong, 
Spatial-aware Speaker Diarizaiton for Multi-channel Multi-party Meeting.
Interspeech2022
Binling Wang, Feng Wang, Wenxuan Hu, Qiulin Wang, Jing Li, Dong Wang 0013, Lin Li 0032, Qingyang Hong, 
Oriental Language Recognition (OLR) 2021: Summary and Analysis.
ICASSP2021
Song Li, Beibei Ouyang, Lin Li 0032, Qingyang Hong, 
Light-TTS: Lightweight Multi-Speaker Multi-Lingual Text-to-Speech.
ICASSP2021
Song Li, Beibei Ouyang, Dexin Liao, Shipeng Xia, Lin Li 0032, Qingyang Hong, 
End-To-End Multi-Accent Speech Recognition with Unsupervised Accent Modelling.
ICASSP2021
Fuchuan Tong, Miao Zhao, Jianfeng Zhou, Hao Lu, Zheng Li, Lin Li 0032, Qingyang Hong, 
ASV-SUBTOOLS: Open Source Toolkit for Automatic Speaker Verification.
Interspeech2021
Zheng Li, Yan Liu, Lin Li 0032, Qingyang Hong, 
Additive Phoneme-Aware Margin Softmax Loss for Language Recognition.
Interspeech2021
Song Li, Beibei Ouyang, Fuchuan Tong, Dexin Liao, Lin Li 0032, Qingyang Hong, 
Real-Time End-to-End Monaural Multi-Speaker Speech Recognition.
Interspeech2021
Jing Li, Binling Wang, Yiming Zhi, Zheng Li, Lin Li 0032, Qingyang Hong, Dong Wang 0013, 
Oriental Language Recognition (OLR) 2020: Summary and Analysis.
Interspeech2021
Dexin Liao, Jing Li, Yiming Zhi, Song Li, Qingyang Hong, Lin Li 0032, 
An Integrated Framework for Two-Pass Personalized Voice Trigger.
Interspeech2021
Yan Liu, Zheng Li, Lin Li 0032, Qingyang Hong, 
Phoneme-Aware and Channel-Wise Attentive Learning for Text Dependent Speaker Verification.
Interspeech2021
Fuchuan Tong, Yan Liu, Song Li, Jie Wang, Lin Li 0032, Qingyang Hong, 
Automatic Error Correction for Speaker Embedding Learning with Noisy Labels.
TASLP2024
Qijie Shao, Pengcheng Guo, Jinghao Yan, Pengfei Hu 0004, Lei Xie 0001, 
Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition.
TASLP2024
Jixun Yao, Qing Wang 0039, Pengcheng Guo, Ziqian Ning, Lei Xie 0001, 
Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted Matrix.
ICASSP2024
Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe 0001, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang, 
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
ICASSP2024
He Wang, Pengcheng Guo, Pan Zhou, Lei Xie 0001, 
MLCA-AVSR: Multi-Layer Cross Attention Fusion Based Audio-Visual Speech Recognition.
TASLP2023
Qing Wang 0039, Jixun Yao, Li Zhang 0106, Pengcheng Guo, Lei Xie 0001, 
Timbre-Reserved Adversarial Attack in Speaker Identification.
ICASSP2023
Pengcheng Guo, He Wang, Bingshen Mu, Ao Zhang, Peikun Chen, 
The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge.
ICASSP2023
Jixun Yao, Yi Lei, Qing Wang 0039, Pengcheng Guo, Ziqian Ning, Lei Xie 0001, Hai Li, Junhui Liu, Danming Xie, 
Preserving Background Sound in Noise-Robust Voice Conversion Via Multi-Task Learning.
ICASSP2023
Jixun Yao, Qing Wang 0039, Yi Lei, Pengcheng Guo, Lei Xie 0001, Namin Wang, Jie Liu, 
Distinguishable Speaker Anonymization Based on Formant and Fundamental Frequency Scaling.
ICASSP2023
Ao Zhang, He Wang, Pengcheng Guo, Yihui Fu, Lei Xie 0001, Yingying Gao, Shilei Zhang, Junlan Feng, 
VE-KWS: Visual Modality Enhanced End-to-End Keyword Spotting.
Interspeech2023
Qing Wang 0039, Jixun Yao, Ziqian Wang, Pengcheng Guo, Lei Xie 0001, 
Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification.
Interspeech2023
Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie 0001, 
Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network.
Interspeech2023
Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen 0003, Lei Xie 0001, 
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR.
Interspeech2023
Tianyi Xu, Zhanheng Yang, Kaixun Huang, Pengcheng Guo, Ao Zhang, Biao Li, Changru Chen, Chao Li, Lei Xie 0001, 
Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition.
Interspeech2023
Hongfei Xue, Qijie Shao, Peikun Chen, Pengcheng Guo, Lei Xie 0001, Jie Liu, 
TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition.
ICASSP2022
Fan Yu, Shiliang Zhang, Yihui Fu, Lei Xie 0001, Siqi Zheng, Zhihao Du, Weilong Huang, Pengcheng Guo, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
ICASSP2022
Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie 0001, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
ICASSP2022
Binbin Zhang, Hang Lv 0001, Pengcheng Guo, Qijie Shao, Chao Yang 0031, Lei Xie 0001, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di Wu 0061, Zhendong Peng, 
WENETSPEECH: A 10000+ Hours Multi-Domain Mandarin Corpus for Speech Recognition.
Interspeech2022
Qijie Shao, Jinghao Yan, Jian Kang 0006, Pengcheng Guo, Xian Shi, Pengfei Hu 0004, Lei Xie 0001, 
Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition.
Interspeech2022
Kun Wei, Pengcheng Guo, Ning Jiang, 
Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism.
ICASSP2021
Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi 0003, Shinji Watanabe 0001, Kun Wei, Wangyou Zhang, Yuekai Zhang, 
Recent Developments on Espnet Toolkit Boosted By Conformer.
NAACL2024
Patrick Foley, Matthew Wiesner, Bismarck Odoom, Leibny Paola García-Perera, Kenton Murray, Philipp Koehn, 
Where are you from? Geolocating Speech and Applications to Language Identification.
TASLP2023
Shota Horiguchi, Shinji Watanabe 0001, Paola García 0001, Yuki Takashima, Yohei Kawaguchi, 
Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors.
ICASSP2023
Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola García, Hung-Yi Lee, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Euro: Espnet Unsupervised ASR Open-Source Toolkit.
ICASSP2023
Zili Huang, Desh Raj, Paola García 0001, Sanjeev Khudanpur, 
Adapting Self-Supervised Models to Multi-Talker Speech Recognition Using Speaker Embeddings.
ICASSP2023
Ruizhe Huang, Matthew Wiesner, Leibny Paola García-Perera, Daniel Povey, Jan Trmal, Sanjeev Khudanpur, 
Building Keyword Search System from End-To-End Asr Systems.
ICASSP2023
Hexin Liu, Haihua Xu, Leibny Paola García, Andy W. H. Khong, Yi He, Sanjeev Khudanpur, 
Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization.
ICASSP2023
Jiatong Shi, Chan-Jan Hsu, Ho-Lam Chung, Dongji Gao, Paola García 0001, Shinji Watanabe 0001, Ann Lee 0001, Hung-Yi Lee, 
Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR.
Interspeech2023
Jesús Villalba 0001, Jonas Borgstrom, Maliha Jahan, Saurabh Kataria, Leibny Paola García, Pedro A. Torres-Carrasquillo, Najim Dehak, 
Advances in Language Recognition in Low Resource African Languages: The JHU-MIT Submission for NIST LRE22.
Interspeech2023
Yi Han Victoria Chua, Hexin Liu, Leibny Paola García, Fei Ting Woon, Jinyi Wong, Xiangyu Zhang, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles, 
MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization.
Interspeech2023
Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola García, Daniel Povey, Sanjeev Khudanpur, 
Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts.
Interspeech2023
Suzy J. Styles, Yi Han Victoria Chua, Fei Ting Woon, Hexin Liu, Leibny Paola García, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels, 
Investigating model performance in language identification: beyond simple error statistics.
TASLP2022
Shota Horiguchi, Yusuke Fujita, Shinji Watanabe 0001, Yawen Xue, Paola García 0001, 
Encoder-Decoder Based Attractors for End-to-End Neural Diarization.
ICASSP2022
Zili Huang, Shinji Watanabe 0001, Shu-Wen Yang, Paola García 0001, Sanjeev Khudanpur, 
Investigating Self-Supervised Learning for Speech Enhancement and Separation.
Interspeech2022
Hexin Liu, Leibny Paola García-Perera, Andy W. H. Khong, Suzy J. Styles, Sanjeev Khudanpur, 
PHO-LID: A Unified Model Incorporating Acoustic-Phonetic and Phonotactic Information for Language Identification.
Interspeech2022
Yuki Takashima, Shota Horiguchi, Shinji Watanabe 0001, Leibny Paola García-Perera, Yohei Kawaguchi, 
Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models.
ICASSP2021
Shota Horiguchi, Paola García 0001, Yusuke Fujita, Shinji Watanabe 0001, Kenji Nagamatsu, 
End-To-End Speaker Diarization as Post-Processing.
Interspeech2021
Hexin Liu, Leibny Paola García-Perera, Xinyi Zhang, Justin Dauwels, Andy W. H. Khong, Sanjeev Khudanpur, Suzy J. Styles, 
End-to-End Language Diarization for Bilingual Code-Switching Speech.
Interspeech2021
Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe 0001, Leibny Paola García-Perera, Kenji Nagamatsu, 
Semi-Supervised Training with Pseudo-Labeling for End-To-End Neural Diarization.
Interspeech2021
Matthew Wiesner, Mousmita Sarma, Ashish Arora, Desh Raj, Dongji Gao, Ruizhe Huang, Supreet Preet, Moris Johnson, Zikra Iqbal, Nagendra Goel, Jan Trmal, Leibny Paola García-Perera, Sanjeev Khudanpur, 
Training Hybrid Models on Noisy Transliterated Transcripts for Code-Switched Speech Recognition.
Interspeech2021
Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe 0001, Leibny Paola García-Perera, Kenji Nagamatsu, 
Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers.
TASLP2024
Jiaming Xu 0001, Jian Cui, Yunzhe Hao, Bo Xu 0002, 
Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments.
ICASSP2024
Ziyi Ni, Minglun Han, Feilong Chen, Linghui Meng 0001, Jing Shi 0003, Pin Lv, Bo Xu 0002, 
ViLaS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition.
ICASSP2024
Jingqing Ruan, Runpeng Xie, Xuantang Xiong, Shuang Xu, Bo Xu 0002, 
MaDE: Multi-Scale Decision Enhancement for Multi-Agent Reinforcement Learning.
ACL2024
Kexin Wang, Jiahong Zhang, Yong Ren, Man Yao, Di Shang, Bo Xu 0002, Guoqi Li, 
SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network.
ICASSP2023
Zefa Hu, Xiuyi Chen, Haoran Wu, Minglun Han, Ziyi Ni, Jing Shi 0003, Shuang Xu, Bo Xu 0002, 
Matching-Based Term Semantics Pre-Training for Spoken Patient Query Understanding.
Interspeech2023
Feilong Chen, Minglun Han, Jing Shi 0003, Shuang Xu, Bo Xu 0002, 
Enhancing Visual Question Answering via Deconstructing Questions and Explicating Answers.
Interspeech2023
Minglun Han, Feilong Chen, Jing Shi 0003, Shuang Xu, Bo Xu 0002, 
Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation.
AAAI2023
Qingyu Wang, Tielin Zhang, Minglun Han, Yi Wang, Duzhen Zhang, Bo Xu 0002, 
Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition.
ICASSP2022
Xiuyi Chen, Feilong Chen, Shuang Xu, Bo Xu 0002, 
A Multi Domain Knowledge Enhanced Matching Network for Response Selection in Retrieval-Based Dialogue Systems.
ICASSP2022
Feilong Chen, Xiuyi Chen, Shuang Xu, Bo Xu 0002, 
Improving Cross-Modal Understanding in Visual Dialog Via Contrastive Learning.
ICASSP2022
Minglun Han, Linhao Dong, Zhenlin Liang, Meng Cai, Shiyu Zhou, Zejun Ma, Bo Xu 0002, 
Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection.
ICASSP2021
Yunzhe Hao, Jiaming Xu 0001, Peng Zhang, Bo Xu 0002, 
Wase: Learning When to Attend for Speaker Extraction in Cocktail Party Environments.
ICASSP2021
Chenxing Li, Jiaming Xu 0001, Nima Mesgarani, Bo Xu 0002, 
Speaker and Direction Inferred Dual-Channel Speech Separation.
ICASSP2021
Linghui Meng 0001, Jin Xu 0010, Xu Tan 0003, Jindong Wang 0001, Tao Qin 0001, Bo Xu 0002, 
MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition.
Interspeech2021
Zhiyun Fan, Meng Li, Shiyu Zhou, Bo Xu 0002, 
Exploring wav2vec 2.0 on Speaker Verification and Language Identification.
Interspeech2021
Xiyun Li, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Jiaming Xu 0001, Bo Xu 0002, Dong Yu 0001, 
MIMO Self-Attentive RNN Beamformer for Multi-Speaker Speech Separation.
AAAI2021
Qianqian Dong, Mingxuan Wang, Hao Zhou 0012, Shuang Xu, Bo Xu 0002, Lei Li 0005, 
Consecutive Decoding for Speech-to-text Translation.
AAAI2021
Qianqian Dong, Rong Ye, Mingxuan Wang, Hao Zhou 0012, Shuang Xu, Bo Xu 0002, Lei Li 0005, 
Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation.
ICASSP2020
Linhao Dong, Bo Xu 0002, 
CIF: Continuous Integrate-And-Fire for End-To-End Speech Recognition.
Interspeech2020
Jing Shi 0003, Jiaming Xu 0001, Yusuke Fujita, Shinji Watanabe 0001, Bo Xu 0002, 
Speaker-Conditional Chain Model for Speech Separation and Extraction.
ICASSP2024
Ilya Gurvich, Ido Leichter, Dharmendar Reddy Palle, Yossi Asher, Alon Vinnikov, Igor Abramovski, Vishak Gopal, Ross Cutler, Eyal Krupka, 
A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying Mechanism.
ICASSP2024
Babak Naderi, Ross Cutler, Nicolae-Catalin Ristea, 
Multi-Dimensional Speech Quality Assessment in Crowdsourcing.
ICASSP2023
Quchen Fu, Szu-Wei Fu, Yaran Fan, Yu Wu 0012, Zhuo Chen 0006, Jayant Gupchup, Ross Cutler, 
Real-Time Speech Interruption Analysis: from Cloud to Client Deployment.
ICASSP2023
Xavier Gitiaux, Aditya Khant, Ebrahim Beyrami, Chandan K. A. Reddy, Jayant Gupchup, Ross Cutler, 
AURA: Privacy-Preserving Augmentation to Improve Test Set Diversity in Speech Enhancement.
Interspeech2023
Lorenz Diener, Marju Purin, Sten Sootla, Ando Saabas, Robert Aichner, Ross Cutler, 
PLCMOS - A Data-driven Non-intrusive Metric for The Evaluation of Packet Loss Concealment Algorithms.
Interspeech2023
Nicolae-Catalin Ristea, Evgenii Indenbom, Ando Saabas, Tanel Pärnamaa, Jegor Guzvin, Ross Cutler, 
DeepVQE: Real Time Deep Voice Quality Enhancement for Joint Acoustic Echo Cancellation, Noise Suppression and Dereverberation.
ICASSP2022
Ross Cutler, Ando Saabas, Tanel Pärnamaa, Marju Purin, Hannes Gamper, Sebastian Braun, Karsten Sørensen, Robert Aichner, 
ICASSP 2022 Acoustic Echo Cancellation Challenge.
ICASSP2022
Harishchandra Dubey, Vishak Gopal, Ross Cutler, Ashkan Aazami, Sergiy Matusevych, Sebastian Braun, Sefik Emre Eskimez, Manthan Thakker, Takuya Yoshioka, Hannes Gamper, Robert Aichner, 
Icassp 2022 Deep Noise Suppression Challenge.
ICASSP2022
Marju Purin, Sten Sootla, Mateja Sponza, Ando Saabas, Ross Cutler, 
AECMOS: A Speech Quality Assessment Metric for Echo Impairment.
ICASSP2022
Chandan K. A. Reddy, Vishak Gopal, Ross Cutler, 
Dnsmos P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors.
Interspeech2022
Lorenz Diener, Sten Sootla, Solomiya Branets, Ando Saabas, Robert Aichner, Ross Cutler, 
INTERSPEECH 2022 Audio Deep Packet Loss Concealment Challenge.
Interspeech2022
Chandan K. A. Reddy, Vishak Gopal, Harishchandra Dubey, Ross Cutler, Sergiy Matusevych, Robert Aichner, 
MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection.
Interspeech2022
Gaoxiong Yi, Wei Xiao, Yiming Xiao, Babak Naderi, Sebastian Möller 0001, Wafaa Wardah, Gabriel Mittag, Ross Cutler, Zhuohuang Zhang, Donald S. Williamson, Fei Chen 0011, Fuzheng Yang, Shidong Shang, 
ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications.
ICASSP2021
Ross Cutler, Babak Nadari, Markus Loide, Sten Sootla, Ando Saabas, 
Crowdsourcing Approach for Subjective Evaluation of Echo Impairment.
ICASSP2021
Chandan K. A. Reddy, Harishchandra Dubey, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan 0003, 
ICASSP 2021 Deep Noise Suppression Challenge.
ICASSP2021
Chandan K. A. Reddy, Vishak Gopal, Ross Cutler, 
Dnsmos: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors.
ICASSP2021
Kusha Sridhar, Ross Cutler, Ando Saabas, Tanel Pärnamaa, Markus Loide, Hannes Gamper, Sebastian Braun, Robert Aichner, Sriram Srinivasan 0003, 
ICASSP 2021 Acoustic Echo Cancellation Challenge: Datasets, Testing Framework, and Results.
Interspeech2021
Ross Cutler, Ando Saabas, Tanel Pärnamaa, Markus Loide, Sten Sootla, Marju Purin, Hannes Gamper, Sebastian Braun, Karsten Sørensen, Robert Aichner, Sriram Srinivasan 0003, 
INTERSPEECH 2021 Acoustic Echo Cancellation Challenge.
Interspeech2021
Babak Naderi, Ross Cutler, 
Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing.
Interspeech2021
Chandan K. A. Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Asokan Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan 0003, 
INTERSPEECH 2021 Deep Noise Suppression Challenge.
ICASSP2024
Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe 0001, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang, 
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
ICASSP2024
Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization.
ICASSP2024
Amir Hussein, Dorsa Zeinali, Ondrej Klejch, Matthew Wiesner, Brian Yan, Shammur Absar Chowdhury, Ahmed Ali 0002, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Speech Collage: Code-Switched Audio Generation by Collaging Monolingual Corpora.
ICASSP2024
Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe 0001, 
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing.
ICASSP2023
Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, Shinji Watanabe 0001, 
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History.
ICASSP2023
Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe 0001, 
A Study on the Integration of Pipeline and E2E SLU Systems for Spoken Semantic Parsing Toward Stop Quality Challenge.
ICASSP2023
Dan Berrebbi, Brian Yan, Shinji Watanabe 0001, 
Avoid Overthinking in Self-Supervised Models for Speech Recognition.
ICASSP2023
William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe 0001, 
Improving Massively Multilingual ASR with Auxiliary CTC Objectives.
ICASSP2023
Hayato Futami, Jessica Huynh, Siddhant Arora, Shih-Lun Wu, Yosuke Kashiwagi, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe 0001, 
The Pipeline System of ASR and NLU with MLM-based data Augmentation Toward Stop Low-Resource Challenge.
ICASSP2023
Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe 0001, 
E-Branchformer-Based E2E SLU Toward Stop on-Device Challenge.
ICASSP2023
Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, Shinji Watanabe 0001, 
Align, Write, Re-Order: Explainable End-to-End Speech Translation via Operation Sequence Generation.
ICASSP2023
Brian Yan, Matthew Wiesner, Ondrej Klejch, Preethi Jyothi, Shinji Watanabe 0001, 
Towards Zero-Shot Code-Switched Speech Recognition.
Interspeech2023
Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe 0001, 
Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding.
Interspeech2023
Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, Shinji Watanabe 0001, 
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning.
Interspeech2023
Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe 0001, 
Tensor decomposition for minimization of E2E SLU model toward on-device processing.
Interspeech2023
Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe 0001, 
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks.
Interspeech2023
Puyuan Peng, Brian Yan, Shinji Watanabe 0001, David Harwath, 
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization.
Interspeech2023
Peter Polák, Brian Yan, Shinji Watanabe 0001, Alex Waibel, Ondrej Bojar, 
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff.
Interspeech2023
Yui Sudo, Muhammad Shakeel 0001, Brian Yan, Jiatong Shi, Shinji Watanabe 0001, 
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders.
Interspeech2023
Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu 0001, Shinji Watanabe 0001, 
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction.
TASLP2024
Xuechen Liu, Md. Sahidullah, Kong Aik Lee, Tomi Kinnunen, 
Generalizing Speaker Verification for Spoof Awareness in the Embedding Space.
TASLP2023
Xuechen Liu, Xin Wang 0037, Md. Sahidullah, Jose Patino 0001, Héctor Delgado, Tomi Kinnunen, Massimiliano Todisco, Junichi Yamagishi, Nicholas W. D. Evans, Andreas Nautsch, Kong Aik Lee, 
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild.
ICASSP2023
Mark Anderson 0006, Tomi Kinnunen, Naomi Harte, 
Learnable Frontends That Do Not Learn: Quantifying Sensitivity To Filterbank Initialisation.
Interspeech2023
Xuechen Liu, Md. Sahidullah, Kong Aik Lee, Tomi Kinnunen, 
Speaker-Aware Anti-spoofing.
Interspeech2023
Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang 0037, Xuechen Liu, Md. Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas W. D. Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung, 
Towards Single Integrated Spoofing-aware Speaker Verification Embeddings.
Interspeech2023
Hye-jin Shim, Rosa González Hautamäki, Md. Sahidullah, Tomi Kinnunen, 
How to Construct Perfect and Worse-than-Coin-Flip Spoofing Countermeasures: A Word of Warning on Shortcut Learning.
Interspeech2023
Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen, 
Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing.
Interspeech2023
Vishwanath Pratap Singh, Md. Sahidullah, Tomi Kinnunen, 
Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech.
SpeechComm2022
Lauri Tavi, Tomi Kinnunen, Rosa González Hautamäki, 
Improving speaker de-identification with functional data analysis of f0 trajectories.
TASLP2022
Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi, 
Optimizing Tandem Speaker Verification and Anti-Spoofing Systems.
ICASSP2022
Xuechen Liu, Md. Sahidullah, Tomi Kinnunen, 
Learnable Nonlinear Compression for Robust Speaker Verification.
Interspeech2022
Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas W. D. Evans, Tomi Kinnunen, 
SASV 2022: The First Spoofing-Aware Speaker Verification Challenge.
Interspeech2021
Bhusan Chettri, Rosa González Hautamäki, Md. Sahidullah, Tomi Kinnunen, 
Data Quality as Predictor of Voice Anti-Spoofing Generalization.
Interspeech2021
Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee, 
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.
TASLP2020
Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.
Interspeech2020
Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li 0001, 
The Attacker's Perspective on Automatic Speaker Verification: An Overview.
Interspeech2020
Rosa González Hautamäki, Tomi Kinnunen, 
Why Did the x-Vector System Miss a Target Speaker? Impact of Acoustic Mismatch Upon Target Score on VoxCeleb Data.
Interspeech2020
Xuechen Liu, Md. Sahidullah, Tomi Kinnunen, 
A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings.
Interspeech2020
Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee, 
Extrapolating False Alarm Rates in Automatic Speaker Verification.
TASLP2019
Akihiro Kato, Tomi H. Kinnunen, 
Statistical Regression Models for Noise Robust F0 Estimation Using Recurrent Deep Neural Networks.
TASLP2024
Christoph Böddeker, Aswin Shanmugam Subramanian, Gordon Wichern, Reinhold Haeb-Umbach, Jonathan Le Roux, 
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings.
ICASSP2024
Tobias Cord-Landwehr, Christoph Böddeker, Catalin Zorila, Rama Doddipatla, Reinhold Haeb-Umbach, 
Geodesic Interpolation of Frame-Wise Speaker Embeddings for the Diarization of Meeting Scenarios.
TASLP2023
Thilo von Neumann, Keisuke Kinoshita, Christoph Böddeker, Marc Delcroix, Reinhold Haeb-Umbach, 
Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation Criteria.
ICASSP2023
Tobias Cord-Landwehr, Christoph Böddeker, Catalin Zorila, Rama Doddipatla, Reinhold Haeb-Umbach, 
Frame-Wise and Overlap-Robust Speaker Embeddings for Meeting Diarization.
ICASSP2023
Thilo von Neumann, Christoph Böddeker, Keisuke Kinoshita, Marc Delcroix, Reinhold Haeb-Umbach, 
On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems.
Interspeech2023
Simon Berger, Peter Vieting, Christoph Böddeker, Ralf Schlüter, Reinhold Haeb-Umbach, 
Mixture Encoder for Joint Speech Separation and Recognition.
Interspeech2023
Tobias Cord-Landwehr, Christoph Böddeker, Catalin Zorila, Rama Doddipatla, Reinhold Haeb-Umbach, 
A Teacher-Student Approach for Extracting Informative Speaker Embeddings From Speech Mixtures.
ICASSP2022
Thilo von Neumann, Keisuke Kinoshita, Christoph Böddeker, Marc Delcroix, Reinhold Haeb-Umbach, 
SA-SDR: A Novel Loss Function for Separation of Meeting Style Data.
Interspeech2022
Christoph Böddeker, Tobias Cord-Landwehr, Thilo von Neumann, Reinhold Haeb-Umbach, 
An Initialization Scheme for Meeting Separation with Spatial Mixture Models.
Interspeech2022
Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Böddeker, Reinhold Haeb-Umbach, 
Utterance-by-utterance overlap-aware neural diarization with Graph-PIT.
Interspeech2022
Michael Kuhlmann, Fritz Seebauer, Janek Ebbers, Petra Wagner, Reinhold Haeb-Umbach, 
Investigation into Target Speaking Rate Adaptation for Voice Conversion.
ICASSP2021
Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach, 
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.
Interspeech2021
Thilo von Neumann, Keisuke Kinoshita, Christoph Böddeker, Marc Delcroix, Reinhold Haeb-Umbach, 
Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation of Arbitrary Numbers of Speakers.
TASLP2020
Tomohiro Nakatani, Christoph Böddeker, Keisuke Kinoshita, Rintaro Ikeshita, Marc Delcroix, Reinhold Haeb-Umbach, 
Jointly Optimal Denoising, Dereverberation, and Source Separation.
ICASSP2020
Jens Heitkaemper, Darius Jakobeit, Christoph Böddeker, Lukas Drude, Reinhold Haeb-Umbach, 
Demystifying TasNet: A Dissecting Approach.
ICASSP2020
Thilo von Neumann, Keisuke Kinoshita, Lukas Drude, Christoph Böddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach, 
End-to-End Training of Time Domain Audio Separation and Recognition.
Interspeech2020
Jens Heitkaemper, Joerg Schmalenstroeer, Reinhold Haeb-Umbach, 
Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments.
Interspeech2020
Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach, 
Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation.
Interspeech2020
Thilo von Neumann, Christoph Böddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach, 
Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR.
ICASSP2019
Jahn Heymann, Lukas Drude, Reinhold Haeb-Umbach, Keisuke Kinoshita, Tomohiro Nakatani, 
Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online ASR.
TASLP2024
Zainab Alhakeem, Se-In Jang, Hong-Goo Kang, 
Disentangled Representations in Local-Global Contexts for Arabic Dialect Identification.
ICASSP2024
Hejung Yang, Hong-Goo Kang, 
On Fine-Tuning Pre-Trained Speech Models With EMA-Target Self-Supervised Loss.
ICASSP2023
Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang, 
Style Modeling for Multi-Speaker Articulation-to-Speech.
ICASSP2023
Zhenyu Piao, Miseul Kim, Hyungchan Yoon, Hong-Goo Kang, 
HappyQuokka System for ICASSP 2023 Auditory EEG Challenge.
Interspeech2023
Woo-Jin Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang, 
MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion.
Interspeech2023
Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang, 
HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders.
Interspeech2023
Jihyun Kim, Hong-Goo Kang, 
Contrastive Learning based Deep Latent Masking for Music Source Separation.
Interspeech2023
Hejung Yang, Hong-Goo Kang, 
Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement.
Interspeech2023
Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang, 
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech.
Interspeech2023
Hyungchan Yoon, Seyun Um, Changhwan Kim, Hong-Goo Kang, 
Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech.
ICASSP2022
Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang, 
Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement.
Interspeech2022
Miseul Kim, Zhenyu Piao, Seyun Um, Ran Lee, Jaemin Joh, Seungshin Lee, Hong-Goo Kang, 
Light-Weight Speaker Verification with Global Context Information.
Interspeech2022
Changhwan Kim, Seyun Um, Hyungchan Yoon, Hong-Goo Kang, 
FluentTTS: Text-dependent Fine-grained Style Control for Multi-style TTS.
Interspeech2022
Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang, 
Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting.
Interspeech2021
Huu-Kim Nguyen, Kihyuk Jeong, Seyun Um, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang, 
LiteTTS: A Lightweight Mel-Spectrogram-Free Text-to-Wave Synthesizer Based on Generative Adversarial Networks.
ICASSP2020
Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank K. Soong, Hong-Goo Kang, 
Improving LPCNET-Based Text-to-Speech with Linear Prediction-Structured Mixture Density Network.
ICASSP2020
Se-Yun Um, Sangshin Oh, Kyungguen Byun, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang, 
Emotional Speech Synthesis with Rich and Granularized Control.
Interspeech2020
Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang, 
FaceFilter: Audio-Visual Speech Separation Using Still Images.
Interspeech2020
Soo-Whan Chung, Hong-Goo Kang, Joon Son Chung, 
Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision.
Interspeech2020
Hyewon Han, Soo-Whan Chung, Hong-Goo Kang, 
MIRNet: Learning Multiple Identities Representations in Overlapped Speech.
ICASSP2024
Jian-Tao Zhang, Yan Song 0001, Jin Li, Wu Guo, Hao-Yu Song, Ian McLoughlin 0001, 
Meta Representation Learning Method for Robust Speaker Verification in Unseen Domains.
ICASSP2023
Hang-Rui Hu, Yan Song 0001, Jian-Tao Zhang, Li-Rong Dai 0001, Ian McLoughlin 0001, Zhu Zhuo, Yu Zhou, Yu-Hong Li, Hui Xue, 
Stargan-vc Based Cross-Domain Data Augmentation for Speaker Verification.
Interspeech2023
Kang Li, Yan Song 0001, Ian McLoughlin 0001, Lin Liu 0017, Jin Li, Li-Rong Dai 0001, 
Fine-tuning Audio Spectrogram Transformer with Task-aware Adapters for Sound Event Detection.
Interspeech2023
Xiao-Min Zeng, Yan Song 0001, Ian McLoughlin 0001, Lin Liu 0017, Li-Rong Dai 0001, 
Robust Prototype Learning for Anomalous Sound Detection.
ICASSP2022
Han Chen, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu 0017, 
Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift.
ICASSP2022
Hang-Rui Hu, Yan Song 0001, Ying Liu, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu 0017, 
Domain Robust Deep Embedding Learning for Speaker Recognition.
ICASSP2022
Yuxuan Xi, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu 0017, 
Frontend Attributes Disentanglement for Speech Emotion Recognition.
Interspeech2022
Hang-Rui Hu, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu 0017, 
Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification.
TASLP2021
Jian Tang, Jie Zhang 0042, Yan Song 0001, Ian McLoughlin 0001, Li-Rong Dai 0001, 
Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR.
ICASSP2021
Ying Liu, Yan Song 0001, Ian McLoughlin 0001, Lin Liu 0017, Li-Rong Dai 0001, 
An Effective Deep Embedding Learning Method Based on Dense-Residual Networks for Speaker Verification.
Interspeech2021
Hui Wang, Lin Liu 0017, Yan Song 0001, Lei Fang, Ian McLoughlin 0001, Li-Rong Dai 0001, 
A Weight Moving Average Based Alternate Decoupled Learning Algorithm for Long-Tailed Language Identification.
Interspeech2021
Xu Zheng, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu 0017, 
An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection.
ICASSP2020
Hui Wang, Yan Song 0001, Zengxi Li, Ian McLoughlin 0001, Li-Rong Dai 0001, 
An Online Speaker-aware Speech Separation Approach Based on Time-domain Representation.
ICASSP2020
Jie Yan, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, 
Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection.
Interspeech2020
Ying Liu, Yan Song 0001, Yiheng Jiang, Ian McLoughlin 0001, Lin Liu 0017, Li-Rong Dai 0001, 
An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions.
Interspeech2020
Zi-qiang Zhang, Yan Song 0001, Jian-Shu Zhang, Ian McLoughlin 0001, Li-Rong Dai 0001, 
Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution.
Interspeech2020
Xu Zheng, Yan Song 0001, Jie Yan, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu 0017, 
An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection.
TASLP2019
Zengxi Li, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, 
Listening and Grouping: An Online Autoregressive Approach for Monaural Speech Separation.
ICASSP2019
Jian Sun, Wu Guo, Zhi Chen, Yan Song 0001, 
Topic Detection in Conversational Telephone Speech Using CNN with Multi-stream Inputs.
ICASSP2019
Jie Yan, Yan Song 0001, Wu Guo, Li-Rong Dai 0001, Ian McLoughlin 0001, Liang Chen, 
A Region Based Attention Method for Weakly Supervised Sound Event Detection and Classification.
ICASSP2024
Rupak Vignesh Swaminathan, Grant P. Strimel, Ariya Rastrow, Sri Harish Mallidi, Kai Zhen, Hieu Duy Nguyen, Nathan Susanj, Athanasios Mouchtaris, 
Max-Margin Transducer Loss: Improving Sequence-Discriminative Training Using a Large-Margin Learning Strategy.
ICASSP2023
Anastasios Alexandridis, Kanthashree Mysore Sathyendra, Grant P. Strimel, Feng-Ju Chang, Ariya Rastrow, Nathan Susanj, Athanasios Mouchtaris, 
Gated Contextual Adapters For Selective Contextual Biasing In Neural Transducers.
ICASSP2023
Xuandi Fu, Kanthashree Mysore Sathyendra, Ankur Gandhe, Jing Liu, Grant P. Strimel, Ross McGowan, Athanasios Mouchtaris, 
Robust Acoustic And Semantic Contextual Biasing In Neural Transducers For Speech Recognition.
ICASSP2023
Markus Müller, Anastasios Alexandridis, Zach Trozenski, Joel Whiteman, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, Siegfried Kunzmann, 
Multilingual End-To-End Spoken Language Understanding For Ultra-Low Footprint Applications.
ICASSP2023
Saumya Y. Sahai, Jing Liu, Thejaswi Muniyappa, Kanthashree Mysore Sathyendra, Anastasios Alexandridis, Grant P. Strimel, Ross McGowan, Ariya Rastrow, Feng-Ju Chang, Athanasios Mouchtaris, Siegfried Kunzmann, 
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition.
Interspeech2023
Martin Radfar, Paulina Lyskawa, Brandon Trujillo, Yi Xie, Kai Zhen, Jahn Heymann, Denis Filimonov, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, 
Conmer: Streaming Conformer Without Self-attention for Interactive Voice Assistants.
ICASSP2022
Bhuvan Agrawal, Markus Müller, Samridhi Choudhary, Martin Radfar, Athanasios Mouchtaris, Ross McGowan, Nathan Susanj, Siegfried Kunzmann, 
Tie Your Embeddings Down: Cross-Modal Latent Spaces for End-to-end Spoken Language Understanding.
ICASSP2022
Anastasios Alexandridis, Grant P. Strimel, Ariya Rastrow, Pavel Kveton, Jon Webb, Maurizio Omologo, Siegfried Kunzmann, Athanasios Mouchtaris, 
Caching Networks: Capitalizing on Common Speech for ASR.
ICASSP2022
Anastasios Alexandridis, Kanthashree Mysore Sathyendra, Grant P. Strimel, Pavel Kveton, Jon Webb, Athanasios Mouchtaris, 
TINYS2I: A Small-Footprint Utterance Classification Model with Contextual Support for On-Device SLU.
ICASSP2022
Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Feng-Ju Chang, Jing Liu, Jinru Su, Grant P. Strimel, Athanasios Mouchtaris, Siegfried Kunzmann, 
Contextual Adapters for Personalized Speech Recognition in Neural Transducers.
ICASSP2022
Kai Wei, Dillon Knox, Martin Radfar, Thanh Tran, Markus Müller, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, Maurizio Omologo, 
A Neural Prosody Encoder for End-to-End Dialogue Act Classification.
Interspeech2022
Kaiqi Zhao 0002, Hieu Nguyen, Animesh Jain, Nathan Susanj, Athanasios Mouchtaris, Lokesh Gupta, Ming Zhao 0002, 
Knowledge Distillation via Module Replacing for Automatic Speech Recognition with Recurrent Neural Network Transducer.
Interspeech2022
Martin Radfar, Rohit Barnwal, Rupak Vignesh Swaminathan, Feng-Ju Chang, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, 
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition.
Interspeech2022
Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian John King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel, 
Compute Cost Amortized Transformer for Streaming ASR.
Interspeech2022
Kai Zhen, Hieu Duy Nguyen, Raviteja Chinta, Nathan Susanj, Athanasios Mouchtaris, Tariq Afzal, Ariya Rastrow, 
Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition.
ICASSP2021
Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Brian John King, Siegfried Kunzmann, 
End-to-End Multi-Channel Transformer for Speech Recognition.
ICASSP2021
Surabhi Punjabi, Harish Arsikere, Zeynab Raeesy, Chander Chandak, Nikhil Bhave, Ankish Bansal, Markus Müller, Sergio Murillo, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Sri Garimella, Roland Maas, Mat Hans, Athanasios Mouchtaris, Siegfried Kunzmann, 
Joint ASR and Language Identification Using RNN-T: An Efficient Approach to Dynamic Language Switching.
Interspeech2021
Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo, 
Multi-Channel Transformer Transducer for Speech Recognition.
Interspeech2021
Vasileios Papadourakis, Markus Müller, Jing Liu, Athanasios Mouchtaris, Maurizio Omologo, 
Phonetically Induced Subwords for End-to-End Speech Recognition.
Interspeech2021
Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann, Ariya Rastrow, 
FANS: Fusing ASR and NLU for On-Device SLU.
ICASSP2024
Yuan Tseng, Layne Berry, Yiting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Poyao Huang 0001, Chun-Mao Lai, Shang-Wen Li 0001, David Harwath, Yu Tsao 0001, Abdelrahman Mohamed, Chi-Luen Feng, Hung-Yi Lee, 
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.
ACL2024
Puyuan Peng, Po-Yao Huang 0001, Shang-Wen Li 0001, Abdelrahman Mohamed, David Harwath, 
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild.
ACL2024
Jordan Voas, David Harwath, Raymond Mooney, 
Multimodal Contextualized Semantic Parsing from Speech.
ICASSP2023
Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-Yi Lee, David Harwath, 
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval.
ICASSP2023
Changan Chen, Wei Sun, David Harwath, Kristen Grauman, 
Learning Audio-Visual Dereverberation.
ICASSP2023
Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed, 
Continual Learning for On-Device Speech Recognition Using Disentangled Conformers.
ICASSP2023
Reem Gody, David Harwath, 
Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models.
ICASSP2023
Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas 0001, Rogério Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James R. Glass, 
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval.
Interspeech2023
Chiori Hori, Puyuan Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh K. Jha, Diego Romeres, Jonathan Le Roux, 
Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos.
Interspeech2023
Puyuan Peng, Shang-Wen Li 0001, Okko Räsänen, Abdelrahman Mohamed, David Harwath, 
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model.
Interspeech2023
Puyuan Peng, Brian Yan, Shinji Watanabe 0001, David Harwath, 
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization.
Interspeech2023
Andrew Rouditchenko, Sameer Khurana, Samuel Thomas 0001, Rogério Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James R. Glass, 
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages.
ICLR2023
Yuan Gong 0001, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James R. Glass, 
Contrastive Audio-Visual Masked Autoencoder.
ICASSP2022
Puyuan Peng, David Harwath, 
Fast-Slow Transformer for Visually Grounding Speech.
ICASSP2022
David Xu 0006, David Harwath, 
Adversarial Input Ablation for Audio-Visual Learning.
Interspeech2022
Alan Baade, Puyuan Peng, David Harwath, 
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer.
Interspeech2022
Tyler Miller, David Harwath, 
Exploring Few-Shot Fine-Tuning Strategies for Models of Visually Grounded Speech.
Interspeech2022
Puyuan Peng, David Harwath, 
Word Discovery in Visually Grounded, Self-Supervised Speech Models.
Interspeech2021
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas 0001, Hilde Kuehne, Brian Chen 0001, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass, 
Cascaded Multilingual Audio-Visual Learning from Videos.
Interspeech2021
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Brian Chen 0001, Dhiraj Joshi, Samuel Thomas 0001, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogério Schmidt Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba 0001, James R. Glass, 
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
ICASSP2023
Ali Elkahky, Wei-Ning Hsu, Paden Tomasello, Tu Anh Nguyen, Robin Algayres, Yossi Adi, Jade Copet, Emmanuel Dupoux, Abdelrahman Mohamed, 
Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training?
Interspeech2023
Mark Hallap, Emmanuel Dupoux, Ewan Dunbar, 
Evaluating context-invariance in unsupervised speech representations.
Interspeech2023
Marvin Lavechin, Yaya Sy, Hadrien Titeux, María Andrea Cruz Blandón, Okko Räsänen, Hervé Bredin, Emmanuel Dupoux, Alejandrina Cristià, 
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models.
Interspeech2023
Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarandi, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux, 
Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis.
Interspeech2023
Maureen de Seyssel, Marvin Lavechin, Hadrien Titeux, Arthur Thomas, Gwendal Virlet, Andrea Santos Revilla, Guillaume Wisniewski, Bogdan Ludusan, Emmanuel Dupoux, 
ProsAudit, a prosodic benchmark for self-supervised speech models.
Interspeech2023
Yaya Sy, William N. Havard, Marvin Lavechin, Emmanuel Dupoux, Alejandrina Cristià, 
Measuring Language Development From Child-centered Recordings.
NeurIPS2023
Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Défossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz 0001, Yossi Adi, 
Textually Pretrained Speech Language Models.
EMNLP2023
Robin Algayres, Yossi Adi, Tu Anh Nguyen, Jade Copet, Gabriel Synnaeve, Benoît Sagot, Emmanuel Dupoux, 
Generative Spoken Language Model based on continuous word-sized audio tokens.
EMNLP-Findings2023
Robin Algayres, Pablo Diego-Simon, Benoît Sagot, Emmanuel Dupoux, 
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words.
Interspeech2022
Robin Algayres, Adel Nabli, Benoît Sagot, Emmanuel Dupoux, 
Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning.
Interspeech2022
Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski, 
Probing phoneme, language and speaker information in unsupervised speech representations.
ACL2022
Eugene Kharitonov, Ann Lee 0001, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu, 
Text-Free Prosody-Aware Generative Spoken Language Modeling.
EMNLP2022
Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi, 
Textless Speech Emotion Conversion using Discrete & Decomposed Representations.
Interspeech2021
Ewan Dunbar, Mathieu Bernard, Nicolas Hamilakis, Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Eugene Kharitonov, Emmanuel Dupoux, 
The Zero Resource Speech Challenge 2021: Spoken Language Modelling.
Interspeech2021
Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, 
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
ACL2021
Changhan Wang, Morgane Rivière, Ann Lee 0001, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Miguel Pino, Emmanuel Dupoux, 
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation.
TASLP2020
Odette Scharenborg, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, 
Speech Technology for Unwritten Languages.
ICASSP2020
Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux, 
Libri-Light: A Benchmark for ASR with Limited or No Supervision.
ICASSP2020
Morgane Rivière, Armand Joulin, Pierre-Emmanuel Mazaré, Emmanuel Dupoux, 
Unsupervised Pretraining Transfers Well Across Languages.
Interspeech2020
Robin Algayres, Mohamed Salah Zaïem, Benoît Sagot, Emmanuel Dupoux, 
Evaluating the Reliability of Acoustic Speech Embeddings.
ICASSP2024
Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, Jinxi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer, 
Prompting Large Language Models with Speech Recognition Abilities.
ICASSP2024
Egor Lakomkin, Chunyang Wu, Yassir Fathullah, Ozlem Kalinli, Michael L. Seltzer, Christian Fuegen, 
End-to-End Speech Recognition Contextualization with Large Language Models.
NAACL2024
Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Ke Li, Junteng Jia, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer, 
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs.
Interspeech2023
Pingchuan Ma 0001, Niko Moritz, Stavros Petridis, Christian Fuegen, Maja Pantic, 
Streaming Audio-Visual Speech Recognition with Alignment Regularization.
Interspeech2023
Ju Lin, Niko Moritz, Ruiming Xie, Kaustubh Kalgaonkar, Christian Fuegen, Frank Seide, 
Directional Speech Recognition for Speaker Disambiguation and Cross-talk Suppression.
Interspeech2022
Suyoun Kim, Duc Le, Weiyi Zheng, Tarun Singh, Abhinav Arora, Xiaoyu Zhai, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer, 
Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric.
Interspeech2022
Weiyi Zheng, Alex Xiao, Gil Keren, Duc Le, Frank Zhang 0001, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed, 
Scaling ASR Improves Zero and Few Shot Learning.
ICASSP2021
Suyoun Kim, Yuan Shangguan, Jay Mahadeokar, Antoine Bruguier, Christian Fuegen, Michael L. Seltzer, Duc Le, 
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer.
ICASSP2021
Ju Lin, Yun Wang, Kaustubh Kalgaonkar, Gil Keren, Didi Zhang, Christian Fuegen, 
A Time-Domain Convolutional Recurrent Network for Packet Loss Concealment.
ICASSP2021
Ganesh Venkatesh, Alagappan Valliappan, Jay Mahadeokar, Yuan Shangguan, Christian Fuegen, Michael L. Seltzer, Vikas Chandra, 
Memory-Efficient Speech Recognition on Smart Devices.
ICASSP2021
Alex Xiao, Christian Fuegen, Abdelrahman Mohamed, 
Contrastive Semi-Supervised Learning for ASR.
Interspeech2021
Anurag Kumar 0003, Yun Wang, Vamsi Krishna Ithapu, Christian Fuegen, 
Do Sound Event Representations Generalize to Other Audio Tasks? A Case Study in Audio Transfer Learning.
Interspeech2021
Suyoun Kim, Abhinav Arora, Duc Le, Ching-Feng Yeh, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer, 
Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding.
Interspeech2021
Duc Le, Mahaveer Jain, Gil Keren, Suyoun Kim, Yangyang Shi, Jay Mahadeokar, Julian Chan, Yuan Shangguan, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Michael L. Seltzer, 
Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion.
Interspeech2021
Ju Lin, Yun Wang, Kaustubh Kalgaonkar, Gil Keren, Didi Zhang, Christian Fuegen, 
A Two-Stage Approach to Speech Bandwidth Extension.
Interspeech2021
Jay Mahadeokar, Yangyang Shi, Yuan Shangguan, Chunyang Wu, Alex Xiao, Hang Su, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer, 
Flexi-Transducer: Optimizing Latency, Accuracy and Compute for Multi-Domain On-Device Scenarios.
Interspeech2021
Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer, 
Dissecting User-Perceived Latency of On-Device E2E Speech Recognition.
Interspeech2021
Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer, 
Dynamic Encoder Transducer: A Flexible Solution for Trading Off Accuracy for Latency.
Interspeech2021
Chunyang Wu, Zhiping Xiu, Yangyang Shi, Ozlem Kalinli, Christian Fuegen, Thilo Köhler, Qing He, 
Transformer-Based Acoustic Modeling for Streaming Speech Synthesis.
ICASSP2020
Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux, 
Libri-Light: A Benchmark for ASR with Limited or No Supervision.
ICASSP2024
Rupak Vignesh Swaminathan, Grant P. Strimel, Ariya Rastrow, Sri Harish Mallidi, Kai Zhen, Hieu Duy Nguyen, Nathan Susanj, Athanasios Mouchtaris, 
Max-Margin Transducer Loss: Improving Sequence-Discriminative Training Using a Large-Margin Learning Strategy.
ACL-Findings2024
Aditya Gourav, Jari Kolehmainen, Prashanth Gurunath Shivakumar, Yile Gu, Grant P. Strimel, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko, 
Multi-Modal Retrieval For Large Language Model Based Speech Recognition.
ICASSP2023
Anastasios Alexandridis, Kanthashree Mysore Sathyendra, Grant P. Strimel, Feng-Ju Chang, Ariya Rastrow, Nathan Susanj, Athanasios Mouchtaris, 
Gated Contextual Adapters For Selective Contextual Biasing In Neural Transducers.
ICASSP2023
Feng-Ju Chang, Thejaswi Muniyappa, Kanthashree Mysore Sathyendra, Kai Wei, Grant P. Strimel, Ross McGowan, 
Dialog Act Guided Contextual Adapter for Personalized Speech Recognition.
ICASSP2023
Xuandi Fu, Kanthashree Mysore Sathyendra, Ankur Gandhe, Jing Liu, Grant P. Strimel, Ross McGowan, Athanasios Mouchtaris, 
Robust Acoustic And Semantic Contextual Biasing In Neural Transducers For Speech Recognition.
ICASSP2023
Markus Müller, Anastasios Alexandridis, Zach Trozenski, Joel Whiteman, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, Siegfried Kunzmann, 
Multilingual End-To-End Spoken Language Understanding For Ultra-Low Footprint Applications.
ICASSP2023
Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant P. Strimel, Andreas Stolcke, Ivan Bulyko, 
Procter: Pronunciation-Aware Contextual Adapter For Personalized Speech Recognition In Neural Transducers.
ICASSP2023
Saumya Y. Sahai, Jing Liu, Thejaswi Muniyappa, Kanthashree Mysore Sathyendra, Anastasios Alexandridis, Grant P. Strimel, Ross McGowan, Ariya Rastrow, Feng-Ju Chang, Athanasios Mouchtaris, Siegfried Kunzmann, 
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition.
Interspeech2023
Yiting Lu, Philip Harding, Kanthashree Mysore Sathyendra, Sibo Tong, Xuandi Fu, Jing Liu, Feng-Ju Chang, Simon Wiesler, Grant P. Strimel, 
Model-Internal Slot-triggered Biasing for Domain Expansion in Neural Transducer ASR Models.
Interspeech2023
Martin Radfar, Paulina Lyskawa, Brandon Trujillo, Yi Xie, Kai Zhen, Jahn Heymann, Denis Filimonov, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, 
Conmer: Streaming Conformer Without Self-attention for Interactive Voice Assistants.
ICASSP2022
Anastasios Alexandridis, Grant P. Strimel, Ariya Rastrow, Pavel Kveton, Jon Webb, Maurizio Omologo, Siegfried Kunzmann, Athanasios Mouchtaris, 
Caching Networks: Capitalizing on Common Speech for ASR.
ICASSP2022
Anastasios Alexandridis, Kanthashree Mysore Sathyendra, Grant P. Strimel, Pavel Kveton, Jon Webb, Athanasios Mouchtaris, 
TINYS2I: A Small-Footprint Utterance Classification Model with Contextual Support for On-Device SLU.
ICASSP2022
Xuandi Fu, Feng-Ju Chang, Martin Radfar, Kai Wei, Jing Liu, Grant P. Strimel, Kanthashree Mysore Sathyendra, 
Multi-Task RNN-T with Semantic Decoder for Streamable Spoken Language Understanding.
ICASSP2022
Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Feng-Ju Chang, Jing Liu, Jinru Su, Grant P. Strimel, Athanasios Mouchtaris, Siegfried Kunzmann, 
Contextual Adapters for Personalized Speech Recognition in Neural Transducers.
ICASSP2022
Kai Wei, Dillon Knox, Martin Radfar, Thanh Tran, Markus Müller, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, Maurizio Omologo, 
A Neural Prosody Encoder for End-to-End Dialogue Act Classification.
Interspeech2022
Christin Jose, Joe Wang, Grant P. Strimel, Mohammad Omar Khursheed, Yuriy Mishchenko, Brian Kulis, 
Latency Control for Keyword Spotting.
Interspeech2022
Martin Radfar, Rohit Barnwal, Rupak Vignesh Swaminathan, Feng-Ju Chang, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, 
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition.
Interspeech2022
Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian John King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel, 
Compute Cost Amortized Transformer for Streaming ASR.
AAAI2022
Thanh Tran, Kai Wei, Weitong Ruan, Ross McGowan, Nathan Susanj, Grant P. Strimel, 
Adaptive Global-Local Context Fusion for Multi-Turn Spoken Language Understanding.
ICASSP2021
Jon Macoskey, Grant P. Strimel, Ariya Rastrow, 
Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization.
TASLP2024
Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang 0001, 
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research.
ICLR2024
Qianqian Dong, Zhiying Huang, Qi Tian 0001, Chen Xu 0008, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li 0001, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu 0015, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang 0002, 
PolyVoice: Language Models for Speech to Speech Translation.
ACL2024
Zhichao Huang, Chutong Meng, Tom Ko, 
RepCodec: A Speech Representation Codec for Speech Tokenization.
ICASSP2023
Xuxin Cheng, Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Yuexian Zou, 
M3ST: Mix at Three Levels for Speech Translation.
Interspeech2023
Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang 0006, H. Lilian Tang, Mark D. Plumbley, Volkan Kiliç, Wenwu Wang 0001, 
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention.
Interspeech2023
Chutong Meng, Junyi Ao, Tom Ko, Mingxuan Wang, Haizhou Li 0001, 
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning.
Interspeech2023
Rong Ye, Chengqi Zhao, Tom Ko, Chutong Meng, Tao Wang, Mingxuan Wang, Jun Cao, 
GigaST: A 10, 000-hour Pseudo Speech Translation Corpus.
IJCAI2023
Chen Xu 0008, Rong Ye, Qianqian Dong, Chengqi Zhao, Tom Ko, Mingxuan Wang, Tong Xiao, Jingbo Zhu, 
Recent Advances in Direct Speech-to-text Translation.
ACL2023
Chen Xu 0008, Xiaoqian Liu, Xiaowen Liu, Qingxuan Sun, Yuhao Zhang, Murun Yang, Qianqian Dong, Tom Ko, Mingxuan Wang, Tong Xiao, Anxiang Ma, Jingbo Zhu, 
CTC-based Non-autoregressive Speech Translation.
ACL-Findings2023
Dong Zhang, Rong Ye, Tom Ko, Mingxuan Wang, Yaqian Zhou, 
DUB: Discrete Unit Back-translation for Speech Translation.
ICASSP2022
Rui Wang 0073, Junyi Ao, Long Zhou, Shujie Liu 0001, Zhihua Wei 0001, Tom Ko, Qing Li 0001, Yu Zhang 0006, 
Multi-View Self-Attention Based Transformer for Speaker Recognition.
ICASSP2022
Fengpeng Yue, Yan Deng, Lei He 0005, Tom Ko, Yu Zhang 0006, 
Exploring Machine Speech Chain For Domain Adaptation.
Interspeech2022
Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu 0001, Haizhou Li 0001, Tom Ko, Lirong Dai 0001, Jinyu Li 0001, Yao Qian, Furu Wei, 
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.
Interspeech2022
Qibing Bai, Tom Ko, Yu Zhang 0006, 
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis.
Interspeech2022
Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Qibing Bai, Yu Zhang 0006, 
Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation.
Interspeech2022
Rui Wang 0073, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei 0001, Yu Zhang 0006, Tom Ko, Haizhou Li 0001, 
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT.
ACL2022
Junyi Ao, Rui Wang 0073, Long Zhou, Chengyi Wang 0002, Shuo Ren, Yu Wu 0012, Shujie Liu 0001, Tom Ko, Qing Li 0001, Yu Zhang 0006, Zhihua Wei 0001, Yao Qian, Jinyu Li 0001, Furu Wei, 
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing.
Interspeech2021
Yangbin Chen, Tom Ko, Jianping Wang 0001, 
A Meta-Learning Approach for User-Defined Spoken Term Classification with Varying Classes and Examples.
Interspeech2021
Qiushi Huang, Tom Ko, H. Lilian Tang, Xubo Liu, Bo Wu 0018, 
Token-Level Supervised Contrastive Learning for Punctuation Restoration.
Interspeech2021
Jingsong Wang, Yuxuan He, Chunyu Zhao, Qijie Shao, Wei-Wei Tu, Tom Ko, Hung-yi Lee, Lei Xie, 
Auto-KWS 2021 Challenge: Task, Datasets, and Baselines.
ICASSP2024
Paula Andrea Pérez-Toro, Judith Dineley, Agnieszka Kaczkowska, Pauline Conde, Yuezhou Zhang, Faith Matcham, Sara Siddi, Josep Maria Haro, Stuart Bruce, Til Wykes, Raquel Bailón, Srinivasan Vairavan, Richard J. B. Dobson, Andreas K. Maier, Elmar Nöth, Juan Rafael Orozco-Arroyave, Vaibhav A. Narayan, Nicholas Cummins, 
Longitudinal Modeling of Depression Shifts Using Speech and Language.
Interspeech2023
Edward L. Campbell, Judith Dineley, Pauline Conde, Faith Matcham, Katie M. White, Carolin Oetzmann, Sara Simblett, Stuart Bruce, Amos A. Folarin, Til Wykes, Srinivasan Vairavan, Richard J. B. Dobson, Laura Docío Fernández, Carmen García-Mateo, Vaibhav A. Narayan, Matthew Hotopf, Nicholas Cummins, 
Classifying depression symptom severity: Assessment of speech representations in personalized and generalized machine learning models.
Interspeech2023
Judith Dineley, Ewan Carr, Faith Matcham, Johnny Downs, Richard J. B. Dobson, Thomas F. Quatieri, Nicholas Cummins, 
Towards robust paralinguistic assessment for real-world mobile health (mHealth) monitoring: an initial study of reverberation effects on speech.
Interspeech2023
Salvatore Fara, Orlaith Hickey, Alexandra Livia Georgescu, Stefano Goria, Emilia Molimpakis, Nicholas Cummins, 
Bayesian Networks for the robust and unbiased prediction of depression and its symptoms utilizing speech and multimodal data.
Interspeech2022
Salvatore Fara, Stefano Goria, Emilia Molimpakis, Nicholas Cummins, 
Speech and the n-Back task as a lens into depression. How combining both may allow us to isolate different core symptoms of depression.
Interspeech2022
Bahman Mirheidari, André Bittar, Nicholas Cummins, Johnny Downs, Helen L. Fisher, Heidi Christensen, 
Automatic Detection of Expressed Emotion from Five-Minute Speech Samples: Challenges and Opportunities.
TASLP2021
Kazi Nazmul Haque, Rajib Rana, Jiajun Liu, John H. L. Hansen, Nicholas Cummins, Carlos Busso, Björn W. Schuller, 
Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation Using Fewer Labelled Audio Data.
ICASSP2021
Chao Li, Boyang Chen, Ziping Zhao 0001, Nicholas Cummins, Björn W. Schuller, 
Hierarchical Attention-Based Temporal Convolutional Networks for Eeg-Based Emotion Recognition.
Interspeech2021
Judith Dineley, Grace Lavelle, Daniel Leightley, Faith Matcham, Sara Siddi, Maria Teresa Peñarrubia-María, Katie M. White, Alina Ivan, Carolin Oetzmann, Sara Simblett, Erin Dawe-Lane, Stuart Bruce, Daniel Stahl, Yatharth Ranjan, Zulqarnain Rashid, Pauline Conde, Amos A. Folarin, Josep Maria Haro, Til Wykes, Richard J. B. Dobson, Vaibhav A. Narayan, Matthew Hotopf, Björn W. Schuller, Nicholas Cummins, RADAR-CNS Consortium, 
Remote Smartphone-Based Speech Collection: Acceptance and Barriers in Individuals with Major Depressive Disorder.
ICASSP2020
Ziping Zhao 0001, Zhongtian Bao, Zixing Zhang 0001, Nicholas Cummins, Haishuai Wang, Björn W. Schuller, 
Hierarchical Attention Transfer Networks for Depression Assessment from Speech.
Interspeech2020
Merlin Albes, Zhao Ren, Björn W. Schuller, Nicholas Cummins, 
Squeeze for Sneeze: Compact Neural Networks for Cold and Flu Recognition.
Interspeech2020
Alice Baird, Nicholas Cummins, Sebastian Schnieder, Jarek Krajewski, Björn W. Schuller, 
An Evaluation of the Effect of Anxiety on Speech - Computational Prediction of Anxiety from Sustained Vowels.
Interspeech2020
Nicholas Cummins, Yilin Pan, Zhao Ren, Julian Fritsch, Venkata Srikanth Nallanthighal, Heidi Christensen, Daniel Blackburn, Björn W. Schuller, Mathew Magimai-Doss, Helmer Strik, Aki Härmä, 
A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia Recognition.
Interspeech2020
Adria Mallol-Ragolta, Nicholas Cummins, Björn W. Schuller, 
An Investigation of Cross-Cultural Semi-Supervised Learning for Continuous Affect Recognition.
Interspeech2020
Zhao Ren, Jing Han 0010, Nicholas Cummins, Björn W. Schuller, 
Enhancing Transferability of Black-Box Adversarial Attacks via Lifelong Learning for Speech Emotion Recognition Models.
Interspeech2020
Ziping Zhao 0001, Qifei Li, Nicholas Cummins, Bin Liu 0041, Haishuai Wang, Jianhua Tao 0001, Björn W. Schuller, 
Hybrid Network Feature Extraction for Depression Assessment from Speech.
ICASSP2019
Lukas Stappen, Nicholas Cummins, Eva-Maria Meßner, Harald Baumeister, Judith Dineley, Björn W. Schuller, 
Context Modelling Using Hierarchical Attention Networks for Sentiment and Self-assessed Emotion Detection in Spoken Narratives.
Interspeech2019
Alice Baird, Shahin Amiriparian, Nicholas Cummins, Sarah Sturmbauer, Johanna Janson, Eva-Maria Meßner, Harald Baumeister, Nicolas Rohleder, Björn W. Schuller, 
Using Speech to Predict Sequentially Measured Cortisol Levels During a Trier Social Stress Test.
Interspeech2019
Adria Mallol-Ragolta, Ziping Zhao 0001, Lukas Stappen, Nicholas Cummins, Björn W. Schuller, 
A Hierarchical Attention Network-Based Approach for Depression Detection from Transcribed Clinical Interviews.
Interspeech2019
Maximilian Schmitt, Nicholas Cummins, Björn W. Schuller, 
Continuous Emotion Recognition in Speech - Do We Need Recurrence?
TASLP2024
Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen, 
A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training.
ICASSP2024
Yurii Iotov, Sidsel Marie Nørholm, Peter John McCutcheon, Mads Græsbøll Christensen, 
Improving Speech Attenuation in Headphones using Harmonic Model Decomposition and Multiple-Frequency ANC.
SpeechComm2023
Alfredo Esquivel Jaramillo, Jesper Kjær Nielsen, Mads Græsbøll Christensen, 
An adaptive autoregressive pre-whitener for speech and acoustic signals based on parametric NMF.
TASLP2023
Jesper Kjær Nielsen, Mads Græsbøll Christensen, Jesper Bünsow Boldt, 
An Analysis of Traditional Noise Power Spectral Density Estimators Based on the Gaussian Stochastic Volatility Model.
TASLP2023
Junqing Zhang, Liming Shi, Mads Græsbøll Christensen, Wen Zhang 0002, Lijun Zhang 0004, Jingdong Chen, 
CGMM-Based Sound Zone Generation Using Robust Pressure Matching With ATF Perturbation Constraints.
ICASSP2023
Shuai Tao, Himavanth Reddy, Jesper Rindom Jensen, Mads Græsbøll Christensen, 
Frequency Bin-Wise Single Channel Speech Presence Probability Estimation Using Multiple DNNS.
Interspeech2023
Debang Liu, Tianqi Zhang, Mads Græsbøll Christensen, Ying Wei, Zeliang An, 
Audio-Visual Fusion using Multiscale Temporal Convolutional Attention for Time-Domain Speech Separation.
ICASSP2022
Yurii Iotov, Sidsel Marie Nørholm, Valiantsin Belyi, Mads Dyrholm, Mads Græsbøll Christensen, 
Computationally Efficient Fixed-Filter ANC for Speech Based on Long-Term Prediction for Headphone Applications.
ICASSP2022
Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen, 
A Bayesian Permutation Training Deep Representation Learning Method for Speech Enhancement with Variational Autoencoder.
SpeechComm2021
Amir Hossein Poorjam, Mathew Shaji Kavalekalam, Liming Shi, Yordan P. Raykov, Jesper Rindom Jensen, Max A. Little, Mads Græsbøll Christensen, 
Automatic quality control and enhancement for voice-based remote Parkinson's disease detection.
TASLP2021
Liming Shi, Taewoong Lee, Lijun Zhang 0004, Jesper Kjær Nielsen, Mads Græsbøll Christensen, 
Generation of Personal Sound Zones With Physical Meaningful Constraints and Conjugate Gradient Method.
ICASSP2021
Yang Xiang, Liming Shi, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen, 
A Novel NMF-HMM Speech Enhancement Algorithm Based on Poisson Mixture Model.
Interspeech2021
Alfredo Esquivel Jaramillo, Jesper Kjær Nielsen, Mads Græsbøll Christensen, 
Speech Decomposition Based on a Hybrid Speech Model and Optimal Segmentation.
SpeechComm2020
Jesper Rindom Jensen, Sam Karimian-Azari, Mads Græsbøll Christensen, Jacob Benesty, 
Harmonic beamformers for speech enhancement and dereverberation in the time domain.
ICASSP2020
Zihao Cui, Changchun Bao, Jesper Kjær Nielsen, Mads Græsbøll Christensen, 
Autoregressive Parameter Estimation with Dnn-Based Pre-Processing.
ICASSP2020
Alfredo Esquivel Jaramillo, Andreas Jakobsson, Jesper Kjær Nielsen, Mads Græsbøll Christensen, 
Robust Fundamental Frequency Estimation in Coloured Noise.
ICASSP2020
Liming Shi, Taewoong Lee, Lijun Zhang 0004, Jesper Kjær Nielsen, Mads Græsbøll Christensen, 
A Fast Reduced-Rank Sound Zone Control Algorithm Using The Conjugate Gradient Method.
Interspeech2020
Yang Xiang, Liming Shi, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen, 
An NMF-HMM Speech Enhancement Method Based on Kullback-Leibler Divergence.
TASLP2019
Martin Weiss Hansen, Jesper Rindom Jensen, Mads Græsbøll Christensen, 
Estimation of Fundamental Frequencies in Stereophonic Music Mixtures.
TASLP2019
Mathew Shaji Kavalekalam, Jesper Kjær Nielsen, Jesper Bünsow Boldt, Mads Græsbøll Christensen, 
Model-Based Speech Enhancement for Intelligibility Improvement in Binaural Hearing Aids.
ICASSP2024
Paula Andrea Pérez-Toro, Judith Dineley, Agnieszka Kaczkowska, Pauline Conde, Yuezhou Zhang, Faith Matcham, Sara Siddi, Josep Maria Haro, Stuart Bruce, Til Wykes, Raquel Bailón, Srinivasan Vairavan, Richard J. B. Dobson, Andreas K. Maier, Elmar Nöth, Juan Rafael Orozco-Arroyave, Vaibhav A. Narayan, Nicholas Cummins, 
Longitudinal Modeling of Depression Shifts Using Speech and Language.
SpeechComm2023
Paula Andrea Pérez-Toro, Tomás Arias-Vergara, Philipp Klumpp, Juan Camilo Vásquez-Correa, Maria Schuster, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Depression assessment in people with Parkinson's disease: The combination of acoustic features and natural language processing.
ICASSP2023
Paula Andrea Pérez-Toro, Dalia Rodríguez-Salas, Tomás Arias-Vergara, Sebastian P. Bayerl, Philipp Klumpp, Korbinian Riedhammer, Maria Schuster, Elmar Nöth, Andreas K. Maier, Juan Rafael Orozco-Arroyave, 
Transferring Quantified Emotion Knowledge for the Detection of Depression in Alzheimer's Disease Using Forestnets.
Interspeech2023
Soroosh Tayebi Arasteh, Cristian David Ríos-Urrego, Elmar Nöth, Andreas Maier 0001, Seung Hee Yang, Jan Rusz, Juan Rafael Orozco-Arroyave, 
Federated Learning for Secure Development of AI Models for Parkinson's Disease Detection Using Speech from Different Languages.
Interspeech2023
Tomás Arias-Vergara, Elizabeth Londoño-Mora, Paula Andrea Pérez-Toro, Maria Schuster, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas Maier 0001, 
Measuring Phonological Precision in Children with Cleft Lip and Palate.
Interspeech2023
Daniel Escobar-Grisales, Tomás Arias-Vergara, Cristian David Ríos-Urrego, Elmar Nöth, Adolfo M. García, Juan Rafael Orozco-Arroyave, 
An Automatic Multimodal Approach to Analyze Linguistic and Acoustic Cues on Parkinson's Disease Patients.
Interspeech2023
Paula Andrea Pérez-Toro, Tomás Arias-Vergara, Franziska Braun, Florian Hönig, Carlos Andrés Tobón-Quintero, David Aguillón, Francisco Lopera, Liliana Hincapié-Henao, Maria Schuster, Korbinian Riedhammer, Andreas Maier 0001, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Automatic Assessment of Alzheimer's across Three Languages Using Speech and Language Features.
Interspeech2023
Cristian David Ríos-Urrego, Jan Rusz, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Automatic Classification of Hypokinetic and Hyperkinetic Dysarthria based on GMM-Supervectors.
Interspeech2022
Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas K. Maier, Seung Hee Yang, 
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition.
Interspeech2022
Paula Andrea Pérez-Toro, Philipp Klumpp, Abner Hernandez, Tomas Arias, Patricia Lillo, Andrea Slachevsky, Adolfo Martín García, Maria Schuster, Andreas K. Maier, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Alzheimer's Detection from English to Spanish Using Acoustic and Linguistic Embeddings.
Interspeech2022
P. Schäfer, Paula Andrea Pérez-Toro, Philipp Klumpp, Juan Rafael Orozco-Arroyave, Elmar Nöth, Andreas K. Maier, A. Abad, Maria Schuster, Tomás Arias-Vergara, 
CoachLea: an Android Application to Evaluate the Speech Production and Perception of Children with Hearing Loss.
ICASSP2021
Paula Andrea Pérez-Toro, Juan Camilo Vásquez-Correa, Tomás Arias-Vergara, Philipp Klumpp, M. Sierra-Castrillón, M. E. Roldán-López, David Aguillón, Liliana Hincapié-Henao, Carlos Andrés Tobón-Quintero, Tobias Bocklet, Maria Schuster, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Acoustic and Linguistic Analyses to Assess Early-Onset and Genetic Alzheimer's Disease.
ICASSP2021
Juan Camilo Vásquez-Correa, Tomás Arias-Vergara, Philipp Klumpp, Paula Andrea Pérez-Toro, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
End-2-End Modeling of Speech and Gait from Patients with Parkinson's Disease: Comparison Between High Quality Vs. Smartphone Data.
Interspeech2021
Philipp Klumpp, Tobias Bocklet, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Sebastian P. Bayerl, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
The Phonetic Footprint of Covid-19?
Interspeech2021
Paula Andrea Pérez-Toro, Sebastian P. Bayerl, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Philipp Klumpp, Maria Schuster, Elmar Nöth, Juan Rafael Orozco-Arroyave, Korbinian Riedhammer, 
Influence of the Interviewer on the Automatic Assessment of Alzheimer's Disease in the Context of the ADReSSo Challenge.
Interspeech2021
Juan Camilo Vásquez-Correa, Julian Fritsch, Juan Rafael Orozco-Arroyave, Elmar Nöth, Mathew Magimai-Doss, 
On Modeling Glottal Source Information for Phonation Assessment in Parkinson's Disease.
SpeechComm2020
Juan Camilo Vásquez-Correa, Tomás Arias-Vergara, Maria Schuster, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson's Disease and Cleft Lip and Palate.
ICASSP2020
Juan Camilo Vásquez-Correa, Tobias Bocklet, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Comparison of User Models Based on GMM-UBM and I-Vectors for Speech, Handwriting, and Gait Assessment of Parkinson's Disease Patients.
Interspeech2020
Philipp Klumpp, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Florian Hönig, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Surgical Mask Detection with Deep Recurrent Phonetic Models.
Interspeech2019
Tomás Arias-Vergara, Juan Rafael Orozco-Arroyave, Milos Cernak, Sandra Gollwitzer, Maria Schuster, Elmar Nöth, 
Phone-Attribute Posteriors to Evaluate the Speech of Cochlear Implant Users.
TASLP2024
Michele Panariello, Natalia A. Tomashenko, Xin Wang 0037, Xiaoxiao Miao, Pierre Champion, Hubert Nourtel, Massimiliano Todisco, Nicholas W. D. Evans, Emmanuel Vincent 0001, Junichi Yamagishi, 
The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation.
Interspeech2023
Sewade Ogun, Vincent Colotte, Emmanuel Vincent 0001, 
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS.
Interspeech2023
Prerak Srivastava, Antoine Deleforge, Archontis Politis, Emmanuel Vincent 0001, 
How to (Virtually) Train Your Speaker Localizer.
TASLP2022
Brij Mohan Lal Srivastava, Mohamed Maouche, Md. Sahidullah, Emmanuel Vincent 0001, Aurélien Bellet, Marc Tommasi, Natalia A. Tomashenko, Xin Wang 0037, Junichi Yamagishi, 
Privacy and Utility of X-Vector Based Speaker Anonymization.
ICASSP2022
Michel Olvera, Emmanuel Vincent 0001, Gilles Gasso, 
On The Impact of Normalization Strategies in Unsupervised Adversarial Domain Adaptation for Acoustic Scene Classification.
Interspeech2022
Mohamed Maouche, Brij Mohan Lal Srivastava, Nathalie Vauquier, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent 0001, 
Enhancing Speech Privacy with Slicing.
Interspeech2021
Sunit Sivasankaran, Emmanuel Vincent 0001, Dominique Fohr, 
Explaining Deep Learning Models for Speech Enhancement.
ICASSP2020
Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent 0001, 
Filterbank Design for End-to-end Speech Separation.
ICASSP2020
Sunit Sivasankaran, Emmanuel Vincent 0001, Dominique Fohr, 
SLOGD: Speaker Location Guided Deflation Approach to Speech Separation.
ICASSP2020
Brij Mohan Lal Srivastava, Nathalie Vauquier, Md. Sahidullah, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent 0001, 
Evaluating Voice Conversion-Based Privacy Protection against Informed Attackers.
ICASSP2020
Nicolas Turpault, Romain Serizel, Emmanuel Vincent 0001, 
Limitations of Weak Labels for Embedding and Tagging.
Interspeech2020
Samuele Cornell, Maurizio Omologo, Stefano Squartini, Emmanuel Vincent 0001, 
Detecting and Counting Overlapping Speakers in Distant Speech Scenarios.
Interspeech2020
Mathieu Hu, Laurent Pierron, Emmanuel Vincent 0001, Denis Jouvet, 
Kaldi-Web: An Installation-Free, On-Device Speech Recognition System.
Interspeech2020
Mohamed Maouche, Brij Mohan Lal Srivastava, Nathalie Vauquier, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent 0001, 
A Comparative Study of Speech Anonymization Metrics.
Interspeech2020
Manuel Pariente, Samuele Cornell, Joris Cosentino, Sunit Sivasankaran, Efthymios Tzinis, Jens Heitkaemper, Michel Olvera, Fabian-Robert Stöter, Mathieu Hu, Juan M. Martín-Doñas, David Ditter, Ariel Frank, Antoine Deleforge, Emmanuel Vincent 0001, 
Asteroid: The PyTorch-Based Audio Source Separation Toolkit for Researchers.
Interspeech2020
Imran A. Sheikh, Emmanuel Vincent 0001, Irina Illina, 
On Semi-Supervised LF-MMI Training of Acoustic Models with Limited Data.
Interspeech2020
Brij Mohan Lal Srivastava, Natalia A. Tomashenko, Xin Wang 0037, Emmanuel Vincent 0001, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi, 
Design Choices for X-Vector Based Speaker Anonymization.
Interspeech2020
Natalia A. Tomashenko, Brij Mohan Lal Srivastava, Xin Wang 0037, Emmanuel Vincent 0001, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Jose Patino 0001, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco, 
Introducing the VoicePrivacy Initiative.
Interspeech2020
M. A. Tugtekin Turan, Emmanuel Vincent 0001, Denis Jouvet, 
Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation.
SpeechComm2019
Nancy Bertin, Ewen Camberlein, Romain Lebarbenchon, Emmanuel Vincent 0001, Sunit Sivasankaran, Irina Illina, Frédéric Bimbot, 
VoiceHome-2, an extended corpus for multichannel speech processing in real homes.
TASLP2024
Tsubasa Ochiai, Kazuma Iwamoto, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance.
TASLP2024
Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki, Shoji Makino, 
Blind and Spatially-Regularized Online Joint Optimization of Source Separation, Dereverberation, and Noise Reduction.
ICASSP2024
Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
How Does End-To-End Speech Recognition Training Impact Speech Enhancement Artifacts?
ICASSP2024
Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Shoko Araki, Jan Cernocký, 
Target Speech Extraction with Pre-Trained Self-Supervised Learning Models.
TASLP2023
Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Yasunori Ohishi, Shoko Araki, 
SoundBeam: Target Sound Extraction Conditioned on Sound-Class Labels and Enrollment Clues for Increased Performance and Continuous Learning.
Interspeech2023
Shoko Araki, Ayako Yamamoto, Tsubasa Ochiai, Kenichi Arai, Atsunori Ogawa, Tomohiro Nakatani, Toshio Irino, 
Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine.
Interspeech2023
Marc Delcroix, Naohiro Tawara, Mireia Díez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukás Burget, Shoko Araki, 
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization.
ICASSP2022
Atsunori Ogawa, Naohiro Tawara, Marc Delcroix, Shoko Araki, 
Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models.
Interspeech2022
Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR.
ICASSP2021
Julio Wissing, Benedikt T. Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura, 
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain.
Interspeech2021
Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Shoko Araki, 
Few-Shot Learning of New Sound Classes for Target Sound Extraction.
Interspeech2021
Christopher Schymura, Benedikt T. Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa, 
PILOT: Introducing Transformers for Probabilistic Sound Event Localization.
Interspeech2021
Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, 
Comparison of Remote Experiments Using Crowdsourcing and Laboratory Experiments on Speech Intelligibility.
SpeechComm2020
Katsuhiko Yamamoto, Toshio Irino, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani, 
GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech.
ICASSP2020
Marc Delcroix, Tsubasa Ochiai, Katerina Zmolíková, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki, 
Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam.
ICASSP2020
Satoru Emura, Hiroshi Sawada, Shoko Araki, Noboru Harada, 
A Frequency-Domain BSS Method Based on ℓ1 Norm, Unitary Constraint, and Cayley Transform.
ICASSP2020
Rintaro Ikeshita, Tomohiro Nakatani, Shoko Araki, 
Overdetermined Independent Vector Analysis.
ICASSP2020
Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, 
Tackling Real Noisy Reverberant Meetings with All-Neural Source Separation, Counting, and Diarization System.
ICASSP2020
Christopher Schymura, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa, 
A Dynamic Stream Weight Backprop Kalman Filter for Audiovisual Speaker Tracking.
Interspeech2020
Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Toshio Irino, 
Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System.
ICASSP2024
Shansong Liu, Atin Sakkeer Hussain, Chenshuo Sun, Ying Shan, 
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning.
ICASSP2024
Shansong Liu, Xu Li 0015, Dian Li, Ying Shan, 
Humtrans: A Novel Open-Source Dataset for Humming Melody Transcription and Beyond.
ICASSP2024
Tianjun Mao, Shansong Liu, Yunxuan Zhang, Dian Li, Ying Shan, 
Unified Pretraining Target Based Video-Music Retrieval with Music Rhythm and Video Optical Flow Information.
Interspeech2023
Zhihan Yang, Shansong Liu, Xu Li 0015, Haozhe Wu, Zhiyong Wu 0001, Ying Shan, Jia Jia 0001, 
Prosody Modeling with 3D Visual Information for Expressive Video Dubbing.
TASLP2022
Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
ICASSP2022
Shujie Hu, Shansong Liu, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shoukang Hu, Mingyu Cui, Xunying Liu, Helen Meng, 
Exploiting Cross Domain Acoustic-to-Articulatory Inverted Features for Disordered Speech Recognition.
Interspeech2022
Xu Li 0015, Shansong Liu, Ying Shan, 
A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion.
TASLP2021
Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye 0001, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.
TASLP2021
Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng, 
Recent Progress in the CUHK Dysarthric Speech Recognition System.
TASLP2021
Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.
ICASSP2021
Shoukang Hu, Xurong Xie, Shansong Liu, Mingyu Cui, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
ICASSP2021
Boyang Xue, Jianwei Yu, Junhao Xu, Shansong Liu, Shoukang Hu, Zi Ye 0001, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Transformer Language Models for Speech Recognition.
ICASSP2021
Zi Ye 0001, Shoukang Hu, Jinchao Li, Xurong Xie, Mengzhe Geng, Jianwei Yu, Junhao Xu, Boyang Xue, Shansong Liu, Xunying Liu, Helen Meng, 
Development of the Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using the Dementiabank Corpus.
Interspeech2021
Jiajun Deng, Fabian Ritter Gutierrez, Shoukang Hu, Mengzhe Geng, Xurong Xie, Zi Ye 0001, Shansong Liu, Jianwei Yu, Xunying Liu, Helen Meng, 
Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition.
Interspeech2021
Mengzhe Geng, Shansong Liu, Jianwei Yu, Xurong Xie, Shoukang Hu, Zi Ye 0001, Zengrui Jin, Xunying Liu, Helen Meng, 
Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition.
Interspeech2021
Zengrui Jin, Mengzhe Geng, Xurong Xie, Jianwei Yu, Shansong Liu, Xunying Liu, Helen Meng, 
Adversarial Data Augmentation for Disordered Speech Recognition.
ICASSP2020
Jianwei Yu, Shi-Xiong Zhang, Jian Wu 0027, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.
Interspeech2020
Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng, 
Investigation of Data Augmentation Techniques for Disordered Speech Recognition.
Interspeech2020
Shansong Liu, Xurong Xie, Jianwei Yu, Shoukang Hu, Mengzhe Geng, Rongfeng Su, Shi-Xiong Zhang, Xunying Liu, Helen Meng, 
Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition.
ICASSP2019
Shoukang Hu, Max W. Y. Lam, Xurong Xie, Shansong Liu, Jianwei Yu, Xixin Wu, Xunying Liu, Helen Meng, 
Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition.
AAAI2024
Yaoxun Xu, Hangting Chen, Jianwei Yu, Qiaochu Huang, Zhiyong Wu 0001, Shi-Xiong Zhang, Guangzhi Li, Yi Luo 0004, Rongzhi Gu, 
SECap: Speech Emotion Captioning with Large Language Model.
ICASSP2023
Ruize Xu, Ruoxuan Feng, Shi-Xiong Zhang, Di Hu 0001, 
MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning.
Interspeech2023
Yong Xu 0004, Vinay Kothapally, Meng Yu 0003, Shixiong Zhang, Dong Yu 0001, 
Zoneformer: On-device Neural Beamformer For In-car Multi-zone Speech Separation, Enhancement and Echo Cancellation.
ICASSP2022
Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu 0003, Zhenyu Tang 0001, Dinesh Manocha, Dong Yu 0001, 
Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator.
ICASSP2022
Brian Yan, Chunlei Zhang, Meng Yu 0003, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe 0001, Dong Yu 0001, 
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.
Interspeech2022
Vinay Kothapally, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Dong Yu 0001, 
Joint Neural AEC and Beamforming with Double-Talk Detection.
TASLP2021
Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, Jesper Jensen 0001, 
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.
TASLP2021
Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.
TASLP2021
Zhuohuang Zhang, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu 0001, 
Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation.
ICASSP2021
Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe 0001, Meng Yu 0003, Yong Xu 0004, Shi-Xiong Zhang, Dong Yu 0001, 
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.
Interspeech2021
Saurabh Kataria, Shi-Xiong Zhang, Dong Yu 0001, 
Multi-Channel Speaker Verification for Single and Multi-Talker Speech.
Interspeech2021
Xiyun Li, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Jiaming Xu 0001, Bo Xu 0002, Dong Yu 0001, 
MIMO Self-Attentive RNN Beamformer for Multi-Speaker Speech Separation.
Interspeech2021
Helin Wang, Bo Wu, Lianwu Chen, Meng Yu 0003, Jianwei Yu, Yong Xu 0004, Shi-Xiong Zhang, Chao Weng, Dan Su 0002, Dong Yu 0001, 
TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.
Interspeech2021
Yong Xu 0004, Zhuohuang Zhang, Meng Yu 0003, Shi-Xiong Zhang, Dong Yu 0001, 
Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation.
Interspeech2021
Meng Yu 0003, Chunlei Zhang, Yong Xu 0004, Shi-Xiong Zhang, Dong Yu 0001, 
MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment.
ICASSP2020
Yifan Ding, Yong Xu 0004, Shi-Xiong Zhang, Yahuan Cong, Liqiang Wang, 
Self-Supervised Learning for Audio-Visual Speaker Diarization.
ICASSP2020
Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu 0004, Meng Yu 0003, Dan Su 0002, Yuexian Zou, Dong Yu 0001, 
Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning.
ICASSP2020
Aswin Shanmugam Subramanian, Chao Weng, Meng Yu 0003, Shi-Xiong Zhang, Yong Xu 0004, Shinji Watanabe 0001, Dong Yu 0001, 
Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.
ICASSP2020
Jianwei Yu, Shi-Xiong Zhang, Jian Wu 0027, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.
Interspeech2020
Shansong Liu, Xurong Xie, Jianwei Yu, Shoukang Hu, Mengzhe Geng, Rongfeng Su, Shi-Xiong Zhang, Xunying Liu, Helen Meng, 
Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition.
TASLP2024
Tsubasa Ochiai, Kazuma Iwamoto, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance.
ICASSP2024
Kenichi Fujita, Hiroshi Sato, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix, Takafumi Moriya, Yusuke Ijima, 
Noise-Robust Zero-Shot Text-to-Speech Synthesis Conditioned on Self-Supervised Speech-Representation Model with Adapters.
ICASSP2024
Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
How Does End-To-End Speech Recognition Training Impact Speech Enhancement Artifacts?
ICASSP2023
Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, 
Improving Scheduled Sampling for Neural Transducer-Based ASR.
ICASSP2023
Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Hiroshi Sato, Taiga Yamane, Takanori Ashihara, Kohei Matsuura, Takafumi Moriya, 
Leveraging Language Embeddings for Cross-Lingual Self-Supervised Speech Representation Learning.
Interspeech2023
Nobukatsu Hojo, Saki Mizuno, Satoshi Kobashikawa, Ryo Masumura, Mana Ihori, Hiroshi Sato, Tomohiro Tanaka, 
Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer.
Interspeech2023
Mana Ihori, Hiroshi Sato, Tomohiro Tanaka, Ryo Masumura, Saki Mizuno, Nobukatsu Hojo, 
Transcribing Speech as Spoken and Written Dual Text Using an Autoregressive Model.
Interspeech2023
Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando, 
End-to-End Joint Target and Non-Target Speakers ASR.
Interspeech2023
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami, 
Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data.
Interspeech2023
Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo, 
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss.
ICASSP2022
Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix, Takahiro Shinozaki, 
Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.
ICASSP2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya, 
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.
Interspeech2022
Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani, 
Listen only to me! How well can target speech extraction handle false alarms?
Interspeech2022
Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR.
Interspeech2022
Ryo Masumura, Yoshihiro Yamazaki, Saki Mizuno, Naoki Makishima, Mana Ihori, Mihiro Uchida, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Shota Orihashi, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando, 
End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.
Interspeech2022
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki, 
Streaming Target-Speaker ASR with Neural Transducer.
Interspeech2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
Interspeech2022
Tomohiro Tanaka, Ryo Masumura, Hiroshi Sato, Mana Ihori, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, 
Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks.
ICASSP2021
Atsushi Ando, Ryo Masumura, Hiroshi Sato, Takafumi Moriya, Takanori Ashihara, Yusuke Ijima, Tomoki Toda, 
Speech Emotion Recognition Based on Listener Adaptive Models.
ICASSP2021
Takafumi Moriya, Takanori Ashihara, Tomohiro Tanaka, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Yusuke Ijima, Ryo Masumura, Yusuke Shinohara, 
Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.
ICASSP2024
Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara N. Sainath, Françoise Beaufays, Pedro Moreno Mengibar, 
Improving Speech Recognition for African American English with Audio Classification.
ICASSP2024
Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno 0001, 
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models.
NAACL2024
Weiran Wang, Rohit Prabhavalkar, Haozhe Shan, Zhong Meng, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li 0028, James Qin, Xingyu Cai, Adam Stooke, Chengjian Zheng, Yanzhang He, Tara N. Sainath, Pedro Moreno Mengibar, 
Massive End-to-end Speech Recognition Models with Time Reduction.
ICASSP2023
Kartik Audhkhasi, Brian Farris, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Modular Conformer Training for Flexible End-to-End ASR.
ICASSP2023
Tongzhou Chen, Cyril Allauzen, Yinghui Huang, Daniel S. Park, David Rybach, W. Ronny Huang, Rodrigo Cabrera, Kartik Audhkhasi, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Michael Riley 0001, 
Large-Scale Language Model Rescoring on Long-Form Data.
Interspeech2023
Zhouyuan Huo, Khe Chai Sim, Dongseong Hwang, Tsendsuren Munkhdalai, Tara N. Sainath, Pedro Moreno Mengibar, 
Re-investigating the Efficient Transfer Learning of Speech Foundation Model using Feature Fusion Methods.
Interspeech2023
Qiujia Li, Bo Li 0028, Dongseong Hwang, Tara N. Sainath, Pedro Moreno Mengibar, 
Modular Domain Adaptation for Conformer-Based Streaming ASR.
ICASSP2022
Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Gary Wang, 
Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses.
ICASSP2022
Neeraj Gaur, Tongzhou Chen, Ehsan Variani, Parisa Haghani, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Multilingual Second-Pass Rescoring for Automatic Speech Recognition Systems.
Interspeech2022
Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition.
Interspeech2022
Fadi Biadsy, Youzheng Chen, Xia Zhang, Oleg Rybakov, Andrew Rosenberg, Pedro J. Moreno 0001, 
A Scalable Model Specialization Framework for Training and Inference using Submodels and its Application to Speech Model Personalization.
Interspeech2022
Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Ankur Bapna, Heiga Zen, 
MAESTRO: Matched Speech Text Representations through Modality Matching.
Interspeech2022
Gary Wang, Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Jesse Emond, Yinghui Huang, Pedro J. Moreno 0001, 
Non-Parallel Voice Conversion for ASR Augmentation.
ICASSP2021
Rohan Doshi, Youzheng Chen, Liyang Jiang, Xia Zhang, Fadi Biadsy, Bhuvana Ramabhadran, Fang Chu, Andrew Rosenberg, Pedro J. Moreno 0001, 
Extending Parrotron: An End-to-End, Speech Conversion and Speech Recognition Model for Atypical Speech.
ICASSP2021
Neeraj Gaur, Brian Farris, Parisa Haghani, Isabel Leal, Pedro J. Moreno 0001, Manasa Prasad, Bhuvana Ramabhadran, Yun Zhu, 
Mixture of Informed Experts for Multilingual Speech Recognition.
Interspeech2021
Kartik Audhkhasi, Tongzhou Chen, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Mixture Model Attention: Flexible Streaming and Non-Streaming Automatic Speech Recognition.
Interspeech2021
Zhehuai Chen, Bhuvana Ramabhadran, Fadi Biadsy, Xia Zhang, Youzheng Chen, Liyang Jiang, Fang Chu, Rohan Doshi, Pedro J. Moreno 0001, 
Conformer Parrotron: A Faster and Stronger End-to-End Speech Conversion and Recognition Model for Atypical Speech.
Interspeech2021
Zhehuai Chen, Andrew Rosenberg, Yu Zhang 0033, Heiga Zen, Mohammadreza Ghodsi, Yinghui Huang, Jesse Emond, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.
Interspeech2021
Isabel Leal, Neeraj Gaur, Parisa Haghani, Brian Farris, Pedro J. Moreno 0001, Manasa Prasad, Bhuvana Ramabhadran, Yun Zhu, 
Self-Adaptive Distillation for Multilingual Speech Recognition: Leveraging Student Independence.
ICASSP2020
Ehsan Variani, Tongzhou Chen, James Apfel, Bhuvana Ramabhadran, Seungji Lee, Pedro J. Moreno 0001, 
Neural Oracle Search on N-BEST Hypotheses.
ICLR2024
Alon Ziv, Itai Gat, Gaël Le Lan, Tal Remez, Felix Kreuk, Jade Copet, Alexandre Défossez, Gabriel Synnaeve, Yossi Adi, 
Masked Audio Generation using a Single Non-Autoregressive Transformer.
Interspeech2023
Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarandi, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux, 
Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis.
NeurIPS2023
Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Défossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz 0001, Yossi Adi, 
Textually Pretrained Speech Language Models.
NeurIPS2023
Robin San Roman, Yossi Adi, Antoine Deleforge, Romain Serizel, Gabriel Synnaeve, Alexandre Défossez, 
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion.
ICLR2023
Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi, 
AudioGen: Textually Guided Audio Generation.
EMNLP2023
Robin Algayres, Yossi Adi, Tu Anh Nguyen, Jade Copet, Gabriel Synnaeve, Benoît Sagot, Emmanuel Dupoux, 
Generative Spoken Language Model based on continuous word-sized audio tokens.
ICASSP2022
Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert, 
Pseudo-Labeling for Massively Multilingual Speech Recognition.
ICASSP2022
Vineel Pratap, Qiantong Xu, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert, 
Word Order does not Matter for Speech Recognition.
ICASSP2021
Chaitanya Talnikar, Tatiana Likhomanenko, Ronan Collobert, Gabriel Synnaeve, 
Joint Masked CPC And CTC Training For ASR.
ICASSP2021
Qiantong Xu, Alexei Baevski, Tatiana Likhomanenko, Paden Tomasello, Alexis Conneau, Ronan Collobert, Gabriel Synnaeve, Michael Auli, 
Self-Training and Pre-Training are Complementary for Speech Recognition.
Interspeech2021
Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee 0001, Ronan Collobert, Gabriel Synnaeve, Michael Auli, 
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training.
Interspeech2021
Tatiana Likhomanenko, Qiantong Xu, Jacob Kahn, Gabriel Synnaeve, Ronan Collobert, 
slimIPL: Language-Model-Free Iterative Pseudo-Labeling.
Interspeech2021
Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Paden Tomasello, Jacob Kahn, Gilad Avidov, Ronan Collobert, Gabriel Synnaeve, 
Rethinking Evaluation in ASR: Are Our Models Robust Enough?
ICASSP2020
Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux, 
Libri-Light: A Benchmark for ASR with Limited or No Supervision.
ICASSP2020
Andros Tjandra, Chunxi Liu, Frank Zhang 0001, Xiaohui Zhang 0007, Yongqiang Wang 0005, Gabriel Synnaeve, Satoshi Nakamura 0001, Geoffrey Zweig, 
DEJA-VU: Double Feature Presentation and Iterated Loss in Deep Transformer Networks.
Interspeech2020
Alexandre Défossez, Gabriel Synnaeve, Yossi Adi, 
Real Time Speech Enhancement in the Waveform Domain.
Interspeech2020
Da-Rong Liu, Chunxi Liu, Frank Zhang 0001, Gabriel Synnaeve, Yatharth Saraf, Geoffrey Zweig, 
Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model.
Interspeech2020
Vineel Pratap, Anuroop Sriram, Paden Tomasello, Awni Y. Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert, 
Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters.
Interspeech2020
Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Y. Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert, 
Scaling Up Online Speech Recognition Using ConvNets.
Interspeech2020
Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, Ronan Collobert, 
MLS: A Large-Scale Multilingual Dataset for Speech Research.
ACL2024
HyoJung Han, Mohamed Anwar, Juan Pino 0001, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang, 
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception.
ICASSP2023
Jiatong Shi, Yun Tang 0002, Ann Lee 0001, Hirofumi Inaguma, Changhan Wang, Juan Pino 0001, Shinji Watanabe 0001, 
Enhancing Speech-To-Speech Translation with Multiple TTS Targets.
Interspeech2023
Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino 0001, Changhan Wang, 
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation.
Interspeech2023
Jiatong Shi, Yun Tang 0002, Hirofumi Inaguma, Hongyu Gong, Juan Pino 0001, Shinji Watanabe 0001, 
Exploration on HuBERT with Multiple Resolution.
ICML2023
Phuong-Hang Le, Hongyu Gong, Changhan Wang, Juan Pino 0001, Benjamin Lecouteux, Didier Schwab, 
Pre-training for Speech Translation: CTC Meets Optimal Transport.
ACL2023
Yun Tang 0002, Anna Y. Sun, Hirofumi Inaguma, Xinyue Chen, Ning Dong, Xutai Ma, Paden Tomasello, Juan Pino 0001, 
Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks.
ACL2023
Paul-Ambroise Duquenne, Hongyu Gong, Ning Dong, Jingfei Du, Ann Lee 0001, Vedanuj Goswami, Changhan Wang, Juan Pino 0001, Benoît Sagot, Holger Schwenk, 
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations.
ACL2023
Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang 0002, Ann Lee 0001, Shinji Watanabe 0001, Juan Pino 0001, 
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units.
ACL2023
Changhan Wang, Hirofumi Inaguma, Peng-Jen Chen, Ilia Kulikov, Yun Tang 0002, Wei-Ning Hsu, Michael Auli, Juan Pino 0001, 
Simple and Effective Unsupervised Speech Translation.
ACL-Findings2023
Peng-Jen Chen, Kevin Tran, Yilin Yang, Jingfei Du, Justine Kao, Yu-An Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Pino 0001, Wei-Ning Hsu, Ann Lee 0001, 
Speech-to-Speech Translation for a Real-world Unwritten Language.
Interspeech2022
Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino 0001, Alexei Baevski, Alexis Conneau, Michael Auli, 
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale.
Interspeech2022
Danni Liu, Changhan Wang, Hongyu Gong, Xutai Ma, Yun Tang 0002, Juan Miguel Pino, 
From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation.
Interspeech2022
Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino 0001, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee 0001, 
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation.
ACL2022
Ann Lee 0001, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang 0002, Juan Pino 0001, Wei-Ning Hsu, 
Direct Speech-to-Speech Translation With Discrete Units.
ACL2022
Yun Tang 0002, Hongyu Gong, Ning Dong, Changhan Wang, Wei-Ning Hsu, Jiatao Gu, Alexei Baevski, Xian Li, Abdelrahman Mohamed, Michael Auli, Juan Miguel Pino, 
Unified Speech-Text Pre-training for Speech Translation and Recognition.
NAACL2022
Ann Lee 0001, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Yossi Adi, Juan Miguel Pino, Jiatao Gu, Wei-Ning Hsu, 
Textless Speech-to-Speech Translation on Real Data.
ICASSP2021
Xutai Ma, Yongqiang Wang, Mohammad Javad Dousti, Philipp Koehn, Juan Miguel Pino, 
Streaming Simultaneous Speech Translation with Augmented Memory Transformer.
ICASSP2021
Yun Tang 0002, Juan Miguel Pino, Changhan Wang, Xutai Ma, Dmitriy Genzel, 
A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks.
Interspeech2021
Changhan Wang, Anne Wu, Jiatao Gu, Juan Pino 0001, 
CoVoST 2 and Massively Multilingual Speech Translation.
Interspeech2021
Changhan Wang, Anne Wu, Juan Pino 0001, Alexei Baevski, Michael Auli, Alexis Conneau, 
Large-Scale Self- and Semi-Supervised Learning for Speech Translation.
SpeechComm2024
Chao Pan 0001, Jingdong Chen, Jacob Benesty, 
On intrusive speech quality measures and a global SNR based metric.
TASLP2024
Xianrui Wang, Yichen Yang 0010, Andreas Brendel, Tetsuya Ueda, Shoji Makino, Jacob Benesty, Walter Kellermann, Jingdong Chen, 
On Semi-Blind Source Separation-Based Approaches to Nonlinear Echo Cancellation Based on Bilinear Alternating Optimization.
ICASSP2024
Zhiheng Wang, Hongsen He, Jingdong Chen, Jacob Benesty, Yi Yu 0002, 
A Steered Response Power Approach with Bilinear Prediction-Based Trade-Off Prewhitening for Speaker Localization.
ICASSP2024
Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang 0029, Hongbo Lan, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao, 
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
TASLP2023
Junqing Zhang, Liming Shi, Mads Græsbøll Christensen, Wen Zhang 0002, Lijun Zhang 0004, Jingdong Chen, 
CGMM-Based Sound Zone Generation Using Robust Pressure Matching With ATF Perturbation Constraints.
ICASSP2023
Hang Chen, Shilong Wu, Yusheng Dai, Zhe Wang, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge.
ICASSP2023
Hongsen He, Jingdong Chen, Jacob Benesty, Yi Yu 0002, 
A Frequency-Domain Recursive Least-Squares Adaptive Filtering Algorithm Based On A Kronecker Product Decomposition.
ICASSP2023
Gongping Huang, Jacob Benesty, Israel Cohen, Emil Winebrand, Jingdong Chen, Walter Kellermann, 
Switching Kronecker Product Linear Filtering for Multispeaker Adaptive Speech Dereverberation.
ICASSP2023
Xianrui Wang, Andreas Brendel, Gongping Huang, Yichen Yang 0010, Walter Kellermann, Jingdong Chen, 
Spatially Informed Independent vector analysis for Source Extraction based on the convolutive Transfer Function Model.
ICASSP2023
Xianrui Wang, Ningning Pan, Jacob Benesty, Jingdong Chen, 
On Multiple-Input/Binaural-Output Antiphasic Speaker Signal Extraction.
ICASSP2023
Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.
TASLP2022
Zhongxin Bai, Jianyu Wang, Xiao-Lei Zhang 0001, Jingdong Chen, 
End-to-End Speaker Verification via Curriculum Bipartite Ranking Weighted Binary Cross-Entropy.
TASLP2022
Gongping Huang, Jacob Benesty, Israel Cohen, Jingdong Chen, 
Kronecker Product Multichannel Linear Filtering for Adaptive Weighted Prediction Error-Based Speech Dereverberation.
ICASSP2022
Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee 0001, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.
ICASSP2022
Ningning Pan, Jingdong Chen, Jacob Benesty, 
DNN Based Multiframe Single-Channel Noise Reduction Filters.
Interspeech2022
Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee 0001, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan, 
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.
Interspeech2022
Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee 0001, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Shifu Xiong, Jianqing Gao, 
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.
Interspeech2021
Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen 0006, Yanxin Hu, Lei Xie 0001, Jian Wu 0027, Hui Bu, Xin Xu, Jun Du, Jingdong Chen, 
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario.
SpeechComm2020
Zhongxin Bai, Xiao-Lei Zhang 0001, Jingdong Chen, 
Cosine metric learning based speaker verification.
TASLP2020
Zhongxin Bai, Xiao-Lei Zhang 0001, Jingdong Chen, 
Speaker Verification by Partial AUC Optimization With Mahalanobis Distance Metric Learning.
TASLP2023
Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li 0001, 
Self-Supervised Training of Speaker Encoder With Multi-Modal Diverse Positive Pairs.
Interspeech2023
Tanmay Khandelwal, Rohan Kumar Das, 
A Multi-Task Learning Framework for Sound Event Detection using High-level Acoustic Characteristics of Sounds.
ICASSP2022
Tianchi Liu 0004, Rohan Kumar Das, Kong Aik Lee, Haizhou Li 0001, 
MFA: TDNN with Multi-Scale Frequency-Channel Attention for Text-Independent Speaker Verification with Short Utterances.
ICASSP2022
Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li 0001, 
Self-Supervised Speaker Recognition with Loss-Gated Learning.
TASLP2021
Jichen Yang, Hongji Wang, Rohan Kumar Das, Yanmin Qian, 
Modified Magnitude-Phase Spectrum Information for Spoofing Detection.
ICASSP2021
Rohan Kumar Das, Jichen Yang, Haizhou Li 0001, 
Data Augmentation with Signal Companding for Detection of Logical Access Attacks.
Interspeech2021
Rohan Kumar Das, Maulik C. Madhavi, Haizhou Li 0001, 
Diagnosis of COVID-19 Using Auditory Acoustic Cues.
ICASSP2020
Rohan Kumar Das, Haizhou Li 0001, 
On the Importance of Vocal Tract Constriction for Speaker Characterization: The Whispered Speech Study.
ICASSP2020
Rohan Kumar Das, Jichen Yang, Haizhou Li 0001, 
Assessing the Scope of Generalized Countermeasures for Anti-Spoofing.
ICASSP2020
Xuehao Zhou, Xiaohai Tian, Grandee Lee, Rohan Kumar Das, Haizhou Li 0001, 
End-to-End Code-Switching TTS with Cross-Lingual Language Model.
Interspeech2020
Tianchi Liu 0004, Rohan Kumar Das, Maulik C. Madhavi, Shengmei Shen, Haizhou Li 0001, 
Speaker-Utterance Dual Attention for Speaker and Utterance Verification.
Interspeech2020
Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li 0001, 
The Attacker's Perspective on Automatic Speaker Verification: An Overview.
Interspeech2020
Xiaoyi Qin, Ming Li 0026, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li 0001, 
The INTERSPEECH 2020 Far-Field Speaker Verification Challenge.
Interspeech2020
Ruijie Tao, Rohan Kumar Das, Haizhou Li 0001, 
Audio-Visual Speaker Recognition with a Cross-Modal Discriminative Network.
Interspeech2020
Zhenzong Wu, Rohan Kumar Das, Jichen Yang, Haizhou Li 0001, 
Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks.
TASLP2019
Jichen Yang, Rohan Kumar Das, Nina Zhou, 
Extraction of Octave Spectra Information for Spoofing Attack Detection.
ICASSP2019
Yi Zhou 0020, Xiaohai Tian, Haihua Xu, Rohan Kumar Das, Haizhou Li 0001, 
Cross-lingual Voice Conversion with Bilingual Phonetic Posteriorgram and Average Modeling.
Interspeech2019
Rohan Kumar Das, Haizhou Li 0001, 
Instantaneous Phase and Long-Term Acoustic Cues for Orca Activity Detection.
Interspeech2019
Rohan Kumar Das, Jichen Yang, Haizhou Li 0001, 
Long Range Acoustic Features for Spoofed Speech Detection.
Interspeech2019
Sarfaraz Jelil, Abhishek Shrivastava, Rohan Kumar Das, S. R. Mahadeva Prasanna, Rohit Sinha 0003, 
SpeechMarker: A Voice Based Multi-Level Attendance Application.
ICASSP2023
Sebastian Ellis, Stefan Goetze, Heidi Christensen, 
Moving Towards Non-Binary Gender Identification Via Analysis of System Errors in Binary Gender Classification.
TASLP2022
Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic, 
Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition.
ICASSP2022
Zhengjun Yue, Erfan Loweimi, Zoran Cvetkovic, Heidi Christensen, Jon Barker, 
Multi-Modal Acoustic-Articulatory Feature Fusion For Dysarthric Speech Recognition.
Interspeech2022
Samuel Hollands, Daniel Blackburn, Heidi Christensen, 
Evaluating the Performance of State-of-the-Art ASR Systems on Non-Native English using Corpora with Extensive Language Background Variation.
Interspeech2022
Bahman Mirheidari, Daniel Blackburn, Heidi Christensen, 
Automatic cognitive assessment: Combining sparse datasets with disparate cognitive scores.
Interspeech2022
Bahman Mirheidari, André Bittar, Nicholas Cummins, Johnny Downs, Helen L. Fisher, Heidi Christensen, 
Automatic Detection of Expressed Emotion from Five-Minute Speech Samples: Challenges and Opportunities.
Interspeech2022
Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic, 
Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs.
SpeechComm2021
Lubna Alhinti, Heidi Christensen, Stuart P. Cunningham, 
Acoustic differences in emotional speech of people with dysarthria.
ICASSP2021
Yilin Pan, Venkata Srikanth Nallanthighal, Daniel Blackburn, Heidi Christensen, Aki Härmä, 
Multi-Task Estimation of Age and Cognitive Decline from Speech.
Interspeech2021
Heidi Christensen, 
Towards Automatic Speech Recognition for People with Atypical Speech.
Interspeech2021
Bahman Mirheidari, Yilin Pan, Daniel Blackburn, Ronan O'Malley, Heidi Christensen, 
Identifying Cognitive Impairment Using Sentence Representation Vectors.
Interspeech2021
Yilin Pan, Bahman Mirheidari, Jennifer M. Harris, Jennifer C. Thompson, Matthew Jones, Julie S. Snowden, Daniel Blackburn, Heidi Christensen, 
Using the Outputs of Different Automatic Speech Recognition Paradigms for Acoustic- and BERT-Based Alzheimer's Dementia Detection Through Spontaneous Speech.
Interspeech2021
Zhengjun Yue, Jon Barker, Heidi Christensen, Cristina McKean, Elaine Ashton, Yvonne Wren, Swapnil Gadgil, Rebecca Bright, 
Parental Spoken Scaffolding and Narrative Skills in Crowd-Sourced Storytelling Samples of Young Children.
ICASSP2020
Feifei Xiong, Jon Barker, Zhengjun Yue, Heidi Christensen, 
Source Domain Data Selection for Improved Transfer Learning Targeting Dysarthric Speech Recognition.
ICASSP2020
Zhengjun Yue, Feifei Xiong, Heidi Christensen, Jon Barker, 
Exploring Appropriate Acoustic and Language Modelling Choices for Continuous Dysarthric Speech Recognition.
Interspeech2020
Lubna Alhinti, Stuart P. Cunningham, Heidi Christensen, 
Recognising Emotions in Dysarthric Speech Using Typical Speech Data.
Interspeech2020
Nicholas Cummins, Yilin Pan, Zhao Ren, Julian Fritsch, Venkata Srikanth Nallanthighal, Heidi Christensen, Daniel Blackburn, Björn W. Schuller, Mathew Magimai-Doss, Helmer Strik, Aki Härmä, 
A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia Recognition.
Interspeech2020
Bahman Mirheidari, Daniel Blackburn, Ronan O'Malley, Annalena Venneri, Traci Walker, Markus Reuber, Heidi Christensen, 
Improving Cognitive Impairment Classification by Generative Neural Network-Based Feature Augmentation.
Interspeech2020
Yilin Pan, Bahman Mirheidari, Markus Reuber, Annalena Venneri, Daniel Blackburn, Heidi Christensen, 
Improving Detection of Alzheimer's Disease Using Automatic Speech Recognition to Identify High-Quality Segments for More Robust Feature Extraction.
Interspeech2020
Yilin Pan, Bahman Mirheidari, Zehai Tu, Ronan O'Malley, Traci Walker, Annalena Venneri, Markus Reuber, Daniel Blackburn, Heidi Christensen, 
Acoustic Feature Extraction with Interpretable Deep Neural Network for Neurodegenerative Related Disorder Classification.
SpeechComm2024
Nan Li, Longbiao Wang, Meng Ge, Masashi Unoki, Sheng Li 0010, Jianwu Dang 0001, 
Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network.
ICASSP2024
Junjie Li, Ruijie Tao, Zexu Pan, Meng Ge, Shuai Wang 0016, Haizhou Li 0001, 
Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-Talker Speech.
ICASSP2024
Yi Ma, Kong Aik Lee, Ville Hautamäki, Meng Ge, Haizhou Li 0001, 
Gradient Weighting for Speaker Verification in Extremely Low Signal-to-Noise Ratio.
ICASSP2024
Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li 0001, 
SVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks.
Interspeech2023
Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang 0001, Chengyun Deng, Fei Wang, 
Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation.
Interspeech2023
Junjie Li, Meng Ge, Zexu Pan, Rui Cao, Longbiao Wang, Jianwu Dang 0001, Shiliang Zhang, 
Rethinking the Visual Cues in Audio-Visual Speaker Extraction.
Interspeech2023
Qinghua Liu, Meng Ge, Zhizheng Wu 0001, Haizhou Li 0001, 
PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network.
Interspeech2023
Honglong Wang, Chengyun Deng, Yanjie Fu, Meng Ge, Longbiao Wang, Gaoyan Zhang, Jianwu Dang 0001, Fei Wang, 
SDNet: Stream-attention and Dual-feature Learning Network for Ad-hoc Array Speech Separation.
TASLP2022
Zexu Pan, Meng Ge, Haizhou Li 0001, 
USEV: Universal Speaker Extraction With Visual Cue.
ICASSP2022
Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang 0001, Haizhou Li 0001, 
L-SpEx: Localized Target Speaker Extraction.
ICASSP2022
Yongjie Lv, Longbiao Wang, Meng Ge, Sheng Li 0010, Chenchen Ding, Lixin Pan, Yuguang Wang 0003, Jianwu Dang 0001, Kiyoshi Honda, 
Compressing Transformer-Based ASR Model by Task-Driven Loss and Attention-Based Multi-Level Feature Distillation.
Interspeech2022
Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang 0001, 
Iterative Sound Source Localization for Unknown Number of Sources.
Interspeech2022
Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang 0001, 
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network.
Interspeech2022
Nan Li, Meng Ge, Longbiao Wang, Masashi Unoki, Sheng Li 0010, Jianwu Dang 0001, 
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network.
Interspeech2022
Zexu Pan, Meng Ge, Haizhou Li 0001, 
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction.
Interspeech2022
Tongtong Song, Qiang Xu, Meng Ge, Longbiao Wang, Hao Shi, Yongjie Lv, Yuqin Lin, Jianwu Dang 0001, 
Language-specific Characteristic Assistance for Code-switching Speech Recognition.
Interspeech2022
Qiang Xu, Tongtong Song, Longbiao Wang, Hao Shi, Yuqin Lin, Yongjie Lv, Meng Ge, Qiang Yu 0005, Jianwu Dang 0001, 
Self-Distillation Based on High-level Information Supervision for Compressing End-to-End ASR Model.
Interspeech2022
Haoran Yin, Meng Ge, Yanjie Fu, Gaoyan Zhang, Longbiao Wang, Lei Zhang, Lin Qiu, Jianwu Dang 0001, 
MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources.
ICASSP2021
Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang 0001, Haizhou Li 0001, 
Multi-Stage Speaker Extraction with Utterance and Frame-Level Reference Signals.
ICASSP2021
Nan Li, Longbiao Wang, Masashi Unoki, Sheng Li 0010, Rui Wang 0102, Meng Ge, Jianwu Dang 0001, 
Robust Voice Activity Detection Using a Masked Auditory Encoder Based Convolutional Neural Network.
SpeechComm2024
Sanli Tian, Zehan Li, Zhaobiao Lyv, Gaofeng Cheng, Qing Xiao, Ta Li, Qingwei Zhao, 
Factorized and progressive knowledge distillation for CTC-based ASR models.
TASLP2024
Yifan Chen, Gaofeng Cheng, Runyan Yang, Pengyuan Zhang, Yonghong Yan 0002, 
Interrelate Training and Clustering for Online Speaker Diarization.
TASLP2024
Han Zhu 0004, Gaofeng Cheng, Jindong Wang 0001, Wenxin Hou, Pengyuan Zhang, Yonghong Yan 0002, 
Boosting Cross-Domain Speech Recognition With Self-Supervision.
TASLP2023
Han Zhu 0004, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, Yonghong Yan 0002, 
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition.
TASLP2022
Gaofeng Cheng, Haoran Miao, Runyan Yang, Keqi Deng, Yonghong Yan 0002, 
ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture.
TASLP2022
Keqi Deng, Gaofeng Cheng, Runyan Yang, Yonghong Yan 0002, 
Alleviating ASR Long-Tailed Problem by Decoupling the Learning of Representation and Classification.
TASLP2022
Changfeng Gao, Gaofeng Cheng, Ta Li, Pengyuan Zhang, Yonghong Yan 0002, 
Self-Supervised Pre-Training for Attention-Based Encoder-Decoder ASR Model.
ICASSP2022
Keqi Deng, Songjun Cao, Yike Zhang, Long Ma, Gaofeng Cheng, Ji Xu, Pengyuan Zhang, 
Improving CTC-Based Speech Recognition Via Knowledge Transferring from Pre-Trained Language Models.
ICASSP2022
Keqi Deng, Zehui Yang, Shinji Watanabe 0001, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang, 
Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models.
Interspeech2022
Yifan Chen, Yifan Guo, Qingxuan Li, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan 0002, 
Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization.
Interspeech2022
Zehan Li, Haoran Miao, Keqi Deng, Gaofeng Cheng, Sanli Tian, Ta Li, Yonghong Yan 0002, 
Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies.
Interspeech2022
Sanli Tian, Keqi Deng, Zehan Li, Lingxuan Ye, Gaofeng Cheng, Ta Li, Yonghong Yan 0002, 
Knowledge Distillation For CTC-based Speech Recognition Via Consistent Acoustic Representation Learning.
Interspeech2022
Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie 0001, Yonghong Yan 0002, 
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset.
Interspeech2022
Lingxuan Ye, Gaofeng Cheng, Runyan Yang, Zehui Yang, Sanli Tian, Pengyuan Zhang, Yonghong Yan 0002, 
Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR by Fusing Speech Generation Methods.
Interspeech2022
Han Zhu 0004, Li Wang, Gaofeng Cheng, Jindong Wang 0001, Pengyuan Zhang, Yonghong Yan 0002, 
Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR.
Interspeech2022
Han Zhu 0004, Jindong Wang 0001, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan 0002, 
Decoupled Federated Learning for ASR with Non-IID Data.
TASLP2021
Runyan Yang, Gaofeng Cheng, Haoran Miao, Ta Li, Pengyuan Zhang, Yonghong Yan 0002, 
Keyword Search Using Attention-Based End-to-End ASR and Frame-Synchronous Phoneme Alignments.
ICASSP2021
Keqi Deng, Gaofeng Cheng, Haoran Miao, Pengyuan Zhang, Yonghong Yan 0002, 
History Utterance Embedding Transformer LM for Speech Recognition.
ICASSP2021
Changfeng Gao, Gaofeng Cheng, Runyan Yang, Han Zhu 0004, Pengyuan Zhang, Yonghong Yan 0002, 
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Text Data.
TASLP2020
Haoran Miao, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan 0002, 
Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture.
TASLP2023
Vikram C. Mathad, Julie M. Liss, Kathy Chapman, Nancy Scherer, Visar Berisha, 
Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation.
TASLP2023
Jianwei Zhang, Julie Liss, Suren Jayasuriya, Visar Berisha, 
Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection.
ICASSP2023
Leo Hsu, Visar Berisha, 
Does Human Speech Follow Benford's Law?
ICASSP2023
Lingfeng Xu, Kimberly D. Mueller, Julie Liss, Visar Berisha, 
Decorrelating Language Model Embeddings for Speech-Based Prediction of Cognitive Impairment.
Interspeech2023
Yan Xiong, Visar Berisha, Chaitali Chakrabarti, 
Aligning Speech Enhancement for Improving Downstream Classification Performance.
NeurIPS2023
Jianwei Zhang, Suren Jayasuriya, Visar Berisha, 
Learning Repeatable Speech Embeddings Using An Intra-class Correlation Regularizer.
Interspeech2022
Visar Berisha, Chelsea Krantsevich, Gabriela Stegmann, Shira Hahn, Julie Liss, 
Are reported accuracies in the clinical speech machine learning literature overoptimistic?
Interspeech2022
Kelvin Tran, Lingfeng Xu, Gabriela Stegmann, Julie Liss, Visar Berisha, Rene Utianski, 
Investigating the Impact of Speech Compression on the Acoustics of Dysarthric Speech.
ICASSP2021
Vikram C. Mathad, Nancy Scherer, Kathy Chapman, Julie Liss, Visar Berisha, 
An Attention Model for Hypernasality Prediction in Children with Cleft Palate.
Interspeech2021
Vikram C. Mathad, Tristan J. Mahr, Nancy Scherer, Kathy Chapman, Katherine C. Hustad, Julie Liss, Visar Berisha, 
The Impact of Forced-Alignment Errors on Automatic Pronunciation Evaluation.
Interspeech2021
Jianwei Zhang, Suren Jayasuriya, Visar Berisha, 
Restoring Degraded Speech via a Modified Diffusion Model.
TASLP2020
Michael Saxon, Ayush Tripathi, Yishan Jiao, Julie M. Liss, Visar Berisha, 
Robust Estimation of Hypernasality in Dysarthria With Acoustic Model Likelihood Features.
ICASSP2020
Vikram C. Mathad, Kathy Chapman, Julie Liss, Nancy Scherer, Visar Berisha, 
Deep Learning Based Prediction of Hypernasality for Clinical Applications.
Interspeech2020
Deepak Kadetotad, Jian Meng, Visar Berisha, Chaitali Chakrabarti, Jae-sun Seo, 
Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity.
Interspeech2020
Meredith Moore, Piyush Papreja, Michael Saxon, Visar Berisha, Sethuraman Panchanathan, 
UncommonVoice: A Crowdsourced Dataset of Dysphonic Speech.
ICASSP2019
Jacob Peplinski, Visar Berisha, Julie Liss, Shira Hahn, Jeremy Shefner, Seward B. Rutkove, Kristin Qi, Kerisa Shelton, 
Objective Assessment of Vocal Tremor.
ICASSP2019
Michael Saxon, Julie Liss, Visar Berisha, 
Objective Measures of Plosive Nasalization in Hypernasal Speech.
ICASSP2019
Rohit Voleti, Julie M. Liss, Visar Berisha, 
Investigating the Effects of Word Substitution Errors on Sentence Embeddings.
Interspeech2019
Nichola Lubold, Stephanie A. Borrie, Tyson S. Barrett, Megan M. Willi, Visar Berisha, 
Do Conversational Partners Entrain on Articulatory Precision?
Interspeech2019
Meredith Moore, Michael Saxon, Hemanth Venkateswara, Visar Berisha, Sethuraman Panchanathan, 
Say What? A Dataset for Exploring the Error Patterns That Two ASR Engines Make.
ICASSP2024
Shulin He, Jinjiang Liu, Hao Li 0046, Yang Yang 0121, Fei Chen 0011, Xueliang Zhang 0001, 
3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications.
ICASSP2024
Shulin He, Huaiwen Zhang, Wei Rao, Kanghao Zhang, Yukai Ju, Yang Yang 0121, Xueliang Zhang 0001, 
Hierarchical Speaker Representation for Target Speaker Extraction.
ICASSP2023
Shulin He, Wei Rao, Jinjiang Liu, Jun Chen 0024, Yukai Ju, Xueliang Zhang 0001, Yannan Wang, Shidong Shang, 
Speech Enhancement with Intelligent Neural Homomorphic Synthesis.
TASLP2022
Heming Wang, Xueliang Zhang 0001, DeLiang Wang, 
Fusing Bone-Conduction and Air-Conduction Sensors for Complex-Domain Speech Enhancement.
ICASSP2022
Jinjiang Liu, Xueliang Zhang 0001, 
DRC-NET: Densely Connected Recurrent Convolutional Neural Network for Speech Dereverberation.
ICASSP2022
Heming Wang, Xueliang Zhang 0001, DeLiang Wang, 
Attention-Based Fusion for Bone-Conducted and Air-Conducted Speech Enhancement in the Complex Domain.
ICASSP2022
Yang Yang 0121, Hui Zhang 0031, Xueliang Zhang 0001, Huaiwen Zhang, 
Alleviating the Loss-Metric Mismatch in Supervised Single-Channel Speech Enhancement.
Interspeech2022
Jiahui Pan, Shuai Nie, Hui Zhang 0031, Shulin He, Kanghao Zhang, Shan Liang, Xueliang Zhang 0001, Jianhua Tao 0001, 
Speaker recognition-assisted robust audio deepfake detection.
Interspeech2022
Chenggang Zhang, Jinjiang Liu, Xueliang Zhang 0001, 
LCSM: A Lightweight Complex Spectral Mapping Framework for Stereophonic Acoustic Echo Cancellation.
TASLP2021
Hao Li 0046, DeLiang Wang, Xueliang Zhang 0001, Guanglai Gao, 
Recurrent Neural Networks and Acoustic Features for Frame-Level Signal-to-Noise Ratio Estimation.
ICASSP2021
Ke Tan 0001, Xueliang Zhang 0001, DeLiang Wang, 
Real-Time Speech Enhancement for Mobile Communication Based on Dual-Channel Complex Spectral Mapping.
Interspeech2021
Jinjiang Liu, Xueliang Zhang 0001, 
Inplace Gated Convolutional Recurrent Neural Network for Dual-Channel Speech Enhancement.
Interspeech2021
Kanghao Zhang, Shulin He, Hao Li 0046, Xueliang Zhang 0001, 
DBNet: A Dual-Branch Network Architecture Processing on Spectrum and Waveform for Single-Channel Speech Enhancement.
TASLP2020
Zhihao Du, Xueliang Zhang 0001, Jiqing Han 0001, 
A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement.
ICASSP2020
Shulin He, Hao Li 0046, Xueliang Zhang 0001, 
Speakerfilter: Deep Learning-Based Target Speaker Extraction Using Anchor Speech.
Interspeech2020
Zhihao Du, Jiqing Han 0001, Xueliang Zhang 0001, 
Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition.
Interspeech2020
Hao Li 0046, DeLiang Wang, Xueliang Zhang 0001, Guanglai Gao, 
Frame-Level Signal-to-Noise Ratio Estimation Using Deep Learning.
Interspeech2020
Tianjiao Xu, Hui Zhang 0031, Xueliang Zhang 0001, 
Polishing the Classical Likelihood Ratio Test by Supervised Learning for Voice Activity Detection.
Interspeech2020
Chenggang Zhang, Xueliang Zhang 0001, 
A Robust and Cascaded Acoustic Echo Cancellation Based on Deep Learning.
TASLP2019
Zhong-Qiu Wang, Xueliang Zhang 0001, DeLiang Wang, 
Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking.
TASLP2024
Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda, 
Pretraining and Adaptation Techniques for Electrolaryngeal Speech Recognition.
TASLP2024
Shu-Wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li 0001, Abdelrahman Mohamed, Shinji Watanabe 0001, Hung-yi Lee, 
A Large-Scale Evaluation of Speech Foundation Models.
ICASSP2024
Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda, 
Electrolaryngeal Speech Intelligibility Enhancement through Robust Linguistic Encoders.
ICASSP2023
Wen-Chin Huang, Benjamin Peloquin, Justine Kao, Changhan Wang, Hongyu Gong, Elizabeth Salesky, Yossi Adi, Ann Lee 0001, Peng-Jen Chen, 
A Holistic Cascade System, Benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation.
ICASSP2023
Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda, 
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition.
ICASSP2022
Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi, 
Generalization Ability of MOS Prediction Networks.
ICASSP2022
Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda, 
LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech.
ICASSP2022
Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda, 
Towards Identity Preserving Normal to Dysarthric Voice Conversion.
ICASSP2022
Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe 0001, Tomoki Toda, 
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations.
ICASSP2022
Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda, 
Direct Noisy Speech Modeling for Noisy-To-Noisy Voice Conversion.
Interspeech2022
Wen-Chin Huang, Erica Cooper, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi, 
The VoiceMOS Challenge 2022.
Interspeech2022
Wen-Chin Huang, Dejan Markovic, Alexander Richard, Israel Dejene Gebru, Anjali Menon, 
End-to-End Binaural Speech Synthesis.
Interspeech2022
Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda, 
Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition.
ACL2022
Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-Wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li 0001, Shinji Watanabe 0001, Abdelrahman Mohamed, Hung-yi Lee, 
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities.
TASLP2021
Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Pretraining Techniques for Sequence-to-Sequence Voice Conversion.
TASLP2021
Hirokazu Kameoka, Wen-Chin Huang, Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Tomoki Toda, 
Many-to-Many Voice Transformer Network.
ICASSP2021
Tomoki Hayashi, Wen-Chin Huang, Kazuhiro Kobayashi, Tomoki Toda, 
Non-Autoregressive Sequence-To-Sequence Voice Conversion.
ICASSP2021
Wen-Chin Huang, Yi-Chiao Wu, Tomoki Hayashi, 
Any-to-One Sequence-to-Sequence Voice Conversion Using Self-Supervised Discrete Speech Representations.
ICASSP2021
Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda, 
Speech Recognition by Simply Fine-Tuning Bert.
ICASSP2021
Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda, 
Crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder.
ICASSP2023
Liangjie Huang, Tian Yuan, Yunming Liang, Zeyu Chen, Can Wen, Yanlu Xie, Jinsong Zhang 0001, Dengfeng Ke, 
LIMI-VC: A Light Weight Voice Conversion Model with Mutual Information Disentanglement.
Interspeech2023
Lixia Hao, Qi Gong, Jinsong Zhang 0001, 
The effect of stress on Mandarin tonal perception in continuous speech for Spanish-speaking learners.
Interspeech2023
Ruishan Li, Yingming Gao, Yanlu Xie, Dengfeng Ke, Jinsong Zhang 0001, 
Dual Audio Encoders Based Mandarin Prosodic Boundary Prediction by Using Multi-Granularity Prosodic Representations.
TASLP2022
Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR.
Interspeech2022
Jingwen Cheng, Yuchen Yan, Yingming Gao, Xiaoli Feng, Yannan Wang, Jinsong Zhang 0001, 
A study of production error analysis for Mandarin-speaking Children with Hearing Impairment.
Interspeech2022
Yujia Jin, Yanlu Xie, Jinsong Zhang 0001, 
A VR Interactive 3D Mandarin Pronunciation Teaching Model.
Interspeech2022
Longfei Yang, Jinsong Zhang 0001, Takahiro Shinozaki, 
Self-Supervised Learning with Multi-Target Contrastive Coding for Non-Native Acoustic Modeling of Mispronunciation Verification.
TASLP2021
Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load.
Interspeech2021
Linkai Peng, Kaiqi Fu, Binghuai Lin, Dengfeng Ke, Jinsong Zhang 0001, 
A Study on Fine-Tuning wav2vec2.0 Model for the Task of Mispronunciation Detection and Diagnosis.
Interspeech2021
Yuqing Zhang 0003, Zhu Li, Binghuai Lin, Jinsong Zhang 0001, 
A Preliminary Study on Discourse Prosody Encoding in L1 and L2 English Spontaneous Narratives.
Interspeech2021
Yuqing Zhang 0003, Zhu Li, Bin Wu, Yanlu Xie, Binghuai Lin, Jinsong Zhang 0001, 
Relationships Between Perceptual Distinctiveness, Articulatory Complexity and Functional Load in Speech Communication.
Interspeech2020
Wang Dai, Jinsong Zhang 0001, Yingming Gao, Wei Wei, Dengfeng Ke, Binghuai Lin, Yanlu Xie, 
Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism.
Interspeech2020
Dan Du, Xianjin Zhu, Zhu Li, Jinsong Zhang 0001, 
Perception and Production of Mandarin Initial Stops by Native Urdu Speakers.
Interspeech2020
Yingming Gao, Xinyu Zhang, Yi Xu, Jinsong Zhang 0001, Peter Birkholz, 
An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech.
Interspeech2020
Binghuai Lin, Liyuan Wang, Xiaoli Feng, Jinsong Zhang 0001, 
Automatic Scoring at Multi-Granularity for L2 Pronunciation.
Interspeech2020
Binghuai Lin, Liyuan Wang, Xiaoli Feng, Jinsong Zhang 0001, 
Joint Detection of Sentence Stress and Phrase Boundary for Prosody.
Interspeech2020
Yanlu Xie, Xiaoli Feng, Boxue Li, Jinsong Zhang 0001, Yujia Jin, 
A Mandarin L2 Learning APP with Mispronunciation Detection and Feedback.
Interspeech2020
Longfei Yang, Kaiqi Fu, Jinsong Zhang 0001, Takahiro Shinozaki, 
Pronunciation Erroneous Tendency Detection with Language Adversarial Represent Learning.
Interspeech2019
Dan Du, Jinsong Zhang 0001, 
The Production of Chinese Affricates /ts/ and /tsh/ by Native Urdu Speakers.
Interspeech2019
Shuju Shi, Chilin Shih, Jinsong Zhang 0001, 
Capturing L1 Influence on L2 Pronunciation by Simulating Perceptual Space Using Acoustic Features.
TASLP2024
Chenfeng Miao, Qingying Zhu, Minchuan Chen, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion.
ICASSP2024
Zeyu Yang, Minchuan Chen, Yanping Li, Wei Hu, Shaojun Wang, Jing Xiao 0006, Zijian Li, 
ESVC: Combining Adaptive Style Fusion and Multi-Level Feature Disentanglement for Expressive Singing Voice Conversion.
ICASSP2024
Ziyang Zhuang, Kun Zou, Chenfeng Miao, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, Jing Xiao 0006, 
Improving Attention-Based End-to-End Speech Recognition by Monotonic Alignment Attention Matrix Reconstruction.
ICML2024
Chenfeng Miao, Qingying Zhu, Minchuan Chen, Wei Hu, Zijian Li, Shaojun Wang, Jing Xiao 0006, 
DFlow: A Generative Model Combining Denoising AutoEncoder and Normalizing Flow for High Fidelity Waveform Generation.
Interspeech2023
Minchuan Chen, Chenfeng Miao, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
Exploring multi-task learning and data augmentation in dementia detection with self-supervised pretrained models.
Interspeech2023
Fengyun Tan, Chaofeng Feng, Tao Wei, Shuai Gong, Jinqiang Leng, Wei Chu, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
Improving End-to-End Modeling For Mandarin-English Code-Switching Using Lightweight Switch-Routing Mixture-of-Experts.
TASLP2022
Suliang Bu, Yunxin Zhao, Tuo Zhao, Shaojun Wang, Mei Han, 
Modeling Speech Structure to Improve T-F Masks for Speech Enhancement and Recognition.
Interspeech2022
Chenfeng Miao, Ting Chen, Minchuan Chen, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
A compact transformer-based GAN vocoder.
Interspeech2022
Chenfeng Miao, Kun Zou, Ziyang Zhuang, Tao Wei, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
Towards Efficiently Learning Monotonic Alignments for Attention-based End-to-End Speech Recognition.
Interspeech2022
Zongfeng Quan, Nick J. C. Wang, Wei Chu, Tao Wei, Shaojun Wang, Jing Xiao 0006, 
FFM: A Frame Filtering Mechanism To Accelerate Inference Speed For Conformer In Speech Recognition.
Interspeech2022
Ye Wang, Baishun Ling, Yanmeng Wang, Junhao Xue, Shaojun Wang, Jing Xiao 0006, 
Adversarial Knowledge Distillation For Robust Spoken Language Understanding.
ICASSP2021
Weiwei Jiang, Junjie Li, Minchuan Chen, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
Improving Neural Text Normalization with Partial Parameter Generator and Pointer-Generator Network.
ICASSP2021
Shuang Liang, Chenfeng Miao, Minchuan Chen, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
Unsupervised Learning for Multi-Style Speech Synthesis with Limited Data.
ICASSP2021
Hao Pan, Zhongdi Chao, Jiang Qian, Bojin Zhuang, Shaojun Wang, Jing Xiao 0006, 
Network Pruning Using Linear Dependency Analysis on Feature Maps.
ICASSP2021
Yanmeng Wang, Ye Wang, Xingyu Lou, Wenge Rong, Zhenghong Hao, Shaojun Wang, 
Improving Dialogue Response Generation Via Knowledge Graph Filter.
Interspeech2021
Suliang Bu, Yunxin Zhao, Shaojun Wang, Mei Han, 
Learning Speech Structure to Improve Time-Frequency Masks.
Interspeech2021
Junjie Li, Zhiyu Zhang, Minchuan Chen, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
Improving Polyphone Disambiguation for Mandarin Chinese by Combining Mix-Pooling Strategy and Window-Based Attention.
Interspeech2021
Zhengchen Liu, Chenfeng Miao, Qingying Zhu, Minchuan Chen, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
EfficientSing: A Chinese Singing Voice Synthesis System Using Duration-Free Acoustic Model and HiFi-GAN Vocoder.
ICML2021
Chenfeng Miao, Shuang Liang, Zhengchen Liu, Minchuan Chen, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture.
ICASSP2020
Chenfeng Miao, Shuang Liang, Minchuan Chen, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
Flow-TTS: A Non-Autoregressive Network for Text to Speech Based on Flow.